National Repository of Grey Literature 64 records found  previous11 - 20nextend  jump to record: Search took 0.01 seconds. 
A school analysis as a possible source of treebanks (?)
Konárová, Marie ; Vidová Hladká, Barbora (advisor) ; Zeman, Daniel (referee)
The aim of this thesis is to explore the possibilities of using data from the school sentence analyses for tagging words in the language corpora. For testing of this hypothesis, a set of sentences has been selected from a common czech language textbook. Students of selected primary and secondary schools were asked to perform the syntactical analysis of these sentences. The data collection was carried out using a prototype sentence analysis editor Capek. The editor is still being developed, also based on feedback gained from the students and teachers who used it during the data collecting process. Several transformation rules for converting data from the school sentence analyses into the data structures used within the Prague Dependency corpus were developed. The accuracy of the conversion using the proposed rules was tested together with the accuracy of students' results.
Automatic Resolution of Pronoun Coreference in Czech
Košarko, Ondřej ; Mírovský, Jiří (advisor) ; Vidová Hladká, Barbora (referee)
Title: Automatic Resolution of Pronoun Coreference in Czech Author: Ondřej Košarko Department: ÚFAL MFF UK Supervisor: RNDr. Jiří Mírovský, Ph.D. Supervisor's e­mail address: mirovsky@ufal.mff.cuni.cz Abstract: The aim of this thesis is to introduce a procedure for automatic pronomial coreference resolution in Czech texts. The text is morphologically and analytically annotated acording to the system of Prague Dependency Treebank. The procedure uses a machine learning method; for its training a set of manually annotated data from Prague Dependency Treebank is used. Evaluation of the results is also part of this thesis. Keywords: pronomial coreference, automatic resolution, machine learning
Natural Language Interface for online webcasts
Macošek, Jan ; Hajič, Jan (advisor) ; Vidová Hladká, Barbora (referee)
This text describes development of natural language interface for online webcasts. These webcasts are transformed from text to speech and then played by the electronic rabbit Nabaztag. Its user can control it by voice commands, so the text also focuses on training accoustic models with the HTK Toolkit and on using these models to recognize speech with the Julius speech recognizer. Searching for the webcasts and their processing is also described, along with some problems that occured during speech synthesis of sportoriented texts.
Automatické určování sémantických preferencí pro slovesná valenční doplnění
Vandas, Karel ; Lopatková, Markéta (advisor) ; Vidová Hladká, Barbora (referee)
Verb valency plays an important role in the description of behaviour of verbs and connects surface realisation of language with its semantics. Verb itself usually encodes several readings. Complementations of a verb help to identify correct reading of the verb. So far valency verb complementations are mostly studied from morphological and syntactical point of view. The purpose of this thesis is to examine possibilities of automatic identification of semantic preferences for valency complementations of verbs. The thesis discusses performance of system with different levels of available verb valency information in connection with cluster analysis. The thesis contains an evaluation section that compares available methods and their comparision.
Semantic disambiguation using Distributional Semantics
Prodanovic, Srdjan ; Hana, Jiří (advisor) ; Vidová Hladká, Barbora (referee)
Ve statistických modelů sémantiky jsou významy slov pouze na základě jejich distribuční vlastnosti.Základní zdroj je zde jeden slovník, který lze použít pro různé úkoly, kde se význam slov reprezentovány jako vektory v vektorového prostoru, a slovní podoby jako vzdálenosti mezi jejich vektorových osobnosti. Pomocí silných podobnosti, může vhodnost podmínek uvedených zejména v souvislosti se vypočítá a používá pro celou řadu úkolů, jeden z nich je slovo smysl Disambiguation. V této práci bylo vyšetřeno několik různých přístupů k modelům z vektorového prostoru a prováděny tak, aby k překročení vyhodnocení vlastního výkonu na Word Sense disambiguation úkolem Prague Dependency Treebank.
Quantifying Determiners from the Distributional Semantics View
Gutiérrez Vasques, María Ximena ; Lopatková, Markéta (advisor) ; Vidová Hladká, Barbora (referee)
Název práce: Quantifying Determiners from the Distributional Semantics View Autor: Maria Ximena Gutierrez Vasques Katedra: Ústav formální a aplikované lingvistiky Vedoucí diplomové práce: doc. RNDr. Markéta Lopatková, Ph.D. Abstrakt: Distribuční sémanika představuje moderní přístup k zachycení sémantiky přirozeného jazyka. Jedním z témat, kterým zatím v rámci tohoto přístupu nebyla věnována dostatečná pozornost, je možnost automatické detekce logických relací jako vyplývání. Tato diplomová práce navazuje na práci autorů Baroni, Bernar- di, Do and Shan (2012), kteří se zabývají relací vyplývání mezi kvantifikujícími výrazy. Citovaná práce využívá detekce pomocí SVN klasifikátorů natrénavaných na sémantických vektorech reprezentujících relaci vyplývání. Popisované exper- imenty se nezaměřovaly na nastaveni parametrů SVN klasifikátoru, proto se v této práci vracíme k původním experimentům popisujícím relaci vyplývání mezi kvantifikovanýmo jmennými konstrukcemi, navrhujeme nové konfigurace klasi- fikátoru a optimalizujeme nastavení parametrů. Dosaženou přesnost predikce porovnáváme s původními výsledky a ukazujeme, že SVM klasifikátor s kvadrat- ickým polynomiálním jádrem dosahuje lepších výsledků....
Functional Arabic Morphology: Formal System and Implementation
Smrž, Otakar ; Vidová Hladká, Barbora (advisor) ; Hajič, Jan (referee) ; Habash, Nizar Y. (referee)
Functional Arabic Morphology is a formulation of the Arabic inflectional system seeking the working interface between morphology and syntax. ElixirFM is its high-level implementation that reuses and extends the Functional Morphology library for Haskell. Inflection and derivation are modeled in terms of paradigms, grammatical categories, lexemes and word classes. The computation of analysis or generation is conceptually distinguished from the general-purpose linguistic model. The lexicon of ElixirFM is designed with respect to abstraction, yet is no more complicated than printed dictionaries. It is derived from the open-source Buckwalter lexicon and is enhanced with information sourcing from the syntactic annotations of the Prague Arabic Dependency Treebank. MorphoTrees is the idea of building effective and intuitive hierarchies over the information provided by computational morphological systems. MorphoTrees are implemented for Arabic as an extension to the TrEd annotation environment based on Perl. Encode Arabic libraries for Haskell and Perl serve for processing the non-trivial and multi-purpose ArabTEX notation that encodes Arabic orthographies and phonetic transcriptions in parallel.
Assessing the impact of manual corrections in the Groningen Meaning Bank
Weck, Benno ; Lopatková, Markéta (advisor) ; Vidová Hladká, Barbora (referee)
The Groningen Meaning Bank (GMB) project develops a corpus with rich syntactic and semantic annotations. Annotations in GMB are generated semi-automatically and stem from two sources: (i) Initial annotations from a set of standard NLP tools, (ii) Corrections/refinements by human annotators. For example, on the part-of-speech level of annotation there are currently 18,000 of those corrections, so called Bits of Wisdom (BOWs). For applying this information to boost the NLP processing we experiment how to use the BOWs in retraining the part-of-speech tagger and found that it can be improved to correct up to 70% of identified errors within held-out data. Moreover an improved tagger helps to raise the performance of the parser. Preferring sentences with a high rate of verified tags in retraining has proven to be the most reliable way. With a simulated active learning experiment using Query-by-Uncertainty (QBU) and Query-by- Committee (QBC) we proved that selectively sampling sentences for retraining yields better results with less data needed than random selection. In an additional pilot study we found that a standard maximum-entropy part-of-speech tagger can be augmented so that it uses already known tags to enhance its tagging decisions on an entire sequence without retraining a new model first. Powered by...
Sledování aktivovanosti objektů v textech
Václ, Jan ; Vidová Hladká, Barbora (advisor) ; Novák, Michal (referee)
The notion of salience in the discourse analysis models how the activation of referred objects evolves in the flow of text. The salience algorithm was already defined and tested briefly in an earlier research, we present a reproduction of its results in a larger scale using data from the Prague Discourse Treebank 1.0. The results are then collected into an accessible shape and analyzed both in their visual and quantitative form in the context of the two main resources of the salience - coreference relations and topic-focus articulation. Finally, attempts are made with using the salience information in the machine learning NLP tasks of document clustering and topic modeling. Powered by TCPDF (www.tcpdf.org)
Detekce podezřelých anotací
Václ, Jan ; Vidová Hladká, Barbora (advisor) ; Hana, Jiří (referee)
This work describes a machine learning approach for checking the part-of-speech annotation, and presents its implementation - a system called MissTagger. The checking procedure covers both error detection and error correction. MissTagger employs a simplified instance-based learning algorithm where the words in the text are recognized as instances. Part-of-speech tags of context of static length are selected as features, no lexical information is included. The words whose tags comprises this context are chosen based either on a linear or on a dependency-tree structure of the sentence. Two languages are examined in the experiments for evaluation, Czech and English.

National Repository of Grey Literature : 64 records found   previous11 - 20nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.