National Repository of Grey Literature 70 records found  previous11 - 20nextend  jump to record: Search took 0.00 seconds. 
Hloubková automatická analýza angličtiny
Dušek, Ondřej ; Hajič, Jan (advisor) ; Vidová Hladká, Barbora (referee)
This thesis contains an account of our studies of deep or semantic analysis of English, particularly as described using predicate-argument structure description. Our main goal is to create a system for automatic inference of semantic relations between predicates and arguments - semantic role labeling. We developed a framework for parallel processing of our experiments, integrating third-party machine learning tools and implementing well-known as well as novel procedures. We investigated the current approaches to the problem and proposed several improvements, such as new classi cation features, separate handling of adverbial modi ers or special treatment for rare predicates. Based on our research, we designed and implemented our own semantic analysis system, consisting of predicate disambiguation and argument classi cation subtasks. We evaluated our solution using the CoNLL 2009 Shared Task English corpus.
Matching Images to Texts
Hajič, Jan ; Pecina, Pavel (advisor) ; Průša, Daniel (referee)
We build a joint multimodal model of text and images for automatically assigning illustrative images to journalistic articles. We approach the task as an unsupervised representation learning problem of finding a common representation that abstracts from individual modalities, inspired by multimodal Deep Boltzmann Machine of Srivastava and Salakhutdinov. We use state-of-the-art image content classification features obtained from the Convolutional Neural Network of Krizhevsky et al. as input "images" and entire documents instead of keywords as input texts. A deep learning and experiment management library Safire has been developed. We have not been able to create a successful retrieval system because of difficulties with training neural networks on the very sparse word observation. However, we have gained substantial understanding of the nature of these difficulties and thus are confident that we will be able to improve in future work.
Verb Valency Frames Disambiguation
Semecký, Jiří ; Hajič, Jan (advisor) ; Krbec, Pavel (referee) ; Lopatková, Markéta (referee)
Semantic analysis has become a bottleneck of many natural language applications. Machine translation, automatic question answering, dialog management, and others rely on high quality semantic analysis. Verbs are central elements of clauses with strong influence on the realization of whole sentences. Therefore the semantic analysis of verbs plays a key role in the analysis of natural language. We believe that solid disambiguation of verb senses can boost the performance of many real-life applications. In this thesis, we investigate the potential of statistical disambiguation of verb senses. Each verb occurrence can be described by diverse types of information. We investigate which information is worth considering when determining the sense of verbs. Different types of classification methods are tested with regard to the topic. In particular, we compared the Naive Bayes classifier, decision trees, rule-based method, maximum entropy, and support vector machines. The proposed methods are thoroughly evaluated on two different Czech corpora, VALEVAL and the Prague Dependency Treebank. Significant improvement over the baseline is observed.
Pojmenované entity a ontologie metodami hlubokého učení
Rafaj, Filip ; Hajič, Jan (advisor) ; Žabokrtský, Zdeněk (referee)
In this master thesis we describe a method for linking named entities in a given text to a knowledge base - Named Entity Linking. Using a deep neural architecture together with BERT contextualized word embeddings we created a semi-supervised model that jointly performs Named Entity Recognition and Named Entity Disambiguation. The model outputs a Wikipedia ID for each entity detected in an input text. To compute contextualized word embeddings we used pre-trained BERT without making any changes to it (no fine-tuning). We experimented with components of our model and various versions of BERT embeddings. Moreover, we tested several different ways of using the contextual embeddings. Our model is evaluated using standard metrics and surpasses scores of models that were establishing the state of the art before the expansion of pre-trained contextualized models. The scores of our model are comparable to current state-of-the-art models.
Netgraph-A Tool for Searching in the Prague Dependency Treebank 2.0
Mírovský, Jiří ; Hajič, Jan (advisor) ; Rosen, Alexandr (referee) ; Ondruška, Roman (referee)
Three sides existed whose connection is solved in this thesis. First, it was the Prague Dependency Treebank 2.0, one of the most advanced treebanks in the linguistic world. Second, there existed a very limited but extremely intuitive search tool - Netgraph 1.0. Third, there were users longing for such a simple and intuitive tool that would be powerful enough to search in the Prague Dependency Treebank. In the thesis, we study the annotation of the Prague Dependency Treebank 2.0, especially on the tectogrammatical layer, which is by far the most complex layer of the treebank, and assemble a list of requirements on a query language that would allow searching for and studying all linguistic phenomena annotated in the treebank. We propose an extension to the query language of the existing search tool Netgraph 1.0 and show that the extended query language satisfies the list of requirements. We also show how all principal linguistic phenomena annotated in the treebank can be searched for with the query language. The proposed query language has also been implemented - we present the search tool as well and talk about the data format for the tool. An attached CD-ROM contains the installation of the tool.
Vícejazyčná databáze kolokací
Helcl, Jindřich ; Hajič, Jan (advisor) ; Mareček, David (referee)
Collocations are groups of words which are co-occurring more often than appearing separately. They also include phrases that give a new meaning to a group of unrelated words. This thesis is aimed to find collocations in large data and to create a database that allows their retrieval. The Pointwise Mutual Information, a value based on word frequency, is computed for finding the collocations. Words with the highest value of PMI are considered candidates for good collocations. Chosen collocations are stored in a database in a format that allows searching with Apache Lucene. A part of the thesis is to create a Web user interface as a quick and easy way to search collocations. If this service is fast enough and the collocations are good, translators will be able to use it for finding proper equivalents in the target language. Students of a foreign language will also be able to use it to extend their vocabulary. Such database will be created independently in several languages including Czech and English. Powered by TCPDF (www.tcpdf.org)
Natural Language Interface for online webcasts
Macošek, Jan ; Hajič, Jan (advisor) ; Vidová Hladká, Barbora (referee)
This text describes development of natural language interface for online webcasts. These webcasts are transformed from text to speech and then played by the electronic rabbit Nabaztag. Its user can control it by voice commands, so the text also focuses on training accoustic models with the HTK Toolkit and on using these models to recognize speech with the Julius speech recognizer. Searching for the webcasts and their processing is also described, along with some problems that occured during speech synthesis of sportoriented texts.

National Repository of Grey Literature : 70 records found   previous11 - 20nextend  jump to record:
See also: similar author names
2 Hajič, Jakub
Interested in being notified about new results for this query?
Subscribe to the RSS feed.