National Repository of Grey Literature 9 records found  Search took 0.01 seconds. 
Syntactically-based classification of Czech sentences
Kríž, Vincent ; Vidová Hladká, Barbora (advisor) ; Mírovský, Jiří (referee)
Classification of syntactically meaningful sentences is a very useful task for the applications of natural language processing, for example machine translation, search engines and question answering systems. The theoretical linguistic research considers the language to be a system of layers. In our project, a term 'to-be-meaningful' will be specified with respect to this point of view. Namely, the morphological and syntactic layers will be considered. A knowledge-based algorithm classifying a string of Czech words being either meaningful or meaningless will be proposed and implemented. Before being classified, strings will be pre-processed by the external modules. Czech will be used as the object language.
Classifier for semantic patterns of English verbs
Kríž, Vincent ; Holub, Martin (advisor) ; Bojar, Ondřej (referee)
The goal of the diploma thesis is to design, implement and evaluate classifiers for automatic classification of semantic patterns of English verbs according to a pattern lexicon that draws on the Corpus Pattern Analysis. We use a pilot collection of 30 sample English verbs as training and test data sets. We employ standard methods of machine learning. In our experiments we use decision trees, k-nearest neighbourghs (kNN), support vector machines (SVM) and Adaboost algorithms. Among other things we concentrate on feature design and selection. We experiment with both morpho-syntactic and semantic features. Our results show that the morpho-syntactic features are the most important for statistically-driven semantic disambiguation. Nevertheless, for some verbs the use of semantic features plays an important role.
Detecting semantic relations in texts and their integration with external data resources
Kríž, Vincent ; Vidová Hladká, Barbora (advisor)
We present a strategy to automate the extraction of semantic relations from texts. Both machine learning and rule-based techniques are investigated and the impact of different linguistic knowledge is analyzed for the various approaches. To implement the extraction system RExtractor, several natural language processing tools have been improved: from sentence splitting and tokenization modules to dependency syntax parsers. Furthermore, we created the Czech Legal Text Treebank with several layers of linguistic annotation, which is used to train and test each stage of the proposed system. As a result of the performed work, new Semantic Web resources and tools are available for automatic processing of texts.
Detecting semantic relations in texts and their integration with external data resources
Kríž, Vincent ; Vidová Hladká, Barbora (advisor) ; Harašta, Jakub (referee) ; Pecina, Pavel (referee)
We present a strategy to automate the extraction of semantic relations from texts. Both machine learning and rule-based techniques are investigated and the impact of different linguistic knowledge is analyzed for the various approaches. To implement the extraction system RExtractor, several natural language processing tools have been improved: from sentence splitting and tokenization modules to dependency syntax parsers. Furthermore, we created the Czech Legal Text Treebank with several layers of linguistic annotation, which is used to train and test each stage of the proposed system. As a result of the performed work, new Semantic Web resources and tools are available for automatic processing of texts.
Detecting semantic relations in texts and their integration with external data resources
Kríž, Vincent ; Vidová Hladká, Barbora (advisor)
We present a strategy to automate the extraction of semantic relations from texts. Both machine learning and rule-based techniques are investigated and the impact of different linguistic knowledge is analyzed for the various approaches. To implement the extraction system RExtractor, several natural language processing tools have been improved: from sentence splitting and tokenization modules to dependency syntax parsers. Furthermore, we created the Czech Legal Text Treebank with several layers of linguistic annotation, which is used to train and test each stage of the proposed system. As a result of the performed work, new Semantic Web resources and tools are available for automatic processing of texts.
Automatic concordance extraction from the Internet
Macháček, Dominik ; Kríž, Vincent (advisor) ; Vidová Hladká, Barbora (referee)
Concordances are sentences containing given target word. They are profitable research objects in all linguistics fields. A big amount of concordances is exploited during lexical desambiguation problem solving. Language corpora are not able to supply sufficient number of concordances of some English verbs. In this thesis we elaborate a design and implementation of a console application for automatic extraction of given number of English concordances. The application gets on its input a target word, a part-of-speech and a number of sentences. Consecutively it seeks out and extracts on the Internet desired number of English sentences containing a target word as given part-of-speech. We created also a Python library which allows a modification of the application to any arbitrary language. We published it on PyPI server. A part of a work is also a webpage allowing users to try out the application through web interface. 1
Framework for information extraction from the large language data sets
Kuboň, David ; Križ, Vincent (advisor) ; Bednárek, David (referee)
This thesis describes the FAFEFI program that focuses on n-gram and skip-gram extraction from large data sets. The thesis presents two different approaches to passing input data to the program. It also describes the design of data structures for n-gram and skip-gram representation within computer memory, the algorithm of n-gram and skip-gram extraction, memory-friendly options of saving extracted data and their final composition into output feature vectors. It also offers a variety of extra functions such as line filter and line modifier and a great deal of configurable parameters ranging from in-file separators to formatting the names of output files. Moreover, the program provides a differentiation in its activity by enabling saving data just after extraction from the train set and brings tools for cluster parallelization. Powered by TCPDF (www.tcpdf.org)
Classifier for semantic patterns of English verbs
Kríž, Vincent ; Holub, Martin (advisor) ; Bojar, Ondřej (referee)
The goal of the diploma thesis is to design, implement and evaluate classifiers for automatic classification of semantic patterns of English verbs according to a pattern lexicon that draws on the Corpus Pattern Analysis. We use a pilot collection of 30 sample English verbs as training and test data sets. We employ standard methods of machine learning. In our experiments we use decision trees, k-nearest neighbourghs (kNN), support vector machines (SVM) and Adaboost algorithms. Among other things we concentrate on feature design and selection. We experiment with both morpho-syntactic and semantic features. Our results show that the morpho-syntactic features are the most important for statistically-driven semantic disambiguation. Nevertheless, for some verbs the use of semantic features plays an important role.
Syntactically-based classification of Czech sentences
Kríž, Vincent ; Mírovský, Jiří (referee) ; Vidová Hladká, Barbora (advisor)
Classification of syntactically meaningful sentences is a very useful task for the applications of natural language processing, for example machine translation, search engines and question answering systems. The theoretical linguistic research considers the language to be a system of layers. In our project, a term 'to-be-meaningful' will be specified with respect to this point of view. Namely, the morphological and syntactic layers will be considered. A knowledge-based algorithm classifying a string of Czech words being either meaningful or meaningless will be proposed and implemented. Before being classified, strings will be pre-processed by the external modules. Czech will be used as the object language.

See also: similar author names
1 KŘÍŽ, Vladimír
1 KŘÍŽ, Vojtěch
10 KŘÍŽ, Václav
14 Kříž, Vlastimil
10 Kříž, Václav
3 Kříž, Vítězslav
Interested in being notified about new results for this query?
Subscribe to the RSS feed.