National Repository of Grey Literature 8 records found  Search took 0.01 seconds. 
Discourse Relations in Czech
Poláková, Lucie ; Hajičová, Eva (advisor) ; Hoffmannová, Jana (referee) ; Pešek, Ondřej (referee)
This doctoral thesis is devoted to linguistic analysis of discourse relations as one of the aspects of discourse coherence. Discourse relations are semantic relations holding between propositions in a discourse (discourse arguments). The aim of the thesis is a complex description of discourse relations in Czech and its application in an annotation scheme in the Prague Dependency Treebank. The thesis is divided into three parts: The first one is focused on the theoretical description of discourse relations and on analysis of adequacy of various methodological concepts in corpus processing. The second part describes in detail the proposed scheme for the annotation of discourse relations and the process of the corpus build- up including the evaluation of consistency of the annotated data. Finally, in the last part of the thesis, we address some problematic issues arisen with the employment the proposed scheme and look for their possible solutions.
Usage of Verbal Adjectives in Contemporary Czech
Fílová, Hana ; Lehečková, Eva (advisor) ; Adam, Robert (referee)
The aim of this bachelor thesis is to describe and summarize the most characteristic features of verbal adjectives. It is divided into two parts. The first part contains a theoretical description of verbal adjectives. It is based on information included in Czech grammar books, handbooks and research articles. It deals with the description of function, form, classification and use of verbal adjectives in contemporary Czech. This section also contains definitions of some controversial points or inaccuracies of these descriptions regarding verbal adjectives. The second part of the thesis is dedicated to the research of selected features, it is focused on following types of verbal adjectives: -cí, -vší, -ný, -tý used in professional and journalistic style. The controversial points or inaccuracies of the descriptions defined in the first section of the thesis are examined on the material of The Czech National Corpus. It is focused on following features of verbal adjectives: the number and nature of the preserved syntactic arguments, word order position of the adjectives and their dominating noun, and the syntactic function of the adjectives.
Velký mnohojazyčný korpus
Majliš, Martin ; Žabokrtský, Zdeněk (advisor) ; Spousta, Miroslav (referee)
This thesis introduces the W2C Corpus which contains 97 languages with more than 10 million words for each of these languages, with the total size 10.5 billion words. The corpus was built by crawling the Internet. This work describes the methods and tools used for its construction. The complete process consisted of building an initial corpus from Wikipedia, developing a language recognizer for 122 languages, implementing a distributed system for crawling and parsing webpages and finally, the reduction of duplicities. A comparative analysis of the texts of Wikipedia and the Internet is provided at the end of this thesis. The analysis is based on basic statistics such as average word and sentence length, conditional entropy and perplexity. 1
Discourse Relations in Czech
Poláková, Lucie ; Hajičová, Eva (advisor) ; Hoffmannová, Jana (referee) ; Pešek, Ondřej (referee)
This doctoral thesis is devoted to linguistic analysis of discourse relations as one of the aspects of discourse coherence. Discourse relations are semantic relations holding between propositions in a discourse (discourse arguments). The aim of the thesis is a complex description of discourse relations in Czech and its application in an annotation scheme in the Prague Dependency Treebank. The thesis is divided into three parts: The first one is focused on the theoretical description of discourse relations and on analysis of adequacy of various methodological concepts in corpus processing. The second part describes in detail the proposed scheme for the annotation of discourse relations and the process of the corpus build- up including the evaluation of consistency of the annotated data. Finally, in the last part of the thesis, we address some problematic issues arisen with the employment the proposed scheme and look for their possible solutions.
Usage of Verbal Adjectives in Contemporary Czech
Fílová, Hana ; Lehečková, Eva (advisor) ; Adam, Robert (referee)
The aim of this bachelor thesis is to describe and summarize the most characteristic features of verbal adjectives. It is divided into two parts. The first part contains a theoretical description of verbal adjectives. It is based on information included in Czech grammar books, handbooks and research articles. It deals with the description of function, form, classification and use of verbal adjectives in contemporary Czech. This section also contains definitions of some controversial points or inaccuracies of these descriptions regarding verbal adjectives. The second part of the thesis is dedicated to the research of selected features, it is focused on following types of verbal adjectives: -cí, -vší, -ný, -tý used in professional and journalistic style. The controversial points or inaccuracies of the descriptions defined in the first section of the thesis are examined on the material of The Czech National Corpus. It is focused on following features of verbal adjectives: the number and nature of the preserved syntactic arguments, word order position of the adjectives and their dominating noun, and the syntactic function of the adjectives.
Velký mnohojazyčný korpus
Majliš, Martin ; Žabokrtský, Zdeněk (advisor) ; Spousta, Miroslav (referee)
This thesis introduces the W2C Corpus which contains 97 languages with more than 10 million words for each of these languages, with the total size 10.5 billion words. The corpus was built by crawling the Internet. This work describes the methods and tools used for its construction. The complete process consisted of building an initial corpus from Wikipedia, developing a language recognizer for 122 languages, implementing a distributed system for crawling and parsing webpages and finally, the reduction of duplicities. A comparative analysis of the texts of Wikipedia and the Internet is provided at the end of this thesis. The analysis is based on basic statistics such as average word and sentence length, conditional entropy and perplexity. 1

Interested in being notified about new results for this query?
Subscribe to the RSS feed.