keywords:"language corpus" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"language corpus"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Discourse Relations in Czech Poláková, Lucie ; Hajičová, Eva (advisor) ; Hoffmannová, Jana (referee) ; Pešek, Ondřej (referee) This doctoral thesis is devoted to linguistic analysis of discourse relations as one of the aspects of discourse coherence. Discourse relations are semantic relations holding between propositions in a discourse (discourse arguments). The aim of the thesis is a complex description of discourse relations in Czech and its application in an annotation scheme in the Prague Dependency Treebank. The thesis is divided into three parts: The first one is focused on the theoretical description of discourse relations and on analysis of adequacy of various methodological concepts in corpus processing. The second part describes in detail the proposed scheme for the annotation of discourse relations and the process of the corpus build- up including the evaluation of consistency of the annotated data. Finally, in the last part of the thesis, we address some problematic issues arisen with the employment the proposed scheme and look for their possible solutions. Detailed record
	Usage of Verbal Adjectives in Contemporary Czech Fílová, Hana ; Lehečková, Eva (advisor) ; Adam, Robert (referee) The aim of this bachelor thesis is to describe and summarize the most characteristic features of verbal adjectives. It is divided into two parts. The first part contains a theoretical description of verbal adjectives. It is based on information included in Czech grammar books, handbooks and research articles. It deals with the description of function, form, classification and use of verbal adjectives in contemporary Czech. This section also contains definitions of some controversial points or inaccuracies of these descriptions regarding verbal adjectives. The second part of the thesis is dedicated to the research of selected features, it is focused on following types of verbal adjectives: -cí, -vší, -ný, -tý used in professional and journalistic style. The controversial points or inaccuracies of the descriptions defined in the first section of the thesis are examined on the material of The Czech National Corpus. It is focused on following features of verbal adjectives: the number and nature of the preserved syntactic arguments, word order position of the adjectives and their dominating noun, and the syntactic function of the adjectives. Detailed record
	Velký mnohojazyčný korpus Majliš, Martin ; Žabokrtský, Zdeněk (advisor) ; Spousta, Miroslav (referee) This thesis introduces the W2C Corpus which contains 97 languages with more than 10 million words for each of these languages, with the total size 10.5 billion words. The corpus was built by crawling the Internet. This work describes the methods and tools used for its construction. The complete process consisted of building an initial corpus from Wikipedia, developing a language recognizer for 122 languages, implementing a distributed system for crawling and parsing webpages and finally, the reduction of duplicities. A comparative analysis of the texts of Wikipedia and the Internet is provided at the end of this thesis. The analysis is based on basic statistics such as average word and sentence length, conditional entropy and perplexity. 1 Detailed record
	Discourse Relations in Czech Poláková, Lucie ; Hajičová, Eva (advisor) ; Hoffmannová, Jana (referee) ; Pešek, Ondřej (referee) This doctoral thesis is devoted to linguistic analysis of discourse relations as one of the aspects of discourse coherence. Discourse relations are semantic relations holding between propositions in a discourse (discourse arguments). The aim of the thesis is a complex description of discourse relations in Czech and its application in an annotation scheme in the Prague Dependency Treebank. The thesis is divided into three parts: The first one is focused on the theoretical description of discourse relations and on analysis of adequacy of various methodological concepts in corpus processing. The second part describes in detail the proposed scheme for the annotation of discourse relations and the process of the corpus build- up including the evaluation of consistency of the annotated data. Finally, in the last part of the thesis, we address some problematic issues arisen with the employment the proposed scheme and look for their possible solutions. Detailed record
	Non-gradable Adjectives in Latin Hrach, Petr ; Pultrová, Lucie (advisor) ; Muchnová, Dagmar (referee) Detailed record
	Usage of Verbal Adjectives in Contemporary Czech Fílová, Hana ; Lehečková, Eva (advisor) ; Adam, Robert (referee) The aim of this bachelor thesis is to describe and summarize the most characteristic features of verbal adjectives. It is divided into two parts. The first part contains a theoretical description of verbal adjectives. It is based on information included in Czech grammar books, handbooks and research articles. It deals with the description of function, form, classification and use of verbal adjectives in contemporary Czech. This section also contains definitions of some controversial points or inaccuracies of these descriptions regarding verbal adjectives. The second part of the thesis is dedicated to the research of selected features, it is focused on following types of verbal adjectives: -cí, -vší, -ný, -tý used in professional and journalistic style. The controversial points or inaccuracies of the descriptions defined in the first section of the thesis are examined on the material of The Czech National Corpus. It is focused on following features of verbal adjectives: the number and nature of the preserved syntactic arguments, word order position of the adjectives and their dominating noun, and the syntactic function of the adjectives. Detailed record
	Velký mnohojazyčný korpus Majliš, Martin ; Žabokrtský, Zdeněk (advisor) ; Spousta, Miroslav (referee) This thesis introduces the W2C Corpus which contains 97 languages with more than 10 million words for each of these languages, with the total size 10.5 billion words. The corpus was built by crawling the Internet. This work describes the methods and tools used for its construction. The complete process consisted of building an initial corpus from Wikipedia, developing a language recognizer for 122 languages, implementing a distributed system for crawling and parsing webpages and finally, the reduction of duplicities. A comparative analysis of the texts of Wikipedia and the Internet is provided at the end of this thesis. The analysis is based on basic statistics such as average word and sentence length, conditional entropy and perplexity. 1 Detailed record
	Language functions as an object of quantitative linguistics Králík, Jan Three examples from the Czech corpus. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English