Majliš, Martin - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: Majliš, Martin

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Text summarization Majliš, Martin ; Pecina, Pavel (advisor) ; Schlesinger, Pavel (referee) The present work explains the basic principles of automatic summarization, evaluation and fundamental concepts, which are used in this eld. It also includes a description of a system for automatic text summarization and evaluation - CSummaK (Czech Summarization Kit). As part of this system are basic algorithms for creating sentence extract summaries (Cenroid, Lead, Position, Random, Relevance Measure, etc.) their evaluation (Precision, Recall, FMeasure, etc.), whose description is also part of this work. This system was used for production of automatic extracts from news articles. Another system was developed for obtaining reference extracts, which allows users to create on-line extracts from news articles. In this work is also evaluated quality of single algorithms, their combination with of di erent parameters, together with discussion of the possibilities of practical application. Detailed record
	Velký mnohojazyčný korpus Majliš, Martin ; Žabokrtský, Zdeněk (advisor) ; Spousta, Miroslav (referee) This thesis introduces the W2C Corpus which contains 97 languages with more than 10 million words for each of these languages, with the total size 10.5 billion words. The corpus was built by crawling the Internet. This work describes the methods and tools used for its construction. The complete process consisted of building an initial corpus from Wikipedia, developing a language recognizer for 122 languages, implementing a distributed system for crawling and parsing webpages and finally, the reduction of duplicities. A comparative analysis of the texts of Wikipedia and the Internet is provided at the end of this thesis. The analysis is based on basic statistics such as average word and sentence length, conditional entropy and perplexity. 1 Detailed record
	Velký mnohojazyčný korpus Majliš, Martin ; Žabokrtský, Zdeněk (advisor) ; Spousta, Miroslav (referee) This thesis introduces the W2C Corpus which contains 97 languages with more than 10 million words for each of these languages, with the total size 10.5 billion words. The corpus was built by crawling the Internet. This work describes the methods and tools used for its construction. The complete process consisted of building an initial corpus from Wikipedia, developing a language recognizer for 122 languages, implementing a distributed system for crawling and parsing webpages and finally, the reduction of duplicities. A comparative analysis of the texts of Wikipedia and the Internet is provided at the end of this thesis. The analysis is based on basic statistics such as average word and sentence length, conditional entropy and perplexity. 1 Detailed record
	Text summarization Majliš, Martin ; Schlesinger, Pavel (referee) ; Pecina, Pavel (advisor) The present work explains the basic principles of automatic summarization, evaluation and fundamental concepts, which are used in this eld. It also includes a description of a system for automatic text summarization and evaluation - CSummaK (Czech Summarization Kit). As part of this system are basic algorithms for creating sentence extract summaries (Cenroid, Lead, Position, Random, Relevance Measure, etc.) their evaluation (Precision, Recall, FMeasure, etc.), whose description is also part of this work. This system was used for production of automatic extracts from news articles. Another system was developed for obtaining reference extracts, which allows users to create on-line extracts from news articles. In this work is also evaluated quality of single algorithms, their combination with of di erent parameters, together with discussion of the possibilities of practical application. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English