National Repository of Grey Literature 32 records found  previous11 - 20nextend  jump to record: Search took 0.00 seconds. 
Information Extraction from Wikipedia
Valušek, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes. Several approaches with the use of machine learning will be presented. Furthermore, important features like date of birth in articles regarding people, or area in those about lakes, and many more, will be extracted. With the use of the system presented in this thesis, one can generate a well structured knowledge base, using a file with Wikipedia articles (called dump file) and a small training set containing a few well-classed articles. Such knowledge base can then be used for semantic enrichment of text. During this process a file with so called definition words is generated. Definition words are features extracted by natural text analysis, which could be used also in other ways than in this thesis. There is also a component that can determine, which articles were added, deleted or modified in between the creation of two different knowledge bases.
Information Extraction from Wikipedia
Musil, Martin ; Otrusina, Lubomír (referee) ; Schmidt, Marek (advisor)
This bachelor thesis deals with the problem of automatic information extraction from text. Goal is to create an application, which captures knowledge out of the articles from online information server Wikipedia, using extraction patterns. At the beginning, we interpret the basic terms of the subject and the main part of the publication is focused to the experiments and above all to the implementation, divided into two parts, processing of the text and following information extraction. The conclusion of the thesis analyses the results coming from experiments and efficiency of created rules.
Authorship and actorship on Czech Wikipedia
Sedláček, Štěpán ; Abu Ghosh, Yasar (advisor) ; Kuřík, Bohuslav (referee)
The author carried out an ethnographic study of Czech Wikipedia in which he mapped human and non-human actors involved in the creation of an internet encyclopedia. As part of this process, he himself became one of the users and reflected how authorship, collective compiling of meanings, and supervision are constructed.
Creating a Bilingual Dictionary using Wikipedia
Ivanova, Angelina ; Zeman, Daniel (advisor) ; Straňák, Pavel (referee)
Title: Creating a Bilingual Dictionary using Wikipedia Author: Angelina Ivanova Department/Institute: Institute of Formal and Applied Linguistics (32-ÚFAL) Supervisor of the master thesis: RNDr. Daniel Zeman Ph.D. Abstract: Machine-readable dictionaries play important role in the research area of com- putational linguistics. They gained popularity in such fields as machine translation and cross-language information extraction. In this thesis we investigate the quality and content of bilingual English-Russian dictionaries generated from Wikipedia link structure. Wiki-dictionaries differ dramatically from the traditional dictionaries: the re- call of the basic terminology on Mueller's dictionary was 7.42%. Machine translation experiments with Wiki-dictionary incorporated into the training set resulted in the rather small, but statistically significant drop of the the quality of the translation compared to the experiment without Wiki-dictionary. We supposed that the main reason was domain difference between the dictio- nary and the corpus and got some evidence that on the test set collected from Wikipedia articles the model with incorporated dictionary performed better. In this work we show how big the difference between the dictionaries de- veloped from the Wikipedia link structure and the traditional...
Open educational resources in context of environmental studies
Petiška, Eduard ; Papík, Richard (advisor) ; Nečas, Vlastimil (referee) ; Feberová, Jitka (referee)
The dissertation deals with the issue of open educational resources (OER) in the context of environmental studies. The thesis has been framed as a set of four thematically related studies published as peer-reviewed articles in relevant professional journals. The introductory section deals with the broader context of the issue of open educational resources and clarifies the thematic context of individual publications. The first study discusses the issue of open educational resources, the evaluation of their quality and possibilities of use in environmental disciplines; at the same time, it analyzes various types of resources available for environmental education in the Czech environment. Based on this, research was carried out among students of environmental disciplines at Czech universities on a sample of 233 respondents. The results of the research are presented in the second study, which focuses on quality assessment, while the third study deals with the general methods of OER use. Findings show that the most widely used resource among students is Wikipedia, which they also consider to be a relatively high-quality resource. For these reasons, the last study focuses further on Wikipedia and the proposed verifiability of claims by respected sources as a quality indicator for the use of Wikipedia in...
Machine Learning for Natural Language Question Answering
Sasín, Jonáš ; Fajčík, Martin (referee) ; Smrž, Pavel (advisor)
This thesis deals with natural language question answering using Czech Wikipedia. Question answering systems are experiencing growing popularity, but most of them are developed for English. The main purpose of this work is to explore possibilities and datasets available and create such system for Czech. In the thesis I focused on two approaches. One of them uses English model ALBERT and machine translation of passages. The other one utilizes the multilingual BERT. Several variants of the system are compared in this work. Possibilities of relevant passage retrieval are also discussed. Standard evaluation is provided for every variant of the tested system. The best system version has been evaluated on the SQAD v3.0 dataset, reaching 0.44 EM and 0.55 F1 score, which is an excellent result compared to other existing systems. The main contribution of this work is the analysis of existing possibilities and setting a benchmark for further development of better systems for Czech.
Information Retrieval in Czech Wikipedia
Balgar, Marek ; Bartík, Vladimír (referee) ; Chmelař, Petr (advisor)
The main task of this Masters Thesis is to understand questions of information retrieval and text classifi cation. The main research is focused on the text data, the semantic dictionaries and especially the knowledges inferred from the Wikipedia. In this thesis is also described implementation of the querying system, which is based on achieved knowledges. Finally properties and possible improvements of the system are talked over.
Open educational resources in context of environmental studies
Petiška, Eduard ; Papík, Richard (advisor) ; Nečas, Vlastimil (referee) ; Feberová, Jitka (referee)
The dissertation deals with the issue of open educational resources (OER) in the context of environmental studies. The thesis has been framed as a set of four thematically related studies published as peer-reviewed articles in relevant professional journals. The introductory section deals with the broader context of the issue of open educational resources and clarifies the thematic context of individual publications. The first study discusses the issue of open educational resources, the evaluation of their quality and possibilities of use in environmental disciplines; at the same time, it analyzes various types of resources available for environmental education in the Czech environment. Based on this, research was carried out among students of environmental disciplines at Czech universities on a sample of 233 respondents. The results of the research are presented in the second study, which focuses on quality assessment, while the third study deals with the general methods of OER use. Findings show that the most widely used resource among students is Wikipedia, which they also consider to be a relatively high-quality resource. For these reasons, the last study focuses further on Wikipedia and the proposed verifiability of claims by respected sources as a quality indicator for the use of Wikipedia in...
Information Extraction from Wikipedia
Valušek, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes. Several approaches with the use of machine learning will be presented. Furthermore, important features like date of birth in articles regarding people, or area in those about lakes, and many more, will be extracted. With the use of the system presented in this thesis, one can generate a well structured knowledge base, using a file with Wikipedia articles (called dump file) and a small training set containing a few well-classed articles. Such knowledge base can then be used for semantic enrichment of text. During this process a file with so called definition words is generated. Definition words are features extracted by natural text analysis, which could be used also in other ways than in this thesis. There is also a component that can determine, which articles were added, deleted or modified in between the creation of two different knowledge bases.
Identifying Entity Types and Attributes Across Languages
Švub, Daniel ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The target of this thesis is to analyze articles on the Wikipedia internet encyclopedia and to convert their text written in natural language into a structured database of persons, places and other entities. The essence of the implemented program is the determination of the type of entity based on its typical characteristics, and the extraction of the most important attributes of this entity in the Czech and Slovak languages. The result of this task is a knowledge base allowing simple searching and sorting of information. Thanks to its easy extensibility, it is possible to add identification of other types of entities and other features to the program, as well as a support of other languages.

National Repository of Grey Literature : 32 records found   previous11 - 20nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.