National Repository of Grey Literature 147 records found  1 - 10nextend  jump to record: Search took 0.00 seconds. 
Information Extraction from Wikipedia
Jurišica, Rudolf ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The goal of this thesis is to reduce the number of unknown referenced entities in Czech Wikipedia articles. This has been achieved by using some existing solutions, created by the KNOT research group at FIT BUT, and then by creating a set of programs. These programs are automatically run every month, when a new version of Wikipedia is released. They will automatically add new names to the knowledge base, generate their derived forms, and edit the articles themselves directly on Wikipedia.
Automatic Additions and Corrections of Wikidata and Wikipedia Based on Information Extraction
Hložek, Matej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This bachelor's thesis is focused on creation of system for automatic extraction of data from articles in English language from internet encyclopedia site Wikipedia. Depending on class given by text classifier, different types of information are extracted from natural language text and from so called infoboxes of individual articles from Wikipedia. Final product of this system is a knowledge base containing all extracted data and classified type. A notable part of this system is an article extractor that extracts infoboxes and first paragraphs of articles from so called wikidump file.
Word Sense Clustering
Haljuk, Petr ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This Bachelor's thesis deals with the semantic similarity of words . It describes the design and the implementation of a system, which searches for the most similar words and measures the semantic similarity of words . The system uses the Word2Vec model from GenSim library . It learns the relations among words from CommonCrawl corpus .
Word Sense Clustering
Bárta, Jakub ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This bachelor's thesis deals with the design and implementation of a modular system focused on semantic similarity. System is able to stem the corpus and to analyze corpus in different ways - through coocurrence matrix or LSA.
Query Answering over Wikipedia for Mobile Devices on the Android Platform
Kováč, Andrej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
p { margin-bottom: 0.1in; direction: ltr; line-height: 120%; text-align: left; widows: 2; orphans: 2; }p.western { font-family: "Times New Roman",serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } This bachelor thesis deals with the development of a system for query answering over Wikipedia for mobile devices running Android operating system. In this technical report theoretical knowledge related to this topic is described as well as the implementation process of a server system and client side application. Part of this thesis is dedicated to testing of the system and in the final part the potential for future development is drafted.
Named Entity Disambiguation in Slovak
Križan, Samuel ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
Thesis deals with the topic of named entity recognition and disambiguation. A basic system was created which includes all prequisitions necessary for named entity disambiguation in Slovak language. Part of the system is building of a knowledge base out of an export from Slovak Wikipedia. This was subsequently compared to knowledge base obtained from Wikidata, which revealed that the main contribution of Wikipedia knowledge base for Slovak language is greater coverage of entities with link to Slovak Wikipedia and better determination of entity classes. Apart from that, morfological dictionary of KNOT@FIT research group was updated, which yielded an improvement by 33-39 %. This work presumes possible utilization in relation to system extention by a disambiguation modul and enhancement of alternative names coverage.
Acquiring Thesauri from Wikipedia
Novák, Ján ; Schmidt, Marek (referee) ; Otrusina, Lubomír (advisor)
This thesis deals with automatic acquiring thesauri from Wikipedia. It describes Wikipedia as a suitable data set for thesauri acquiring and also methods for computing semantic similarity of terms are described. The thesis also contains a description of concepts and implementation of the system for automatic thesauri acquiring. Finally, the implemented system is evaluated by the standard metrics, such as precision or recall.
Wikipedia Page Classification
Suchý, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The goal of this paper is to design and implement a system for selection of Wikipedia articles relevant to a given topic in order to reduce the amount of memory taken by its offline version. The solution of this problem was achieved with use of methods from information retrieval and theirs implementation using Elasticsearch search engine. The system tries to determine the area of user's interest by given keywords and make a selection of articles from that area. This is achieved by measuring of similarity of articles and adding all articles from frequent categories in the selection. The sizes of the output files for queries over Simple English Wikipedia are usually below 30 MB.
Automatic Keyword Suggestion
Šimara, Svatopluk ; Škoda, Petr (referee) ; Otrusina, Lubomír (advisor)
This thesis deals with the automatic keywords suggestion. The suggestion is based only on the statistic methods. For the analysis are used diploma thesis and similar documents. Statistic methods are detailed tested and evaluated by using these documents.  For the final keywords suggestion ale chosen only the most successful methods. In the end, the suggested keywords are compared with the manual assigned keywords.
Semantic Enrichment Component
Doležal, Jan ; Otrusina, Lubomír (referee) ; Dytrych, Jaroslav (advisor)
This master's thesis describes Semantic Enrichment Component (SEC), that searches entities (e.g., persons or places) in the input text document and returns information about them. The goals of this component are to create a single interface for named entity recognition tools, to enable parallel document processing, to save memory while using the knowledge base, and to speed up access to its content. To achieve these goals, the output of the named entity recognition tools in the text was specified, the tool for storing the preprocessed knowledge base into the shared memory was implemented, and the client-server scheme was used to create the component.

National Repository of Grey Literature : 147 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.