National Repository of Grey Literature 60 records found  beginprevious29 - 38nextend  jump to record: Search took 0.01 seconds. 
Rozpoznání pojmenovaných entit v textu
Süss, Martin
This thesis deals with the named entity recognition (NER) in text. It is realized by machine learning techniques. Recently, techniques for creating word embeddings models have been introduced. These word vectors can encode many useful relationships between words in text data, such as their syntactic or semantic similarity. Modern NER systems use these vector features for improving their quality. However, only few of them investigate in greater detail how much these vectors have impact on recognition and whether they can be optimized for even greater recognition quality. This thesis examines various factors that may affect the quality of word embeddings, and thus the resulting quality of the NER system. A series of experiments have been performed, which examine these factors, such as corpus quality and size, vector dimensions, text preprocessing techniques, and various algorithms (Word2Vec, GloVe and FastText) and their parameters. Their results bring useful findings that can be used within creation of word vectors and thus indirectly increase the resulting quality of NER systems.
System for Web Data Source Integration
Kolečkář, David ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The thesis aims at designing and implementing a web application that will be used for the integration of web data sources. For data integration, a method using domain model of the target information system was applied. The work describes individual methods used for extracting information from web pages. The text describes the process of designing the architecture of the system including a description of the chosen technologies and tools. The main part of the work is implementation and testing the final web application that is written in Java and Angular framework. The outcome of the work is a web application that will allow its users to define web data sources and save data in the target database.
Extraction of Semantic Relations from Text
Pospíšil, Milan ; Schmidt, Marek (referee) ; Smrž, Pavel (advisor)
Today exists many semi-structured documents, whitch we want convert to structured form. Goal of this work is create a system, that make this task more automatized. That could be difficult problem, because most of these documents are not generated by computer, so system have to tolerate differences. We also need some semantic understanding, thats why we choose only domain of meeting minutes documents.
Detecting semantic relations in texts and their integration with external data resources
Kríž, Vincent ; Vidová Hladká, Barbora (advisor) ; Harašta, Jakub (referee) ; Pecina, Pavel (referee)
We present a strategy to automate the extraction of semantic relations from texts. Both machine learning and rule-based techniques are investigated and the impact of different linguistic knowledge is analyzed for the various approaches. To implement the extraction system RExtractor, several natural language processing tools have been improved: from sentence splitting and tokenization modules to dependency syntax parsers. Furthermore, we created the Czech Legal Text Treebank with several layers of linguistic annotation, which is used to train and test each stage of the proposed system. As a result of the performed work, new Semantic Web resources and tools are available for automatic processing of texts.
Detecting semantic relations in texts and their integration with external data resources
Kríž, Vincent ; Vidová Hladká, Barbora (advisor)
We present a strategy to automate the extraction of semantic relations from texts. Both machine learning and rule-based techniques are investigated and the impact of different linguistic knowledge is analyzed for the various approaches. To implement the extraction system RExtractor, several natural language processing tools have been improved: from sentence splitting and tokenization modules to dependency syntax parsers. Furthermore, we created the Czech Legal Text Treebank with several layers of linguistic annotation, which is used to train and test each stage of the proposed system. As a result of the performed work, new Semantic Web resources and tools are available for automatic processing of texts.
Identifying Entity Types and Attributes Across Languages
Švub, Daniel ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The target of this thesis is to analyze articles on the Wikipedia internet encyclopedia and to convert their text written in natural language into a structured database of persons, places and other entities. The essence of the implemented program is the determination of the type of entity based on its typical characteristics, and the extraction of the most important attributes of this entity in the Czech and Slovak languages. The result of this task is a knowledge base allowing simple searching and sorting of information. Thanks to its easy extensibility, it is possible to add identification of other types of entities and other features to the program, as well as a support of other languages.
Extracting Information from Medical Texts
Zvára, Karel ; Svátek, Vojtěch (advisor) ; Veselý, Arnošt (referee) ; Skalská, Hana (referee)
The aim of my work was to find out the specific features of Czech medical reports in terms of the possibility of extracting specific information from them. For my work, I had a total of 268 anonymized narrative medical reports from two outpatient departments. I have studied standards for preserving electronic health records and for transferring clinical information between healthcare information systems. I have also participated in the process of implementing electronic medical record in the field of dentistry. First of all, I tried to process narrative medical reports using natural language processing (NLP) tools. I came to the conclusion that narrative medical reports in the Czech language are very different than a typical Czech text, especially because it mostly contains short telegraphic phrases and the texts lack typical Czech sentence structure. It also contains many misspellings, acronyms and abbreviations. Another problem was the absence of existence of the Czech translation of the main international classification systems. Therefore I decided to continue the research by developing the method for pro-processing the input text for translation and its semantic annotation. The main objective of this part of the research was to propose a method and support software for interactive correction...
Web User Interface for a Information Extraction Tool
Pokorný, Jan ; Křivka, Zbyněk (referee) ; Burget, Radek (advisor)
In this work you can read about the design and implementation of the JavaScript application, which serves as a user interface for the data extraction tool. The application offers an environment in which the user manages extraction tasks. Tasks are created using interactive graphs. This functionality is achieved through the current modern trends in JavaScript applications that are described in the work. In particular, it is a React library and Redux state manager.
Web Page Segmentation Algorithms Based on Clustering
Lengál, Tomáš ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
This report deals with segmentation of web pages, which is important discipline of information extraction. In the first part, we describe several general ways to implement it. After that we introduce method Box Clustering Segmentation, which comes with a slightly different approach towards segmentation. In the second half, we describe implementation of this method as a part of framework FITLayout and final testing.
Processing of Czech court decisions
Maslowski, Bohdan ; Vidová Hladká, Barbora (advisor) ; Nečaský, Martin (referee)
Title: Processing of Czech court decisions Author: Bohdan Maslowski Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Barbora Vidová Hladká, Ph.D. Abstract: The objective of this thesis is a comparison of various language processing methods of Czech case-law documents. In particular, the tasks of extraction of information about parties (names, roles, addresses, etc.) and document classification by two criteria, subject and result have been solved. Machine learning methods are evaluated and compared to rule-based approach. For the purpose of training and evaluation of classifiers, a corpus of 400 Czech case-law documents has been created and manually annotated. The thesis includes a web application used for demonstration of the results of different approaches and a tool for running and evaluation of testing scenarios. Keywords: natural language processing, information extraction, legislative domain, machine learning, rule-based systems

National Repository of Grey Literature : 60 records found   beginprevious29 - 38nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.