keywords:"extrakce informací" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"extrakce informací"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Rozpoznání pojmenovaných entit v textu Süss, Martin This thesis deals with the named entity recognition (NER) in text. It is realized by machine learning techniques. Recently, techniques for creating word embeddings models have been introduced. These word vectors can encode many useful relationships between words in text data, such as their syntactic or semantic similarity. Modern NER systems use these vector features for improving their quality. However, only few of them investigate in greater detail how much these vectors have impact on recognition and whether they can be optimized for even greater recognition quality. This thesis examines various factors that may affect the quality of word embeddings, and thus the resulting quality of the NER system. A series of experiments have been performed, which examine these factors, such as corpus quality and size, vector dimensions, text preprocessing techniques, and various algorithms (Word2Vec, GloVe and FastText) and their parameters. Their results bring useful findings that can be used within creation of word vectors and thus indirectly increase the resulting quality of NER systems. Detailed record
	System for Web Data Source Integration Kolečkář, David ; Bartík, Vladimír (referee) ; Burget, Radek (advisor) The thesis aims at designing and implementing a web application that will be used for the integration of web data sources. For data integration, a method using domain model of the target information system was applied. The work describes individual methods used for extracting information from web pages. The text describes the process of designing the architecture of the system including a description of the chosen technologies and tools. The main part of the work is implementation and testing the final web application that is written in Java and Angular framework. The outcome of the work is a web application that will allow its users to define web data sources and save data in the target database. Detailed record
	Extraction of Semantic Relations from Text Pospíšil, Milan ; Schmidt, Marek (referee) ; Smrž, Pavel (advisor) Today exists many semi-structured documents, whitch we want convert to structured form. Goal of this work is create a system, that make this task more automatized. That could be difficult problem, because most of these documents are not generated by computer, so system have to tolerate differences. We also need some semantic understanding, thats why we choose only domain of meeting minutes documents. Detailed record
	Detecting semantic relations in texts and their integration with external data resources Kríž, Vincent ; Vidová Hladká, Barbora (advisor) ; Harašta, Jakub (referee) ; Pecina, Pavel (referee) We present a strategy to automate the extraction of semantic relations from texts. Both machine learning and rule-based techniques are investigated and the impact of different linguistic knowledge is analyzed for the various approaches. To implement the extraction system RExtractor, several natural language processing tools have been improved: from sentence splitting and tokenization modules to dependency syntax parsers. Furthermore, we created the Czech Legal Text Treebank with several layers of linguistic annotation, which is used to train and test each stage of the proposed system. As a result of the performed work, new Semantic Web resources and tools are available for automatic processing of texts. Detailed record
	Detecting semantic relations in texts and their integration with external data resources Kríž, Vincent ; Vidová Hladká, Barbora (advisor) We present a strategy to automate the extraction of semantic relations from texts. Both machine learning and rule-based techniques are investigated and the impact of different linguistic knowledge is analyzed for the various approaches. To implement the extraction system RExtractor, several natural language processing tools have been improved: from sentence splitting and tokenization modules to dependency syntax parsers. Furthermore, we created the Czech Legal Text Treebank with several layers of linguistic annotation, which is used to train and test each stage of the proposed system. As a result of the performed work, new Semantic Web resources and tools are available for automatic processing of texts. Detailed record
	Identifying Entity Types and Attributes Across Languages Švub, Daniel ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor) The target of this thesis is to analyze articles on the Wikipedia internet encyclopedia and to convert their text written in natural language into a structured database of persons, places and other entities. The essence of the implemented program is the determination of the type of entity based on its typical characteristics, and the extraction of the most important attributes of this entity in the Czech and Slovak languages. The result of this task is a knowledge base allowing simple searching and sorting of information. Thanks to its easy extensibility, it is possible to add identification of other types of entities and other features to the program, as well as a support of other languages. Detailed record
	Extracting Information from Medical Texts Zvára, Karel ; Svátek, Vojtěch (advisor) ; Veselý, Arnošt (referee) ; Skalská, Hana (referee) The aim of my work was to find out the specific features of Czech medical reports in terms of the possibility of extracting specific information from them. For my work, I had a total of 268 anonymized narrative medical reports from two outpatient departments. I have studied standards for preserving electronic health records and for transferring clinical information between healthcare information systems. I have also participated in the process of implementing electronic medical record in the field of dentistry. First of all, I tried to process narrative medical reports using natural language processing (NLP) tools. I came to the conclusion that narrative medical reports in the Czech language are very different than a typical Czech text, especially because it mostly contains short telegraphic phrases and the texts lack typical Czech sentence structure. It also contains many misspellings, acronyms and abbreviations. Another problem was the absence of existence of the Czech translation of the main international classification systems. Therefore I decided to continue the research by developing the method for pro-processing the input text for translation and its semantic annotation. The main objective of this part of the research was to propose a method and support software for interactive correction... Detailed record
	Web User Interface for a Information Extraction Tool Pokorný, Jan ; Křivka, Zbyněk (referee) ; Burget, Radek (advisor) In this work you can read about the design and implementation of the JavaScript application, which serves as a user interface for the data extraction tool. The application offers an environment in which the user manages extraction tasks. Tasks are created using interactive graphs. This functionality is achieved through the current modern trends in JavaScript applications that are described in the work. In particular, it is a React library and Redux state manager. Detailed record
	Web Page Segmentation Algorithms Based on Clustering Lengál, Tomáš ; Bartík, Vladimír (referee) ; Burget, Radek (advisor) This report deals with segmentation of web pages, which is important discipline of information extraction. In the first part, we describe several general ways to implement it. After that we introduce method Box Clustering Segmentation, which comes with a slightly different approach towards segmentation. In the second half, we describe implementation of this method as a part of framework FITLayout and final testing. Detailed record
	Processing of Czech court decisions Maslowski, Bohdan ; Vidová Hladká, Barbora (advisor) ; Nečaský, Martin (referee) Title: Processing of Czech court decisions Author: Bohdan Maslowski Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Barbora Vidová Hladká, Ph.D. Abstract: The objective of this thesis is a comparison of various language processing methods of Czech case-law documents. In particular, the tasks of extraction of information about parties (names, roles, addresses, etc.) and document classification by two criteria, subject and result have been solved. Machine learning methods are evaluated and compared to rule-based approach. For the purpose of training and evaluation of classifiers, a corpus of 400 Czech case-law documents has been created and manually annotated. The thesis includes a web application used for demonstration of the results of different approaches and a tool for running and evaluation of testing scenarios. Keywords: natural language processing, information extraction, legislative domain, machine learning, rule-based systems Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English