keywords:"data extraction" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"data extraction"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	System for Recognizing Disinformation in Web Environment Večerka, Lukáš ; Žádník, Martin (referee) ; Strnadel, Josef (advisor) This work deals with the design, implementation, and verification of a system for automatic recognition of disinformation on the web. It addresses the issue of disinformation spread in the online environment and its impact on society. It focuses on training several Czech transformer language models for disinformation recognition and further automatic extraction of content from Czech online newspapers and their analysis using text classification and natural language processing through deep learning methods. The results of these analyses are then presented in a web user interface with the aim of providing a platform for verifying articles, authors, and sources. The interface could be used for data annotation by experts for continuous improvement of language models. Detailed record
	Development of YARA-X ecosystem Ďuriš, Tomáš ; Křivka, Zbyněk (referee) ; Regéciová, Dominika (advisor) The aim of this work is to extend and create an unified ecosystem of tools for the YARA language. The focus is on incorporating modules that can gather information about the structure of executable files. Additionally, a module that can present obtained information to the user in multiple formats is also being proposed. An interactive environment has been created for evaluating YARA rules and enhancing the overall ecosystem by using an error-tolerant parsing algorithm. The proposed solution enables the seamless integration and utilization of existing tools while addressing the limitations of the original YARA ecosystem. The output of the work is an extended system with tools that facilitate the debugging of YARA rules, obtaining information from executable files, and visualizing them. The final solution has been thoroughly tested, utilized by analysts, and integrated into main YARA-X branch. Detailed record
	Automatic Additions and Corrections of Wikidata and Wikipedia Based on Information Extraction Hložek, Matej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor) This bachelor's thesis is focused on creation of system for automatic extraction of data from articles in English language from internet encyclopedia site Wikipedia. Depending on class given by text classifier, different types of information are extracted from natural language text and from so called infoboxes of individual articles from Wikipedia. Final product of this system is a knowledge base containing all extracted data and classified type. A notable part of this system is an article extractor that extracts infoboxes and first paragraphs of articles from so called wikidump file. Detailed record
	Detection of Nudity in an Image Data Pešková, Daniela ; Orság, Filip (referee) ; Goldmann, Tomáš (advisor) Zameranie tejto práce je vytvorenie nástroja schopného detekovať nahotu v obrazových dátach. To je dosiahnuté natrénovaním modelu na detekciu inkriminovaných častí tela a vytvorením algoritmu schopného detekovať pokožku. Výsledné nástroje môžu byť použité pre automatickú detekciu nahoty v obrázkoch. Prvá časť práce sa zameriava na teóriu neurónových sietí a počítačového videnia so zameraním na detekciu pokožky. Druhá časť hovorí o prístupe zvolenom pre vytvorenie datasetu, procese tvorby a trénovania modelu schopného detekovať nahotu v obraze, ako aj o algoritmickom prístupe. Detailed record
	Intelligent Data Scraping in a Web Browser Maštera, František ; Bartík, Vladimír (referee) ; Burget, Radek (advisor) The goal of this thesis is to extract data from web pages without the knowledge of their internal structure. The point is to recognize the structure using an algorithm and a given input information about the content that the user wants to extract. The structure analysis is then followed by the content extraction itself. An average success rate of over 80% was achieved on selected sets of websites. The resulting algorithm represents a new approach to data extraction and can be deployed in the real world or can be a part of further development. Detailed record
	Extension of Apache Tika with Industrial File Formats Text Extraction Rešetár, René ; Burget, Radek (referee) ; Rychlý, Marek (advisor) The goal of the bachelor's thesis was to extend the parsers of the Apache Tika project with data and table extraction from industrial document formats from laboratory instruments. These data will be stored in a structured format according to a certain scheme. In the theoretical part, the supplied industrial formats, the Apache Tika project and the possibilities of its expansion were examined. In the practical part, a tool was designed and implemented, which classifies documents using the Apache Tika project, processes them, creates structured data from them in the JSON format and subsequently validates them. Finally, a set of tests was created to verify and demonstrate the properties of the solution. Detailed record
	Image object detection using template Novák, Pavel ; Mašek, Jan (referee) ; Burget, Radim (advisor) This Thesis is focused to Image Object Detection using Template. Main Benefit of this Work is a new Method for sympthoms extraction from Histogram of Oriented Gradients using set of Comparators. In this used Work Methods of Image comparing and Sympthoms extraction are described. Main Part is given to Histogram of Oriented Gradients Method. We came out from this Method. In this Work is used small training Data Set (100 pcs.) verified by X-Validation, followed by tests on real Sceneries. Achieved success Rate using X-Validation is 98%. for SVM Algorithm. Detailed record
	Sentiment Analysis in Automotive Industry Bezák, Adam ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor) The main theme of this thesis is to familiarize with the basic methods of sentiment analysis on social networks. Thesis’s theme is aimed on the automotive industry, although this prinicipal can be used in any different examined branch. The basis of the practical part is to obtain data from the social networks, analyze them and then index them into ElasticSearch database. Another goal of the thesis is to visualize these data by means of a web portal. Created web portal provides various statistics of the leading automobile brands, an overview of new trends or the aspect visualization of the individual cars. Detailed record
	Relationship between Changes in Betting Odds and Results of Football Matches Jurkovič, Juraj ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor) The goal of this thesis is to demonstrate techniques for solving web scraping and knowledge discovery tasks. The case study is focused on the extraction of data from bookmaker websites and subsequent analysis of collected data. The thesis demonstrates the implementation of web scraping task in Python language. The thesis describes selected implementation details for developing such a system and proposes a database schema that can be used for this purpose. Collected data is analyzed using statistical methods and frequent patterns are discovered in odds movements using apriori algorithm. Discovered relationships and frequent patterns are presented to the end user. Detailed record
	Methods of Data Extraction from the Web Perina, Lukáš ; Křivka, Zbyněk (referee) ; Burget, Radek (advisor) The purpose of this bachelor thesis is to design an architecture and subsequent implementation of an application designed for data extraction (web scraping) from web documents. Unlike conventional methods, it is an extraction based on defining data types and regular expressions of requested elements. Extraction is executed in such a manner, where it is not necessary to know the detailed structure of given web document and the possibility of using just one definition to detect requested elements on different web pages. Algorithm is able to achieve overall accuracy of 85,51% and recall 80,28%. This approach can reduce the time required for analysis of web pages significantly and not to take the structure of the code as a determining factor while creating web scraping requests. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English