National Repository of Grey Literature 24 records found  previous11 - 20next  jump to record: Search took 0.03 seconds. 
Data Extraction from PDF Documents
Bartošák, Michal ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The work focuses on extracting information from medical records saved in PDF format, which were created by heart pacemakers during regular patient monitoring in the hospital. The result of this work is a desktop application written in Java that retrieves and analyzes data from records using PDFBox and pdf2dom libraries. The output of the application is a CSV file, which represents the acquired values in table form, as well as extracted images that are saved to a user-defined output folder. Application testing on records from three different companies proved that record extraction is highly reliable (with overall precision and recall metrics reaching almost 100 % in every test), provided that the application arguments are correctly set.
Automated Processing of PDF Document Contents
Gajdošík, Štefan ; Rychlý, Marek (referee) ; Burget, Radek (advisor)
This bachelor thesis deals with the extraction of data generated by pacemakers. The main content of this thesis is to introduce the PDF document format, tools for working with PDF documents, description of techniques for data extraction, and implementation of an application for automatic data extraction in Python programming language.
Extension of Apache Tika with Industrial File Formats Text Extraction
Rešetár, René ; Burget, Radek (referee) ; Rychlý, Marek (advisor)
The goal of the bachelor's thesis was to extend the parsers of the Apache Tika project with data and table extraction from industrial document formats from laboratory instruments. These data will be stored in a structured format according to a certain scheme. In the theoretical part, the supplied industrial formats, the Apache Tika project and the possibilities of its expansion were examined. In the practical part, a tool was designed and implemented, which classifies documents using the Apache Tika project, processes them, creates structured data from them in the JSON format and subsequently validates them. Finally, a set of tests was created to verify and demonstrate the properties of the solution.
Methods of Data Extraction from the Web
Perina, Lukáš ; Křivka, Zbyněk (referee) ; Burget, Radek (advisor)
The purpose of this bachelor thesis is to design an architecture and subsequent implementation of an application designed for data extraction (web scraping) from web documents. Unlike conventional methods, it is an extraction based on defining data types and regular expressions of requested elements. Extraction is executed in such a manner, where it is not necessary to know the detailed structure of given web document and the possibility of using just one definition to detect requested elements on different web pages. Algorithm is able to achieve overall accuracy of 85,51% and recall 80,28%. This approach can reduce the time required for analysis of web pages significantly and not to take the structure of the code as a determining factor while creating web scraping requests.
Intelligent Data Scraping in a Web Browser
Maštera, František ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The goal of this thesis is to extract data from web pages without the knowledge of their internal structure. The point is to recognize the structure using an algorithm and a given input information about the content that the user wants to extract. The structure analysis is then followed by the content extraction itself. An average success rate of over 80% was achieved on selected sets of websites. The resulting algorithm represents a new approach to data extraction and can be deployed in the real world or can be a part of further development.
Portal for Aggregation of Data from Web Sources
Mikita, Tibor ; Křivka, Zbyněk (referee) ; Burget, Radek (advisor)
This thesis deals with data extraction and data aggregation from heterogeneous web sources. The goal is to create a platform and a functional web application using appropriate technologies. The main focus of the thesis is on the application design and implementation. The application domain is accommodation or lease of apartments. For the data extraction, we use the portal API or a wrapper. Obtained data is stored in a document database. In this thesis, we managed to design and implement a system that allows to obtain rental ads from multiple web sources at the same time and to present them in a uniform way.
Environment for analyzing suspicious device
Procházka, Jan ; Martinásek, Zdeněk (referee) ; Malina, Lukáš (advisor)
This bachelor thesis focuses on a design of enviroment for analysis of a suspicious device. Such device may be for example a disc contaminated by malicious code or a mobile device. The aim of this work is to design an efficient and simple solution using open source products. The final designed environment should be capable of performing both surface and in-depth data analysis. The theoretical part offers an information related to the scope of addressed problem and includes terms such as Sandbox, Malware, Android. These are described from the point of view of understanding the analysis of malware occurring predominantly on mobile devices. The practical part describes the used hardware and software for the design of the environment and it contains examples of analyzes of the external devices contaminated by a malcode. These examples are mainly for Android mobile devices.
Relationship between Changes in Betting Odds and Results of Football Matches
Jurkovič, Juraj ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor)
The goal of this thesis is to demonstrate techniques for solving web scraping and knowledge discovery tasks. The case study is focused on the extraction of data from bookmaker websites and subsequent analysis of collected data. The thesis demonstrates the implementation of web scraping task in Python language. The thesis describes selected implementation details for developing such a system and proposes a database schema that can be used for this purpose. Collected data is analyzed using statistical methods and frequent patterns are discovered in odds movements using apriori algorithm. Discovered relationships and frequent patterns are presented to the end user.
Analýza spolehlivosti forenzních nástrojů pro zkoumání malé digitální techniky
PĚSTOVÁ, Karolína
This bachelor thesis deals with forensic tools for investigating small digital devices. In theoretical part are described principles in digital forensic analysis and approaches for examining mobile phones. In practical part are analysed selected mobile phones with tools for investigating small digital devices. In this part are evaluated results and proposed a solution for acquiring the most relevant data.
Sentiment Analysis in Automotive Industry
Bezák, Adam ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The main theme of this thesis is to familiarize with the basic methods of sentiment analysis on social networks. Thesis’s theme is aimed on the automotive industry, although this prinicipal can be used in any different examined branch. The basis of the practical part is to obtain data from the social networks, analyze them and then index them into ElasticSearch database. Another goal of the thesis is to visualize these data by means of a web portal. Created web portal provides various statistics of the leading automobile brands, an overview of new trends or the aspect visualization of the individual cars.

National Repository of Grey Literature : 24 records found   previous11 - 20next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.