National Repository of Grey Literature 21 records found  previous11 - 20next  jump to record: Search took 0.02 seconds. 
Extension of Apache Tika with Industrial File Formats Text Extraction
Rešetár, René ; Burget, Radek (referee) ; Rychlý, Marek (advisor)
The goal of the bachelor's thesis was to extend the parsers of the Apache Tika project with data and table extraction from industrial document formats from laboratory instruments. These data will be stored in a structured format according to a certain scheme. In the theoretical part, the supplied industrial formats, the Apache Tika project and the possibilities of its expansion were examined. In the practical part, a tool was designed and implemented, which classifies documents using the Apache Tika project, processes them, creates structured data from them in the JSON format and subsequently validates them. Finally, a set of tests was created to verify and demonstrate the properties of the solution.
Automated Detection of Relations in Data Structures
Nováček, Pavel ; Fiedor, Tomáš (referee) ; Smrčka, Aleš (advisor)
This thesis deals with automated knowledge acquisition from structured data, precisely it includes detections of relations of data types in tree-structured data. The thesis is a part of Testos platform, which aims at software testing automation. The goal was to design and implement a solution that would automatically plan and execute detections over samples of real data structures. Detections would be handled by external modules called detectors that would cooperate with the solution. The final tool is a service which implements a algorithm for communicating with detectors via well-defined protocol, sending them requests in parallel to perform detections and handling results of detections. The service can be managed and tasked via created HTTP API. The results of detections, i.e. meanings and relations of input data, are used by other tools of Testos platform for the purpose of generating new test data with structure corresponding to input samples of real data.
Interactive Visualization of XML
Kubíček, Daniel ; Stryka, Lukáš (referee) ; Chmelař, Petr (advisor)
This bachelor's thesis is engaged in problem of visualization XML. In the first part, there is described XML language in brief and his basic characterization. There is resumed the possibility of XML usage, his advantage, XML syntax and logical structure. Bachelor's thesis also describes the problematic of reading XML in Java. The thesis is engaged in problems of XML visualization. There are summarized XML properties, advantages and disadvantages of available visualization tools in short. The Implementation chapter describes choosing of suitable programming language and development system. It briefly defines available modeling tools with individual reasons why they couldn't be used. The developed application contains two types of displaying XML elements, the basic type and the aggregation type. The program output is supposed to be practical and usable visualization tool, although many extensions can be made.
Automated Detection of Types in Data Structures
Oháňka, Martin ; Hruška, Martin (referee) ; Smrčka, Aleš (advisor)
This bachelor's thesis deals with data structure synthesis for software testing. In particular, the thesis focuses on analysis of real data in order to detect data types for further test data generation. Data analysis is performed in two layers: a control system for scheduling and invoking partial detections, and a set of data detectors. The thesis deals with analysis and implementation of tool consisting of set of data type detectors over tree structured data like JSON, YAML, or XML. The goal of the detectors is to determine a semantics of values of analysed structure and dependencies between data. The set can be easily expanded as needed, to detect even more complicated meanings and dependencies. The results of these analysis can be used to generate new test data for software testing.
Generating Structured Test Data
Olšák, Ondřej ; Holík, Lukáš (referee) ; Smrčka, Aleš (advisor)
The goal of the bachelor's thesis is to create a tool for generating files with structure data content. The purpose of these files is to be used as test data conforming to testing of program input space. This thesis focuses on tree-structured data. The tool integrates tools implemented previously within Testos framework for generating test data in order to satisfy user-defined coverage criterion. The tool is able to generate a set of files in JSON or XML format containing test data satisfying ECC, BCC, or PWC coverage criterion.
Detectors of Structured Data for Generating Test Data
Znojil, Ondřej ; Turoňová, Lenka (referee) ; Smrčka, Aleš (advisor)
This works is focused on design and implementation of utility for analysis of structured data of often used formats as JSON or XML. Utility is one of many components of Testos platform, which is set of testing utilities, communicating with each other. Primary goal of this thesis is creation of analysator for the prupose of testing. Utility is used to aggregate input data, determine occurence of each data entity and abstraction of its scalar values.
Question Answering over Structured Data
Birger, Mark ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
Tato práce se zabývá problematikou odpovídání na otázky nad strukturovanými daty. Ve většině případů jsou strukturovaná data reprezentována pomocí propojených grafů, avšak ukrytí koncové struktury dát je podstatné pro využití podobných systémů jako součástí rozhraní s přirozeným jazykem. Odpovídající systém byl navržen a vyvíjen v rámci této práce. V porovnání s tradičními odpovídajícími systémy, které jsou založené na lingvistické analýze nebo statistických metodách, náš systém zkoumá poskytnutý graf a ve výsledků generuje sémantické vazby na základě vstupních párů otázka-odpověd'. Vyvíjený systém je nezávislý na struktuře dát, ale pro účely vyhodnocení jsme využili soubor dát z Wikidata a DBpedia. Kvalita výsledného systému a zkoumaného přístupu byla vyhodnocena s využitím připraveného datasetu a standartních metrik.
Design and Implementation of System for Aggregations of Real Estate Offers in the Czech Republic
Drobník, Jakub ; Kučera, Jan (advisor) ; Chlapek, Dušan (referee)
The diploma thesis deals with the design and implementation of software for aggregations of real estate offers in the Czech Republic. The aim of the thesis is to create a system which aggregates the data of real estate offers from web pages. This thesis consists of two basic parts. The context of creating the system is described in the first part. The author discusses ways to retrieve data from websites - especially the extraction of data using automated robots - in the first part of the thesis. The design and implementation of the system are described in the second part. The author and sponsor define requirements for the system in the second part of the thesis. The outcome of this thesis is a prototype that aggregates data from real estate portals into the prepared database. The main contribution of the thesis is an example of a possible approach that can aggregate data from a particular market segment and put it into the database.
Application of text mining methods for analysis of users movie reviews
Palatínus, Vojtěch ; Matějka, Martin (advisor) ; Novotný, Ota (referee)
The topic of this thesis is to define the challenges while working with the unstructured data. It focuses, specifically, on a transformation between unstructured and structured data using text mining methods and bringing the closer view on so-called Big Data phenomenon. The goal of this thesis is to introduce problems that occur when working with unstructured data, to show their transformation to structured data format using text mining methods and to perform analysis on user reviews published on the website of The Internet Movie Database from the mined data. The aim of this thesis is to familiarize the reader with the unstructured data and on the example demonstrate how to use text mining methods for mining relevant information from this type of data.
ScraperWiki Tutorial
Levine, Thomas
The objective of the workshop, or better hackathon, was to get the data into a structured format, and join it with data from another sources – together with an overview and showing by example what is possible with scraping. Thomas identified targets for web scraping and navigating the complexity of different types of web pages and introduced that in a few half-hour-long and hour-long modules that catered to different audiences.
Slides: Download fulltextPDF

National Repository of Grey Literature : 21 records found   previous11 - 20next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.