National Repository of Grey Literature 4 records found  Search took 0.01 seconds. 
Visual Pattern Detection in Web Pages
Kotraš, Martin ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The work solves the extraction of information from websites using the technique of searching for visual patterns - spatial relations between areas on the website and the same visual styles of these areas - with the extension of new techniques to improve results. It uses a user-specified ontological data model, which describes which data items will be extracted from the specified web page and how the individual items on the page look, mainly from a text point of view. As part of the work, a console application VizGet in Java was created using the FitLayout framework to obtain a visual model of the website. Testing the application on 7 different domains, including a list of the best movies, e-shop products, or weather forecasts, showed that the success rate of the application ranges in about 75 % of subtests above 85 % F-score and in more than 90 % of subtests above 60 % F-score, where 45 % of subtests achieve an F-score of 100 %. The VizGet application can thus be deployed for practical use in non-critical applications, while it is open to further extensions and possibilities for improvement.
Interactive Generator of Syntax of Heterogeneous Data Structures
Kotraš, Martin ; Janoušek, Vladimír (referee) ; Smrčka, Aleš (advisor)
Today, software systems are often composed of several components that transmit data through various communication channels. Despite the fact that there are a number of standardized data encoding formats, developers still create their own mostly with regard to the specific use of the software they create. One of the essential parts of quality verification or minimization of data transmission errors is the validation of input data. The first step to validation is to formalize a language describing data structures. The most general formalism for these purposes is the grammar of the language in the standard description, e.g. BNF, ABNF, or EBNF. However, creating a language-specific grammar can be a step that is sensitive to error for an inexperienced developer. The aim of this project is a simple application for creating grammar from a sample of data. The work solves the generation of grammar and validation code snippets from the sample string of the language, e.g. the source code of the programming language. The user solves the problem by sequentially marking parts of the uploaded string, naming them, and assigning properties to them. This is aided by tools for splitting rules, merging rule prefixes and/or suffixes, creating lists, and optimizing the resulting rules. As part of the work, a single-page web application was created, which was able to pass relatively well when tested on JSON and XML, and it was possible to create a more general grammar despite the problems with a weak parser. Thanks to this work, even less experienced users can create more general grammars of their strings and use them for validation purposes. In addition, the work provides a basis for further research in this area and is open to further improvement.
Visual Pattern Detection in Web Pages
Kotraš, Martin ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The work solves the extraction of information from websites using the technique of searching for visual patterns - spatial relations between areas on the website and the same visual styles of these areas - with the extension of new techniques to improve results. It uses a user-specified ontological data model, which describes which data items will be extracted from the specified web page and how the individual items on the page look, mainly from a text point of view. As part of the work, a console application VizGet in Java was created using the FitLayout framework to obtain a visual model of the website. Testing the application on 7 different domains, including a list of the best movies, e-shop products, or weather forecasts, showed that the success rate of the application ranges in about 75 % of subtests above 85 % F-score and in more than 90 % of subtests above 60 % F-score, where 45 % of subtests achieve an F-score of 100 %. The VizGet application can thus be deployed for practical use in non-critical applications, while it is open to further extensions and possibilities for improvement.
Interactive Generator of Syntax of Heterogeneous Data Structures
Kotraš, Martin ; Janoušek, Vladimír (referee) ; Smrčka, Aleš (advisor)
Today, software systems are often composed of several components that transmit data through various communication channels. Despite the fact that there are a number of standardized data encoding formats, developers still create their own mostly with regard to the specific use of the software they create. One of the essential parts of quality verification or minimization of data transmission errors is the validation of input data. The first step to validation is to formalize a language describing data structures. The most general formalism for these purposes is the grammar of the language in the standard description, e.g. BNF, ABNF, or EBNF. However, creating a language-specific grammar can be a step that is sensitive to error for an inexperienced developer. The aim of this project is a simple application for creating grammar from a sample of data. The work solves the generation of grammar and validation code snippets from the sample string of the language, e.g. the source code of the programming language. The user solves the problem by sequentially marking parts of the uploaded string, naming them, and assigning properties to them. This is aided by tools for splitting rules, merging rule prefixes and/or suffixes, creating lists, and optimizing the resulting rules. As part of the work, a single-page web application was created, which was able to pass relatively well when tested on JSON and XML, and it was possible to create a more general grammar despite the problems with a weak parser. Thanks to this work, even less experienced users can create more general grammars of their strings and use them for validation purposes. In addition, the work provides a basis for further research in this area and is open to further improvement.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.