National Repository of Grey Literature 19 records found  1 - 10next  jump to record: Search took 0.01 seconds. 
Intelligent Data Scraping in a Web Browser
Maštera, František ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The goal of this thesis is to extract data from web pages without the knowledge of their internal structure. The point is to recognize the structure using an algorithm and a given input information about the content that the user wants to extract. The structure analysis is then followed by the content extraction itself. An average success rate of over 80% was achieved on selected sets of websites. The resulting algorithm represents a new approach to data extraction and can be deployed in the real world or can be a part of further development.
Web page segmentation utilizing clustering techniques
Zelený, Jan ; Šimko, Marián (referee) ; Kliegr, Tomáš (referee) ; Zendulka, Jaroslav (advisor)
Získávání informací a jiné techniky dolování dat z webových stránek získávají na důležitosti s tím, jak se rozvíjí webové technologie a jak roste množství informací uložených na webu, jakožto jediném nosiči těchto informací. Spolu s tímto množství informací také ale roste množství obsahu, který není v kontextu prezentovaných informací ničím důležitý. To je jedním z důvodů, proč je důležité se intenzivně věnovat předzpracování informací uložených na webu. Segmentační algoritmy jsou jedním z možných způsobů předzpracování. Tato práce se věnuje využití shlukovacích technik pro zefektivnění existujících, ale i nalezení zcela nových algoritmů použitelných pro segmentaci webových stránek.
A Tool for Recognition and Verification of Spedition Orders
Kalivoda, Vojtěch ; Hradiš, Michal (referee) ; Herout, Adam (advisor)
The aim of this work is to design and implement a web tool that will facilitate the work of dispatchers of forwarding and transport companies through automated recognition of important information in orders. Thanks to the recognition, not all information has to be manually rewritten by dispatchers, which saves time. Order recognition is based on finding entities in a document, representing its surroundings with vectors using word2vec models and subsequent classification using convolutional neural networks. The tool can recognize 20 types of information in real time with an average success rate of 72.35~\%. As part of the work, a dataset of almost 1~700 orders was collected and 141 of them were annotated. Part of the work is a web application that serves as an interface for the tool and data collection.
Optimization of processing of documents in library of Bohuslav Martinu Institute
Váchová, Veronika ; Stöcklová, Anna (advisor) ; Kolínová, Pavlína (referee)
This thesis describes the library of the Bohuslav Martinu Institute. First, there is described history and a current activities of the Institute. The main topic of this work is to decompose the library system of the local library and the information center. There are described static elements (the library collection in detail, catalogues and the library equipment) and dynamic elements (aquisition, organisation and services). The practical part contains a description of current state of cataloguing of documents and a proposal for minor improvements. Powered by TCPDF (www.tcpdf.org)
Intelligent Data Scraping in a Web Browser
Maštera, František ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The goal of this thesis is to extract data from web pages without the knowledge of their internal structure. The point is to recognize the structure using an algorithm and a given input information about the content that the user wants to extract. The structure analysis is then followed by the content extraction itself. An average success rate of over 80% was achieved on selected sets of websites. The resulting algorithm represents a new approach to data extraction and can be deployed in the real world or can be a part of further development.
A Tool for Recognition and Verification of Spedition Orders
Kalivoda, Vojtěch ; Hradiš, Michal (referee) ; Herout, Adam (advisor)
The aim of this work is to design and implement a web tool that will facilitate the work of dispatchers of forwarding and transport companies through automated recognition of important information in orders. Thanks to the recognition, not all information has to be manually rewritten by dispatchers, which saves time. Order recognition is based on finding entities in a document, representing its surroundings with vectors using word2vec models and subsequent classification using convolutional neural networks. The tool can recognize 20 types of information in real time with an average success rate of 72.35~\%. As part of the work, a dataset of almost 1~700 orders was collected and 141 of them were annotated. Part of the work is a web application that serves as an interface for the tool and data collection.
Web page segmentation utilizing clustering techniques
Zelený, Jan ; Šimko, Marián (referee) ; Kliegr, Tomáš (referee) ; Zendulka, Jaroslav (advisor)
Získávání informací a jiné techniky dolování dat z webových stránek získávají na důležitosti s tím, jak se rozvíjí webové technologie a jak roste množství informací uložených na webu, jakožto jediném nosiči těchto informací. Spolu s tímto množství informací také ale roste množství obsahu, který není v kontextu prezentovaných informací ničím důležitý. To je jedním z důvodů, proč je důležité se intenzivně věnovat předzpracování informací uložených na webu. Segmentační algoritmy jsou jedním z možných způsobů předzpracování. Tato práce se věnuje využití shlukovacích technik pro zefektivnění existujících, ale i nalezení zcela nových algoritmů použitelných pro segmentaci webových stránek.
Optimization of processing of documents in library of Bohuslav Martinu Institute
Váchová, Veronika ; Stöcklová, Anna (advisor) ; Kolínová, Pavlína (referee)
This thesis describes the library of the Bohuslav Martinu Institute. First, there is described history and a current activities of the Institute. The main topic of this work is to decompose the library system of the local library and the information center. There are described static elements (the library collection in detail, catalogues and the library equipment) and dynamic elements (aquisition, organisation and services). The practical part contains a description of current state of cataloguing of documents and a proposal for minor improvements. Powered by TCPDF (www.tcpdf.org)
Methodology for processing plan documentation
Fencl, Petr
The presented methodology solves a clearly defined task - processing plan documentation for a given volume number in a limited time frame and limited financial resources. Because it is 94% of plans on tracing paper, we can take the whole process as the processing of the registry and archives of his last contact with the printing device - ie the scanner. The methodology aims to describe the process from collection matrices to final storage in the archive. Considerations in determining the progress of the work and the decision-making process is very important for solving complexity. For example, typically stored on archival break them individually in a paper sleeve / pocket / with a description and identification number on the carton. In our case for the 6500 format, I need 6500 sheets of sufficient size eg. 70 g / m2. This amount is two pallets of paper with a height of about 80 cm. That means the material 1.6 m in height. When transferred to a pocket, it is already 3.2 m. When you insert the matrix and stored without tying palette brings tripled - that is almost 10 m in height material.
Fulltext: Download fulltextPDF
Methodologies for processing personal papers with a significant share plan documentation
Šárka, Steinová
Presented methodical aid is focused on organizing personal estates with a significant proportion of plan documentation. Its aim is filing instructions on the arrangement, identification and retrieval methods of objects recorded in the planned documentation of important landscape architects.
Fulltext: Download fulltextPDF

National Repository of Grey Literature : 19 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.