National Repository of Grey Literature 8 records found  Search took 0.01 seconds. 
Advanced Web Crawler
Činčera, Jaroslav ; Jirák, Ota (referee) ; Trchalík, Roman (advisor)
This Master's thesis describes design and implementation of advanced web crawler. This crawler can be configured by user and is designed for web browsing according to specified parameters. Can acquire and evaluate content of web pages. Its configuration is performed by creating projects which are consisting of different types of steps. User can create simple action like downloading page, form submission, etc. or can create more complex and larger projects.
Incremental Web Crawling With Bubing System
Ondřej, Karel ; Fajčík, Martin (referee) ; Škoda, Petr (advisor)
This bachelor thesis deals with modification of BUbiNG system for incremental crawling. The paper describes the main problems related to incremental Internet crawling and the use of other open-source systems for incremental crawling. As a result, BUbiNG system supports re-visiting pages using two commonly used strategies. The first strategy always re-visits page after the same interval. The second strategy adjusts the interval between visits according to the frequency of page changes.
Web API Blocking
Frandel, Martin ; Hranický, Radek (referee) ; Polčák, Libor (advisor)
The aim of this work is to obtain the web APIs used in the top 1 000 000 pages of the Tranco ranking along with their subpages using the Web API Manager extension, then analyze and categorize the obtained data. Design a mechanism for the JShelter extension supporting blocking of individual web APIs that have been evaluated as tracking or advertising, implement the solution and then test it. In total, 2 973 276 web pages were analyzed. The captured data was aggregated with respect to web API insecurity, analyzed and the results described in the paper, with some API calls being blocked up to 93.33 % of the time. I was able to develop a method for identifying problematic APIs. Using polynomial regression, I found polynomials that describe the blocking behavior towards individual web APIs and their methods. I implemented the blocking functionality in the JShelter extension and successfully tested the solution.
Automatizované zhromažďovanie a štrukturalizácia dát z webových zdrojov
Zahradník, Roman
This diploma thesis deals with the creation of a solution for continuous data acquisition from web sources. The application is in charge of automatically navigating web pages, extracting data using dedicated selectors, and subsequently standardizing them for further processing for data mining.
Incremental Web Crawling With Bubing System
Ondřej, Karel ; Fajčík, Martin (referee) ; Škoda, Petr (advisor)
This bachelor thesis deals with modification of BUbiNG system for incremental crawling. The paper describes the main problems related to incremental Internet crawling and the use of other open-source systems for incremental crawling. As a result, BUbiNG system supports re-visiting pages using two commonly used strategies. The first strategy always re-visits page after the same interval. The second strategy adjusts the interval between visits according to the frequency of page changes.
Advanced Web Crawler
Činčera, Jaroslav ; Jirák, Ota (referee) ; Trchalík, Roman (advisor)
This Master's thesis describes design and implementation of advanced web crawler. This crawler can be configured by user and is designed for web browsing according to specified parameters. Can acquire and evaluate content of web pages. Its configuration is performed by creating projects which are consisting of different types of steps. User can create simple action like downloading page, form submission, etc. or can create more complex and larger projects.
Evaluation of the quality of IT services through the analysis of unstructured data
Zimmermann, Radim ; Vencovský, Filip (advisor) ; Karkošková, Soňa (referee)
The aim of this work is to obtain and analyze unstructured data from Web sites ( http://www.ispreview.co.uk ) ISPs in the UK using the analytical program KNIME Analytics Platform. In the first 7 chapters I described the theoretical tools that enable me to attain my goals . In the practical part I downloaded the data using a Web crawler, and then I analyzed the indexed search keywords and compare the two providers from the customer's perspective. The results were visualized using graphs and describe it . Contribution of my work is in the form of feedback or reports for business executives .
Monitoring of Internet and its benefits to business tools from SAS Institute
Moravec, Petr ; Pour, Jan (advisor) ; Rott, Ondrej (referee)
This Thesis is focused on the ways of getting information from the World Wide Web source . The Introduction pays attention to the theoretical approach towards data collection options . The main part of the Introduction is engaged in the Web Crawler program as the possibility of data collection from internet and consequently they are followed by alternative methods of data collection. E.g. Google Search API. The next part of the Thesis is dedicated to SAS products and their meanings in the context of reporting and internet monitoring. SAS Intelligence platform is presented as the crucial Company platform In the framework of the platform there could be found concrete SAS solutions. SAS Web Crawler and Semantic Server are described in SAS Content Categorization solution. Whilst the first two parts of Thesis are focused on the theory , the third and closing part pays attention to practical examples. There are illustrated examples of Internet Data collection, which are mainly realized in SAS. The practical part of Thesis follows the theoretical one and it cannot be detached.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.