keywords:"web scraping" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"web scraping"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	System for Recognizing Disinformation in Web Environment Večerka, Lukáš ; Žádník, Martin (referee) ; Strnadel, Josef (advisor) This work deals with the design, implementation, and verification of a system for automatic recognition of disinformation on the web. It addresses the issue of disinformation spread in the online environment and its impact on society. It focuses on training several Czech transformer language models for disinformation recognition and further automatic extraction of content from Czech online newspapers and their analysis using text classification and natural language processing through deep learning methods. The results of these analyses are then presented in a web user interface with the aim of providing a platform for verifying articles, authors, and sources. The interface could be used for data annotation by experts for continuous improvement of language models. Detailed record
	Integration of Web Data Sources to Information Systems Hrubý, Erik ; Zaklová, Kristýna (referee) ; Burget, Radek (advisor) The goal of this work is to create a library for the integration of data from web resources, such as HTML document, into information systems. The library is implemented in the Java language and the programmer will be able to use it for quick and easy mapping of data from the HTML document to Java data structures (objects), which he will be able to freely use in his information system operating on the Java platform. The programmer will be required to supply the library with his own implementation, in which the annotations will describe how the given values should be searched using the CSS selector or the XPath expression. The Jsoup library is used to download the web document. Detailed record
	Data Mining Based Web Analyzer of Job Advertisements Wittner, Alex ; Dzurenda, Petr (referee) ; Sikora, Marek (advisor) Cílem této bakalářské práce bylo vytvoření automatizovaného zadávání nových pracovních inzerátů pomocí vložení URL v rámci již existující webové aplikace https://rewire.informacni-bezpecnost.cz, jejíž cílem je shromažďování pracovních inzerátů v oblasti cybersecurity s podrobnou analýzou pracovních kompetencí. Pracovní inzeráty jsou analyzovány pomocí více vzorového vyhledávacího algoritmu Aho-Corasick, psaného v jazyce Java. K získávání informací ze zadaných pracovních inzerátů slouží Python skript využívající knihovnu Selenium. Výsledná implementace a webová stránka je vytvořena pomocí jazyka PHP a knihovny ReactJS využívající JavaScript. Detailed record
	Automatic Webpage Reconstruction Serečun, Viliam ; Ryšavý, Ondřej (referee) ; Veselý, Vladimír (advisor) Many legal institutions require a burden of proof regarding web content. This thesis deals with a problem connected to web reconstruction and archiving. The primary goal is to provide an open source solution, which will satisfy legal institutions with their requirements. This work presents two main products. The first is a framework, which is a fundamental building block for developing web scraping and web archiving applications. The second product is a web application prototype. This prototype shows the framework utilization. The application output is MAFF archive file which comprises a reconstructed web page, web page screenshot, and meta information table. This table shows information about collected data, server information such as IP addresses and ports of a device where is the original web page located, and time stamp. Detailed record
	System for Web Data Source Integration Kolečkář, David ; Bartík, Vladimír (referee) ; Burget, Radek (advisor) The thesis aims at designing and implementing a web application that will be used for the integration of web data sources. For data integration, a method using domain model of the target information system was applied. The work describes individual methods used for extracting information from web pages. The text describes the process of designing the architecture of the system including a description of the chosen technologies and tools. The main part of the work is implementation and testing the final web application that is written in Java and Angular framework. The outcome of the work is a web application that will allow its users to define web data sources and save data in the target database. Detailed record
	Sentiment Analysis of Czech and Slovak Social Networks and Web Discussions Sojka, Matěj ; Dočekal, Martin (referee) ; Smrž, Pavel (advisor) Thanks to digitalization, the spread of opinions in the population has accelerated sharply in the recent years, however the need to understand them has not changed. The goal of this thesis was to create a system for automatic data collection from social media and web discussions and sentiment analysis in Czech and Slovak language. The system has a web interface for visualizing results and configuring data analysis. The system is capable of offering topics to the user that it considers to occur in the selected data and group posts based on user-defined opinions. Detailed record
	Relationship between Changes in Betting Odds and Results of Football Matches Jurkovič, Juraj ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor) The goal of this thesis is to demonstrate techniques for solving web scraping and knowledge discovery tasks. The case study is focused on the extraction of data from bookmaker websites and subsequent analysis of collected data. The thesis demonstrates the implementation of web scraping task in Python language. The thesis describes selected implementation details for developing such a system and proposes a database schema that can be used for this purpose. Collected data is analyzed using statistical methods and frequent patterns are discovered in odds movements using apriori algorithm. Discovered relationships and frequent patterns are presented to the end user. Detailed record
	Sentiment Analysis of Czech Social Networks and Web Discussions on Retail Chains Bolješik, Michal ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor) The goal of this thesis is to design and implement a system that analyses data from the web mentioning Czech grocery chain stores. Implemented system is able to download such data automatically, perform sentiment analysis of the data, extract locations and chain stores' names from the data and index the data. The system also includes a user interface showing results of the analyses. The first part of the thesis surveys the state of the art in collecting data from web, sentiment analysis and indexing documents. A description of the discussed system's design and its implementation follows. The last part of the thesis evaluates implemented system Detailed record
	Methods of Data Extraction from the Web Perina, Lukáš ; Křivka, Zbyněk (referee) ; Burget, Radek (advisor) The purpose of this bachelor thesis is to design an architecture and subsequent implementation of an application designed for data extraction (web scraping) from web documents. Unlike conventional methods, it is an extraction based on defining data types and regular expressions of requested elements. Extraction is executed in such a manner, where it is not necessary to know the detailed structure of given web document and the possibility of using just one definition to detect requested elements on different web pages. Algorithm is able to achieve overall accuracy of 85,51% and recall 80,28%. This approach can reduce the time required for analysis of web pages significantly and not to take the structure of the code as a determining factor while creating web scraping requests. Detailed record
	Identification of Cryptocurrency Users Zrnčík, Henrich ; Matoušek, Petr (referee) ; Veselý, Vladimír (advisor) abstract.en={The 3rd January 2009 is considered to be the beginning of the cryptocurrency era. This work will deal with one of the ways of obtaining information about cryptocurrency users. It will be done by retrieving it from public websites where users often knowingly or unknowingly publish their cryptocurrency's address for different purposes (donation accounts or online payments which include personal data) as a result of which it is possible to link their crypto account with their real identity. The information is obtained through web scrappings. The result is an implemented application capable of automated collecting of data about the identity of cryptocurrency users and its subsequent storage in a structured database. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English