National Repository of Grey Literature 12 records found  previous11 - 12  jump to record: Search took 0.01 seconds. 
ScraperWiki Tutorial
Levine, Thomas
The objective of the workshop, or better hackathon, was to get the data into a structured format, and join it with data from another sources – together with an overview and showing by example what is possible with scraping. Thomas identified targets for web scraping and navigating the complexity of different types of web pages and introduced that in a few half-hour-long and hour-long modules that catered to different audiences.
Slides: Download fulltextPDF
Data quality management in small and medium enterprises
Zelený, Pavel ; Pour, Jan (advisor) ; Novotný, Ota (referee)
This diploma thesis deals with the data quality management. There are many tools and methodologies to support the data quality management even in Czech market but they are all only for large companies. Small and middle companies can't afford them because of high cost. The first goal of this thesis is to summarize principles of the methodologies and then on the base of the methodologies to suggest more simple methodology for small and middle companies. In the second part of thesis is created and adapted the methodology for a specific company. The first step is to choose the data area of interest in the company. Because of impossibility to buy a software tool to clean data, there are defined relatively simple rules which are base source to create cleaning scripts in SQL language. The scripts are used for automatic data cleaning. On the base of next analyze is decided what data should be cleaned manually. In the next step are described recommendations how to remove duplicities from the database. There is used a functionality of the company's production system. The last step of the methodology is to create a control mechanism which have to keep the required data quality in future. At the end of thesis is made a data research in four data sources. All these sources are from companies using the same production system. The reason of research is to present the overview of data quality and to help with decision about cleaning data in the companies also.

National Repository of Grey Literature : 12 records found   previous11 - 12  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.