Original title: Datawarehouse
Authors: Ragab Negm, Hussein Mohamed Abdelhaq ; Merunka, Vojtěch (advisor) ; Martin, Martin (referee)
Document type: Master’s theses
Language: eng
Publisher: Česká zemědělská univerzita v Praze
Abstract: Data is being produced by the firms in ever increasing rates and firms are finding new ways to make use of data to create business value. The generated volumes of data create the need for better and cheaper storage options that allows utilizing the data as well. Data warehouses have emerged as the most appropriate tool for this task. However, data warehouses come with significant costs both human and financial. The pool of technologies for implementing data warehouses is versatile. This project aims to provide a comparative implementation using two of the technologies, namely, Microsoft SQL Server and Apache Hadoop. The project covers the different phases of building a data warehouse; the requirements specification phase; the design phase and a compact comparison between the entity-relation and dimensional modeling design techniques and the process of building a dimensional model based on based on the application data sources; the extract-transform-load phase. The comparison is then made between the two technologies for data capacity, data loading, connectivity and querying data. The project concludes that the decision to choose between Microsoft SQL Server and Apache Hadoop is not a recommendation for one over the other but should be based on the needs, resources and the existing ecosystem. Hadoop would be the choice for bigger amounts of data, unstructured or irregular data formats, and when the licensing fees are an unaffordable cost. On the other hand, Microsoft SQL Server would make a better choice when the data is structured, the anticipated data volumes are suitable and when the rest of ecosystem is Microsoft based. Future development for this project should cover new ways to make Hadoop more efficient with smaller data volumes.

Institution: Czech University of Life Sciences Prague (web)
Document availability information: Available in the CZU repository.
Original record: https://is.czu.cz/zp/index.pl?podrobnosti_zp=202160

Permalink: http://www.nusl.cz/ntk/nusl-257745


The record appears in these collections:
Universities and colleges > Public universities > Czech University of Life Sciences Prague
Academic theses (ETDs) > Master’s theses
 Record created 2016-09-21, last modified 2022-03-03


No fulltext
  • Export as DC, NUŠL, RIS
  • Share