National Repository of Grey Literature 38 records found  beginprevious19 - 28next  jump to record: Search took 0.00 seconds. 
Big Data
Bútora, Matúš ; Bartík, Vladimír (referee) ; Hruška, Tomáš (advisor)
The aim of the bachelor thesis is to describe the Big Data issue and the OLAP aggregate operations. These operations are applied using Apache Hadoop technology. Most of the work is focused on the description of this technology. The last chapter contains application of aggregate operations and their implementation, following the conclusion of the work and the possibility for future development.
Integration of Big Data and data warehouse
Kiška, Vladislav ; Novotný, Ota (advisor) ; Kerol, Valeria (referee)
Master thesis deals with a problem of data integration between Big Data platform and enterprise data warehouse. Main goal of this thesis is to create a complex transfer system to move data from a data warehouse to this platform using a suitable tool for this task. This system should also store and manage all metadata information about previous transfers. Theoretical part focuses on describing concepts of Big Data, brief introduction into their history and presents factors which led to need for this new approach. Next chapters describe main principles and attributes of these technologies and discuss benefits of their implementation within an enterprise. Thesis also describes technologies known as Business Intelligence, their typical use cases and their relation to Big Data. Minor chapter presents main components of Hadoop system and most popular related applications. Practical part of this work consists of implementation of a system to execute and manage transfers from traditional relation database, in this case representing a data warehouse, to cluster of a few computers running a Hadoop system. This part also includes a summary of most used applications to move data into Hadoop and a design of database metadata schema, which is used to manage these transfers and to store transfer metadata.
Apache Hadoop as analytics platform
Brotánek, Jan ; Novotný, Ota (advisor) ; Kerol, Valeria (referee)
Diploma Thesis focuses on integrating Hadoop platform into current data warehouse architecture. In theoretical part, properties of Big Data are described together with their methods and processing models. Hadoop framework, its components and distributions are discussed. Moreover, compoments which enables end users, developers and analytics to access Hadoop cluster are described. Case study of batch data extraction from current data warehouse on Oracle platform with aid of Sqoop tool, their transformation in relational structures of Hive component and uploading them back to the original source is being discussed at practical part of thesis. Compression of data and efficiency of queries depending on various storage formats is also discussed. Quality and consistency of manipulated data is checked during all phases of the process. Fraction of practical part discusses ways of storing and capturing stream data. For this purposes tool Flume is used to capture stream data. Further this data are transformed in Pig tool. Purpose of implementing the process is to move part of data and its processing from current data warehouse to Hadoop cluster. Therefore process of integration of current data warehouse and Hortonworks Data Platform and its components, was designed
Distributed Processing of IP flow Data
Krobot, Pavel ; Kořenek, Jan (referee) ; Žádník, Martin (advisor)
This thesis deals with the subject of distributed processing of IP flow. Main goal is to provide an implementation of a software collector which allows storing and processing huge amount of a network data in particular. There was studied an open-source implementation of a framework for the distributed processing of large data sets called Hadoop, which is based on MapReduce paradigm. There were made some experiments with this system which provided the comparison with the current systems and shown weaknesses of this framework. Based on this knowledge there was created a specification and scheme for an extension of current software collector within this work. In terms of the created scheme there was created an implementation of query framework for formed collector, which is considered as most critical in the field of distributed processing of IP flow data. Results of experiments with created implementation show significant performance growth and ability of linear scalability with some types of queries.
Processing and Visualization of Military Sensor Data
Boychuk, Maksym ; Burget, Radek (referee) ; Rychlý, Marek (advisor)
This thesis deals with the creating, visualization and processing data in a military environment. The task is to design and implement a system that enables the creation, visualization and processing ESM data. The result of this work is a ESMBD application that allows using a classical approach, which is a relational database, and BigData technologies for data storage and manipulation. The comparison of data processing speed while using the classic approach (Postgres database) and BigData technologies (Cassandra databases and Hadoop) has been carried out as well.
Scalable machine learning using Hadoop and Mahout tools
Kryške, Lukáš ; Atassi, Hicham (referee) ; Burget, Radim (advisor)
This bachelor’s thesis compares several tools for building a scalable, machine learning platform and describes their advantages and disadvantages. It also practically demonstrates functionality of this scalable platform based on the Apache Hadoop and Apache Mahout tools and measures performance of the K-Means algorithm for total of five computing nodes.
Scalable preprocessing of data using Hadoop tool
Marinič, Michal ; Šmirg, Ondřej (referee) ; Burget, Radim (advisor)
The thesis is concerned with scalable pre-processing of data using Hadoop tool which is used for processing of large volumes of data. In the first theoretical part it focuses on explaining of functioning and structure of the basic elements of Hadoop distributed file system and MapReduce methods for parallel processing. The latter practical part of the thesis describes the implementation of basic Hadoop cluster in pseudo-distributed mode for easy program-debugging, and also describes an implementation of Hadoop cluster in fully-distributed mode for simulation in practice.
Optimization of the Hadoop Platform for Distributed Computation
Čecho, Jaroslav ; Smrčka, Aleš (referee) ; Letko, Zdeněk (advisor)
This thesis is focusing on possibilities of improving the Apache Hadoop framework by outsourcing some computation to a graphic card using the NVIDIA CUDA technology. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model called mapreduce. NVIDIA CUDA is a platform which allows one to use a graphic card for a general computation. This thesis contains description and experimental implementations of suitable computation inside te Hadoop framework that can benefit from being executed on a graphic card.
Implementation of Regular Expression Grouping in MapReduce Paradigm
Šafář, Martin ; Dvořák, Milan (referee) ; Kaštil, Jan (advisor)
The greatest contribution of this thesis is design and implementation of program, that uses MapReduce paradigm and Apache Hadoop for acceleration of regular expression grouping. This paper also describes algorithms, that are used for regular expression grouping and proposes some improvements for these algorithms. Experiments carried out in this thesis show, that a cluster of 20 computers can speed up the grouping ten times.
Hadoop and Business Intelligence
Kerner, Josef ; Šperková, Lucie (advisor) ; Augustín, Jakub (referee)
The main purpose of this thesis is to describe how an integration of a Hadoop platform into currently existing Business Intelligence technologies and processes can augment its data processing and analysis capabilities while encountering Big Data. Furthermore, it describes reasons why the whole Hadoop application ecosystem was founded and informs the reader about the functionality of its primary components. It continues with provision of overview about Hadoop higher-level components architecture and their use in existing Business Intelligence processes such as data ingestion, transformation and analysis. In the last theoretical chapter it focuses itself on describing specific areas of utilization of the Hadoop platform and Big Data in data warehousing, text mining and predictive analytics. From the practical point of view, a particular use case is provided, an implementation of Big Data ETL process in the field of financial markets and trading with a detailed explanation of the corresponding necessities such as data model, ETL code and proposed metrics, which can be further implemented for achieving increased return on investments.

National Repository of Grey Literature : 38 records found   beginprevious19 - 28next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.