National Repository of Grey Literature 24 records found  previous5 - 14next  jump to record: Search took 0.01 seconds. 
Implementation of Regular Expression Grouping in MapReduce Paradigm
Šafář, Martin ; Dvořák, Milan (referee) ; Kaštil, Jan (advisor)
The greatest contribution of this thesis is design and implementation of program, that uses MapReduce paradigm and Apache Hadoop for acceleration of regular expression grouping. This paper also describes algorithms, that are used for regular expression grouping and proposes some improvements for these algorithms. Experiments carried out in this thesis show, that a cluster of 20 computers can speed up the grouping ten times.
Big Data
Bútora, Matúš ; Bartík, Vladimír (referee) ; Hruška, Tomáš (advisor)
The aim of the bachelor thesis is to describe the Big Data issue and the OLAP aggregate operations. These operations are applied using Apache Hadoop technology. Most of the work is focused on the description of this technology. The last chapter contains application of aggregate operations and their implementation, following the conclusion of the work and the possibility for future development.
Application for Big Data
Blaho, Matúš ; Bartík, Vladimír (referee) ; Hruška, Tomáš (advisor)
This work deals with the description and analysis of the Big Data concept and its processing and use in the process of decision support. Suggested processing is based on the MapReduce concept designed for Big Data processing. The theoretical part of this work is largely about the Hadoop system that implements this concept. Its understanding is a key feature for properly designing applications that run within it. The work also contains design for specific Big Data processing applications. In the implementation part of the thesis is a description of Hadoop system management, description of implementation of MapReduce applications and description of their testing over data sets.
Scalable preprocessing of data using Hadoop tool
Marinič, Michal ; Šmirg, Ondřej (referee) ; Burget, Radim (advisor)
The thesis is concerned with scalable pre-processing of data using Hadoop tool which is used for processing of large volumes of data. In the first theoretical part it focuses on explaining of functioning and structure of the basic elements of Hadoop distributed file system and MapReduce methods for parallel processing. The latter practical part of the thesis describes the implementation of basic Hadoop cluster in pseudo-distributed mode for easy program-debugging, and also describes an implementation of Hadoop cluster in fully-distributed mode for simulation in practice.
Optimization of the Hadoop Platform for Distributed Computation
Čecho, Jaroslav ; Smrčka, Aleš (referee) ; Letko, Zdeněk (advisor)
This thesis is focusing on possibilities of improving the Apache Hadoop framework by outsourcing some computation to a graphic card using the NVIDIA CUDA technology. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model called mapreduce. NVIDIA CUDA is a platform which allows one to use a graphic card for a general computation. This thesis contains description and experimental implementations of suitable computation inside te Hadoop framework that can benefit from being executed on a graphic card.
Application for Big Data
Blaho, Matúš ; Bartík, Vladimír (referee) ; Hruška, Tomáš (advisor)
This work deals with the description and analysis of the Big Data concept and its processing and use in the process of decision support. Suggested processing is based on the MapReduce concept designed for Big Data processing. The theoretical part of this work is largely about the Hadoop system that implements this concept. Its understanding is a key feature for properly designing applications that run within it. The work also contains design for specific Big Data processing applications. In the implementation part of the thesis is a description of Hadoop system management, description of implementation of MapReduce applications and description of their testing over data sets.
Efficient kNN classification of malware from HTTPS data
Maroušek, Jakub ; Lokoč, Jakub (advisor) ; Galamboš, Leo (referee)
An important task of Network Intrusion Detection Systems (NIDS) is to detect malign com- munication in a computer network traffic. The traditional detection approaches which analyze the content of network packets, are becoming insufficient with an increased usage of encrypted HTTPS protocol. The previous research shows, however, that the high-level properties of HTTPS commu- nication such as the duration of a request or the number of bytes sent/received from the client to the server may be successfully used to detect behavioral patterns of malware activity. We study approximate k-NN similarity joins as one of the methods to build a classifier recognizing malign communication. Three MapReduce-based and one centralized approximate k-NN join methods are reimplemented in order to support large volumes of high-dimensional data. Finally, we thoroughly evaluate all methods on different datasets containing vectors up to 1000 dimensions and compare multiple aspects concerning scalability, approximation precision and classification precision of each approach.
Big Data
Bútora, Matúš ; Bartík, Vladimír (referee) ; Hruška, Tomáš (advisor)
The aim of the bachelor thesis is to describe the Big Data issue and the OLAP aggregate operations. These operations are applied using Apache Hadoop technology. Most of the work is focused on the description of this technology. The last chapter contains application of aggregate operations and their implementation, following the conclusion of the work and the possibility for future development.
Distributed Processing of IP flow Data
Krobot, Pavel ; Kořenek, Jan (referee) ; Žádník, Martin (advisor)
This thesis deals with the subject of distributed processing of IP flow. Main goal is to provide an implementation of a software collector which allows storing and processing huge amount of a network data in particular. There was studied an open-source implementation of a framework for the distributed processing of large data sets called Hadoop, which is based on MapReduce paradigm. There were made some experiments with this system which provided the comparison with the current systems and shown weaknesses of this framework. Based on this knowledge there was created a specification and scheme for an extension of current software collector within this work. In terms of the created scheme there was created an implementation of query framework for formed collector, which is considered as most critical in the field of distributed processing of IP flow data. Results of experiments with created implementation show significant performance growth and ability of linear scalability with some types of queries.
Scalable preprocessing of data using Hadoop tool
Marinič, Michal ; Šmirg, Ondřej (referee) ; Burget, Radim (advisor)
The thesis is concerned with scalable pre-processing of data using Hadoop tool which is used for processing of large volumes of data. In the first theoretical part it focuses on explaining of functioning and structure of the basic elements of Hadoop distributed file system and MapReduce methods for parallel processing. The latter practical part of the thesis describes the implementation of basic Hadoop cluster in pseudo-distributed mode for easy program-debugging, and also describes an implementation of Hadoop cluster in fully-distributed mode for simulation in practice.

National Repository of Grey Literature : 24 records found   previous5 - 14next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.