keywords:"Apache Spark" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"Apache Spark"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Computational tasks for solving parallel data processing Rexa, Denis ; Uher, Václav (referee) ; Mašek, Jan (advisor) The goal of this diploma thesis was to create four laboratory exercises for the subject "Parallel Data Processing", where students will try on the options and capabilities of Apache Spark as a parallel computing platform. The work also includes basic setup and use of Apache Kafka technology and NoSQL Apache Cassandra database. The other two lab assignments focus on working with a Travelling Salesman Problem. The first lab was designed to demonstrate the difficulty of a task where the student will face an exponential increase in complexity. The second task consists of an optimization algorithm to solve the problem in cluster. This algorithm is subjected to performance measurements in clusters. The conclusion of the thesis contains recommendations for optimization as well as comparison of running with different number of computing devices. Detailed record
	Web Application for Graphical Description and Execution of Spark Tasks Hmeľár, Jozef ; Burget, Radek (referee) ; Rychlý, Marek (advisor) This master's thesis deals with Big data processing in distributed system Apache Spark using tools, which allow remotely entry and execution of Spark tasks through web inter- face. Author describes the environment of Spark in the first part, in the next he focuses on the Apache Livy project, which offers REST API to run Spark tasks. Contemporary solutions that allow interactive data analysis are presented. Author further describes his own application design for interactive entry and launch of Spark tasks using graph repre- sentation of them. Author further describes the web part of the application as well as the server part of the application. In next section author presents the implementation of both parts and, last but not least, the demonstration of the result achieved on a typical task. The created application provides an intuitive interface for comfortable working with the Apache Spark environment, creating custom components, and also a number of other options that are standard in today's web applications. Detailed record
	Network Traces Analysis Using Apache Spark Béder, Michal ; Veselý, Vladimír (referee) ; Ryšavý, Ondřej (advisor) The aim of this thesis is to show how to design and implement an application for network traces analysis using Apache Spark distributed system. Implementation can be divided into three parts - loading data from a distributed HDFS storage, supported network protocols analysis and distributed data processing. As a data visualization tool is used web-based notebook Apache Zeppelin. The resulting application is able to analyze individual packets as well as the entire flows. It supports JSON and pcap as input data formats. The goal of the application is to allow Big Data processing. The greatest impact on its performance has the input data format and allocation of the available cores. Detailed record
	Computational tasks for Parallel data processing course Horečný, Peter ; Rajnoha, Martin (referee) ; Mašek, Jan (advisor) The goal of this thesis was to create laboratory excercises for subject „Parallel data processing“, which will introduce options and capabilities of Apache Spark technology to the students. The excercises focus on work with basic operations and data preprocessing, work with concepts and algorithms of machine learning. By following the instructions, the students will solve real world situations problems by using algorithms for linear regression, classification, clustering and frequent patterns. This will show them the real usage and advantages of Spark. As an input data, there will be databases of czech and slovak companies with a lot of information provided, which need to be prepared, filtered and sorted for next processing in the first excercise. The students will also get known with functional programming, because the are not whole programs in excercises, but just the pieces of instructions, which are not repeated in the following excercises. They will get a comprehensive overview about possibilities of Spark by getting over all the excercices. Detailed record
	Machine learning in the field of Big Data Šimánek, Michal ; Kerol, Valeria (advisor) ; Novotný, Ota (referee) This bachelor's thesis devotes to the field of machine learning in Big Data. The main aim is to map and evaluate current situation of machine learning in Big Data, select and compare the most used machine learning libraries in Apache Spark tool and provide guide, how to implement algorithms of selected libraries. Theoretical part consists of explaining concept of Big Data, tools Apache Hadoop and Apache Spark, machine learning and decribes most used machine learning libraries in the Apache Spark tool along with comparsion metrics. Practical part is oriented to implementation of algorithms of selected libraries, writing the guide for implementation and according to outcomes and implementations comparing selected libraries from different views. Contribution of this thesis is to introduce machine learning problematics in Big Data, describe most used machine learning libraries and compare selected libraries with providing guide how to implement their algorithms. Detailed record
	Hadoop and Business Intelligence Kerner, Josef ; Šperková, Lucie (advisor) ; Augustín, Jakub (referee) The main purpose of this thesis is to describe how an integration of a Hadoop platform into currently existing Business Intelligence technologies and processes can augment its data processing and analysis capabilities while encountering Big Data. Furthermore, it describes reasons why the whole Hadoop application ecosystem was founded and informs the reader about the functionality of its primary components. It continues with provision of overview about Hadoop higher-level components architecture and their use in existing Business Intelligence processes such as data ingestion, transformation and analysis. In the last theoretical chapter it focuses itself on describing specific areas of utilization of the Hadoop platform and Big Data in data warehousing, text mining and predictive analytics. From the practical point of view, a particular use case is provided, an implementation of Big Data ETL process in the field of financial markets and trading with a detailed explanation of the corresponding necessities such as data model, ETL code and proposed metrics, which can be further implemented for achieving increased return on investments. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English