National Repository of Grey Literature 7 records found  Search took 0.01 seconds. 
Computational tasks for Parallel data processing course
Horečný, Peter ; Rajnoha, Martin (referee) ; Mašek, Jan (advisor)
The goal of this thesis was to create laboratory excercises for subject „Parallel data processing“, which will introduce options and capabilities of Apache Spark technology to the students. The excercises focus on work with basic operations and data preprocessing, work with concepts and algorithms of machine learning. By following the instructions, the students will solve real world situations problems by using algorithms for linear regression, classification, clustering and frequent patterns. This will show them the real usage and advantages of Spark. As an input data, there will be databases of czech and slovak companies with a lot of information provided, which need to be prepared, filtered and sorted for next processing in the first excercise. The students will also get known with functional programming, because the are not whole programs in excercises, but just the pieces of instructions, which are not repeated in the following excercises. They will get a comprehensive overview about possibilities of Spark by getting over all the excercices.
Association Rules Mining
Dvořák, Michal ; Chmelař, Petr (referee) ; Stryka, Lukáš (advisor)
The main goal of this bachelor's thesis is design and implementation of the application that provides a comparison of the performance and time consumption of given algorithms for mining of the frequent itemsets and the association rules. For demonstration, the mining algorithms Apriori, AprioriTIDList, AprioriItemSet and the method using FP-tree were chosen. The tests were executed over various amounts of data and with different minimum support and confidence values as well. The application was implemented in the object oriented language C# and the relational database provided by MS SQL Server 2008 is used as the data source.
Knowledge Discovery over Data Warehouses
Pumprla, Ondřej ; Chmelař, Petr (referee) ; Stryka, Lukáš (advisor)
This Master's thesis deals with the principles of the data mining process, especially with the mining  of association rules. The theoretical apparatus of general description and principles of the data warehouse creation is set. On the basis of this theoretical knowledge, the application for the association rules mining is implemented. The application requires the data in the transactional form or the multidimensional data organized in the Star schema. The implemented algorithms for finding  of the frequent patterns are Apriori and FP-tree. The system allows the variant setting of parameters for mining process. Also, the validation tests and efficiency proofs were accomplished. From the point of view of the association rules searching support, the resultant application is more applicable and robust than the existing compared systems SAS Miner and Oracle Data Miner.
Computational tasks for Parallel data processing course
Horečný, Peter ; Rajnoha, Martin (referee) ; Mašek, Jan (advisor)
The goal of this thesis was to create laboratory excercises for subject „Parallel data processing“, which will introduce options and capabilities of Apache Spark technology to the students. The excercises focus on work with basic operations and data preprocessing, work with concepts and algorithms of machine learning. By following the instructions, the students will solve real world situations problems by using algorithms for linear regression, classification, clustering and frequent patterns. This will show them the real usage and advantages of Spark. As an input data, there will be databases of czech and slovak companies with a lot of information provided, which need to be prepared, filtered and sorted for next processing in the first excercise. The students will also get known with functional programming, because the are not whole programs in excercises, but just the pieces of instructions, which are not repeated in the following excercises. They will get a comprehensive overview about possibilities of Spark by getting over all the excercices.
Association Rules Mining
Dvořák, Michal ; Chmelař, Petr (referee) ; Stryka, Lukáš (advisor)
The main goal of this bachelor's thesis is design and implementation of the application that provides a comparison of the performance and time consumption of given algorithms for mining of the frequent itemsets and the association rules. For demonstration, the mining algorithms Apriori, AprioriTIDList, AprioriItemSet and the method using FP-tree were chosen. The tests were executed over various amounts of data and with different minimum support and confidence values as well. The application was implemented in the object oriented language C# and the relational database provided by MS SQL Server 2008 is used as the data source.
Knowledge Discovery over Data Warehouses
Pumprla, Ondřej ; Chmelař, Petr (referee) ; Stryka, Lukáš (advisor)
This Master's thesis deals with the principles of the data mining process, especially with the mining  of association rules. The theoretical apparatus of general description and principles of the data warehouse creation is set. On the basis of this theoretical knowledge, the application for the association rules mining is implemented. The application requires the data in the transactional form or the multidimensional data organized in the Star schema. The implemented algorithms for finding  of the frequent patterns are Apriori and FP-tree. The system allows the variant setting of parameters for mining process. Also, the validation tests and efficiency proofs were accomplished. From the point of view of the association rules searching support, the resultant application is more applicable and robust than the existing compared systems SAS Miner and Oracle Data Miner.
Frequent Pattern Discovery in a Data Stream
Dvořák, Michal ; Hlosta, Martin (referee) ; Zendulka, Jaroslav (advisor)
Frequent-pattern mining from databases has been widely studied and frequently observed. Unfortunately, these algorithms are not suitable for data stream processing. In frequent-pattern mining from data streams, it is important to manage sets of items and also their history. There are several reasons for this; it is not just the history of frequent items, but also the history of potentially frequent sets that can become frequent later. This requires more memory and computational power. This thesis describes two algorithms: Lossy Counting and FP-stream. An effective implementation of these algorithms in C# is an integral part of this thesis. In addition, the two algorithms have been compared. 

Interested in being notified about new results for this query?
Subscribe to the RSS feed.