National Repository of Grey Literature 84 records found  beginprevious75 - 84  jump to record: Search took 0.00 seconds. 
The use of statistical methods in data mining in predicting consumer behaviour for Internet purchases
Podzimková, Michaela ; Vilikus, Ondřej (advisor) ; Berka, Petr (referee)
Data mining is a new discipline that occurs with increasing amount of stored data and the increasing need to obtain the information hidden in them. It is focused on the mining of potentially useful information from large data sets and it lies at the intersection of statistics, machine learning, artificial intelligence, databases and other areas. The aim of this thesis is to present the process of data mining with an emphasis on its connection with statistics and to describe a selection of statistical methods widely used in this field and which were also used in the applied data mining problem in this thesis. Real data from purchases in the online store show that using different methods gives different results and interesting information about purchasing behavior, and also proves that not all methods are always applicable to all types of tasks.
Explaining Anomalies with Sapling Random Forests
Pevný, T. ; Kopp, Martin
The main objective of anomaly detection algorithms is finding samples deviating from the majority. Although a vast number of algorithms designed for this already exist, almost none of them explain, why a particular sample was labelled as an anomaly. To address this issue, we propose an algorithm called Explainer, which returns the explanation of sample’s differentness in disjunctive normal form (DNF), which is easy to understand by humans. Since Explainer treats anomaly detection algorithms as black-boxes, it can be applied in many domains to simplify investigation of anomalies. The core of Explainer is a set of specifically trained trees, which we call sapling random forests. Since their training is fast and memory efficient, the whole algorithm is lightweight and applicable to large databases, datastreams, and real-time problems. The correctness of Explainer is demonstrated on a wide range of synthetic and real world datasets.
Interpreting and Clustering Outliers with Sapling Random Forests
Kopp, Martin ; Pevný, T. ; Holeňa, Martin
The main objective of outlier detection is finding samples considerably deviating from the majority. Such outliers, often referred to as anomalies, are nowadays more and more important, because they help to uncover interesting events within data. Consequently, a considerable amount of statistical and data mining techniques to identify anomalies was proposed in the last few years, but only a few works at least mentioned why some sample was labelled as an anomaly. Therefore, we propose a method based on specifically trained decision trees, called sapling random forest. Our method is able to interpret the output of arbitrary anomaly detector. The explanation is given as a subset of features, in which the sample is most deviating, or as conjunctions of atomic conditions, which can be viewed as antecedents of logical rules easily understandable by humans. To simplify the investigation of suspicious samples even more, we propose two methods of clustering anomalies into groups. Such clusters can be investigated at once saving time and human efforts. The feasibility of our approach is demonstrated on several synthetic and one real world datasets.
Statistical Expectation of High Energy Physics Data Sets Separation Algorithms
Hakl, František
Article focuses on the application of the basic results of the statistical learning theory known as Probabilistic Approximately Correct learning in the evaluation and post-processing of unique physical data obtained from the detectors of particle accelerators. The aim of this article is not direct separation of the measured data but evaluation of the appropriateness of separation methods used. The main principles and results of the PAC learning theory are briefly summarized, the main characteristics of selected multivariable data separation algorithms are studied from the VC-dimension point of view. Finally, based on actual data sets obtained from Tevatron D$\emptyset$ experiment, some practical hints for separation method selection and numerical computation are derived.
Datamining - teorie a praxe
Popelka, Aleš ; Maryška, Miloš (advisor) ; Machač, Ivo (referee)
This thesis deals with the topic of the technology called data mining. First, the thesis describes the term data mining as an independent discipline and then its processing methods and the most common use. The term data mining is thereafter explained with the help of methodologies describing all parts of the process of knowledge discovery in databases -- CRISP-DM, SEMMA. The study's purpose is presenting new data mining methods and particular algorithms -- decision trees, neural networks and genetic algorithms. These facts are used as theoretical introduction, which is followed by practical application searching for causes of meningoencephalitis development of certain sample of patients. Decision trees in system Clementine, which is one of the top datamining tools, were used for the analysys.
Using data mining to manage an enterprise.
Prášil, Zdeněk ; Pour, Jan (advisor) ; Novotný, Ota (referee)
The thesis is focused on data mining and its use in management of an enterprise. The thesis is structured into theoretical and practical part. Aim of the theoretical part was to find out: 1/ the most used methods of the data mining, 2/ typical application areas, 3/ typical problems solved in the application areas. Aim of the practical part was: 1/ to demonstrate use of the data mining in small Czech e-shop for understanding of the structure of the sale data, 2/ to demonstrate, how the data mining analysis can help to increase marketing results. In my analyses of the literature data I found decision trees, linear and logistic regression, neural network, segmentation methods and association rules are the most used methods of the data mining analysis. CRM and marketing, financial institutions, insurance and telecommunication companies, retail trade and production are the application areas using the data mining the most. The specific tasks of the data mining focus on relationships between marketing sales and customers to make better business. In the analysis of the e-shop data I revealed the types of goods which are buying together. Based on this fact I proposed that the strategy supporting this type of shopping is crucial for the business success. As a conclusion I proved the data mining is methods appropriate also for the small e-shop and have capacity to improve its marketing strategy.
Classification of strategical plans under conditions of the risk {--} decision making of investment by the apparatus of decision trees
JÍCHOVÁ, Romana
In my thesis I dealt with the capital decision making, with the methods to classification of the investments and with decision making under risk and uncertainty. The aim of the thesis was the application of mathematical methods by selection the options of the investments. The main task was to show the possibility of using decision trees, which are the graphical instruments for describing actions available to the decision maker. In the practical part there is described the process of making a decision tree on the example of the sale of real properties and on the example of the extraction of coal oil.

National Repository of Grey Literature : 84 records found   beginprevious75 - 84  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.