National Repository of Grey Literature 204 records found  beginprevious194 - 203next  jump to record: Search took 0.01 seconds. 
Interpreting and Clustering Outliers with Sapling Random Forests
Kopp, Martin ; Pevný, T. ; Holeňa, Martin
The main objective of outlier detection is finding samples considerably deviating from the majority. Such outliers, often referred to as anomalies, are nowadays more and more important, because they help to uncover interesting events within data. Consequently, a considerable amount of statistical and data mining techniques to identify anomalies was proposed in the last few years, but only a few works at least mentioned why some sample was labelled as an anomaly. Therefore, we propose a method based on specifically trained decision trees, called sapling random forest. Our method is able to interpret the output of arbitrary anomaly detector. The explanation is given as a subset of features, in which the sample is most deviating, or as conjunctions of atomic conditions, which can be viewed as antecedents of logical rules easily understandable by humans. To simplify the investigation of suspicious samples even more, we propose two methods of clustering anomalies into groups. Such clusters can be investigated at once saving time and human efforts. The feasibility of our approach is demonstrated on several synthetic and one real world datasets.
Repository for results of association rules data mining tasks in SEWEBAR project
Marek, Tomáš ; Šimůnek, Milan (advisor) ; Svátek, Vojtěch (referee)
This diploma thesis aims at design and implementation of I:ZI Repository application. I:ZI Repository application provides management of data mining tasks and theirs results repository and functions for search in this repository. I:ZI Repository is a REST API build on top of Java EE technology, Berkeley XML database is used for storing data mining tasks. I:ZI Repository application was created based on XQuery search application. The application has completely new structure compared to XQuery search application, all functionality of XQuery search application is present in I:ZI Repository application. Possibilities of using more general search query was added into I:ZI Repository application as well as fuzzy approaches for searching and possibility of clustering search results. Enhanced logging of application activities aimed at logging incoming search queries and outgoing search results is a part of implementation. Results of application testing are included as well.
Business Intelligence principles and their use in questionnaire investigation
Hanuš, Václav ; Maryška, Miloš (advisor) ; Novotný, Ota (referee)
This thesis is oriented on practical usage of tools for data mining and business intelligence. Main goals are processing of source data to suitable form and test use of chosen tool on the test case. As input data I used database which was created as result of processing forms from research to verify the level of IT and economics knowledge among Czech universities. These data was modified into the form, which allows processing them via data mining tools included in Microsoft SQL Server 2008. I choose two cases for verification the potentials of these tools. First case was focused on clustering using Microsoft Clustering algorithm. Main task was to sort the universities into the clusters by comparing their attributes which was amounts of credits of each knowledge group. I had to deal with two problems. It was necessary to reduce the number of groups of subjects, otherwise there was a danger of creation too many clusters which I couldn't put the name on. Another problem was unequal value of credits in each group and this problem caused another problem with weights of these groups. Solution was at the end quite simple. I put together similar groups to bigger formation with more general category. For unequal value, I used parameter for each of new group and transform it to scale 0-5. Second case was focused on prediction task using Microsoft Logistic Regresion algorithm and Microsoft Neural Network algorithm. In this case was the goal to predict the number of presently studying students. I had a historical data from years 2001-2009. A predictive model was processed based on them and I could compare the prediction with real data. In this case, it was also necessary to transform the source data, otherwise it couldn't be processed by tested tool. Original data was placed into the view instead of table and contained not only wished objects but more types of these. For example divided by a sex. Solution was in creation of new table in database where only relevant objects for test case were placed. Last problem come up when I tried to use prediction model to predict data for year 2010 for which there wasn't real data in the table. Software reported an error and couldn't make prediction. During my research on the Microsoft technical support I find some threads which refer to similar problem, so it's possible that this is a system error whit will be fix in forthcoming actualization. Fulfillment of these cases provided me enough clues to determine abilities of these tools from Microsoft. After my former school experience with data mining tools from IBM (former SSPS) and SAS, I can recognize, if tested tools can match these software from major data mining supplier on the market and if it can be use for serious deployment.
Testing Random Forests for Unix and Windows
Jiřina, Marcel ; Jiřina jr., M.
Fulltext: content.csg - Download fulltextPDF
Plný tet: v1075-10 - Download fulltextPDF
Shluková analýza pomocí genetických algoritmů
Kudová, Petra
We study the application of genetic algorithms to clustering and propose the Clustering Genetic Algorithm. On experiments we have shown that it
The importance of body image in marketing comunications
Kučerová, Dana ; Koudelka, Jan (advisor) ; Stříteský, Václav (referee)
The importance of body image in marketing comunications. The comparison of ideal and real beauty measures using clustering method and MML-TGI data.
Web Analytics: Identification of new trends
Slavík, Michal ; Kliegr, Tomáš (advisor) ; Nekvasil, Marek (referee)
The goal of this thesis is to identify the main trends in the field of tools used to analyse web traffic. The necessary theoretical background is extracted from relevant literature and field research is chosen to gain knowledge of practitioners. Following trends have been identified: a growth in demand for Web Analytics software, an increasing interest in Web Analytics courses, an enlargment of measuring Web 2.0 and social networks, use of semantic information as the most fruitful section of academic research. The thesis also presents the main techniques of Web Usage Mining: association rules, sequential patterns, and clustering. A section about query categorization is also included. According to the field research, practitioners express most interest in clustering. The first two chapters present Web Analytics in general and introduce the main aspects of current applications. The third chapter covers theoretical research, the fifth one presents results of the field research. The fourth chapter raises the point that terminology of Web Analytics is not unified.

National Repository of Grey Literature : 204 records found   beginprevious194 - 203next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.