National Repository of Grey Literature 132 records found  beginprevious123 - 132  jump to record: Search took 0.01 seconds. 
Hledání sémantické informace v textových datech s využitím latentní analýzy
Řezníček, Pavel
The first part of thesis focuses on theoretical introduction to the methods of text mining -- Information retrieval, classification and clustering. LSA method is presented as an advanced model for representing textual data. Furthermore, the work describes source data and methods for their preprocessing and preparation used to enhance the effectiveness of text mining methods. For each chosen text mining method there are defined evaluation metrics and used already existing, or newly implemented, programs are presented. The results of experiments comparing the effects of different preprocessing type and use of different models of the source data are then demonstrated and discussed in the conclusion.
Automatizace generování stopslov
Krupník, Jiří
This diploma thesis focuses its point on automatization of stopwords generation as one method of pre-processing a textual documents. It analyses an influence of stopwords removal to a result of data mining tasks (classification and clustering). First the text mining techniques and frequently used algorithms are described. Methods of creating domain specific lists of stopwords are described to detail. In the end the results of large collections of text files testing and implementation methods are presented and discussed.
Aplikace metod předzpracování při dolování znalostí z textových dat
Kotíková, Michaela
The diploma thesis focuses on unstructured textual data preprocessing in relation to text mining. A series of experiments oriented to text mining is designed and carried out. The effect of different techniques of textual data preprocessing to the entire text mining process and its results is evaluated based on output of the experiments.
Repository for results of association rules data mining tasks in SEWEBAR project
Marek, Tomáš ; Šimůnek, Milan (advisor) ; Svátek, Vojtěch (referee)
This diploma thesis aims at design and implementation of I:ZI Repository application. I:ZI Repository application provides management of data mining tasks and theirs results repository and functions for search in this repository. I:ZI Repository is a REST API build on top of Java EE technology, Berkeley XML database is used for storing data mining tasks. I:ZI Repository application was created based on XQuery search application. The application has completely new structure compared to XQuery search application, all functionality of XQuery search application is present in I:ZI Repository application. Possibilities of using more general search query was added into I:ZI Repository application as well as fuzzy approaches for searching and possibility of clustering search results. Enhanced logging of application activities aimed at logging incoming search queries and outgoing search results is a part of implementation. Results of application testing are included as well.
Post-processing of association rules by multicriterial clustering method
Kejkula, Martin ; Rauch, Jan (advisor) ; Berka, Petr (referee) ; Máša, Petr (referee)
Association rules mining is one of several ways of knowledge discovery in databases. Paradoxically, data mining itself can produce such great amounts of association rules that there is a new knowledge management problem: there can easily be thousands or even more association rules holding in a data set. The goal of this work is to design a new method for association rules post-processing. The method should be software and domain independent. The output of the new method should be structured description of the whole set of discovered association rules. The output should help user to work with discovered rules. The path to reach the goal I used is: to split association rules into clusters. Each cluster should contain rules, which are more similar each other than to rules from another cluster. The output of the method is such cluster definition and description. The main contribution of this Ph.D. thesis is the described new Multicriterial clustering association rules method. Secondary contribution is the discussion of already published association rules post-processing methods. The output of the introduced new method are clusters of rules, which cannot be reached by any of former post-processing methods. According user expectations clusters are more relevant and more effective than any former association rules clustering results. The method is based on two orthogonal clustering of the same set of association rules. One clustering is based on interestingness measures (confidence, support, interest, etc.). Second clustering is inspired by document clustering in information retrieval. The representation of rules in vectors like documents is fontal in this thesis. The thesis is organized as follows. Chapter 2 identify the role of association rules in the KDD (knowledge discovery in databases) process, using KDD methodologies (CRISP-DM, SEMMA, GUHA, RAMSYS). Chapter 3 define association rule and introduce characteristics of association rules (including interestingness measuress). Chapter 4 introduce current association rules post-processing methods. Chapter 5 is the introduction to cluster analysis. Chapter 6 is the description of the new Multicriterial clustering association rules method. Chapter 7 consists of several experiments. Chapter 8 discuss possibilities of usage and development of the new method.
Business Intelligence principles and their use in questionnaire investigation
Hanuš, Václav ; Maryška, Miloš (advisor) ; Novotný, Ota (referee)
This thesis is oriented on practical usage of tools for data mining and business intelligence. Main goals are processing of source data to suitable form and test use of chosen tool on the test case. As input data I used database which was created as result of processing forms from research to verify the level of IT and economics knowledge among Czech universities. These data was modified into the form, which allows processing them via data mining tools included in Microsoft SQL Server 2008. I choose two cases for verification the potentials of these tools. First case was focused on clustering using Microsoft Clustering algorithm. Main task was to sort the universities into the clusters by comparing their attributes which was amounts of credits of each knowledge group. I had to deal with two problems. It was necessary to reduce the number of groups of subjects, otherwise there was a danger of creation too many clusters which I couldn't put the name on. Another problem was unequal value of credits in each group and this problem caused another problem with weights of these groups. Solution was at the end quite simple. I put together similar groups to bigger formation with more general category. For unequal value, I used parameter for each of new group and transform it to scale 0-5. Second case was focused on prediction task using Microsoft Logistic Regresion algorithm and Microsoft Neural Network algorithm. In this case was the goal to predict the number of presently studying students. I had a historical data from years 2001-2009. A predictive model was processed based on them and I could compare the prediction with real data. In this case, it was also necessary to transform the source data, otherwise it couldn't be processed by tested tool. Original data was placed into the view instead of table and contained not only wished objects but more types of these. For example divided by a sex. Solution was in creation of new table in database where only relevant objects for test case were placed. Last problem come up when I tried to use prediction model to predict data for year 2010 for which there wasn't real data in the table. Software reported an error and couldn't make prediction. During my research on the Microsoft technical support I find some threads which refer to similar problem, so it's possible that this is a system error whit will be fix in forthcoming actualization. Fulfillment of these cases provided me enough clues to determine abilities of these tools from Microsoft. After my former school experience with data mining tools from IBM (former SSPS) and SAS, I can recognize, if tested tools can match these software from major data mining supplier on the market and if it can be use for serious deployment.
Cluster Analysis and Textual Data
Húsek, Dušan ; Řezanková, H. ; Snášel, Václav
Applicability of the cluster analysis in the area of a large textual databases is studied. Main principles of clustering algorithms are discussed and compared from the point of view of their applicability in this field.
Web Analytics: Identification of new trends
Slavík, Michal ; Kliegr, Tomáš (advisor) ; Nekvasil, Marek (referee)
The goal of this thesis is to identify the main trends in the field of tools used to analyse web traffic. The necessary theoretical background is extracted from relevant literature and field research is chosen to gain knowledge of practitioners. Following trends have been identified: a growth in demand for Web Analytics software, an increasing interest in Web Analytics courses, an enlargment of measuring Web 2.0 and social networks, use of semantic information as the most fruitful section of academic research. The thesis also presents the main techniques of Web Usage Mining: association rules, sequential patterns, and clustering. A section about query categorization is also included. According to the field research, practitioners express most interest in clustering. The first two chapters present Web Analytics in general and introduce the main aspects of current applications. The third chapter covers theoretical research, the fifth one presents results of the field research. The fourth chapter raises the point that terminology of Web Analytics is not unified.

National Repository of Grey Literature : 132 records found   beginprevious123 - 132  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.