National Repository of Grey Literature 56 records found  beginprevious47 - 56  jump to record: Search took 0.07 seconds. 
Post-processing of association rules by multicriterial clustering method
Kejkula, Martin ; Rauch, Jan (advisor) ; Berka, Petr (referee) ; Máša, Petr (referee)
Association rules mining is one of several ways of knowledge discovery in databases. Paradoxically, data mining itself can produce such great amounts of association rules that there is a new knowledge management problem: there can easily be thousands or even more association rules holding in a data set. The goal of this work is to design a new method for association rules post-processing. The method should be software and domain independent. The output of the new method should be structured description of the whole set of discovered association rules. The output should help user to work with discovered rules. The path to reach the goal I used is: to split association rules into clusters. Each cluster should contain rules, which are more similar each other than to rules from another cluster. The output of the method is such cluster definition and description. The main contribution of this Ph.D. thesis is the described new Multicriterial clustering association rules method. Secondary contribution is the discussion of already published association rules post-processing methods. The output of the introduced new method are clusters of rules, which cannot be reached by any of former post-processing methods. According user expectations clusters are more relevant and more effective than any former association rules clustering results. The method is based on two orthogonal clustering of the same set of association rules. One clustering is based on interestingness measures (confidence, support, interest, etc.). Second clustering is inspired by document clustering in information retrieval. The representation of rules in vectors like documents is fontal in this thesis. The thesis is organized as follows. Chapter 2 identify the role of association rules in the KDD (knowledge discovery in databases) process, using KDD methodologies (CRISP-DM, SEMMA, GUHA, RAMSYS). Chapter 3 define association rule and introduce characteristics of association rules (including interestingness measuress). Chapter 4 introduce current association rules post-processing methods. Chapter 5 is the introduction to cluster analysis. Chapter 6 is the description of the new Multicriterial clustering association rules method. Chapter 7 consists of several experiments. Chapter 8 discuss possibilities of usage and development of the new method.
Methodology of development and deployment of Business Intelligence solutions in Small and Medium Sized Enterprises
Rydzi, Daniel ; Jandoš, Jaroslav (advisor) ; Vlček, Radim (referee) ; Slánský, David (referee)
Dissertation thesis deals with development and implementation of Business Intelligence (BI) solutions for Small and Medium Sized Enterprises (SME) in the Czech Republic. This thesis represents climax of author's up to now effort that has been put into completing a methodological model for development of this kind of applications for SMEs using self-owned skills and minimum of external resources and costs. This thesis can be divided into five major parts. First part that describes used technologies is divided into two chapters. First chapter describes contemporary state of Business Intelligence concept and it also contains original taxonomy of Business Intelligence solutions. Second chapter describes two Knowledge Discovery in Databases (KDD) techniques that were used for building those BI solutions that are introduced in case studies. Second part describes the area of Czech SMEs, which is an environment where the thesis was written and which it is meant to contribute to. This environment is represented by one chapter that defines the differences of SMEs against large corporations. Furthermore, there are author's reasons why he is personally focusing on this area explained. Third major part introduces the results of survey that was conducted among Czech SMEs with support of Department of Information Technologies of Faculty of Informatics and Statistics of University of Economics in Prague. This survey had three objectives. First one was to map the readiness of Czech SMEs for BI solutions development and deployment. Second was to determine major problems and consequent decisions of Czech SMEs that could be supported by BI solutions and the third objective was to determine top factors preventing SMEs from developing and deploying BI solutions. Fourth part of the thesis is also the core one. In two chapters there is the original Methodology for development and deployment of BI solutions by SMEs described as well as other methodologies that were studied. Original methodology is partly based on famous CRISP-DM methodology. Finally, last part describes particular company that has become a testing ground for author's theories and that supports his research. In further chapters it introduces case-studies of development and deployment of those BI solutions in this company, that were build using contemporary BI and KDD techniques with respect to original methodology. In that sense, these case-studies verified theoretical methodology in real use.
Fuzzy GUHA
Ralbovský, Martin ; Rauch, Jan (advisor) ; Svátek, Vojtěch (referee) ; Holeňa, Martin (referee) ; Vojtáš, Peter (referee)
The GUHA method is one of the oldest methods of exploratory data analysis, which is regarded as part of the data mining or knowledge discovery in databases (KDD) scienti_c area. Unlike many other methods of data mining, the GUHA method has firm theoretical foundations in logic and statistics. In scope of the method, finding interesting knowledge corresponds to finding special formulas in satisfactory rich logical calculus, which is called observational calculus. The main topic of the thesis is application of the "fuzzy paradigm" to the GUHA method By the term "fuzzy paradigm" we mean approaches that use many-valued membership degrees or truth values, namely fuzzy set theory and fuzzy logic. The thesis does not aim to cover all the aspects of this application, it emphasises mainly on: - Association rules as the most prevalent type of formulas mined by the GUHA method - Usage of fuzzy data - Logical aspects of fuzzy association rules mining - Comparison of the GUHA theory to the mainstream fuzzy association rules - Implementation of the theory using the bit string approach The thesis throughoutly elaborates the theory of fuzzy association rules, both using the theoretical apparatus of fuzzy set theory and fuzzy logic. Fuzzy set theory is used mainly to compare the GUHA method to existing mainstream approaches to formalize fuzzy association rules, which were studied in detail. Fuzzy logic is used to define novel class of logical calculi called logical calculi of fuzzy association rules (LCFAR) for logical representation of fuzzy association rules. The problem of existence of deduction rules in LCFAR is dealt in depth. Suitable part of the proposed theory is implemented in the Ferda system using the bit string approach. In the approach, characteristics of examined objects are represented as strings of bits, which in the crisp case enables efficient computation. In order to maintain this feature also in the fuzzy case, a profound low level testing of data structures and algoritms for fuzzy bit strings have been carried out as a part of the thesis.
The GUHA Method, Data Preprocessing and Mining. Position Paper
Hájek, Petr ; Feglar, T. ; Rauch, J. ; Coufal, David
The paper surveys basic principles and foundations of the GUHA method, relation to some well-known data mining systems, main publications, existing implementations and future plans.
Aplikace procedury Ac4ft-Miner na medicínská data
Nekvapil, Viktor ; Rauch, Jan (advisor) ; Šimůnek, Milan (referee)
This bachelor thesis deals with the data mining procedure Ac4ft-Miner, implemented in the LISp-Miner system, which is developed at the Department of Information and Knowledge Engineering at the University of Economics, Prague. The aim of this thesis is firstly to describe the procedure in a simple, understandable way. Secondly, the aim is to apply this procedure on the medical data and present examples of use of this procedure. Further aim is to create methodology of use for doctors from the experience obtained. The aims are reached by using a lot of examples, which demonstrate theoretical concepts on concrete data and by the pursuit of the simple visualisation of tasks (analytical questions) solved by the procedure. The output of this thesis is a coherent text with lot of examples separated from the continuous text; so the reader familiar with a particular topic can skip the examples and proceed to the next issue. Further result of this thesis is an outline of the graphical presentation of analytical questions. Both the examples and the graphical presentation will be used further in the SEWEBAR project of which this thesis is one part. The methodology of use of the procedure for doctors is in the form of advices for use of the tool which should contribute to the further research which is needed. This is because of the high complexity of the procedure, which does not allow formulating general conclusions usable in the methodology. Chapter 1 characterizes the overall process of Knowledge Discovery in Databases represented by the CRISP-DM Methodology. Chapter 2 presents theoretical concepts related to Ac4ft-Miner. Chapter 3 deals with action rules. Chapter 4 addresses possibilities of defining the input and interpretation of the output of the Ac4ft-Miner. Chapter 5 describes the research conducted on the real medical data set ADAMEK, states methodology and examples of the output. Chapter 6 summarises the experience obtained and formulates the methodology of use of Ac4ft-Miner for doctors.
Comparison of the potency of application KDD methods and statistical methods in the analysis of ADAMEK data
Líbal, Petr ; Rauch, Jan (advisor) ; Berka, Petr (referee)
This bachelor thesis compares association rules and logistic regression. For this comparison medical data Adamek have been used. The relationship between attributes belonging to a group of Physical examinations and Difficulty has been studied. Both methods are theoretically described, their connection with the related common areas is mentioned - the analysis of market basket in the case of association rules, linear regression in the case of logistic regression. Before the analysis attributes are described with basic statistics and the distribution of values is graphically illustrated. In both cases, analysis proceed the same way. First, the relationship of each difficulty is examined, then is examined relationship of difficulties in general. In conclusion, the results of both methods is compared.
Web Analytics: Identification of new trends
Slavík, Michal ; Kliegr, Tomáš (advisor) ; Nekvasil, Marek (referee)
The goal of this thesis is to identify the main trends in the field of tools used to analyse web traffic. The necessary theoretical background is extracted from relevant literature and field research is chosen to gain knowledge of practitioners. Following trends have been identified: a growth in demand for Web Analytics software, an increasing interest in Web Analytics courses, an enlargment of measuring Web 2.0 and social networks, use of semantic information as the most fruitful section of academic research. The thesis also presents the main techniques of Web Usage Mining: association rules, sequential patterns, and clustering. A section about query categorization is also included. According to the field research, practitioners express most interest in clustering. The first two chapters present Web Analytics in general and introduce the main aspects of current applications. The third chapter covers theoretical research, the fifth one presents results of the field research. The fourth chapter raises the point that terminology of Web Analytics is not unified.
Ontology Learning and Information Extraction for the Semantic Web
Kavalec, Martin ; Berka, Petr (advisor) ; Štěpánková, Olga (referee) ; Snášel, Václav (referee)
The work gives overview of its three main topics: semantic web, information extraction and ontology learning. A method for identification relevant information on web pages is described and experimentally tested on pages of companies offering products and services. The method is based on analysis of a sample web pages and their position in the Open Directory catalogue. Furthermore, a modfication of association rules mining algorithm is proposed and experimentally tested. In addition to an identification of a relation between ontology concepts, it suggest possible naming of the relation.

National Repository of Grey Literature : 56 records found   beginprevious47 - 56  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.