National Repository of Grey Literature 36 records found  beginprevious27 - 36  jump to record: Search took 0.01 seconds. 
Classification Framework
Koroncziová, Dominika ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor)
The goal of this work is the design and implementation of a machine learning software, based on the RapidMiner library. The finished application integrates the most commonly used algorithms and processes implemented in RapidMiner into an easily usable program. The application contains a simple command line interface, as well as a graphic interface to simplify selection of multiple parameters. The program also provides a tool to create standalone programs, that can be used for classification with a pre-trained model. On top of the original requirements the possibility to work with textual data from Wikipedia was also implemented, providing a tool for downloading and preprocessing of the data in order to use them as training input. This text focuses on the specifics of the algorithms and classifiers used and on their features and uses, and describes the design and implementation of the system. As part of this work, several tests were run in order to validate the efficiency and functionality of the program. The test results are included at the end of the thesis.
Semantic Similarity of Articles
Veselovský, Martin ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor)
This bachelor's thesis deals with modelling of structure of semantic relationships among articles in English language. There are introduced existing methods of articles representation and computation of similarity. The base method is vector space model, which represents document as vector of words. There are given weights of importance to these words using TF-IDF method. Next, there are described advanced methods of modelling, Latent semantic analysis (LSA) and Latent Dirichlet allocation (LDA). This thesis also deals with articles, which are semantically annotated, while weights of annotation words are computed by Stochastic Gradient Descent method. Evaluation of results takes place on the prepared test corpus of documents to which there is reference similarity evaluation.
Representation of Text and Its Influence on Categorization
Šabatka, Ondřej ; Chmelař, Petr (referee) ; Bartík, Vladimír (advisor)
The thesis deals with machine processing of textual data. In the theoretical part, issues related to natural language processing are described and different ways of pre-processing and representation of text are also introduced. The thesis also focuses on the usage of N-grams as features for document representation and describes some algorithms used for their extraction. The next part includes an outline of classification methods used. In the practical part, an application for pre-processing and creation of different textual data representations is suggested and implemented. Within the experiments made, the influence of these representations on accuracy of classification algorithms is analysed.
Mining of Textual Data from the Web for Speech Recognition
Kubalík, Jakub ; Plchot, Oldřich (referee) ; Mikolov, Tomáš (advisor)
Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech.
Derivation of Dictionary for Process Inspector Tool on SharePoint Platform
Pavlín, Václav ; Masařík, Karel (referee) ; Kreslíková, Jitka (advisor)
This master's thesis presents methods for mining important pieces of information from text. It analyses the problem of terms extraction from large document collection and describes the implementation using C# language and Microsoft SQL Server. The system uses stemming and a number of statistical methods for term extraction. This project also compares used methods and suggests the process of the dictionary derivation.
Methods of Web Page Classification
Nachtnebl, Viktor ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
This work deals with methods of web page classification. It explains the concept of classification and different features of web pages used for their classification. Further it analyses representation of a page and in detail describes classification method that deals with hierarchical category model and is able to dynamically create new categories. In the second half it shows implementation of chosen method and describes the results.
Using of Data Mining Method for Analysis of Social Networks
Novosad, Andrej ; Očenášek, Pavel (referee) ; Bartík, Vladimír (advisor)
Thesis discusses data mining the social media. It gives an introduction about the topic of data mining and possible mining methods. Thesis also explores social media and social networks, what are they able to offer and what problems do they bring. Three different APIs of three social networking sites are examined with their opportunities they provide for data mining. Techniques of text mining and document classification are explored. An implementation of a web application that mines data from social site Twitter using the algorithm SVM is being described. Implemented application is classifying tweets based on their text where classes represent tweets' continents of origin. Several experiments executed both in RapidMiner software and in implemented web application are then proposed and their results examined.
Actual Events Tracker
Odstrčilík, Martin ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor)
The goal of the master thesis project was to develop an application for tracking of actual events in the surrounding area of the users. This application should allow the users to view events, create new events and add comments to existing ones. Beyond the implementation of developed application, this project deals with an analysis of the presented problem. The analysis includes a comparison with existing solutions and search for available technologies and frameworks applicable for implementation. Another part inside this work is description of the theory in behind of data classification that is internally used for event and comment analysis. This work also includes a design of appliction including design of user interface, software architecture, database, communication protocol and data classifiers. The main part of this project, the implementation, is described aftewards. At the end of this work, there is a summary of the whole process and also there are given some ideas about enhancing the application in the future.
Semantic Similarity of Texts
Bradáč, Václav ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This paper deals with the determination of semantic similarity texts, focusing on scalability. Part of treatment is a theoretical overview of the tools to implement the system on test data. Tested corpus contains expert articles in the English language. The aim is to analyze these articles, modified to facilitate the analysis of their semantic analogues. One of the most utilized tools is a representation of data in a vector space model.
Improved Prediction of Social Tags Using Data Mining
Harár, Pavol ; Galáž, Zoltán (referee) ; Kříž, Jiří (advisor)
This master’s thesis deals with using Text mining as a method to predict tags of articles. It describes the iterative way of handling big data files, parsing the data, cleaning the data and scoring of terms in article using TF-IDF. It describes in detail the flow of program written in programming language Python 3.4.3. The result of processing more than 1 million articles from Wikipedia database is a dictionary of English terms. By using this dictionary one is capable of determining the most important terms from article in corpus of articles. Relevancy of consequent tags proves the method used in this case.

National Repository of Grey Literature : 36 records found   beginprevious27 - 36  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.