National Repository of Grey Literature 12 records found  1 - 10next  jump to record: Search took 0.00 seconds. 
Stemming Methods Used in Text Mining
Adámek, Tomáš ; Chmelař, Petr (referee) ; Bartík, Vladimír (advisor)
The main theme of this master's thesis is a description of text mining. This document is specialized to English texts and their automatic data preprocessing. The main part of this thesis analyses various stemming algorithms (Lovins, Porter and Paice/Husk). Stemming is a procedure for automatic conflating semantically related terms together via the use of rule sets. Next part of this thesis describes design of an application for various types of stemming algorithms. Application is based on the Java platform with using of graphic library Swing and MVC architecture. Next chapter contains description of implementation of the application and stemming algorithms. In the last part of this master's thesis experiments with stemming algorithms and comparing the algorithm from viewpoint to the results of classification the text are described.
Scala Programming Language and Its Use for Data Analysis
Kohout, Tomáš ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor)
This thesis deals with comparing the Scala programming language with other commonly used languages for data analysis. These languages are evaluated on the basis of the following categories: data manipulation and visualization, machine learning and concurent processing capabilities. The evaluation then shows the strengths and weaknesses of Scala. The strengths will be demonstrated on application for email categorization.
Processing of User Reviews
Cihlářová, Dita ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
Very often, people buy goods on the Internet that they can not see and try. They therefore rely on reviews of other customers. However, there may be too many reviews for a human to handle them quickly and comfortably. The aim of this work is to offer an application that can recognize in Czech reviews what features of a product are most commented and whether the commentary is positive or negative. The results can save a lot of time for e-shop customers and provide interesting feedback to the manufacturers of the products.
Estimation of Emotions from a Text
Dufková, Aneta ; Fajčík, Martin (referee) ; Szőke, Igor (advisor)
This thesis describes a process of estimation of emotions from a text using machine learning. The process starts with research of existing methods, continues with choosing a suitable method and experimenting. It uses several datasets, combines them and tests different techniques of text preprocessing. The result is a web interface which uses the pretrained model and allows to estimate emotions from Twitter posts.
Assessment and implementation of text data preprocessing in neural network models
Ratnasari, Febiyanti
In the realm of text data processing, text preprocessing has traditionally played a significant role. However, with the growing prominence of neural network models and novel representations of textual data, the importance of text preprocessing has been relatively understated. To address this, the present research endeavors to investigate the potential benefits of employing a composite of multiple text data preprocessing techniques in conjunction with a neural network-based text processing model.
Rychlý a trénovatelný tokenizér pro přirozené jazyky
Maršík, Jiří ; Bojar, Ondřej (advisor) ; Spousta, Miroslav (referee)
In this thesis, we present a data-driven system for disambiguating token and sentence boundaries. The implemented system is highly configurable and versatile to the point its tokenization abilities allow to segment unbroken Chinese text. The tokenizer relies on maximum entropy classifiers and requires a sample of tokenized and segmented text as training data. The program is accompanied by a tool for reporting the performance of the tokenization which helps to rapidly develop and tune the tokenization process. The system was built with multi-platform libraries only and with emphasis on speed and correctness. After a necessary survey of other tools for text tokenization and segmentation and a short introduction to maximum entropy modelling, a large part of the thesis focuses on the particular implementation we developed and its evaluation.
Estimation of Emotions from a Text
Dufková, Aneta ; Fajčík, Martin (referee) ; Szőke, Igor (advisor)
This thesis describes a process of estimation of emotions from a text using machine learning. The process starts with research of existing methods, continues with choosing a suitable method and experimenting. It uses several datasets, combines them and tests different techniques of text preprocessing. The result is a web interface which uses the pretrained model and allows to estimate emotions from Twitter posts.
Scala Programming Language and Its Use for Data Analysis
Kohout, Tomáš ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor)
This thesis deals with comparing the Scala programming language with other commonly used languages for data analysis. These languages are evaluated on the basis of the following categories: data manipulation and visualization, machine learning and concurent processing capabilities. The evaluation then shows the strengths and weaknesses of Scala. The strengths will be demonstrated on application for email categorization.
Processing of User Reviews
Cihlářová, Dita ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
Very often, people buy goods on the Internet that they can not see and try. They therefore rely on reviews of other customers. However, there may be too many reviews for a human to handle them quickly and comfortably. The aim of this work is to offer an application that can recognize in Czech reviews what features of a product are most commented and whether the commentary is positive or negative. The results can save a lot of time for e-shop customers and provide interesting feedback to the manufacturers of the products.
Statistical methods in stylometry
Dupal, Pavel ; Kaspříková, Nikola (advisor) ; Šulc, Zdeněk (referee)
The aim of this thesis is to provide an overview of some of the commonly used methods in the area of authorship attribution (stylometry). The text begins with a recap of history from the end of the 19th century to present time and the required terminology from the field of text mining is presented and explained. What follows is a list of selected methods from the field of multidimensional statistics (principal components analysis, cluster analysis) and machine learning (Support Vector Machines, Naive Bayes) and their application as pertains to stylometrical problems, including several methods created specifically for use in this field (bootstrap consensus tree, contrast analysis). Finally these same methods are applied to a practical problem of authorship verification based on a corpus bulit from the works of four internet writers.

National Repository of Grey Literature : 12 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.