National Repository of Grey Literature 15 records found  1 - 10next  jump to record: Search took 0.01 seconds. 
Stemming Methods Used in Text Mining
Adámek, Tomáš ; Chmelař, Petr (referee) ; Bartík, Vladimír (advisor)
The main theme of this master's thesis is a description of text mining. This document is specialized to English texts and their automatic data preprocessing. The main part of this thesis analyses various stemming algorithms (Lovins, Porter and Paice/Husk). Stemming is a procedure for automatic conflating semantically related terms together via the use of rule sets. Next part of this thesis describes design of an application for various types of stemming algorithms. Application is based on the Java platform with using of graphic library Swing and MVC architecture. Next chapter contains description of implementation of the application and stemming algorithms. In the last part of this master's thesis experiments with stemming algorithms and comparing the algorithm from viewpoint to the results of classification the text are described.
Knowledge Discovery from Text Data in the Python Language
Homola, Ján ; Hynek, Jiří (referee) ; Bartík, Vladimír (advisor)
This bachelor thesis deals with knowledge discovery from text data more specifically classification of text-based user reviews. Using experiments, this thesis focuses on methods for preprocessing text data and comparing different classification methods through selected datasets. The conclusion of the work is the evaluation of the achieved results of experiments that were performed using the implemented application.
Scala Programming Language and Its Use for Data Analysis
Kohout, Tomáš ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor)
This thesis deals with comparing the Scala programming language with other commonly used languages for data analysis. These languages are evaluated on the basis of the following categories: data manipulation and visualization, machine learning and concurent processing capabilities. The evaluation then shows the strengths and weaknesses of Scala. The strengths will be demonstrated on application for email categorization.
Processing of User Reviews
Cihlářová, Dita ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
Very often, people buy goods on the Internet that they can not see and try. They therefore rely on reviews of other customers. However, there may be too many reviews for a human to handle them quickly and comfortably. The aim of this work is to offer an application that can recognize in Czech reviews what features of a product are most commented and whether the commentary is positive or negative. The results can save a lot of time for e-shop customers and provide interesting feedback to the manufacturers of the products.
Estimation of Emotions from a Text
Dufková, Aneta ; Fajčík, Martin (referee) ; Szőke, Igor (advisor)
This thesis describes a process of estimation of emotions from a text using machine learning. The process starts with research of existing methods, continues with choosing a suitable method and experimenting. It uses several datasets, combines them and tests different techniques of text preprocessing. The result is a web interface which uses the pretrained model and allows to estimate emotions from Twitter posts.
Assessment and implementation of text data preprocessing in neural network models
Ratnasari, Febiyanti
In the realm of text data processing, text preprocessing has traditionally played a significant role. However, with the growing prominence of neural network models and novel representations of textual data, the importance of text preprocessing has been relatively understated. To address this, the present research endeavors to investigate the potential benefits of employing a composite of multiple text data preprocessing techniques in conjunction with a neural network-based text processing model.
Text Analysis in Specialized Translation: Accuracy and Error Rate
Parobková, Alžbeta ; Marcoň, Petr (referee) ; Dohnal, Přemysl (advisor)
Práca sa zameriava na prieskum a aplikáciu metód textovej analýzy, strojového prekladu na vyhodnotenie kvality technických textov, preložených práve pomocou strojového automatického prekladu. Praktická časť využíva tieto metódy na implementáciu algoritmu pre identifikáciu a klasifikáciu chýb. Ďaľšou časťou praktickej časti je aj aplikácia a natrénovanie neurónového modelu pre korekciu týchto chýb. Porovnanie chybovosti a presnosti prekladu rôznymi prekladačmi je potom preukázané nie len kvalitatívne, ale aj kvantitatívne pomocou štandartných metrík.
Knowledge Discovery from Text Data in the Python Language
Homola, Ján ; Hynek, Jiří (referee) ; Bartík, Vladimír (advisor)
This bachelor thesis deals with knowledge discovery from text data more specifically classification of text-based user reviews. Using experiments, this thesis focuses on methods for preprocessing text data and comparing different classification methods through selected datasets. The conclusion of the work is the evaluation of the achieved results of experiments that were performed using the implemented application.
Rychlý a trénovatelný tokenizér pro přirozené jazyky
Maršík, Jiří ; Bojar, Ondřej (advisor) ; Spousta, Miroslav (referee)
In this thesis, we present a data-driven system for disambiguating token and sentence boundaries. The implemented system is highly configurable and versatile to the point its tokenization abilities allow to segment unbroken Chinese text. The tokenizer relies on maximum entropy classifiers and requires a sample of tokenized and segmented text as training data. The program is accompanied by a tool for reporting the performance of the tokenization which helps to rapidly develop and tune the tokenization process. The system was built with multi-platform libraries only and with emphasis on speed and correctness. After a necessary survey of other tools for text tokenization and segmentation and a short introduction to maximum entropy modelling, a large part of the thesis focuses on the particular implementation we developed and its evaluation.
Estimation of Emotions from a Text
Dufková, Aneta ; Fajčík, Martin (referee) ; Szőke, Igor (advisor)
This thesis describes a process of estimation of emotions from a text using machine learning. The process starts with research of existing methods, continues with choosing a suitable method and experimenting. It uses several datasets, combines them and tests different techniques of text preprocessing. The result is a web interface which uses the pretrained model and allows to estimate emotions from Twitter posts.

National Repository of Grey Literature : 15 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.