keywords:"text preprocessing" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"text preprocessing"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Scala Programming Language and Its Use for Data Analysis Kohout, Tomáš ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor) This thesis deals with comparing the Scala programming language with other commonly used languages for data analysis. These languages are evaluated on the basis of the following categories: data manipulation and visualization, machine learning and concurent processing capabilities. The evaluation then shows the strengths and weaknesses of Scala. The strengths will be demonstrated on application for email categorization. Detailed record
	Processing of User Reviews Cihlářová, Dita ; Burget, Radek (referee) ; Bartík, Vladimír (advisor) Very often, people buy goods on the Internet that they can not see and try. They therefore rely on reviews of other customers. However, there may be too many reviews for a human to handle them quickly and comfortably. The aim of this work is to offer an application that can recognize in Czech reviews what features of a product are most commented and whether the commentary is positive or negative. The results can save a lot of time for e-shop customers and provide interesting feedback to the manufacturers of the products. Detailed record
	Statistical methods in stylometry Dupal, Pavel ; Kaspříková, Nikola (advisor) ; Šulc, Zdeněk (referee) The aim of this thesis is to provide an overview of some of the commonly used methods in the area of authorship attribution (stylometry). The text begins with a recap of history from the end of the 19th century to present time and the required terminology from the field of text mining is presented and explained. What follows is a list of selected methods from the field of multidimensional statistics (principal components analysis, cluster analysis) and machine learning (Support Vector Machines, Naive Bayes) and their application as pertains to stylometrical problems, including several methods created specifically for use in this field (bootstrap consensus tree, contrast analysis). Finally these same methods are applied to a practical problem of authorship verification based on a corpus bulit from the works of four internet writers. Detailed record
	Rychlý a trénovatelný tokenizér pro přirozené jazyky Maršík, Jiří ; Bojar, Ondřej (advisor) ; Spousta, Miroslav (referee) In this thesis, we present a data-driven system for disambiguating token and sentence boundaries. The implemented system is highly configurable and versatile to the point its tokenization abilities allow to segment unbroken Chinese text. The tokenizer relies on maximum entropy classifiers and requires a sample of tokenized and segmented text as training data. The program is accompanied by a tool for reporting the performance of the tokenization which helps to rapidly develop and tune the tokenization process. The system was built with multi-platform libraries only and with emphasis on speed and correctness. After a necessary survey of other tools for text tokenization and segmentation and a short introduction to maximum entropy modelling, a large part of the thesis focuses on the particular implementation we developed and its evaluation. Detailed record
	Stemming Methods Used in Text Mining Adámek, Tomáš ; Chmelař, Petr (referee) ; Bartík, Vladimír (advisor) The main theme of this master's thesis is a description of text mining. This document is specialized to English texts and their automatic data preprocessing. The main part of this thesis analyses various stemming algorithms (Lovins, Porter and Paice/Husk). Stemming is a procedure for automatic conflating semantically related terms together via the use of rule sets. Next part of this thesis describes design of an application for various types of stemming algorithms. Application is based on the Java platform with using of graphic library Swing and MVC architecture. Next chapter contains description of implementation of the application and stemming algorithms. In the last part of this master's thesis experiments with stemming algorithms and comparing the algorithm from viewpoint to the results of classification the text are described. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English