National Repository of Grey Literature 32 records found  beginprevious23 - 32  jump to record: Search took 0.02 seconds. 
Popularity Meter
Hajič, Jan ; Bojar, Ondřej (advisor) ; Popel, Martin (referee)
Having the possibility of automatically tracking a person's popularity in the newspapers is an idea appealing not just to those in the media spotlight. While sentiment (subjectivity) analysis is a rapidly growing subfield of computational linguistics, no data from the news domain are yet available for Czech. We have therefore started building a manually annotated polarity corpus of sentences from Czech news texts; however, these texts have proven themselves rather unwieldy for such processing. We have also designed a classifier which should be able to track popularity based on this corpus; the classifier has been tested on a corpus of product reviews of domestic appliances and some introductory testing has been done on the nascent news corpus. As a model, we simply extract a unigram polarity lexicon from the data. We then use three related methods for identifying lemma polarity and a number of simple filters for feature selection. On the domestic appliance data, our simplest model has achieved results comparable to the state of the art, however, the properties of Czech news texts and preliminary results hint a more linguistically oriented approach might be preferrable.
Feature selection for text classification with Naive Bayes
Lux, Erik ; Petříčková, Zuzana (advisor) ; Petříček, Martin (referee)
The work presents the field of document classification. It describes existing techniques with emphasis on the Naive Bayes' classifier. Several existing feature selection methods suitable for the Naive Bayes' classifier are discussed. This theoretical background is the basis for the implementation of a classification library based on the Naive Bayes' method. Besides the classification program, the library provides a range of document preprocessing tools. They allow to work with different types of documents and, more importantly, they significantly reduce redundant document dimensions. Eventually, we tested the library on two different datasets and compared implemented feature selection methods. The functionality of the whole library is practically verified by including it into the open-source email client Mailpuccino.
Crude Oil Price Forecast based on Text News
Skalický, Jan ; Bojar, Ondřej (advisor) ; Žabokrtský, Zdeněk (referee)
For crude oil price forecast, there is a whole range of algorithms. In this thesis we bring out a new perspective on this issue and introduce our project COPF. Using a maximum entropy classifier, we try to predict the change in crude oil price from text information available on the Internet. We are taking advantage of the knowledge of experts in the field. As a part of the thesis, we tested and improved COPF precision. We have found out that this approach poses a lot of interesting problems. In the current state, the precision of our prediction surpassed the baseline but for further development, it is necessary to obtain more data sources. Our algorithm has never been regarded as a self-standing method but it may nicely complement numerical algorithms.
Cleaning, extraction of text and transformation of web pages into vertical format
Švaňa, Miloš ; Otrusina, Lubomír (referee) ; Dytrych, Jaroslav (advisor)
This thesis deals with the topic of extraction of text from web page, recognition of important contents and its transformation to vertical format, which can be used as a suitable input for other natural language processing tasks. It analyzes the existing solution and its components with emphasis on its disadvantages and describes the design and implementation of new solution based on obtained knowledge.
Adaptive RSS Reader
Luža, Jindřich ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
Purpose of this balcheor thesis is posibility to enhance common RSS reader by extension, which allowing user filter RSS feed depends on that's classification by content to groups.There is discussed problems in common classification and in text classification. Forth, there is reveal teoretical aspect of RSS format, which is needed to be considered in implementation of RSS reader module and prototype of module. At last, testing of used classifier is stated here.
Intelligent Mailbox
Pohlídal, Antonín ; Drozd, Michal (referee) ; Chmelař, Petr (advisor)
This master's thesis deals with the use of text classification for sorting of incoming emails. First, there is described the Knowledge Discovery in Databases and there is also analyzed in detail the text classification with selected methods. Further, this thesis describes the email communication and SMTP, POP3 and IMAP protocols. The next part contains design of the system that classifies incoming emails and there are also described realated technologie ie Apache James Server, PostgreSQL and RapidMiner. Further, there is described the implementation of all necessary components. The last part contains an experiments with email server using Enron Dataset.
Actual Events Tracker
Odstrčilík, Martin ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor)
The goal of the master thesis project was to develop an application for tracking of actual events in the surrounding area of the users. This application should allow the users to view events, create new events and add comments to existing ones. Beyond the implementation of developed application, this project deals with an analysis of the presented problem. The analysis includes a comparison with existing solutions and search for available technologies and frameworks applicable for implementation. Another part inside this work is description of the theory in behind of data classification that is internally used for event and comment analysis. This work also includes a design of appliction including design of user interface, software architecture, database, communication protocol and data classifiers. The main part of this project, the implementation, is described aftewards. At the end of this work, there is a summary of the whole process and also there are given some ideas about enhancing the application in the future.
A Classification of a Syndicated Content
Matušov, Izidor ; Očenášek, Pavel (referee) ; Smrčka, Aleš (advisor)
This work deals with a classification of a syndicated content as the possible way of organizing the content. The classification uses algorithms for natural language processing. The main contribution is applying word sense disambiguation algorithm for enhancing the classification, eliminating the learning stage, and using a readability test for improving user experience. The application is implemented as an extensible server-client model. The future work is discussed in the end.
Application of finite mixtures to text document classification
Novovičová, Jana ; Malík, Antonín
Finite mixture modelling of class-conditional distributions is a standard method in a statistical pattern recognition. We proposed to use the mixture of multinomial distributions as a model for class-conditional distribution for text document classification task. The vector document representations using a bag-of-words or a unigram approach are employed. Experimental comparison of the proposed model and the standard models was performed using Reuters-21578 database.

National Repository of Grey Literature : 32 records found   beginprevious23 - 32  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.