National Repository of Grey Literature 55 records found  beginprevious32 - 41nextend  jump to record: Search took 0.00 seconds. 
Generator of computer descriptions
Matějka, Jan ; Rosa, Rudolf (advisor) ; Dušek, Ondřej (referee)
This thesis deals with the problem of generating coherent and well-formed sentences from structured data. The goal of the thesis is to create a tool which could make generating brief descriptions of electronics based on parameters in the form of structured data easier. The tool can be useful for e.g. e-shops with such electronics. The first part of the thesis introduces possible solutions to this problem. The thesis next describes data needed for solving the problem, including the ways of acquiring such data and structure of the data. Two selected solutions are then described including their implementation. The thesis then examines the advantages and disadvantages of the selected solutions and evaluates texts generated by the created tool.
Analýza textových používateľských hodnotení vybranej skupiny produktov
Valovič, Roman
This work focuses on the design of a system that identifies frequently discussed product features in product reviews, summarizes them, and displays them to the user in terms of sentiment. The work deals with the issue of natural language processing, with a specific focus on Czech languague. The reader will be introduced the methods of preprocessing the text and their impact on the quality of the analysis results. The identification of the mainly discussed products features is carried out by cluster analysis using the K-Means algorithm, where we assume that sufficiently internally homogeneous clusters will represent the individual features of the products. A new area that will be explored in this work is the representation of documents using the Word embeddings technique, and its potential of using vector space as input for machine learning algorithms.
XML Databases for Dictionary Data Management
Samia, Michel ; Dytrych, Jaroslav (referee) ; Smrž, Pavel (advisor)
The following diploma thesis deals with dictionary data processing, especially those in XML based formats. At first, the reader is acquainted with linguistic and lexicographical terms used in this work. Then particular lexicographical data format types and specific formats are introduced. Their advantages and disadvantages are discussed as well. According to previously set criteria, the LMF format has been chosen for design and implementation of Python application, which focuses especially on intelligent merging of more dictionaries into one. After passing all unit tests, this application has been used for processing LMF dictionaries, located on the faculty server of the research group for natural language processing. Finally, the advantages and disadvantages of this application are discussed and ways of further usage and extension are suggested.
Comparison of approaches to text classification
Knížek, Jan ; Hana, Jiří (advisor) ; Vidová Hladká, Barbora (referee)
The focus of this thesis is short text classification. Short text is the prevailing form of text on e-commerce and review platforms, such as Yelp, Tripadvisor or Heureka. As the popularity of the online communication is increasing, it is becoming infeasible for users to filter information manually. It is therefore becoming more and more important to recog- nise the relevant information in text. Classification of reviews is especially challenging, because they have limited structure, use informal language, contain a high number of errors and rely heavily on context and common knowledge. One of the possible appli- cations of machine learning is to automatically filter data and show users only relevant pieces of information. We work with restaurant reviews from Yelp and aim to predict their usefulness. Most restaurants have relatively many reviews, yet only few are truly useful. Our objective is to compare machine learning methods for predicting usefulness. 1
A chatbot for the banking domain
Schmidtová, Patrícia ; Dušek, Ondřej (advisor) ; Rosa, Rudolf (referee)
This thesis designs, implements and evaluates a task-based chatbot, which is expected to answer questions and give advice from the banking domain. We present an extendable natural language understanding (NLU) module based on GATE Framework which serves to create interpretations of the user's utterance. We implement a rule-based dialog manager component which is responsible for answering based on the NLU's interpretations and a stored context. We also implement a template-based natural language generation module. We then evaluate the chatbot with human testers, verifying it performs well in most cases and identifying areas for future improvement. 1
Chatbot for Smart Cities
Jusko, Ján ; Herout, Adam (referee) ; Zemčík, Pavel (advisor)
The aim of this work is to simplify access to information for citizens of the city of Brno and at the same time to innovate the way of communication between the citizen and his city. The problem is solved by creating a conversational agent - chatbot Kroko. Using artificial intelligence and a Czech language analyzer, the agent is able to understand and respond to a certain set of textual, natural language queries. The agent is available on the Messenger platform and has a knowledge base that includes data provided by the city council. After conducting an extensive user testing on a total of 76 citizens of the city, it turned out that up to 97\% of respondents like the idea of a city-oriented chatbot and can imagine using it regularly. The main finding of this work is that the general public can easily adopt and effectively use a chatbot. The results of this work motivate further development of practical applications of conversational agents.
Word2vec Models with Added Context Information
Šůstek, Martin ; Rozman, Jaroslav (referee) ; Zbořil, František (advisor)
This thesis is concerned with the explanation of the word2vec models. Even though word2vec was introduced recently (2013), many researchers have already tried to extend, understand or at least use the model because it provides surprisingly rich semantic information. This information is encoded in N-dim vector representation and can be recall by performing some operations over the algebra. As an addition, I suggest a model modifications in order to obtain different word representation. To achieve that, I use public picture datasets. This thesis also includes parts dedicated to word2vec extension based on convolution neural network.
Content classification in legal documents
Bečvarová, Lucia ; Žabokrtský, Zdeněk (advisor) ; Holub, Martin (referee)
This thesis presents an applied research for the needs of a company Datlowe, s.r.o. aimed at automatic processing of legal documents. The goal of the work is to design, implement and evaluate a classification module that is able to assign categories to the paragraphs of the documents. Several classification algorithms are used, evaluated and compared to each other to be consequently combined to obtain the best models. The outcome is a prediction module which was successfully integrated into the entire document processing system. Other contributions, along with the classification module, are the measurement of the inter-annotator agreement and introducing new set of features for classification.
User simulation for statistical dialogue systems
Michlíková, Vendula ; Jurčíček, Filip (advisor) ; Žabokrtský, Zdeněk (referee)
The purpose of this thesis is to develop and evaluate user simulators for a spoken dialogue system. Created simulators are operating on dialogue act level. We implemented a bigram simulator as a baseline system. Based on the baseline simulator, we created another bigram simulator that is trained on dialogue acts without slot values. The third implemented simulator is similar to an implemen- tation of a dialogue manager. It tracks its dialogue state and learns a dialogue strategy based on the state using supervised learning. The user simulators are implemented in Python 2.7, in ALEX framework for dialogue system development. Simulators are developed for PTICS application which operates in the domain of public transport information. Simulators are trained and evaluated using real human-machine dialogues collected with PTICS application. 1
Assessing the impact of manual corrections in the Groningen Meaning Bank
Weck, Benno ; Lopatková, Markéta (advisor) ; Vidová Hladká, Barbora (referee)
The Groningen Meaning Bank (GMB) project develops a corpus with rich syntactic and semantic annotations. Annotations in GMB are generated semi-automatically and stem from two sources: (i) Initial annotations from a set of standard NLP tools, (ii) Corrections/refinements by human annotators. For example, on the part-of-speech level of annotation there are currently 18,000 of those corrections, so called Bits of Wisdom (BOWs). For applying this information to boost the NLP processing we experiment how to use the BOWs in retraining the part-of-speech tagger and found that it can be improved to correct up to 70% of identified errors within held-out data. Moreover an improved tagger helps to raise the performance of the parser. Preferring sentences with a high rate of verified tags in retraining has proven to be the most reliable way. With a simulated active learning experiment using Query-by-Uncertainty (QBU) and Query-by- Committee (QBC) we proved that selectively sampling sentences for retraining yields better results with less data needed than random selection. In an additional pilot study we found that a standard maximum-entropy part-of-speech tagger can be augmented so that it uses already known tags to enhance its tagging decisions on an entire sequence without retraining a new model first. Powered by...

National Repository of Grey Literature : 55 records found   beginprevious32 - 41nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.