National Repository of Grey Literature 25 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Word Sense Clustering
Haljuk, Petr ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This Bachelor's thesis deals with the semantic similarity of words . It describes the design and the implementation of a system, which searches for the most similar words and measures the semantic similarity of words . The system uses the Word2Vec model from GenSim library . It learns the relations among words from CommonCrawl corpus .
Word2vec Models with Added Context Information
Šůstek, Martin ; Rozman, Jaroslav (referee) ; Zbořil, František (advisor)
This thesis is concerned with the explanation of the word2vec models. Even though word2vec was introduced recently (2013), many researchers have already tried to extend, understand or at least use the model because it provides surprisingly rich semantic information. This information is encoded in N-dim vector representation and can be recall by performing some operations over the algebra. As an addition, I suggest a model modifications in order to obtain different word representation. To achieve that, I use public picture datasets. This thesis also includes parts dedicated to word2vec extension based on convolution neural network.
Word Sense Clustering
Hošták, Viliam Samuel ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with semantic similarity of words. It describes and compares existing models that are currently used for this purpose. It discusses the design and implementation of the system for corpus preprocessing, semantic modelling and retrieval of semantically related words. The system that has been created supports the use of distributional semantic models Word2vec, FastText and Glove.
Advanced Machine-Learning Methods for Text Classification
Dočekal, Martin ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with advanced machine-learning methods for text classification. At first, these methods are described, and then text classification system is created based on these methods. The system also provides tools for document preprocessing and evaluation of classifier. The thesis describes the use of the system in a real-life task.
Word Sense Clustering
Jadrníček, Zbyněk ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis is focused on the problem of semantic similarity of words in English language. At first reader is informed about theory of word sense clustering, then there are described chosen methods and tools related to the topic. In the practical part we design and implement system for determining semantic similarity using Word2Vec tool, particularly we focus on biomedical texts of MEDLINE database. At the end of the thesis we discuss reached results and give some ideas to improve the system.
Identifying Term Similarity in Information Technology Domain
Smutka, Miloslav ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This bachelor thesis works with the idea, implementation and evaluation of resulting system for retrieval of semantically related words. For the determination of word relations, gensim library word2vec model is used.
A Tool for Recognition and Verification of Spedition Orders
Kalivoda, Vojtěch ; Hradiš, Michal (referee) ; Herout, Adam (advisor)
The aim of this work is to design and implement a web tool that will facilitate the work of dispatchers of forwarding and transport companies through automated recognition of important information in orders. Thanks to the recognition, not all information has to be manually rewritten by dispatchers, which saves time. Order recognition is based on finding entities in a document, representing its surroundings with vectors using word2vec models and subsequent classification using convolutional neural networks. The tool can recognize 20 types of information in real time with an average success rate of 72.35~\%. As part of the work, a dataset of almost 1~700 orders was collected and 141 of them were annotated. Part of the work is a web application that serves as an interface for the tool and data collection.
Binární klasifikace zákaznických incidentů pomocí metod NLP
Pokorný, Jiří
This bachelor thesis focuses on building a model for binary classification of customer incidents within the SAP system. By classifying the individual sentences of incidents, the final category of the incident is predicted. The used text is in English. To compare traditional and modern approaches to text classification as well as obtain optimal results, a series of experiments is carried out using different methods of balancing the dataset, vector representation and classification. Finally, the results are analyzed and recommendation is formulated with regard to further development, including applying knowledge gained within the SAP environment.
A Tool for Recognition and Verification of Spedition Orders
Kalivoda, Vojtěch ; Hradiš, Michal (referee) ; Herout, Adam (advisor)
The aim of this work is to design and implement a web tool that will facilitate the work of dispatchers of forwarding and transport companies through automated recognition of important information in orders. Thanks to the recognition, not all information has to be manually rewritten by dispatchers, which saves time. Order recognition is based on finding entities in a document, representing its surroundings with vectors using word2vec models and subsequent classification using convolutional neural networks. The tool can recognize 20 types of information in real time with an average success rate of 72.35~\%. As part of the work, a dataset of almost 1~700 orders was collected and 141 of them were annotated. Part of the work is a web application that serves as an interface for the tool and data collection.
Mining Parallel Corpora from the Web
Kúdela, Jakub ; Holubová, Irena (advisor)
Title: Mining Parallel Corpora from the Web Author: Bc. Jakub Kúdela Author's e-mail address: jakub.kudela@gmail.com Department: Department of Software Engineering Thesis supervisor: Doc. RNDr. Irena Holubová, Ph.D. Supervisor's e-mail address: holubova@ksi.mff.cuni.cz Thesis consultant: RNDr. Ondřej Bojar, Ph.D. Consultant's e-mail adress: bojar@ufal.mff.cuni.cz Abstract: Statistical machine translation (SMT) is one of the most popular ap- proaches to machine translation today. It uses statistical models whose parame- ters are derived from the analysis of a parallel corpus required for the training. The existence of a parallel corpus is the most important prerequisite for building an effective SMT system. Various properties of the corpus, such as its volume and quality, highly affect the results of the translation. The web can be considered as an ever-growing source of considerable amounts of parallel data to be mined and included in the training process, thus increasing the effectiveness of SMT systems. The first part of this thesis summarizes some of the popular methods for acquiring parallel corpora from the web. Most of these methods search for pairs of parallel web pages by looking for the similarity of their structures. How- ever, we believe there still exists a non-negligible amount of parallel...

National Repository of Grey Literature : 25 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.