National Repository of Grey Literature 15 records found  1 - 10next  jump to record: Search took 0.00 seconds. 
Information Retrieval in Text Data
Tkadlčík, Luboš ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
This thesis researches the issue of text data mining and information retrieval. It describes the most common representations of text documents and retrieval strategies. The aim of this thesis is design and implementation of application, which realises information retrieval via vector space model. The application implements three different ways of similarity calculation: cosine measure, the Jaccard coefficient and the Dice coefficient. Achieved results are assessed. Possible continuance of the project is outlined.
Word Sense Clustering
Jadrníček, Zbyněk ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis is focused on the problem of semantic similarity of words in English language. At first reader is informed about theory of word sense clustering, then there are described chosen methods and tools related to the topic. In the practical part we design and implement system for determining semantic similarity using Word2Vec tool, particularly we focus on biomedical texts of MEDLINE database. At the end of the thesis we discuss reached results and give some ideas to improve the system.
DNS Data Analysis for Mobile Device Identification Purposes
Sporni, Alex ; Bartík, Vladimír (referee) ; Burgetová, Ivana (advisor)
This bachelor's thesis deals with the problem of identification of mobile devices based on DNS data analysis. The thesis provides a theoretical introduction to the computer communication model. This thesis explains the importance of DNS in the terms of network communication between devices, It also presents the provided data sets, which contain real communication of mobile devices. These data sets must be with a suitable technique parsed and stored in a database to provide better data manipulation techniques in the later stages of implementation. This work further describes individual techniques of data processing. It also depicts in detail the methodologies for evaluating the relevance of TF-IDF and the application of cosine similarity to identify the mobile devices. The main output of this work is the evaluation of the achieved results.
Semantic Similarity of Articles
Veselovský, Martin ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor)
This bachelor's thesis deals with modelling of structure of semantic relationships among articles in English language. There are introduced existing methods of articles representation and computation of similarity. The base method is vector space model, which represents document as vector of words. There are given weights of importance to these words using TF-IDF method. Next, there are described advanced methods of modelling, Latent semantic analysis (LSA) and Latent Dirichlet allocation (LDA). This thesis also deals with articles, which are semantically annotated, while weights of annotation words are computed by Stochastic Gradient Descent method. Evaluation of results takes place on the prepared test corpus of documents to which there is reference similarity evaluation.
Semantic Similarity of Texts
Hajdin, Martin ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This paper deals with the determination of the semantic similarity of texts focusing on categorization of web documents in this case bookmarks. The part of the process is a theoretical overview of methods for system implementation. It describes the design and implementation of the various methods used in the system, too. This paper also deals with the evaluation of various methods where the chosen method are tested according to specified criteria.
Combining text-based and vision-based semantics
Tran, Binh Giang ; Holub, Martin (advisor) ; Straková, Jana (referee)
Learning and representing semantics is one of the most important tasks that significantly contribute to some growing areas, as successful stories in the recent survey of Turney and Pantel (2010). In this thesis, we present an in- novative (and first) framework for creating a multimodal distributional semantic model from state of the art text-and image-based semantic models. We evaluate this multimodal semantic model on simulating similarity judgements, concept clustering and the newly introduced BLESS benchmark. We also propose an effective algorithm, namely Parameter Estimation, to integrate text- and image- based features in order to have a robust multimodal system. By experiments, we show that our technique is very promising. Across all experiments, our best multimodal model claims the first position. By relatively comparing with other text-based models, we are justified to affirm that our model can stay in the top line with other state of the art models. We explore various types of visual features including SIFT and other color SIFT channels in order to have prelim- inary insights about how computer-vision techniques should be applied in the natural language processing domain. Importantly, in this thesis, we show evi- dences that adding visual features (as the perceptual information coming from...
Semantic disambiguation using Distributional Semantics
Prodanovic, Srdjan ; Hana, Jiří (advisor) ; Vidová Hladká, Barbora (referee)
Ve statistických modelů sémantiky jsou významy slov pouze na základě jejich distribuční vlastnosti.Základní zdroj je zde jeden slovník, který lze použít pro různé úkoly, kde se význam slov reprezentovány jako vektory v vektorového prostoru, a slovní podoby jako vzdálenosti mezi jejich vektorových osobnosti. Pomocí silných podobnosti, může vhodnost podmínek uvedených zejména v souvislosti se vypočítá a používá pro celou řadu úkolů, jeden z nich je slovo smysl Disambiguation. V této práci bylo vyšetřeno několik různých přístupů k modelům z vektorového prostoru a prováděny tak, aby k překročení vyhodnocení vlastního výkonu na Word Sense disambiguation úkolem Prague Dependency Treebank.
DNS Data Analysis for Mobile Device Identification Purposes
Sporni, Alex ; Bartík, Vladimír (referee) ; Burgetová, Ivana (advisor)
This bachelor's thesis deals with the problem of identification of mobile devices based on DNS data analysis. The thesis provides a theoretical introduction to the computer communication model. This thesis explains the importance of DNS in the terms of network communication between devices, It also presents the provided data sets, which contain real communication of mobile devices. These data sets must be with a suitable technique parsed and stored in a database to provide better data manipulation techniques in the later stages of implementation. This work further describes individual techniques of data processing. It also depicts in detail the methodologies for evaluating the relevance of TF-IDF and the application of cosine similarity to identify the mobile devices. The main output of this work is the evaluation of the achieved results.
Semantic Similarity of Texts
Hajdin, Martin ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This paper deals with the determination of the semantic similarity of texts focusing on categorization of web documents in this case bookmarks. The part of the process is a theoretical overview of methods for system implementation. It describes the design and implementation of the various methods used in the system, too. This paper also deals with the evaluation of various methods where the chosen method are tested according to specified criteria.
Combining text-based and vision-based semantics
Tran, Binh Giang ; Holub, Martin (advisor) ; Straková, Jana (referee)
Learning and representing semantics is one of the most important tasks that significantly contribute to some growing areas, as successful stories in the recent survey of Turney and Pantel (2010). In this thesis, we present an in- novative (and first) framework for creating a multimodal distributional semantic model from state of the art text-and image-based semantic models. We evaluate this multimodal semantic model on simulating similarity judgements, concept clustering and the newly introduced BLESS benchmark. We also propose an effective algorithm, namely Parameter Estimation, to integrate text- and image- based features in order to have a robust multimodal system. By experiments, we show that our technique is very promising. Across all experiments, our best multimodal model claims the first position. By relatively comparing with other text-based models, we are justified to affirm that our model can stay in the top line with other state of the art models. We explore various types of visual features including SIFT and other color SIFT channels in order to have prelim- inary insights about how computer-vision techniques should be applied in the natural language processing domain. Importantly, in this thesis, we show evi- dences that adding visual features (as the perceptual information coming from...

National Repository of Grey Literature : 15 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.