National Repository of Grey Literature 60 records found  beginprevious27 - 36nextend  jump to record: Search took 0.01 seconds. 
Content classification in legal documents
Bečvarová, Lucia ; Žabokrtský, Zdeněk (advisor) ; Holub, Martin (referee)
This thesis presents an applied research for the needs of a company Datlowe, s.r.o. aimed at automatic processing of legal documents. The goal of the work is to design, implement and evaluate a classification module that is able to assign categories to the paragraphs of the documents. Several classification algorithms are used, evaluated and compared to each other to be consequently combined to obtain the best models. The outcome is a prediction module which was successfully integrated into the entire document processing system. Other contributions, along with the classification module, are the measurement of the inter-annotator agreement and introducing new set of features for classification.
SVM classifiers and heuristics for feature selection
Krupka, Tomáš ; Holub, Martin (advisor) ; Kopa, Miloš (referee)
In machine learning applications with a large number of computer-generated features, a selection of just a subset of features is often desirable. The Recursive Feature Elimination (SVM-RFE) algorithm proposed by Guyon et al. (2002) employs the mechanism of selecting the features based on their contribution to an SVM model decision rule, and has proven a state-of-the-art performance on the Gene Selection for Cancer Classification task (Tan et al. (2010)). This thesis expands on that work, and proposes a novel modification of the SVM-RFE feature selection method called Evaluation-Based RFE (EB-RFE). This heuristic significantly improves the performance of the SVM classifier in comparison to the original SVM-RFE on the studied machine learning task. In addition to the performance gain, the proposed algorithm has also, in experimental use, proven to have two other desirable properties. Firstly, EB-RFE produces much smaller feature subsets than SVM-RFE, which leads to more compact models. Secondly, unlike SVM-RFE, the EB-RFE heuristic is easily scalable with the computational time well beyond the possibilities of current high-end consumer CPUs. Powered by TCPDF (www.tcpdf.org)
Semantic information from FrameNet and the possibility of its transfer to Czech data
Limburská, Adéla ; Lopatková, Markéta (advisor) ; Holub, Martin (referee)
The thesis focuses on transferring FrameNet annotation from English to Czech and the possibilities of using the resulting data for automatic frame prediction in Czech. The first part, annotation transfer, has been performed in two ways. First, a parallel corpus of English sentences and their human created Czech translations (PCEDT) was used. Second, a much larger parallel corpus was created using ma- chine translation of FrameNet example sentences. This corpus was then used to transfer the annotation as well. The resulting data were partially evaluated and some of the automatically detectable errors were filtered out. Subsequently, the data were used as an input for two machine learning methods, decision trees and support vector machines. Since neither of the machine learning experiments brought impressive results, further manual correction of the data annotation was performed, which helped increase the accuracy of the prediction. However, as the accuracy reported in related papers is notably higher, the thesis also discusses dif- ferent approaches to feature selection and the possibility of further improvement of the prediction results using these methods. 1
The Internet and Private International Law
Holub, Martin ; Pauknerová, Monika (advisor) ; Pfeiffer, Magdalena (referee)
The main focus of the thesis is the issue of determining jurisdiction in matters of tort, delict or quasi-delict with regard to the internet. The author finds that the general rules of determining jurisdiction are suitable for use even in disputes arising with connection to the internet. However, strict application of the aforementioned rules would lead to undesirable results. Therefore it is necessary to construe the general rules in such a way that takes into account the unique characteristics of the internet environment. Given the fact that courts are mainly responsible for the interpretation and application of the general rules, significant decisions of European and American courts are thoroughly analyzed. Even though the main focus of the thesis are the decisions of the courts, recent findings of jurisprudence and recommendations of the international bodies are taken into account as well. In the opening chapters, the unique characteristics of the internet and basic rules for determining the special jurisdiction are presented. Although the issue of determining jurisdiction in contracts is also mentioned in chapter 3, this topic exceeds the scope of this work and is discussed mainly in connection with the "targeting" criterion, which is also significant for out of contract issues. Chapters 4 and...
Query expansion for medical information retrieval
Bibyna, Feraena ; Pecina, Pavel (advisor) ; Holub, Martin (referee)
One of the challenges in medical information retrieval is the terminology gap between the documents (commonly written by medical professional, using medical jargons), and the queries (commonly composed by non professional, using layman terms). In this thesis, we investigate the effect of query expansion, using domain-specific knowledge resource, to deal with this challenge. We use the Unified Medical Language System (UMLS), a repository of biomedical vocabularies, and utilize two of its resources: the Metathesaurus and the Semantic Network. We use the query set and document set provided by CLEF eHealth organizer. The query sets, provided for the medical information retrieval shared task, represent two different use cases of medical information retrieval. We experiment with query expansion using synonymous terms and non-synonymous concepts, blind relevance feedback, field weighting, and linear interpolation of different systems. Powered by TCPDF (www.tcpdf.org)
Automatic suggestion of illustrative images
Odcházel, Ondřej ; Pecina, Pavel (advisor) ; Holub, Martin (referee)
The objective of this thesis is to implement a web application designed for recommendation of stock photos. The application gets the input from newspaper articles in Czech or English and, based on the text itself, suggests appropriate stock photos. The implemented application also searches images according to visual similarity. The thesis deals with theoretical aspects of keywords extraction and language of text detection. Further it analyzes possibilities of efficient search for similar vectors that are used in the search component for visually similar images. It also describes the possibilities in development of modern web frontend and backend. The quality of algorithm for recommending stock photos is tested on users. Powered by TCPDF (www.tcpdf.org)
Automatic construction of semantic networks
Kirschner, Martin ; Pecina, Pavel (advisor) ; Holub, Martin (referee)
Presented work explores the possibilities of automatic construction and expansion of semantic networks with use of machine learning methods. The main focus is put on the feature retrieving procedure for the data set. The work presents a method of semantic relation retrieval, based on distributional hypothesis and trained on the data from Czech WordNet. We also show the first results for Czech language in this area of research. Part of the thesis is also a set of software for processing and evaluating of input data and a overview and discussion about its results on real-world data. The resulting tools can process data of amount in orders of hundreds of millions of words. The research part of the thesis used Czech morphologically and syntactically annotated data, but the methods are not language dependent.
Combining text-based and vision-based semantics
Tran, Binh Giang ; Holub, Martin (advisor) ; Straková, Jana (referee)
Learning and representing semantics is one of the most important tasks that significantly contribute to some growing areas, as successful stories in the recent survey of Turney and Pantel (2010). In this thesis, we present an in- novative (and first) framework for creating a multimodal distributional semantic model from state of the art text-and image-based semantic models. We evaluate this multimodal semantic model on simulating similarity judgements, concept clustering and the newly introduced BLESS benchmark. We also propose an effective algorithm, namely Parameter Estimation, to integrate text- and image- based features in order to have a robust multimodal system. By experiments, we show that our technique is very promising. Across all experiments, our best multimodal model claims the first position. By relatively comparing with other text-based models, we are justified to affirm that our model can stay in the top line with other state of the art models. We explore various types of visual features including SIFT and other color SIFT channels in order to have prelim- inary insights about how computer-vision techniques should be applied in the natural language processing domain. Importantly, in this thesis, we show evi- dences that adding visual features (as the perceptual information coming from...
Automatic construction of semantic networks
Kirschner, Martin ; Pecina, Pavel (advisor) ; Holub, Martin (referee)
Presented work explores the possibilities of automatic construction and expansion of semantic networks with use of machine learning methods. The main focus is put on the feature retrieving procedure for the data set. The work presents a robust method of semantic relation retrieval, based on distributional hypothesis and trained on the data from Czech WordNet. We also show the first results for czech language in this area of research. Part of the thesis is also a set of software for processing and evaluating of input data and a overview and discussion about its results on real-world data. The resulting tools can process data of amount in orders of hundreds of millions of words. The research part of the thesis used Czech morphologicaly and syntacticaly annotated data, but the methods are not language dependent.
Classifier for semantic patterns of English verbs
Kríž, Vincent ; Holub, Martin (advisor) ; Bojar, Ondřej (referee)
The goal of the diploma thesis is to design, implement and evaluate classifiers for automatic classification of semantic patterns of English verbs according to a pattern lexicon that draws on the Corpus Pattern Analysis. We use a pilot collection of 30 sample English verbs as training and test data sets. We employ standard methods of machine learning. In our experiments we use decision trees, k-nearest neighbourghs (kNN), support vector machines (SVM) and Adaboost algorithms. Among other things we concentrate on feature design and selection. We experiment with both morpho-syntactic and semantic features. Our results show that the morpho-syntactic features are the most important for statistically-driven semantic disambiguation. Nevertheless, for some verbs the use of semantic features plays an important role.

National Repository of Grey Literature : 60 records found   beginprevious27 - 36nextend  jump to record:
See also: similar author names
1 HOLUB, Marek
25 HOLUB, Martin
14 HOLUB, Michal
5 HOLUB, Miloš
13 HOLUB, Miroslav
5 Holub, Matyáš
14 Holub, Michal
6 Holub, Milan
5 Holub, Miloš
13 Holub, Miroslav
Interested in being notified about new results for this query?
Subscribe to the RSS feed.