keywords:"zpracování přirozeného jazyka" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"zpracování přirozeného jazyka"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Syntactic Analyzer for Czech Language Beneš, Vojtěch ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor) Master’s thesis describes theoretical basics, solution design, and implementation of constituency (phrasal) parser for Czech language, which is based on a part of speech association into phrases. Created program works with manually built and annotated Czech sample corpus to generate probabilistic context free grammar within runtime machine learning. Parser implementation, based on extended CKY algorithm, then for the input Czech sentence decides if the sentence can be generated by the created grammar and for the positive cases constructs the most probable derivation tree. This result is then compared with the expected parse to evaluate constituency parser success rate. Detailed record
	Extraction of Semantic Relations from Text Schmidt, Marek ; Burget, Radek (referee) ; Smrž, Pavel (advisor) Extraction of semantic relations from English text is the topic of this thesis. It focuses on exploitation of a dependency parser. A method based on syntactic patterns is proposed and evaluated in addition to evaluation of several statistical methods over syntactic features. It applies the methods for extraction of a hypernymy relation and evaluates it on the WordNet thesaurus. A system for extraction of semantic relations from text is designed and implemented based on these methods. Detailed record
	Word Sense Disambiguation Kraus, Michal ; Glembek, Ondřej (referee) ; Smrž, Pavel (advisor) The master's thesis deals with sense disambiguation of Czech words. Reader is informed about task's history and used algorithms are introduced. There are naive Bayes classifier, AdaBoost classifier, maximum entrophy method and decision trees described in this thesis. Used methods are clearly demonstrated. In the next parts of this thesis are used data also described. Last part of the thesis describe reached results. There are some ideas to improve the system at the end of the thesis. Detailed record
	Information Extraction from Biomedical Texts Knoth, Petr ; Burget, Radek (referee) ; Smrž, Pavel (advisor) Recently, there has been much effort in making biomedical knowledge, typically stored in scientific articles, more accessible and interoperable. As a matter of fact, the unstructured nature of such texts makes it difficult to apply knowledge discovery and inference techniques. Annotating information units with semantic information in these texts is the first step to make the knowledge machine-analyzable. In this work, we first study methods for automatic information extraction from natural language text. Then we discuss the main benefits and disadvantages of the state-of-art information extraction systems and, as a result of this, we adopt a machine learning approach to automatically learn extraction patterns in our experiments. Unfortunately, machine learning techniques often require a huge amount of training data, which can be sometimes laborious to gather. In order to face up to this tedious problem, we investigate the concept of weakly supervised or bootstrapping techniques. Finally, we show in our experiments that our machine learning methods performed reasonably well and significantly better than the baseline. Moreover, in the weakly supervised learning task we were able to substantially bring down the amount of labeled data needed for training of the extraction system. Detailed record
	Word Sense Clustering Haljuk, Petr ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor) This Bachelor's thesis deals with the semantic similarity of words . It describes the design and the implementation of a system, which searches for the most similar words and measures the semantic similarity of words . The system uses the Word2Vec model from GenSim library . It learns the relations among words from CommonCrawl corpus . Detailed record
	Automatic Link Detection in Parts of Audiovisual Documents Sychra, Marek ; Černocký, Jan (referee) ; Szőke, Igor (advisor) This paper deals with topic detection. Specifically link detection - finding similarities amongst a group of short documents according to their topic and story segmentation - finding borders between two topically different parts in a large document. The main motivation for research was practical application with the use of presentation materials from lectures at FIT (linking parts of different lectures and courses). The solution of link detection is achieved by text and word analysis, which includes learning the meaning and importance of each word. Story segmentation uses this while searching for the boundaries. Both parts of the problem (link detection, story segmentation) gave great results while testing with a standard dataset (world news reports). During evaluation of lecture processing the success rate was lower, but still good. Detailed record
	Word Sense Clustering Jadrníček, Zbyněk ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor) This thesis is focused on the problem of semantic similarity of words in English language. At first reader is informed about theory of word sense clustering, then there are described chosen methods and tools related to the topic. In the practical part we design and implement system for determining semantic similarity using Word2Vec tool, particularly we focus on biomedical texts of MEDLINE database. At the end of the thesis we discuss reached results and give some ideas to improve the system. Detailed record
	Eight ICT trends that will change libraries Černý, Michal Informační společnost i rychle se rozvíjející ICT proměňují vše kolem nás – od vzdělávání, přes dopravu až třeba právě po knihovny. Příspěvek představí osm technologií, které do deseti let začnou měnit knihovny téměř k nepoznání: Internet věcí; big data; veřejné multimediální displeje; firemní sociální sítě; cloud; nové mobilní sítě; zpracování přirozeného jazyka či sémantické technologie. Co bude tato změna znamenat pro knihovny? Jak se změní jejich postavení v informační společnosti? Video: MP4 Detailed record
	Klasifikace entit pomocí Wikipedie a WordNetu Kliegr, Tomáš ; Rauch, Jan (advisor) ; Berka, Petr (referee) ; Smrž, Pavel (referee) ; Žabokrtský, Zdeněk (referee) This dissertation addresses the problem of classification of entities in text represented by noun phrases. The goal of this thesis is to develop a method for automated classification of entities appearing in datasets consisting of short textual fragments. The emphasis is on unsupervised and semi-supervised methods that will allow for fine-grained character of the assigned classes and require no labeled instances for training. The set of target classes is either user-defined or determined automatically. Our initial attempt to address the entity classification problem is called Semantic Concept Mapping (SCM) algorithm. SCM maps the noun phrases representing the entities as well as the target classes to WordNet. Graph-based WordNet similarity measures are used to assign the closest class to the noun phrase. If a noun phrase does not match any WordNet concept, a Targeted Hypernym Discovery (THD) algorithm is executed. The THD algorithm extracts a hypernym from a Wikipedia article defining the noun phrase using lexico-syntactic patterns. This hypernym is then used to map the noun phrase to a WordNet synset, but it can also be perceived as the classification result by itself, resulting in an unsupervised classification system. SCM and THD algorithms were designed for English. While adaptation of these algorithms for other languages is conceivable, we decided to develop the Bag of Articles (BOA) algorithm, which is language agnostic as it is based on the statistical Rocchio classifier. Since this algorithm utilizes Wikipedia as a source of data for classification, it does not require any labeled training instances. WordNet is used in a novel way to compute term weights. It is also used as a positive term list and for lemmatization. A disambiguation algorithm utilizing global context is also proposed. We consider the BOA algorithm to be the main contribution of this dissertation. Experimental evaluation of the proposed algorithms is performed on the WordSim353 dataset, which is used for evaluation in the Word Similarity Computation (WSC) task, and on the Czech Traveler dataset, the latter being specifically designed for the purpose of our research. BOA performance on WordSim353 achieves Spearman correlation of 0.72 with human judgment, which is close to the 0.75 correlation for the ESA algorithm, to the author's knowledge the best performing algorithm for this gold-standard dataset, which does not require training data. The advantage of BOA over ESA is that it has smaller requirements on preprocessing of the Wikipedia data. While SCM underperforms on the WordSim353 dataset, it overtakes BOA on the Czech Traveler dataset, which was designed specifically for our entity classification problem. This discrepancy requires further investigation. In a standalone evaluation of THD on Czech Traveler dataset the algorithm returned a correct hypernym for 62% of entities. Detailed record
	Extrakce informací z textu Michalko, Boris ; Labský, Martin (advisor) ; Svátek, Vojtěch (referee) ; Nováček, Jan (referee) Cieľom tejto práce je preskúmať dostupné systémy pre extrakciu informácií a možnosti ich použitia v projekte MedIEQ. Teoretickú časť obsahuje úvod do oblasti extrakcie informácií. Popisujem účel, potreby a použitie a vzťah k iným úlohám spracovania prirodzeného jazyka. Prechádzam históriou, nedávnym vývojom, meraním výkonnosti a jeho kritikou. Taktiež popisujem všeobecnú architektúru IE systému a základné úlohy, ktoré má riešiť, s dôrazom na extrakciu entít. V praktickej časti sa nacházda prehľad algoritmov používaných v systémoch pre extrakciu informácií. Opisujem oba typy algoritmov ? pravidlové aj štatistické. V ďalšej kapitole je zoznam a krátky popis existujúcich voľných systémov. Nakoniec robím vlastný experiment s dvomi systémami ? LingPipe a GATE na vybraných korpusoch. Meriam rôzne výkonnostné štatistiky. Taktiež som vytvoril malý slovník a regulárny výraz pre email aby som demonštroval taktiež pravidlá pre extrahovanie určitých špecifických informácií. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English