National Repository of Grey Literature 164 records found  1 - 10nextend  jump to record: Search took 0.02 seconds. 
System for Update of Annotations in Corpora
Vrša, Štěpán ; Smrž, Pavel (referee) ; Dytrych, Jaroslav (advisor)
The goal of this thesis is the creation of a system that allows users to display and update the large corpus data annotations in the MG4J format. This thesis analyzes the current corpus data and annotation management solution and briefly describes the non-trivial SEC and MG4J tools used in the solution. The main element of the system is updating annotations in MG4J and subsequently updating the MG4J indexes. The system is capable of performing the above mentioned operations with an acceptable response time. This thesis also deals with updating entities in a knowledge base.
Searching Semantically Annotated Texts
Grešová, Katarína ; Smrž, Pavel (referee) ; Dytrych, Jaroslav (advisor)
This thesis deals with the issue of semantic searching over indexes of big text data. The aim of this thesis is to design and implement a search engine with web user interface enabling dynamical configuration of access to indexes and editing annotations in the text. The thesis analyzes the current search engine solution and its shortcomings, which results in a specification of requirements for a search engine that is suitable for common use and fulfils the potential of all search engine related tools. The thesis also describes the design, implementation and testing of the resulting system, which also includes an extension in a form of global constraints, which increases the accuracy of the requested search result description.
Plagiarism detection of text documents
Lízal, Radek ; Vítek, Martin (referee) ; Smital, Lukáš (advisor)
This diploma thesis introduces the definition of plagiarism, distinguishes the types of plagiaries which often take place in praxis and the ways of determining the suspected texts. The means of detection are essential; therefore a whole chapter is dedicated to those. For the detection purposes, it is vital to pre-process the data to reduce the demand factor of the program. There is a preview of some programs which are already being used for the detection of plagiarism. The following chapter introduces some selected indications which have been implemented in the Matlab environment to create a detector of plagiaries in text documents. The created program is described in chapter eight. The applied indications and the detector response are described in a chapter called Indications testing. The testing proved the quality of these indications. The results together with pros and cons of the particular methods are discussed in the conclusion.
Syntactic Analyzer for Czech Language
Beneš, Vojtěch ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor)
Master’s thesis describes theoretical basics, solution design, and implementation of constituency (phrasal) parser for Czech language, which is based on a part of speech association into phrases. Created program works with manually built and annotated Czech sample corpus to generate probabilistic context free grammar within runtime machine learning. Parser implementation, based on extended CKY algorithm, then for the input Czech sentence decides if the sentence can be generated by the created grammar and for the positive cases constructs the most probable derivation tree. This result is then compared with the expected parse to evaluate constituency parser success rate.
Systems for checing electronic texts
Zouhar, Petr ; Malý, Jan (referee) ; Pfeifer, Václav (advisor)
The work deals with the possibility of control of electronic texts. Whether it is a source codes or standard text documents. The first chapter is devoted to a brief explanation of the term plagiarism and its characters. Sequentially we describe the methods and metrics used to detect plagiarist. Then we pay attention to detect plagiarism in the free text and source codes. We describe the way of preprocessing of a file and choice of basic units, which represent the document in the comparing. Source codes have a exact syntax. Therefore we attend to the syntax and semantic analysis in the chapter, which describes the check of source codes. The second half of the work is focused on the practical part, particularly on programs intended to control the source codes. The programs are divided to the freely available and the commercial. This is followed by their brief description and if it is a free trial possible we mention the results from this comparing. So we created a corpus of source codes. At the end of the work we focus on design of a program, which compares two source codes on the basis of statistical similarities.
Czech-Slovak Machine Translation
Kadlec, Peter ; Kouřil, Jan (referee) ; Smrž, Pavel (advisor)
Aim of this bachelor thesis was to get familiar with methods used in automatic machine translation, design and implement system for translation from czech to slovak and in the end with help of standard metrics score the created system.
Extending System for Acquiring, Processing, and Analysing Large Web Text Collections
Matějka, Jiří ; Dytrych, Jaroslav (referee) ; Smrž, Pavel (advisor)
The aim of the thesis is to extend the existing system for collecting, downloading, processing and analyzing web pages. This work deals with the automation of all processes, brings new tools into the existing system and offers new versions of some tools involved in the processing system and also offers new procedures and ideas.
Mining of Textual Data from the Web for Speech Recognition
Kubalík, Jakub ; Plchot, Oldřich (referee) ; Mikolov, Tomáš (advisor)
Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech.
Sophisticated methods for electronic text checking
Flégl, Jan ; Malý, Jan (referee) ; Pfeifer, Václav (advisor)
The work is about plagiarism of source codes and text documents. We’d like to describe common known methods, learn something about commercial programs and make our own plagiarism detection software. At the beginning of introduction to the theoretical part we will define the plagiarism. We will also learn something about the history of plagiarsm and its situation in the Czech Republic. We will find out something about syntactic analysis, tools we can use to detect plagiarisms and how to discover it. We will see the function of the metrics on easy exercises. We will clear up the function of graphic method with line comparing. We will define advantages and disadvantages of all methods. At the end of the theoretical part we will find out something about commercial programs. In practical part we will make our own program which compares two source codes by using statistical access methods. We will check its function and ability to detect plagiarism by corpus of source codes which we will create.
A corpus perspective on engineering terminology in popularization
Volek, Pavel ; Smutný, Milan (referee) ; Haupt, Jaromír (advisor)
Tento projekt se zabývá korpusovým přístupem k analýze technické terminologie v popularizaci. Je toho docíleno porovnáváním populárních kontextů s vědeckými a akademickými, ve kterých se daný termín nachází. Tato práce take obsahuje krátké představení konceptu korpusové lingvistiky.

National Repository of Grey Literature : 164 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.