National Repository of Grey Literature 2 records found  Search took 0.00 seconds. 
Effect of the Czech Stemming Algorithm on the Document Retrieval
Pytelka, Petr ; Strossa, Petr (advisor) ; Pinkas, Otakar (referee)
This thesis deals with the measurement of the quality of the stemming/lemmatization algo-rithm for the Czech language in document processing systems and provides an analysis of the results. The theoretical part of the thesis describes the principles of the full-text search, the possibilities of implementation as well as the common problems which have to be solved in connection with the processing of natural language. Methods of evaluating the quality of lemmatization, using recall and precision, are discussed. In addition, the theoret-ical part covers the method of measuring the index of under-stemming and over-stemming, which can be applied for the purposes of a more detailed evaluation. An experiment for evaluating the lemmatization algorithms is proposed in the second part of the thesis. A specialized application has been developed to perform the experiment in three different systems, namely Apache Lucene, the PostgreSQL database systems and the Microsoft SQL Server. The experiment is based on the Prague Dependency Treebank cor-pus. It has been carried out both for the corpus as a whole and for selected word classes separately. Further analysis of the results for Czech stemmer in Apache Lucene leads to a proposal for several modifications of the algorithm. Such modifications result in measurable improvements. The results achieved show how metrics discussed, together with the values measured, can be used for improving the lemmatization algorithms and thus to improve the full-text search for Czech language.
Storing hierarchical and unstructured data with Java Content Repository
Pytelka, Petr ; Pavlíčková, Jarmila (advisor) ; Feuerlicht, Jiří (referee)
This paper discusses the possibilities of storing hierarchical and unstructured data using standards JSR-170 and JSR-283 - "Content Repository for Java". Background of this paper is the graph theory. A definition of hierarchical data that is based on this theory is presented in the paper. Other methods of storing data such as the file-system, the database systems and the content management systems are discussed. The paper provides a detailed description of standard JSR-283 itself and the available features thereof. This is followed by a comparison of relation-, object-relational databases and the features of the individual techniques of object-relational mapping. Reference implementation JackRabbit is described in detail. It includes the description of the relevant API and its configuration. A case study dealing with the realization of the internal structure of a document management system is a part of this paper. Some performance tests were carried out on the reference implementation; the results thereof are presented in the paper. The conclusion of the work provides for a set of criteria to determine situations where it is appropriate to use a repository compatible with JSR-170/283 to store hierarchical and unstructured data, or where reference implementation JackRabbit can be used.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.