National Repository of Grey Literature 24 records found  beginprevious12 - 21next  jump to record: Search took 0.00 seconds. 
Pragmatic lemmatizer of Czech language
Vacek, Matěj ; Strossa, Petr (advisor) ; Kliegr, Tomáš (referee)
This thesis is focused on lemmatizing of nouns and adjectives. It is based on morphology of Czech language. The goal is to create a lemmatizer which can stem words with success rate 90% (at least). At the same time the lemmatizer should be very easy, it should consist as little rules as possible. Lemmatizer will be created to work with real estate adverts, especially houses for sale. In this thesis there will be made an analysis of specific characters of this area. Lemmatizer will be created according to results of this analysis. Lemmatizer was written in Java. Only three types of rules were used and generally the lemmatizer created correct stems in 96.4% of all words.
Effect of the Czech Stemming Algorithm on the Document Retrieval
Pytelka, Petr ; Strossa, Petr (advisor) ; Pinkas, Otakar (referee)
This thesis deals with the measurement of the quality of the stemming/lemmatization algo-rithm for the Czech language in document processing systems and provides an analysis of the results. The theoretical part of the thesis describes the principles of the full-text search, the possibilities of implementation as well as the common problems which have to be solved in connection with the processing of natural language. Methods of evaluating the quality of lemmatization, using recall and precision, are discussed. In addition, the theoret-ical part covers the method of measuring the index of under-stemming and over-stemming, which can be applied for the purposes of a more detailed evaluation. An experiment for evaluating the lemmatization algorithms is proposed in the second part of the thesis. A specialized application has been developed to perform the experiment in three different systems, namely Apache Lucene, the PostgreSQL database systems and the Microsoft SQL Server. The experiment is based on the Prague Dependency Treebank cor-pus. It has been carried out both for the corpus as a whole and for selected word classes separately. Further analysis of the results for Czech stemmer in Apache Lucene leads to a proposal for several modifications of the algorithm. Such modifications result in measurable improvements. The results achieved show how metrics discussed, together with the values measured, can be used for improving the lemmatization algorithms and thus to improve the full-text search for Czech language.
NoSQL databases
Günzl, Richard ; Palovská, Helena (advisor) ; Strossa, Petr (referee)
This thesis deals with database systems referred to as NoSQL databases. In the second chapter, I explain basic terms and the theory of database systems. A short explanation is dedicated to database systems based on the relational data model and the SQL standardized query language. Chapter Three explains the concept and history of the NoSQL databases, and also presents database models, major features and the use of NoSQL databases in comparison with traditional database systems. In the fourth chapter I focus on the various representatives of NoSQL databases, in particular the ones that are most frequently used. In the next chapter, I have taken a practical look at a NoSQL database, specifically Apache Cassandra. I briefly describe the steps required to launch Apache Cassandra and its administration tools. In this practically-oriented chapter, I also show basic operations performed over a sample database using Cassandra CLI, its interactive command line interface. The purpose of this chapter is to make the reader familiar with the method of working with the Apache Cassandra database system and to point out some of its specific aspects. The primary objective of this thesis is to acquaint readers with the most important features and representatives of NoSQL databases and the potential for their practical use.
Geographic information systems
Vodička, Ondřej ; Palovská, Helena (advisor) ; Strossa, Petr (referee)
The diploma thesis focuses on geographic information systems (GIS). The first part of this thesis introduces GIS, it shows their specifics and emphasizes the significance of standardization in the GIS industry. The second part describes the current situation on the GIS market. The GIS software is divided into different categories depending on the provided functionality and at the same time it is divided into an open source and a commercial part. Based on the categories, individual software products are introduced. The next part individually deals with GIS products offered by the Oracle corporation. The last part provides various possibilities, suggestions and recommendations for designing a GIS architecture using ESRI products.
Extrakce informací z webových stránek pomoci extrakčních ontologií
Labský, Martin ; Berka, Petr (advisor) ; Strossa, Petr (referee) ; Vojtáš, Peter (referee) ; Snášel, Václav (referee)
Automatic information extraction (IE) from various types of text became very popular during the last decade. Owing to information overload, there are many practical applications that can utilize semantically labelled data extracted from textual sources like the Internet, emails, intranet documents and even conventional sources like newspaper and magazines. Applications of IE exist in many areas of computer science: information retrieval systems, question answering or website quality assessment. This work focuses on developing IE methods and tools that are particularly suited to extraction from semi-structured documents such as web pages and to situations where available training data is limited. The main contribution of this thesis is the proposed approach of extended extraction ontologies. It attempts to combine extraction evidence from three distinct sources: (1) manually specified extraction knowledge, (2) existing training data and (3) formatting regularities that are often present in online documents. The underlying hypothesis is that using extraction evidence of all three types by the extraction algorithm can help improve its extraction accuracy and robustness. The motivation for this work has been the lack of described methods and tools that would exploit these extraction evidence types at the same time. This thesis first describes a statistically trained approach to IE based on Hidden Markov Models which integrates with a picture classification algorithm in order to extract product offers from the Internet, including textual items as well as images. This approach is evaluated using a bicycle sale domain. Several methods of image classification using various feature sets are described and evaluated as well. These trained approaches are then integrated in the proposed novel approach of extended extraction ontologies, which builds on top of the work of Embley [21] by exploiting manual, trained and formatting types of extraction evidence at the same time. The intended benefit of using extraction ontologies is a quick development of a functional IE prototype, its smooth transition to deployed IE application and the possibility to leverage the use of each of the three extraction evidence types. Also, since extraction ontologies are typically developed by adapting suitable domain ontologies and the ontology remains in center of the extraction process, the work related to the conversion of extracted results back to a domain ontology or schema is minimized. The described approach is evaluated using several distinct real-world datasets.
Practical possibilities of using Apache CouchDb
Pultera, Ondřej ; Palovská, Helena (advisor) ; Strossa, Petr (referee)
This bachelor work is focused on practical possibilities of using Apache CouchDb a document oriented database system. In the first chapter I explain the basic theoretical terms and principles related to Apache CouchDb. I also briefly introduce database systems based on the relational model. The second chapter describes the architecture and properties of Apache CouchDb. In this chapter I also try to explain principles of running Apache CouchDb in a distributed system and think about need for new database systems. In the third chapter I review case studies of successful Apache CouchDb implementations. In this chapter I want to point out scenarios for which is Apache CouchDb a good candidate. In the next chapter I focus on practical usage of the system. I mention the tool for administering Apache CouchDb and describe some settings. I also show examples how to do basic operations through the HTTP interface. The examples are made with scripting languages PHP and JavaScript. This chapter introduces Apache CouchDb from the point of view of and administrator or developer. The reader of this work should understand the basic concepts of Apache CouchDb and be able to determine the usability of this system for a concrete purpose.
Effective methods of plagiarism detectios in large document repositories
Přibil, Jiří ; Jiroušek, Radim (advisor) ; Strossa, Petr (referee) ; Snášel, Václav (referee)
The work focuses on issues of plagiarism detection in large document repositories. Taking into account real situation that needs to be addressed now in the university environment in the Czech Republic and proposes a system that will be able to carry out this analysis in real time and yet be able to capture the widest possible range of plagiarism methods. The main contribution of this work is taking the definition of so-called unordered n-grams - {n}-grams - which can be used just to detect some forms of advanced plagiarism methods. All cited recommendations that relate to the various components of the system to detect plagiarism - preprocessing the document before document insertion into the corpus, the representation of documents in document storage, identification of potential sources of plagiarism to calculate rates of similarity; visualization analysis of plagiarism - are subject to discussion and appropriately quantified. The result is a set of design parameters of the system so that it can in detect plagiarism in the Czech language language quickly, accurately and yet in most forms.
The design of database for hotels and guest houses
Vojtek, Michal ; Palovská, Helena (advisor) ; Strossa, Petr (referee)
This bachelor's thesis presents an analysis, design and implementation of a database solution for hotels and guest houses. The main objective of the thesis is to design an efficient database solution covering the functionality of booking, online access, invoicing and complaints systems. It is a comprehensive database solution providing information support both in normal operation and for management work. The secondary objectives can be defined as methods of using the database and the identification of the database user. The thesis has been prepared based on my own experience. The first part of the thesis introduces the systems using the database; this does not refer to any specific applications but only to generally defined systems for which the thesis outlines a potential use for the database. The second part deals with the possibilities offered by the database for specific persons who come in contact with this database. The third part describes a conceptual data model, explaining all its entities and relationships and clarifying the use of integrity constraints. The fourth part introduces another integrity constraint and analyses a database adjustment for RDBMS Oracle Database Express Edition 10g, in which the functionality of the database was tested. The last part proposes security measures and defines access authorizations for tables, introduces created views limiting the volume of data visible for users, and stored procedures serving to automate certain activities.
Database design for a website about runners
Šimůnek, Dominik ; Palovská, Helena (advisor) ; Strossa, Petr (referee)
The aim of this bachelor thesis is to design a concept of a database for the website on the competitors in athletics. This includes a creation of the web application which allows the website visitors to edit, add, change or delete information mostly about the atheletes. This thesis emphasises the exact determination of the requirements concerning web functionality. The most convenient database model is created according to these requirements and afterwards implemented on the database system MySQL. Database design is a major part of this work. Second part of the thesis is dedicated to the creation of the web application while using the PHP language. The functionality of the database application and model is consequently verified by the existing data created for the pursposes of this thesis, or eventually imported from other data sources available in order to assure larger sample of the data regarding the athletes. The web application will be available on the internet for any visitor.
Design of the database for forwarding agency
Kolařík, Vít ; Palovská, Helena (advisor) ; Strossa, Petr (referee)
The goal of this bachelor thesis was to create a design of a database for aplication of a transport company, that would be able to file orders and to control own trucks. I was progressing from creating an analysis and conceptual and physical model of the database to testing the database by simple commands in Oracle 10g Express Edition. Beside the design, I was also concerned in access of the users into the database. The presumed benefit should be the fact, that the customer would have the possibility to trace the order. An employee should be more effective in planning of the transports and a manager will have data at disposal concerning the orders and the trucks. He should be able to analyse the data because it will be in one place.

National Repository of Grey Literature : 24 records found   beginprevious12 - 21next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.