Strossa, Petr - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: Strossa, Petr

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Linguistic Issues in Machine Translation between Czech and Russian Klyueva, Natalia ; Kuboň, Vladislav (advisor) ; Panevová, Jarmila (referee) ; Strossa, Petr (referee) In this thesis we analyze machine translation between Czech and Russian languages from the perspective of a linguist. We work with two types of Machine Translation systems - rule-based (TectoMT) and statistical (Moses). We experiment with different setups of these two systems in order to achieve the best possible quality. One of the questions we address in our work is whether relatedness of the discussed languages has some impact on machine translation. We explore the output of our two experimental systems and two commercial systems: PC Translator and Google Translate. We make a linguistically-motivated classification of errors for the language pair and describe each type of error in detail, analyzing whether it occurred due to some difference between Czech and Russian or is it caused by the system architecture. We then compare the usage of some specific linguistic phenomena in the two languages and state how the individual systems cope with mismatches. For some errors, we suggest ways to improve them and in several cases we implement those suggestions. In particular, we focus on one specific error type - surface valency. We research the mismatches between Czech and Russian valency, extract a lexicon of surface valency frames, incorporate the lexicon into the TectoMT translation pipeline and present... Detailed record
	Management, Retrieval and Access to Electronic Theses and Dissertations Mach, Jan ; Bratková, Eva (advisor) ; Strossa, Petr (referee) ; Souček, Martin (referee) The dissertation is devoted to analysis of current practice and trends in providing repositories of electronic theses and dissertation (ETDs) in terms of their management, searching and dissemination. The first part presents terminology and the current state of access to ETDs in Czech and foreign repositories and includes results of a survey of the state of access to ETDs in the Czech Republic which was completed in 2014 by all public universities. In the second part, a metadata standard is presented, particularly the possibility of mapping EVSKP-MS metadata elements to other metadata formats and utilization within the OAI-PMH protocol. The issue of access to ETDs is dealt with further in terms of metrics for an evaluation of usage of distributed ETDs. Searching for ETDs is also described in case studies as are recommendations for public tenders for a discovery service and for creating an ETD metadata search server and an associated user interface with faceted search. The final part of the thesis focuses on the issue of plagiarism. This incorporates a presentation and analysis of the most important plagiarism detection systems and a case study of the development of the portal Validátor VŠE to provide access to results of document analysis. Detailed record
	Citing scientific literature using an XML format Jansová, Linda ; Nič, Miloslav (advisor) ; Souček, Martin (referee) ; Strossa, Petr (referee) The dissertation focuses on the use of XML for citation purposes. First, a current situation in citation practice, deeply influenced by the development in electronic publishing, is described. Then selected projects aimed at making citation data machine understandable and ways to include citation data into XML formats are presented. Citation formats (both XML and other formats) are dealt with in detail. This part is followed by a brief introduction of citation managers as tools used to automate work with citation data. Typology of cited documents is also covered. Furthermore, a detailed analysis of citation practice in IUPAC color books is included. Based on partial findings recommended practices for the inclusion of citation data into XML formats were developed. The recommended practices are a set of seven principles which can be summed up as follows: use of UTF-8 encoding, low number of elements and attributes, maximum data structuring, use of controlled vocabularies, data hierarchy and use of links to make connections, use of existing document typology or creation of a new one, use of recursion while working with the data. An experimental format based on the recommended practices has been designed. Citation data from IUPAC color books are used to present the way how the format can be implemented in practice. Detailed record
	Citing scientific literature using an XML format Jansová, Linda ; Nič, Miloslav (advisor) ; Souček, Martin (referee) ; Strossa, Petr (referee) The dissertation focuses on the use of XML for citation purposes. First, a current situation in citation practice, deeply influenced by the development in electronic publishing, is described. Then selected projects aimed at making citation data machine understandable and ways to include citation data into XML formats are presented. Citation formats (both XML and other formats) are dealt with in detail. This part is followed by a brief introduction of citation managers as tools used to automate work with citation data. Typology of cited documents is also covered. Furthermore, a detailed analysis of citation practice in IUPAC color books is included. Based on partial findings recommended practices for the inclusion of citation data into XML formats were developed. The recommended practices are a set of seven principles which can be summed up as follows: use of UTF-8 encoding, low number of elements and attributes, maximum data structuring, use of controlled vocabularies, data hierarchy and use of links to make connections, use of existing document typology or creation of a new one, use of recursion while working with the data. An experimental format based on the recommended practices has been designed. Citation data from IUPAC color books are used to present the way how the format can be implemented in practice. Detailed record
	Methodology for website localization from the perspective of webdesign Čermák, Radim ; Doucek, Petr (advisor) ; Strossa, Petr (referee) ; Hřebíček, Jiří (referee) ; Dědič, Filip (referee) Internet and websites are today one of the most important communication channels of almost all companies. They offer a simple, fast and effective way of communication, which is also available worldwide in a few seconds. With the globalization of market, more and more companies try to expand their business beyond the territory of the home state. In the current time of start-ups is the Internet also often a medium that allows formation of new spheres of business for which the website is absolutely essential channel. This type of business is internet based and has very often international ambitions from the very beginning. Given that each country (or region) can be seen as distinctive culture, it is advisable to locate websites for the needs of the foreign country. This is exactly the theme of this thesis. The concrete objective of this thesis is to offer a methodology for website localization in terms of webdesign. The basic building block is the delimitation of a multidisciplinary theoretical framework that examines the concept of culture and extensive literature review allowing current insight into the linking of website and culture, i.e. cultural website localization. Suitable method for gripping such a complex concept as a culture emerge from a theoretical framework as well. As the most appropriate method were determined Hofstedes cultural dimensions, which are then used for the analysis of cultural determination of web elements. Data collection for the purpose of analysis of web elements cultural determination is performed using a content analysis of websites from nine different countries. The results of the analysis are compared and synthesized with the findings stemming from a literature review. The final artifact of this thesis, a methodology for website localization from the perspective of web design, is based on this ground. Validation of the proposed methodology is done on the basis of assessment of the methodology for a domain of web design. This assessment is based on interviews with experts from different countries as well as presentation of concrete example of methodology use within a midsize website. Detailed record
	Business rule learning using data mining of GUHA association rules Vojíř, Stanislav ; Strossa, Petr (advisor) ; Pour, Jan (referee) ; Kouba, Zdeněk (referee) ; Gregor, Jiří (referee) In the currently highly competitive environment, the information systems of the businesses should not only effectively support the existing business processes, but also have to be dynamically adaptable to the changes in the environment. There are increasing efforts at separation of the application and the business logic in the information system. One of the appropriate instruments for this separation is the business rule approach. Business rules are simple, understandable rules. They can be used for the knowledge externalization and sharing also as for the active control and decisions within the business processes. Although the business rule approach is used for almost 20 years, the various specifications and practical applications of business rules are still a goal of the active research. The disadvantage of the business rule approach is great demands on obtaining of the rules. There has to be a domain expert, who is able to manually write them. One of the problems addressed by the current research is the possibility of (semi)automatic acquisition of business rules from the different resources - unstructured documents, historical data etc. This dissertation thesis addresses the problem of acquisition (learning) of business rules from the historical data of the company. The main objective of this thesis is to design and validate a method for (semi)automatic learning of business rules using the data mining of association rules. Association rule are a known data mining method for discovering of interesting relations hidden in the data. Association rules are comprehensible and explainable. The comprehensibility of association rules is suitable for the use of them for learning of business rules. For this purpose the user can use not only simple rules discovered using the algorithm Apriori or FP-Growth, but also more complex association rules discovered using the GUHA method. Within this thesis is used the procedure 4ft-Miner implemented in the data mining system LISp Miner. The first part of this thesis contains the description of the relevant topics from the research of business rules and association rules. Business rules is not a name of one specification of standard but rather a label of the approach to modelling of business logic. As part of the work there is defined a process of selection of the most appropriate specification of business rules for the selected practical use. Consequently, the author proposed three models of involving of data mining of association rules into business rule sets. These models contain also the definition of a model for the transformation of GUHA association rules in the business rules for the system JBoss Drools. For the possibility of learning of business rules using the data mining results from more than one data set, the author proposed a knowledge base. The knowledge base is suitable for the interconnection of business rules and data mining of association rules. From the perspective of business rules the knowledge base is a term dictionary. From the perspective of data mining the knowledge base contains some background knowledge for data preprocessing and preparation of classification models. The proposed models have been validated using practical implementations in the systems EasyMiner (in conjunction with JBoss Drools) and Erian. The thesis contains also a description of two model use cases based on real data from the field of marketing and the field of health insurance. Detailed record
	Design of search engine for modern needs Maršálek, Tomáš ; Palovská, Helena (advisor) ; Strossa, Petr (referee) In this work I argue that field of text search has focused mostly on long text documents, but there is a growing need for efficient short text search, which has different user expectations. Due to this reduced data set size requirements different algorithmic techniques become more computationally affordable. The focus of this work is on approximate and prefix search and purely text based ranking methods, which are needed due to lower precision of text statistics on short text. A basic prototype search engine has been created using the researched techniques. Its capabilities were demonstrated on example search scenarios and the implementation was compared to two other open source systems representing currently recommended approaches for short text search problem. The results show feasibility of the implemented prototype regarding both user expectations and performance. Several options of future direction of the system are proposed. Detailed record
	Pragmatic lemmatizer of Czech language Vacek, Matěj ; Strossa, Petr (advisor) ; Kliegr, Tomáš (referee) This thesis is focused on lemmatizing of nouns and adjectives. It is based on morphology of Czech language. The goal is to create a lemmatizer which can stem words with success rate 90% (at least). At the same time the lemmatizer should be very easy, it should consist as little rules as possible. Lemmatizer will be created to work with real estate adverts, especially houses for sale. In this thesis there will be made an analysis of specific characters of this area. Lemmatizer will be created according to results of this analysis. Lemmatizer was written in Java. Only three types of rules were used and generally the lemmatizer created correct stems in 96.4% of all words. Detailed record
	Effect of the Czech Stemming Algorithm on the Document Retrieval Pytelka, Petr ; Strossa, Petr (advisor) ; Pinkas, Otakar (referee) This thesis deals with the measurement of the quality of the stemming/lemmatization algo-rithm for the Czech language in document processing systems and provides an analysis of the results. The theoretical part of the thesis describes the principles of the full-text search, the possibilities of implementation as well as the common problems which have to be solved in connection with the processing of natural language. Methods of evaluating the quality of lemmatization, using recall and precision, are discussed. In addition, the theoret-ical part covers the method of measuring the index of under-stemming and over-stemming, which can be applied for the purposes of a more detailed evaluation. An experiment for evaluating the lemmatization algorithms is proposed in the second part of the thesis. A specialized application has been developed to perform the experiment in three different systems, namely Apache Lucene, the PostgreSQL database systems and the Microsoft SQL Server. The experiment is based on the Prague Dependency Treebank cor-pus. It has been carried out both for the corpus as a whole and for selected word classes separately. Further analysis of the results for Czech stemmer in Apache Lucene leads to a proposal for several modifications of the algorithm. Such modifications result in measurable improvements. The results achieved show how metrics discussed, together with the values measured, can be used for improving the lemmatization algorithms and thus to improve the full-text search for Czech language. Detailed record
	NoSQL databases Günzl, Richard ; Palovská, Helena (advisor) ; Strossa, Petr (referee) This thesis deals with database systems referred to as NoSQL databases. In the second chapter, I explain basic terms and the theory of database systems. A short explanation is dedicated to database systems based on the relational data model and the SQL standardized query language. Chapter Three explains the concept and history of the NoSQL databases, and also presents database models, major features and the use of NoSQL databases in comparison with traditional database systems. In the fourth chapter I focus on the various representatives of NoSQL databases, in particular the ones that are most frequently used. In the next chapter, I have taken a practical look at a NoSQL database, specifically Apache Cassandra. I briefly describe the steps required to launch Apache Cassandra and its administration tools. In this practically-oriented chapter, I also show basic operations performed over a sample database using Cassandra CLI, its interactive command line interface. The purpose of this chapter is to make the reader familiar with the method of working with the Apache Cassandra database system and to point out some of its specific aspects. The primary objective of this thesis is to acquaint readers with the most important features and representatives of NoSQL databases and the potential for their practical use. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English