National Repository of Grey Literature 33 records found  1 - 10nextend  jump to record: Search took 0.00 seconds. 
Web page segmentation utilizing clustering techniques
Zelený, Jan ; Šimko, Marián (referee) ; Kliegr, Tomáš (referee) ; Zendulka, Jaroslav (advisor)
Získávání informací a jiné techniky dolování dat z webových stránek získávají na důležitosti s tím, jak se rozvíjí webové technologie a jak roste množství informací uložených na webu, jakožto jediném nosiči těchto informací. Spolu s tímto množství informací také ale roste množství obsahu, který není v kontextu prezentovaných informací ničím důležitý. To je jedním z důvodů, proč je důležité se intenzivně věnovat předzpracování informací uložených na webu. Segmentační algoritmy jsou jedním z možných způsobů předzpracování. Tato práce se věnuje využití shlukovacích technik pro zefektivnění existujících, ale i nalezení zcela nových algoritmů použitelných pro segmentaci webových stránek.
Local and global analytical reports on results of data mining
Reischig, Zdeněk ; Rauch, Jan (advisor) ; Kliegr, Tomáš (referee)
Title: Local and global analytical reports on results of data mining Author: Zdeněk Reischig Department: Department of Software Engineering Supervisor: doc. RNDr. Jan Rauch, CSc. Supervisor's e-mail address: rauch@vse.cz Abstract: Thesis focuses on automatized support of local analytical reports creation, utilization of these reports as data sources for global analytical questions and creation of global analytical reports. In the thesis are suggested methods for comparison of rules. These methods are suitable for solving global analytical questions and can also help with composing of local analytical reports. Thesis also describes different kinds of background knowledge. One of them can be used for elimination of uninteresting rules or for finding data matrices with unusual relations. Other is necessary for solving global analytical questions, when rules are created over data matrices describing the same properties with different measures etc. Another important part of the thesis lies on providing a XML structure template for saving outputs of global analysis. It is also possible to use this XML structure for automatized generation of global analytical reports. Last part of the thesis is based on case study which shows how to use the guidelines and methods suggested in previous chapters. Case study...
Cross-Lingual Information Retrieval in the Medical Domain
Saleh, Shadi ; Pecina, Pavel (advisor) ; Hanbury, Allan (referee) ; Kliegr, Tomáš (referee)
Cross-Lingual Information Retrieval in the Medical Domain Shadi Saleh In recent years, there has been an exponential growth of the digital content available on the Internet, which has correlated with the increasing number of non-English Internet users due to the spread of the Internet across the globe. This raises the importance of unlocking resources for those who want to look up information not limited to the languages they understand. For example, those who want to use the Internet to find medical content related to their health conditions (self-diagnosis) but they do not have access to resources in their language. Cross-Lingual Information Retrieval (CLIR) breaks the lan- guage barriers by allowing search for documents written in a language different from the query language. This thesis tackles the task of CLIR in the medical domain and investigates the two main approaches: query translation (QT) where queries are machine translated to the language of documents and document translation (DT) where documents are translated to the language of queries. We proceed with our research by employing Statistical Machine Translation (SMT) systems that are tuned for the QT approach and the DT approach in the medical domain for seven European languages (Czech, German, French, Spanish, Hungarian, Polish and Swedish) and...
Web page segmentation utilizing clustering techniques
Zelený, Jan ; Šimko, Marián (referee) ; Kliegr, Tomáš (referee) ; Zendulka, Jaroslav (advisor)
Získávání informací a jiné techniky dolování dat z webových stránek získávají na důležitosti s tím, jak se rozvíjí webové technologie a jak roste množství informací uložených na webu, jakožto jediném nosiči těchto informací. Spolu s tímto množství informací také ale roste množství obsahu, který není v kontextu prezentovaných informací ničím důležitý. To je jedním z důvodů, proč je důležité se intenzivně věnovat předzpracování informací uložených na webu. Segmentační algoritmy jsou jedním z možných způsobů předzpracování. Tato práce se věnuje využití shlukovacích technik pro zefektivnění existujících, ale i nalezení zcela nových algoritmů použitelných pro segmentaci webových stránek.
Big Data approach to sentiment analysis
Handa, Karandeep ; Berka, Petr (advisor) ; Kliegr, Tomáš (referee)
Get to know the emotion of the people about brexit
Named Entity Recognition and Linking
Taufer, Pavel ; Straka, Milan (advisor) ; Kliegr, Tomáš (referee)
The goal of this master thesis is to design and implement a named entity recognition and linking algorithm. A part of this goal is to propose and create a knowledge base that will be used in the algorithm. Because of the limited amount of data for languages other than English, we want to be able to train our method on one language, and then transfer the learned parameters to other languages (that do not have enough training data). The thesis consists of description of available knowledge bases, existing methods and design and implementation of our own knowledge base and entity linking method. Our method achieves state of the art result on a few variants of the AIDA CoNLL-YAGO dataset. The method also obtains comparable results on a sample of Czech annotated data from the PDT dataset using the parameters trained on the English CoNLL dataset. Powered by TCPDF (www.tcpdf.org)
Local and global analytical reports on results of data mining
Reischig, Zdeněk ; Rauch, Jan (advisor) ; Kliegr, Tomáš (referee)
Title: Local and global analytical reports on results of data mining Author: Zdeněk Reischig Department: Department of Software Engineering Supervisor: doc. RNDr. Jan Rauch, CSc. Supervisor's e-mail address: rauch@vse.cz Abstract: Thesis focuses on automatized support of local analytical reports creation, utilization of these reports as data sources for global analytical questions and creation of global analytical reports. In the thesis are suggested methods for comparison of rules. These methods are suitable for solving global analytical questions and can also help with composing of local analytical reports. Thesis also describes different kinds of background knowledge. One of them can be used for elimination of uninteresting rules or for finding data matrices with unusual relations. Other is necessary for solving global analytical questions, when rules are created over data matrices describing the same properties with different measures etc. Another important part of the thesis lies on providing a XML structure template for saving outputs of global analysis. It is also possible to use this XML structure for automatized generation of global analytical reports. Last part of the thesis is based on case study which shows how to use the guidelines and methods suggested in previous chapters. Case study...
The Real Knowledge Discovery Task
Kolafa, Ondřej ; Berka, Petr (advisor) ; Kliegr, Tomáš (referee)
The major objective of this thesis is to perform a real data mining task of classifying term deposit accounts holders. For this task an anonymous bank customers with low funds position data are used. In correspondence with CRISP-DM methodology the work is guided through these steps: business understanding, data understanding, data preparation, modeling, evaluation and deployment. The RapidMiner application is used for modeling. Methods and procedures used in actual task are described in theoretical part. Basic concepts of data mining with special respect to CRM segment was introduced as well as CRISP-DM methodology and technics suitable for this task. A difference in proportions of long term accounts holders and non-holders enforced data set had to be balanced in favour of holders. At the final stage, there are twelve models built. According to chosen criterias (area under curve and f-measure) two best models (logistic regression and bayes network) were elected. In the last stage of data mining process a possible real-world utilisation is mentioned. The task is developed only in form of recommendations, because it can't be applied to the real situation.
Preference Learning Methods
Pichl, Ota ; Kliegr, Tomáš (advisor) ; Berka, Petr (referee)
The diploma thesis is focused on preference learning. Preferences can be analyzed in many areas starting from economics, over the statistics to informatics. This thesis is focused on informatics point of view on preferences. At the beginning it is focusing on preferences in general and analyzing its origin in economical science. Then this knowledge base is used for analysis of informatics methods employed in preference learning which also includes machine learning and describes how these sciences are connected with each other. Practical part of work is focused on employing informatics preferences in practice. The basic tasks and methods are described at the beginning and followed by more detailed analysis of one the methods (UTA NM). The result consists of description and implementation of a REST web service that can be used for one of the preference learning tasks.
Analysis of real data from Alza.cz product department using methods of KDD
Válek, Martin ; Berka, Petr (advisor) ; Kliegr, Tomáš (referee)
This thesis deals with data analysis using methods of knowledge discovery in databases. The goal is to select appropriate methods and tools for implementation of a specific project based on real data from Alza.cz product department. Data analysis is performed by using association rules and decision rules in the Lisp-Miner and decision trees in the RapidMiner. The methodology used is the CRISP-DM. The thesis is divided into three main sections. First section is focused on the theoretical summary of information about KDD. There are defined basic terms and described the types of tasks and methods of KDD. In the second section is introduced the methodology CRISP-DM. The practical part firstly introduces company Alza.cz and its goals for this task. Afterwards, the basic structure of the data and preparation for the next step (data mining) is described. In conclusion, the results are evaluated and the possibility of their use is outlined.

National Repository of Grey Literature : 33 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.