National Repository of Grey Literature 201 records found  1 - 10nextend  jump to record: Search took 0.06 seconds. 
Multi-Label Classification of Text Documents
Průša, Petr ; Očenášek, Pavel (referee) ; Bartík, Vladimír (advisor)
The master's thesis deals with automatic classifi cation of text document. It explains basic terms and problems of text mining. The thesis explains term clustering and shows some basic clustering algoritms. The thesis also shows some methods of classi fication and deals with matrix regression closely. Application using matrix regression for classifi cation was designed and developed. Experiments were focused on normalization and thresholding.
Knowledge Discovery from Time Series
Krutý, Peter ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
This thesis is focused on the field of knowledge discovery from data, specifically from time series. Main objective is to research Python programming language support in this area and then design and implement an application that will allow to demonstrate and compare selected methods. Methods are demonstrated in experiments using appropriate data set. The output of the thesis is a comparison of methods for specific tasks and the application implementing selected methods.
Text Data Clustering
Leixner, Petr ; Burgetová, Ivana (referee) ; Bartík, Vladimír (advisor)
Process of text data clustering can be used to analysis, navigation and structure large sets of texts or hypertext documents. The basic idea is to group the documents into a set of clusters on the basis of their similarity. The well-known methods of text clustering, however, do not really solve the specific problems of text clustering like high dimensionality of the input data, very large size of the databases and understandability of the cluster description. This work deals with mentioned problems and describes the modern method of text data clustering based on the use of frequent term sets, which tries to solve deficiencies of other clustering methods.
Text data clustering algorithms
Sedláček, Josef ; Burget, Radim (referee) ; Karásek, Jan (advisor)
The thesis deals with text mining. It describes the theory of text document clustering as well as algorithms used for clustering. This theory serves as a basis for developing an application for clustering text data. The application is developed in Java programming language and contains three methods used for clustering. The user can choose which method will be used for clustering the collection of documents. The implemented methods are K medoids, BiSec K medoids, and SOM (self-organization maps). The application also includes a validation set, which was specially created for the diploma thesis and it is used for testing the algorithms. Finally, the algorithms are compared according to obtained results.
DNA Microarrays Data Analysis
Hebelka, Tomáš ; Jaša, Petr (referee) ; Burgetová, Ivana (advisor)
This work concerns with data analysis of DNA microarrays by using cluster analysis. It explains biological terms - gene expression and DNA microarray. Next, it contains mathematical and informatical description of clustering methods and describes a way to apply these methods to microarrays data. Next, the work contains implementation's detail of clustering methods k-means, DBSCAN and introduces an original clustering algorithm Strom++. Then, description of implementation and application manual follow. Finally, accomplished results are evaluated.
Object Detection and Tracking Using Interest Points
Bílý, Vojtěch ; Hradiš, Michal (referee) ; Juránek, Roman (advisor)
This paper deals with object detection and tracking using iterest points. Existing approaches are described here. Inovated method based on Generalized Hough transform and iterative Hough-space searching is  proposed in this paper. Generality of proposed detector is shown in various types of objects. Object tracking is designed as frame by frame detection.
Unsupervised Evaluation of Speaker Recognition System
Odehnal, Ondřej ; Plchot, Oldřich (referee) ; Matějka, Pavel (advisor)
Tato práce je vystavěna nad moderním systémem pro rozpoznávání mluvčího (SID) založeného na x-vektorech. Cílem bakalářské práce je navrhnout a experimentálně vyhodnotit techniky pro evaluaci SID systému za použití audio nahrávek bez anotace tj. bez znalosti mluvčího. Pro tento účel je z každé nahrávky bez anotace vytvořen embedding. Ty se poté používají pro shlukování nahrávek a následné vytvoření pseudo-anotací. Na těchto anotacích se SID systém evaluuje pomocí equal error rate (EER) metriky. Za účelem vytvoření pseudo-anotací byly navrženy tyto shlukovací algoritmy učení bez učitele: K-means, Gaussian mixture models (GMM) a aglomerativní shlukování. Po testování vyšel jakožto nejlepší experimentální postup K-means se Silhouette metrikou, která používá kosinovou podobnost jako míru vzdálenosti. Nejlepší metoda dosáhla 5,72 % EER s referenčním EER = 5,15 %, které bylo spočítané se znalostí anotace na části datasetu SITW dev-core-core. Podobné výsledky byly získány na části datasetu SITW eval-core-core s odhadnutým EER = 5,86 % a referenčním 5,08 %. Rozdíl mezi hodnotami tvoří 0,57 % pro eval-core-core a 0, 78% pro dev-core-core. Další testy na NIST SRE16 a VoxCeleb1 datasetech byly provedeny za účelem ověření správnosti navrženého postupu. Obecně se dá říct, že navržený testovací postup měl chybu přibližně 1 %, což je poměrně dobrý výsledek pro algoritmus učení bez učitele.
Automatic Seccomp Syscall Policy Generator
Tamaškovič, Marek ; Smrčka, Aleš (referee) ; Turoňová, Lenka (advisor)
Táto práca sa zaoberá návrhom a implementáciou nástroju na preklad zoznamu systémových volaní do politiky obmedzujúcej systémové volania v rámci operačného systému GNU Linux. Motivácia pre takýto nástoj je automatizovať tvorbu bezpečnostných politík. V práci je riešený spôsob interpretovania zoznamu systémových volaní v programe. Taktiež spôsob ako optimalizovať a minimalizovať danú dátovú štruktúru. Na to boli použité tri algoritmy. V jednom prípade bol použitý algoritmus minimax a v tom druhom bol použitý zhlukujúci algoritmus DBSCAN. V poslednej časti tejto práce je riešená metodika testovania nástroja a to testovanie modulov či programu ako celku. Počas testovania sa vyskytli komplikácie, ktoré bránili v komplexnom testovaní vytvoreného nástroja.
Clustering of Biological Sequences
Kubiš, Radim ; Burgetová, Ivana (referee) ; Martínek, Tomáš (advisor)
One of the main reasons for protein clustering is prediction of structure, function and evolution. Many of current tools have disadvantage of high computational complexity due to all-to-all sequence alignment. If any tool works faster, it does not reach accuracy as other tools. Further disadvantage is processing on higher rate of similarity but homologous proteins can be similar with less identity. The process of clustering often ends when reach the condition which does not reflect sufficient quality of clusters. Master's thesis describes the design and implementation of new tool for clustering of protein sequences. New tool should not be computationally demanding but it should preserve required accuracy and produce better clusters. The thesis also describes testing of designed tool, evaluation of results and possibilities of its further development.
Data Mining on Oracle Database Server and MS SQL Server
Opršal, Martin ; Chmelař, Petr (referee) ; Stryka, Lukáš (advisor)
This bachelor's thesis deals with issue of knowledge discovery in databases. This document is focused in getting rules from relation databases based on Microsoft SQL server or Oracle Data mining server. The practical part of this document is about design applications that run on both servers. These applications are programmed in asp.NET, C# for Microsoft SQL server and Java for Oracle server.

National Repository of Grey Literature : 201 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.