National Repository of Grey Literature 127 records found  1 - 10nextend  jump to record: Search took 0.00 seconds. 
Construction of time-space trajectories from multimodal data
Hrbáček, Matěj ; Skopal, Tomáš (advisor) ; Lokoč, Jakub (referee)
With the growth of public camera recordings and video streams in recent years, there is an increasing need for automatic processing with limited human input. An important part of the process is detecting moving objects in the video and grouping individual detections across video frames into trajectories. This thesis presents a set of algorithms for creating trajectories from object detections while using a configurable analytic model. Presented algorithms are based on the clustering of detections, later even simple trajectories, into complex trajectories by their features, such as a timestamp (frame), bounding rectangle in the video frame and optionally, image crop defined by the bounding rectangle. To present the usage of the generated trajectories, we then introduce methods for further analysis and data extraction. The first method improves the input detections by adding missing detection due to the detector error. The second one is creating a simple semantic description of trajectories to enable further research, such as action analysis or trajectory searching. 1
Named Entity Recognition and Its Application to Phishing Detection
Pop, Tomáš ; Skopal, Tomáš (advisor) ; Vomlelová, Marta (referee)
This thesis focuses on named entity recognition applied to email phishing detection. Named entity recognition is a classification task that aims to extract information from a text into a predefined set of categories (named entities), such as organizations, person names, or locations. The thesis describes various named entity recognition approaches, ranging from simple utilizations of neural networks to the current state-of-the-art archi- tectures. The most prevalent libraries and their models in named entity recognition are compared against each other from the computational and predictive performance per- spective on the publicly available Enron email dataset. Moreover, differences in terms of named entities between positive (including phishing) and negative emails are measured on a proprietary dataset. Ultimately, the proprietary dataset is used for an experiment where a phishing email classification workflow is enriched with named entities to conclude whether named entities are helpful for the classifier to improve predictive performance. According to the experiment outcomes, a noticeable dissimilarity was measured regarding named entities in positive and negative emails. However, in the phishing email classifica- tion experiment with the provided dataset, it was concluded that named entities do not offer...
Data Preprocessing Strategies in Imbalanced Data Classification
Haluška, Radovan ; Skopal, Tomáš (advisor) ; Svoboda, Martin (referee)
Learning from imbalanced data has been a research topic studied for many years. There are two main approaches used today - data-level and algorithm- level methods. We set out to study resampling methods which belong to the category of data-level methods. These methods modify the training part of a dataset as opposed to algorithm-level methods, which modify a classifier itself. Resampling methods are further divided into oversampling and un- dersampling methods. It is challenging to know which group of methods performs better and which algorithms stand out the most. We conducted an experiment of unseen scale. We systematically and robustly compared sixteen preprocessing methods over eighteen imbalanced datasets and sum- marised the results in this thesis. The results show that oversampling meth- ods outperformed most undersampling methods in both performance and preprocessing time. 1
Index Suitable for Similar Search in High-dimensional Spaces
Krejčová, Martina ; Kopecký, Michal (advisor) ; Skopal, Tomáš (referee)
In this paper, we focus on indexing and searching in high-dimensional data. To achieve the target we implemented the Metric Index, a model of the similarity search based on the metric spaces, that employs many of known principles of partitioning and filtering. The metric space is a general model of similarity, which enables the usage of implemented index for various data. With this index, stored data could be searched effectively. The internal structure of data is hidden, we just require an implementation of the function for feature extraction, which produces a vector representing data, and the metric function applicable to the given data. The Metric Index was implemented as a data cartridge, the mechanism for extending the capabilities of the Oracle server. This data cartridge enables indexing of large unstructured data in the Oracle server known as LOBs.
Integrace Sociálních sítí
Mašíček, Viktor ; Tykal, Jaroslav (advisor) ; Skopal, Tomáš (referee)
Social networks are the current phenomenon, and their integration begins to gain importance. Basic idea is to pair identical pieces of information stored in various social networks and to detect inconsistencies between them. The most important data that need to be integrated are user profi les and lists of contacts. In our work, we propose the integration of both the pro files and lists of contacts, and even groups of users that could be created in social networks. Implementation is not part of the work. However the proposal suggests the creation of Main social network in form of a web application, that brokers the integration to end users. From the perspective of users it is bene cial to display data from all social networks in one place and automate detection of di erences. User's data could be used for commercial purposes, of course, within legal limits. From this perspective, the biggest contribution of Main social network is a social graph. It consists of various social graphs of social networks. Additionally it would contain information about membership of users in social networks. The proposal consists of three main parts: main processes of the Main social network, its data model and means of gathering information from different networks. The design of data model was partially inspired by existing...
Genetic alogrithms: Characterical syllables of language
Kuthan, Tomáš ; Lánský, Jan (advisor) ; Skopal, Tomáš (referee)
Syllable based compression is a new approach to text compression. An important aspect of this approach are the dictionaries of common syllables. They are used in compression algorithms initialization and greatly affect the compression ratio. Until now they were created by a rather straight-forward analysis of text corpora. We believe that dictionaries created by genetic algorithms may help us lower the compression ratio. In this study we will design such an algorithm and test it on Czech and English texts.
Modification of Pivot Tables method for persistent metric indexing
Moško, Juraj ; Skopal, Tomáš (advisor) ; Hoksza, David (referee)
The pivot tables is one of the most effective metric access method optimized for a number of distance computations in similarity search. In this work the new modification of the pivot tables method was proposed that is besides distance computations optimized also for a number of I/O operations. Proposed Clustered pivot tables method is indexing clusters of similar objects that were created by another metric access method - the M-tree. The indexing of clustered objects has a positive effect for searching within indexed database. Whereas the clusters are paged in second memory, page containing such cluster, which do not satisfy particular query, is not accessed in second memory at all. Non-relevant objects, that are out of the query range, are not loaded into memory, what has the effect of decreasing number of I/O operations and total volume of transferred data. The correctness of proposed approach was experimentally proved and experimental results of proposed method was compared to selected metric access methods.
Fraktální komprese časových řad
Lysík, Martin ; Skopal, Tomáš (advisor) ; Koubková, Alena (referee)
The aim of this work was looking for single dimensional distributions of fractals in real world time series and use them to compress these time series. Usability of these principles for both lossless and lossy compression was examined. Base on the problem analysis was as first designed and implemented the basic compression algorithm. This was progressively extended with simple heuristics for better performance and also other techniques, which should have reduced its deficiencies. As the result were created two more extended compression algorithms and one algorithm with different data processing. Properties of these algorithms, output sizes and quality of decompressed data were compared on several input data and algorithms were also compared with existing compress algorithms and methods for storing time series data.
Deletion in coalesced hashing
Mrkva, Lukáš ; Koubková, Alena (advisor) ; Skopal, Tomáš (referee)
Nazev pracc: Opcra.ee DELETE ve srustajicim luisovani Autor: Lukas Mrkva Katedra (listav): Katedra softwaroveho inzenyrstvi Vedouci diplomovc prace: R.NDr. Alona Koubkova, CSc. E-rnail vedouciho: koubkova@ksi.ms.mff.cuni.cz Abstrakt: Diplomova pnioe jo vcnovana opcraci DELETE vo srustajicim hasovani. Nejprve jsou uvodeny principy hasovani a nektere jeho zakladni druhy. O srnstajicim hasovani pojednava ka])itola 3, kde jsou podrobnc ]>o- psany i ruznc melody koikstrukce h;usovaoi tal)ulky ro/dekuio die pofadi ko- liznich zaznamu a pfitonniosti sklepa. Dale jsou ])fcdstaveny tri rozdilne al- goritmy pro opora.ci DKLI^TK a dctailne diskutovtiny jojich implnincntacc pro jcdnutlivo inotody srustajiciho ha.sova.ni. Po tooroticke cayti naslcdiiji vy- slodky a koinontafo oxpcriincntu na siniulovanych datodi. Pracr jc zainefcna zejmena na porovnani casovu narooiiosti jednotlivych mazacich algoritnm a na porovnani ca.su potfcl)nych k vyhlcdavani za'/nanm prod a po smazani cast! tabulky. Pouzito algoritiny iiu])lciu(1iitovane v ja/yco C' a vyslodky ex- pcrimcntn jsou ]>rilozouy na CD. Klfcova slova: srustajici hasovani. delcto Title: Deletion in Coalesced Hashing Author: Lukas Mrkva Department: Dopartnicnt of Software Engineering Supervisor: RXDr. Aleua Koulikova,CSc. Supervisor's e-mail address:...
Vyhledávací stroj pro matematiku
Mišutka, Jozef ; Galamboš, Leo (advisor) ; Skopal, Tomáš (referee)
The WWW is dominated by search engines such as Google. They are inseparable part of everyday search for information. Theoretical research field interested in searching, the information retrieval, focuses mainly on the natural language constructs - words. During the last years the field has been extended to other searchable content as well. The world of mathematical knowledge on the WWW has grown enormously. The importance of a general mathematical search engine is clear. However, this research field had been abandoned until very recently. Despite the fact that an active ongoing research is in progress, few practical results have been presented. The main goal of this thesis is to fill this gap. A new mathematical search engine was proposed with the focus on applicability. As the only capable search engine of indexing WWW effectively is the full text search engine it was used as the basis. The mathematical extension was designed as an extension which allows it to exploit and use all the advantages of the full text search engine. Most of the mathematical documents do not contain semantic information. The solution to this problem was one of the main goals of this thesis. The extensive evaluation showed that the proposed search engine has many advantages. The most important one is the usability over a large...

National Repository of Grey Literature : 127 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.