National Repository of Grey Literature 51 records found  beginprevious21 - 30nextend  jump to record: Search took 0.00 seconds. 
Security log anonymization tool focusing on artificial intelligence techniques
Šťastná, Ariela ; Jurek, Michael (referee) ; Safonov, Yehor (advisor)
Systémy SIEM zohrávajú v rámci bezpečnostného monitoringu zásadnú úlohu. Zozbierané záznamy agregujú, normalizujú a filtrujú, čo predstavuje základ pre aplikovanie techník dolovania dát. Týmto spôsobom SIEMy prezentujú výborný zdroj veľkých objemov normalizovaných dát. Tieto dáta nesú potenciál pre dosiahnutie pokroku v bezpečnostnom výskume, dolovaní dát a umelej inteligencii, kde môžu viesť k zlepšeniu existujúcich metód prieskumu, sprehľadneniu skenovania siete a odhaleniu sofistikovanejších vektorov útoku. Avšak jedným z hlavných problémov pre využívanie týchto dát je skutočnosť, že dáta v logových záznamoch sú v mnohých prípadoch citlivé a môžu predstavovať riziko z hľadiska bezpečnosti. Z toho dôvodu bol vytvorený nástroj pre anonymizáciu citlivých údajov v logových záznamoch, ktorý zachováva korelácie medzi dátami. Hlavným cieľom bakalárskej práce je zamerať sa na technické a právne aspekty spracovania logov a anonymizáciu pre umelú inteligenciu. V rámci výskumu bola vykonaná analýza najčastejšie sa vyskytujúcich dát v logoch spolu s vyhodnotením ich rizikovosti, výsledkom čoho je vytvorenie kategórií dát vzhľadom na ich citlivosť. V práci je ďalej prezentovaná analýza súčasných SIEM systémov spolu s meta kľúčmi, ktoré využívajú.
Assessing the impact of manual corrections in the Groningen Meaning Bank
Weck, Benno ; Lopatková, Markéta (advisor) ; Vidová Hladká, Barbora (referee)
The Groningen Meaning Bank (GMB) project develops a corpus with rich syntactic and semantic annotations. Annotations in GMB are generated semi-automatically and stem from two sources: (i) Initial annotations from a set of standard NLP tools, (ii) Corrections/refinements by human annotators. For example, on the part-of-speech level of annotation there are currently 18,000 of those corrections, so called Bits of Wisdom (BOWs). For applying this information to boost the NLP processing we experiment how to use the BOWs in retraining the part-of-speech tagger and found that it can be improved to correct up to 70% of identified errors within held-out data. Moreover an improved tagger helps to raise the performance of the parser. Preferring sentences with a high rate of verified tags in retraining has proven to be the most reliable way. With a simulated active learning experiment using Query-by-Uncertainty (QBU) and Query-by- Committee (QBC) we proved that selectively sampling sentences for retraining yields better results with less data needed than random selection. In an additional pilot study we found that a standard maximum-entropy part-of-speech tagger can be augmented so that it uses already known tags to enhance its tagging decisions on an entire sequence without retraining a new model first. Powered by...
Generating Code from Textual Description of Functionality
Kačur, Ján ; Ondřej, Karel (referee) ; Smrž, Pavel (advisor)
The aim of this thesis was to design and implement system for code generation from textual description of functionality. In total, 2 systems were implemented. One of them served its purpose as a control prototype, the second one was the main product of this thesis. I focused on using smaller non-pre-trained models. Both systems used Transformer type model as their cores. The second system, unlike the first, used syntactic decomposition of both code and textual descriptions. Data used in both systems originated from project CodeSearchNet. Targer programming language to generate was Python. The second system achieved better quantitative results than the first one, with accuracy of 85% versus 60%. The system managed to auto-complete correct code to finish the function definition, with bigger time delay. This thesis is almost exclusively dedicated to the second system.
Phishing Detection Using Deep Learning Attention Techniques
Safonov, Yehor
In the modern world, electronic communication is defined as the most used technologyfor exchanging messages between users. The growing popularity of emails brings about considerablesecurity risks and transforms them into an universal tool for spreading phishing content. Even thoughtraditional techniques achieve high accuracy during spam filtering, they do not often catch up to therapid growth and evolution of spam techniques. These approaches are affected by overfitting issues,may converge into a poor local minimum, are inefficient in high-dimensional data processing andhave long-term maintainability problems. The main contribution of this paper is to develop and trainadvanced deep networks which use attention mechanisms for efficient phishing filtering and text understanding.Key aspects of the study lie in a detailed comparison of attention based machine learningmethods, their specifics and accuracy during the application to the phishing problem. From a practicalpoint of view, the paper is focused on email data corpus preprocessing. Deep learning attention basedmodels, for instance the BERT and the XLNet, have been successfully implemented and comparedusing statistical metrics. Obtained results show indisputable advantages of deep attention techniquescompared to the common approaches.
Machine Learning for Natural Language Question Answering
Sasín, Jonáš ; Fajčík, Martin (referee) ; Smrž, Pavel (advisor)
This thesis deals with natural language question answering using Czech Wikipedia. Question answering systems are experiencing growing popularity, but most of them are developed for English. The main purpose of this work is to explore possibilities and datasets available and create such system for Czech. In the thesis I focused on two approaches. One of them uses English model ALBERT and machine translation of passages. The other one utilizes the multilingual BERT. Several variants of the system are compared in this work. Possibilities of relevant passage retrieval are also discussed. Standard evaluation is provided for every variant of the tested system. The best system version has been evaluated on the SQAD v3.0 dataset, reaching 0.44 EM and 0.55 F1 score, which is an excellent result compared to other existing systems. The main contribution of this work is the analysis of existing possibilities and setting a benchmark for further development of better systems for Czech.
Generator of computer descriptions
Matějka, Jan ; Rosa, Rudolf (advisor) ; Dušek, Ondřej (referee)
This thesis deals with the problem of generating coherent and well-formed sentences from structured data. The goal of the thesis is to create a tool which could make generating brief descriptions of electronics based on parameters in the form of structured data easier. The tool can be useful for e.g. e-shops with such electronics. The first part of the thesis introduces possible solutions to this problem. The thesis next describes data needed for solving the problem, including the ways of acquiring such data and structure of the data. Two selected solutions are then described including their implementation. The thesis then examines the advantages and disadvantages of the selected solutions and evaluates texts generated by the created tool.
Analýza textových používateľských hodnotení vybranej skupiny produktov
Valovič, Roman
This work focuses on the design of a system that identifies frequently discussed product features in product reviews, summarizes them, and displays them to the user in terms of sentiment. The work deals with the issue of natural language processing, with a specific focus on Czech languague. The reader will be introduced the methods of preprocessing the text and their impact on the quality of the analysis results. The identification of the mainly discussed products features is carried out by cluster analysis using the K-Means algorithm, where we assume that sufficiently internally homogeneous clusters will represent the individual features of the products. A new area that will be explored in this work is the representation of documents using the Word embeddings technique, and its potential of using vector space as input for machine learning algorithms.
XML Databases for Dictionary Data Management
Samia, Michel ; Dytrych, Jaroslav (referee) ; Smrž, Pavel (advisor)
The following diploma thesis deals with dictionary data processing, especially those in XML based formats. At first, the reader is acquainted with linguistic and lexicographical terms used in this work. Then particular lexicographical data format types and specific formats are introduced. Their advantages and disadvantages are discussed as well. According to previously set criteria, the LMF format has been chosen for design and implementation of Python application, which focuses especially on intelligent merging of more dictionaries into one. After passing all unit tests, this application has been used for processing LMF dictionaries, located on the faculty server of the research group for natural language processing. Finally, the advantages and disadvantages of this application are discussed and ways of further usage and extension are suggested.

National Repository of Grey Literature : 51 records found   beginprevious21 - 30nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.