Národní úložiště šedé literatury Nalezeno 29 záznamů.  1 - 10dalšíkonec  přejít na záznam: Hledání trvalo 0.00 vteřin. 
Generating Code from Textual Description of Functionality
Šamánek, Jan ; Fajčík, Martin (oponent) ; Smrž, Pavel (vedoucí práce)
As machine learning and neural network models continue to grow, there is an increasing demand for GPU-accelerated resources and algorithms to support them. Large language models have the potential to assist with this task, as they are already used as coding assistants for popular programming languages. If these models could also learn less commonly used paradigms like CUDA, they could help develop and maintain the necessary systems. This thesis aims to explore the capabilities of modern language models for learning CUDA as a programming paradigm and creating a training corpus specifically for this purpose.
Automated Truth Discovery
Kočí, Jan ; Ondřej, Karel (oponent) ; Fajčík, Martin (vedoucí práce)
This thesis aims to (i) better understand the biases and cues exploited by content-based methods in the text of fake news articles and (ii) evaluate their performance in predicting the reliability of articles and media sources. Two different models are implemented. The baseline model uses TF-IDF and Multinomial Naive Bayes (MNB) classifier. The second model uses the BERT transformer. To study the cues exploited in the text a method of interpretability is implemented. While MNB is interpretable by design, the BERT model is analyzed through the Integrated gradients explainability method. Both classifiers were trained on a modified version of the NELA-GT-2021 dataset. This thesis suggests application of preprocessing to this dataset which could lead to creating a more robust classifier, e.g., removing keywords that provide simple cues. This thesis also presents a novel FNI dataset consisting of 46 manually selected articles. The FNI dataset enables topic-wise analysis (on topics such as covid, football, science, politics, etc.). The analysis revealed several biases of the classifiers. The baseline model was not able to identify unreliable articles about football (0\% recall on the FNI dataset), reliable scientific articles (0\% recall on the FNI dataset), etc. Both classifiers were more successful in identifying unreliable articles with the BERT classifier having a recall of 91\% on unreliable and only 78\% on reliable articles in the FNI dataset. The methods of interpretability also performed better on unreliable articles and were able to identify the sensationalism and shocking headlines used in fake news. The classifiers are also used to predict the credibility of sources. The results are compared with a state-of-the-art method that employs a different approach of using mutual citations of sources to predict their credibility. One of the outcomes of this thesis is also a new challenge set, containing articles from the NELA dataset on which the classifiers failed. This challenge set can be used for future research in this area.
Cross Lingual News Article Classification and Automatic Topic Discovery Using Multilingual Language Models
Dufková, Aneta ; Fajčík, Martin (oponent) ; Kesiraju, Santosh (vedoucí práce)
The goal of this thesis is to perform cross-lingual classification and automatic topic discovery of news articles using pre-trained multilingual language models. For this task, no large multilingual dataset is available, so the first contribution of this thesis is to create one. The other aim of this thesis is to benchmark multilingual embedding models LaBSE and LASER2 in a classification task. This is done through various experiments, such as training on a limited number of articles and naturally zero-shot learning. Then, a topic discovery is performed so that an article can be represented not only by categories but also by the most representative words. Lastly, the results of classification and topic discovery are visualized in a simple web application.
Designing a Multilingual Fact-Checking Dataset from Existing Question-Answering Data
Kamenický, Daniel ; Aparovich, Maksim (oponent) ; Fajčík, Martin (vedoucí práce)
This thesis adresses the lack of multilingual fact-Checking datasets, which contain annotated evidence grounding the supporting or refuting verdict for a fact. Therefore, this work explores the conversion into the fact-checking dataset from an already existing question-answering dataset. In this work, two approaches for converting question-answer pairs into claims are studied. The first approach is to create a dataset based on a monolingual pre-trained seq-2-seq model T5. The model is trained on an English dataset and the inputs and outputs are translated into the desired languages. The second approach is to use the multilingual mT5 model, which can take input and generate output in the desired language. For multilingual model, training datasets need to be translated. The main problem of this work is the machine translation, which achieved around 30 % success rate in a low-resource languages. The experiments showed better results for claims generated from monolingual model using machine translation. On the other hand, the claims generated from multilingual model achieved a success rate of 73 % compared to monolingual model with a success rate of 88 %. Finally, to analyze possible biases label specific claim biases, a logistic-regression based TF-IDF classifier is trained. The classifier, that computes the probability of the claim's veracity just from itself achieves accuracy close to 0.5 for both converted datasets. Thus the converted datasets can be challenging for fact-checking models.
Komunikační agent pro informace o Brně
Křištof, Jiří ; Fajčík, Martin (oponent) ; Smrž, Pavel (vedoucí práce)
Cílem této práce je implementace komunikačního agenta poskytující informace o Brně. Komunikační agent využívá třívrstvé architektury. Pro vlastní odpovídání na otázky jsou použity techniky strojového učení a neuronových sítí. Na základě provedeného testu bylo se systémem spokojeno 58 % respondentů, s přesností odpovědí poté 84 % uživatelů. Přínosem této práce je usnadnění získávání informací o Brně jeho obyvatelům i návštěvníkům.
Multilingual Open-Domain Question Answering
Slávka, Michal ; Dočekal, Martin (oponent) ; Fajčík, Martin (vedoucí práce)
This thesis explores automatic Multilingual Open-Domain Question Answering. In this work are proposed approaches to this less explored research area. More precisely, this work examines if: (i) utilization of an English system is sufficient, (ii) multilingual models can benefit from a translated question into other languages (iii) or avoiding translation is a better choice. English system based on the T5 model that uses a machine translation is compared to natively multilingual systems based on the multilingual MT5 model. The English system with machine translation only slightly outperforms its monolingual counterparts in multiple tasks. Compared to multilingual models, the English system was trained on a much larger dataset, but the results were comparable. This shows that the use of natively multilingual systems is a promising approach for future research. I also present a method of retrieving multilingual evidence using the BM25 ranking algorithm and compare it with English retrieval. The use of multilingual evidence seems to be beneficial and improves the performance of the systems.
Strojové učení pro odpovídání na otázky v přirozeném jazyce
Sasín, Jonáš ; Fajčík, Martin (oponent) ; Smrž, Pavel (vedoucí práce)
Práce se zabývá odpovídáním na otázky v přirozeném jazyce nad českou Wikipedií. Systémy pro odpovídání na otázky získávají rostoucí popularitu, většina jich ale vzniká pro angličtinu. Cílem této práce je prozkoumat dostupné možnosti a datové sady a vytvořit takový systém pro češtinu. V práci jsem se zaměřil na dva přístupy. Jeden z nich využívá pro extrakci odpovědi anglický model ALBERT a strojový překlad pasáží. Druhý využívá vícejazyčný model BERT. V práci je provedeno porovnání několika variant systému. Diskutovány jsou také možnosti získávání relevantních pasáží. Pro všechny varianty testovaných systémů je provedeno vyhodnocení pomocí standardních metrik. Nejlepší varianta systému byla vyhodnocena na datové sadě SQAD v3.0 s úspěšností 0,44 EM a 0,55 F1 skóre, což je v porovnání s existujícími systémy vynikající výsledek. Hlavním přínosem této práce je analýza možností a nasazení laťky pro další vývoj lepších systémů pro češtinu. 
Visual Question Answering
Kocurek, Pavel ; Ondřej, Karel (oponent) ; Fajčík, Martin (vedoucí práce)
Visual Question Answering (VQA) is a system where an image and a question are used as input and the output is an answer. Despite many research advances, unlike image captioning, VQA is rarely used in practice. This work aims to narrow the gap between research and practice. To examine the possibility of using VQA by blind and visually impaired people, this thesis proposes a demonstrative VQA application and then, a smartphone application. The study with 20 participants from the community was conducted. Firstly, the participants received an application for two weeks. Then, each of them was asked to fill out the questionnaire. 80 % of respondents rated the accuracy of VQA application as sufficient or better and most of them would appreciate it if their image captioning application also supported VQA. Following this discovery, this work tries to establish the link between image captioning and VQA. In particular, the work studies the informativeness provided by both systems in different scenarios. It collects a novel dataset of 111 images with manually annotated captions and diverse scenes. An experiment comparing obtained knowledge showed a success rate of 69.9 % and 46.2 % for VQA and image captioning, respectively. In another experiment 70.9 % of the time, participants were able to select the correct caption based on VQA. The results suggest that VQA outperforms image captioning regarding image details, therefore should be used in practice more often.
Neuronový strojový překlad pro jazykové páry s malým množstvím trénovacích dat
Filo, Denis ; Fajčík, Martin (oponent) ; Jon, Josef (vedoucí práce)
Táto práca sa zaoberá neurónovým strojovým prekladom pre tzv. low-resource jazyky. Cieľom bolo pomocou experimentov vyhodnotiť súčasné techniky a navrhnúť ich vylepšenia. Prekladové systémy v tejto práci využívali architektúru neurónových sietí transformer a boli natrénované pomocou frameworku Marian. Vybranými jazykovými pármi pre experimenty boli slovenčina s chorvátčinou a slovenčina so srbčinou. V experimentoch boli predmetom skúmania techniky transfer learning a semi-supervised learning.
Machine Learning for Question Answering in Czech
Pastorek, Peter ; Fajčík, Martin (oponent) ; Smrž, Pavel (vedoucí práce)
This Master's thesis deals with teaching neural network question answering in Czech. Neural networks are created in Python programming language using the PyTorch library. They are created based on the LSTM structure. They are trained on the Czech SQAD dataset. Because Czech data set is smaller than the English data sets, I opted to extend neural networks with algorithmic procedures. For easier application of algorithmic procedures and better accuracy, I divide question answering into smaller parts.

Národní úložiště šedé literatury : Nalezeno 29 záznamů.   1 - 10dalšíkonec  přejít na záznam:
Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.