Národní úložiště šedé literatury Nalezeno 1 záznamů.  Hledání trvalo 0.01 vteřin. 
Designing a Multilingual Fact-Checking Dataset from Existing Question-Answering Data
Kamenický, Daniel ; Aparovich, Maksim (oponent) ; Fajčík, Martin (vedoucí práce)
This thesis adresses the lack of multilingual fact-Checking datasets, which contain annotated evidence grounding the supporting or refuting verdict for a fact. Therefore, this work explores the conversion into the fact-checking dataset from an already existing question-answering dataset. In this work, two approaches for converting question-answer pairs into claims are studied. The first approach is to create a dataset based on a monolingual pre-trained seq-2-seq model T5. The model is trained on an English dataset and the inputs and outputs are translated into the desired languages. The second approach is to use the multilingual mT5 model, which can take input and generate output in the desired language. For multilingual model, training datasets need to be translated. The main problem of this work is the machine translation, which achieved around 30 % success rate in a low-resource languages. The experiments showed better results for claims generated from monolingual model using machine translation. On the other hand, the claims generated from multilingual model achieved a success rate of 73 % compared to monolingual model with a success rate of 88 %. Finally, to analyze possible biases label specific claim biases, a logistic-regression based TF-IDF classifier is trained. The classifier, that computes the probability of the claim's veracity just from itself achieves accuracy close to 0.5 for both converted datasets. Thus the converted datasets can be challenging for fact-checking models.

Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.