Original title:
Vícejazyčné učení pomocí víceúlohového trénování syntaxe
Translated title:
Multilingual Learning using Syntactic Multi-Task Training
Authors:
Kondratyuk, Daniel ; Straka, Milan (advisor) ; Mareček, David (referee) Document type: Master’s theses
Year:
2019
Language:
eng Abstract:
Recent research has shown promise in multilingual modeling, demonstrating how a single model is capable of learning tasks across several languages. However, typical recurrent neural models fail to scale beyond a small number of related lan- guages and can be quite detrimental if multiple distant languages are grouped together for training. This thesis introduces a simple method that does not have this scaling problem, producing a single multi-task model that predicts universal part-of-speech, morphological features, lemmas, and dependency trees simultane- ously for 124 Universal Dependencies treebanks across 75 languages. By leverag- ing the multilingual BERT model pretrained on 104 languages, we apply several modifications and fine-tune it on all available Universal Dependencies training data. The resulting model, we call UDify, can closely match or exceed state-of- the-art UPOS, UFeats, Lemmas, (and especially) UAS, and LAS scores, without requiring any recurrent or language-specific components. We evaluate UDify for multilingual learning, showing that low-resource languages benefit the most from cross-linguistic annotations. We also evaluate UDify for zero-shot learning, with results suggesting that multilingual training provides strong UD predictions even for languages that neither UDify nor BERT...
Institution: Charles University Faculties (theses)
(web)
Document availability information: Available in the Charles University Digital Repository. Original record: http://hdl.handle.net/20.500.11956/107286