Original title: Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat
Translated title: Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat
Authors: Oluokun, Adedayo ; Zeman, Daniel (advisor) ; Rosa, Rudolf (referee)
Document type: Master’s theses
Year: 2018
Language: eng
Abstract: The goal of this thesis is to create a dependency treebank for Yorùbá, a language with very little pre-existing machine-readable resources. The treebank follows the Universal Dependencies (UD) annotation standard, certain language-specific guidelines for Yorùbá were specified. Known techniques for porting resources from resource-rich languages were tested, in particular projection of annotation across parallel bilingual data. Manual annotation is not the main focus of this thesis; nevertheless, a small portion of the data was verified manually in order to evaluate the annotation quality. Also, a model was trained on the manual annotation using UDPipe.
Keywords: annotation; dependency parsing; low-resource; parallel data; part-of-speech tagging; projection; UDPipe; jazyky s nedostatečnými zdroji; universal dependencies; závislostní syntax

Institution: Charles University Faculties (theses) (web)
Document availability information: Available in the Charles University Digital Repository.
Original record: http://hdl.handle.net/20.500.11956/101633

Permalink: http://www.nusl.cz/ntk/nusl-387891


The record appears in these collections:
Universities and colleges > Public universities > Charles University > Charles University Faculties (theses)
Academic theses (ETDs) > Master’s theses
 Record created 2018-11-15, last modified 2022-03-04


No fulltext
  • Export as DC, NUŠL, RIS
  • Share