Original title:
Unsupervised and Semi-Supervised Multilingual Learning for Resource-Poor Languages
Translated title:
Unsupervised and Semi-Supervised Multilingual Learning for Resource-Poor Languages
Authors:
Tran, Manh-Ke ; Zeman, Daniel (advisor) ; Vidová Hladká, Barbora (referee) Document type: Master’s theses
Year:
2012
Language:
eng Abstract:
[eng][cze] This thesis focuses on unsupervised morphological seg- mentation, the fundamental task in NLP which aims to break words into morphemes. I describe and re-implement a model proposed in Lee et al. (2011) and evaluate it on 4 languages. Moreover, I present a generative model that could use word representation as extra fea- tures. The word representations are leant in unsupervised manner using neural language model. The experiment shows that using extra features improves the performance of the unsupervised model.Pra ce se zaměřuje na neř zenou morfologickou segmentaci, jednu ze za kladn ch u loh poč tačov eho zpracov an přirozen eho jazyka. V t eto u loze je c lem rozložit slova na morf emy. Popisuji a reim- plementuji model navrženy v Lee et al. (2011) a vyhodnocuji ho na 4 jazyc ch. Nav c navrhuji generativn model, ktery dok aže využ t reprezentaci slov jako př davn e rysy. Slovn reprezentace jsou rovněž z sk ava ny neř zeny m zp usobem pomoc strojov eho učen a neuronov eho jazykov eho modelu. Pokusy ukazuj , že s využit m těchto př davny ch rys u celkova u spěšnost neř zen eho modelu vzr usta .
Keywords:
machine learning; morphology; natural language; syntax; morfologie; přirozený jazyk; strojové učení; syntaxe
Institution: Charles University Faculties (theses)
(web)
Document availability information: Available in the Charles University Digital Repository. Original record: http://hdl.handle.net/20.500.11956/40830