Empirické modely pro indické jazykové kontinuum

Bafna, Niyati

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Academic theses (ETDs) > Master’s theses > Empirické modely pro indické jazykové kontinuum

Original title: Empirické modely pro indické jazykové kontinuum
Translated title: Empirical Models for an Indic Language Continuum
Authors: Bafna, Niyati ; Žabokrtský, Zdeněk (advisor) ; Zeman, Daniel (referee)
Document type: Master’s theses
Year: 2022
Language: eng
Abstract: Empirical Models for an Indic Language Continuum Niyati Bafna July 20, 2022 Many Indic languages and dialects of the so-called "Hindi Belt" and surrounding re- gions in the Indian subcontinent, spoken by more than 100 million people, are severely under-resourced and under-researched in NLP, individually and as a dialect continuum. We first collect monolingual data for 26 Indic languages and dialects, 16 of which were previously zero-resource, and perform exploratory character, lexical and subword cross- lingual alignment experiments for the first time on this linguistic system. We present a novel method for unsupervised cognate/borrowing identification from monolingual cor- pora designed for low and extremely low resource scenarios, based on combining noisy se- mantic signals from joint bilingual spaces with orthographic cues modelling sound change; to the best of our knowledge, this is the first work to do so, especially in a (truly) low- resource setup. We create bilingual evaluation lexicons against Hindi for 20 of the lan- guages, and show that our method outperforms both traditional orthography baselines as well as EM-style learnt edit distance matrices, showing that even noisy bilingual em- beddings can act as good guides for this task. We release our crawled data in a new collection called...
Keywords: multilingual data|language continuum|Natural Language Processing; vícejazyčná data|jazykové kontinuum|zpracování přirozeného jazyka

Institution: Charles University Faculties (theses) (web)
Document availability information: Available in the Charles University Digital Repository.
Original record: http://hdl.handle.net/20.500.11956/175497

Permalink: http://www.nusl.cz/ntk/nusl-508776

The record appears in these collections:
Universities and colleges > Public universities > Charles University > Charles University Faculties (theses)
Academic theses (ETDs) > Master’s theses

Record created 2022-09-28, last modified 2023-12-17

Similar records

No fulltext

Export as DC, NUŠL, RIS
Share

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English