Název: Modeling the spread of loanwords in South-East Asia using sailing navigation software and Bayesian networks
Autoři: Kratochvíl, F. ; Kratochvíl, Václav ; Saad, G. ; Vomlel, Jiří
Typ dokumentu: Příspěvky z konference
Konference/Akce: WUPES 2022: 12th Workshop on Uncertainty Processing, Kutná Hora (CZ), 20220601
Rok: 2022
Jazyk: eng
Abstrakt: A loanword is a word permanently adopted from one language and incorporated into another language without translation. In this paper, we study loanwords in the South-East Asia Archipelago, home to a large number of languages. Our paper is inspired by the works of Hoffmann et al. (2021) Bayesian methods are applied to probabilistic modeling of family trees representing the history of language families and by Haynie et al. (2014) modeling the diffusion of a special class of loanwords, so-called Wanderw ̈orter in languages of Australia, North America, and South America. We assume that in the South-East Asia Archipelago Wanderwörter spread along specific maritime trade routes whose geographical characteristics can help unravel the history of Wanderwörter diffusion in the area. For millennia trade was conducted using sailing ships which were constrained by the monsoon system and in certain areas also by strong sea currents. Therefore rather than the geographical distances, the travel times of sailing ships should be considered as a major factor determining the intensity of contact among cultures. We use sailing navigation software to estimate travel times between different ports and show that the estimated travel times correspond well to the travel times of a Chinese map of the sea trade routes from the early seventeenth century. We model the spread of loanwords using a probabilistic graphical model - a Bayesian network. We design a novel heuristic Bayesian network structure learning algorithm that learns the structure as a union of spanning trees for graphs of all loanwords in the training dataset. We compare this algorithm with BIC optimal Bayesian networks by measuring how well these models predict the true presence/absence of a loanword. Interestingly, Bayesian networks learned by our heuristic spanning tree-based algorithm provide better results than the BIC optimal Bayesian networks.
Klíčová slova: Bayesian methods; loanwords; probabilistic graphical model
Číslo projektu: GA20-18407S (CEP)
Poskytovatel projektu: GA ČR
Zdrojový dokument: Proceedings of the 12th Workshop on Uncertainty Processing, ISBN 978-80-7378-460-7

Instituce: Ústav teorie informace a automatizace AV ČR (web)
Informace o dostupnosti dokumentu: Dokument je dostupný na externích webových stránkách.
Externí umístění souboru: http://library.utia.cas.cz/separaty/2022/MTR/kratochvil-0558164.pdf
Původní záznam: http://hdl.handle.net/11104/0332323

Trvalý odkaz NUŠL: http://www.nusl.cz/ntk/nusl-508632


Záznam je zařazen do těchto sbírek:
Věda a výzkum > AV ČR > Ústav teorie informace a automatizace
Konferenční materiály > Příspěvky z konference
 Záznam vytvořen dne 2022-09-28, naposledy upraven 2023-03-28.


Není přiložen dokument
  • Exportovat ve formátu DC, NUŠL, RIS
  • Sdílet