Original title: Modeling the spread of loanwords in South-East Asia using sailing navigation software and Bayesian networks
Authors: Kratochvíl, F. ; Kratochvíl, Václav ; Saad, G. ; Vomlel, Jiří
Document type: Papers
Conference/Event: WUPES 2022: 12th Workshop on Uncertainty Processing, Kutná Hora (CZ), 20220601
Year: 2022
Language: eng
Abstract: A loanword is a word permanently adopted from one language and incorporated into another language without translation. In this paper, we study loanwords in the South-East Asia Archipelago, home to a large number of languages. Our paper is inspired by the works of Hoffmann et al. (2021) Bayesian methods are applied to probabilistic modeling of family trees representing the history of language families and by Haynie et al. (2014) modeling the diffusion of a special class of loanwords, so-called Wanderw ̈orter in languages of Australia, North America, and South America. We assume that in the South-East Asia Archipelago Wanderwörter spread along specific maritime trade routes whose geographical characteristics can help unravel the history of Wanderwörter diffusion in the area. For millennia trade was conducted using sailing ships which were constrained by the monsoon system and in certain areas also by strong sea currents. Therefore rather than the geographical distances, the travel times of sailing ships should be considered as a major factor determining the intensity of contact among cultures. We use sailing navigation software to estimate travel times between different ports and show that the estimated travel times correspond well to the travel times of a Chinese map of the sea trade routes from the early seventeenth century. We model the spread of loanwords using a probabilistic graphical model - a Bayesian network. We design a novel heuristic Bayesian network structure learning algorithm that learns the structure as a union of spanning trees for graphs of all loanwords in the training dataset. We compare this algorithm with BIC optimal Bayesian networks by measuring how well these models predict the true presence/absence of a loanword. Interestingly, Bayesian networks learned by our heuristic spanning tree-based algorithm provide better results than the BIC optimal Bayesian networks.
Keywords: Bayesian methods; loanwords; probabilistic graphical model
Project no.: GA20-18407S (CEP)
Funding provider: GA ČR
Host item entry: Proceedings of the 12th Workshop on Uncertainty Processing, ISBN 978-80-7378-460-7

Institution: Institute of Information Theory and Automation AS ČR (web)
Document availability information: Fulltext is available at external website.
External URL: http://library.utia.cas.cz/separaty/2022/MTR/kratochvil-0558164.pdf
Original record: http://hdl.handle.net/11104/0332323

Permalink: http://www.nusl.cz/ntk/nusl-508632


The record appears in these collections:
Research > Institutes ASCR > Institute of Information Theory and Automation
Conference materials > Papers
 Record created 2022-09-28, last modified 2023-03-28


No fulltext
  • Export as DC, NUŠL, RIS
  • Share