Název:
Modeling the spread of loanwords in South-East Asia using sailing navigation software and Bayesian networks
Autoři:
Kratochvíl, F. ; Kratochvíl, Václav ; Saad, G. ; Vomlel, Jiří Typ dokumentu: Příspěvky z konference Konference/Akce: WUPES 2022: 12th Workshop on Uncertainty Processing, Kutná Hora (CZ), 20220601
Rok:
2022
Jazyk:
eng
Abstrakt: A loanword is a word permanently adopted from one language and incorporated into another language without translation. In this paper, we study loanwords in the South-East Asia Archipelago, home to a large number of languages. Our paper is inspired by the works of Hoffmann et al. (2021) Bayesian methods are applied to probabilistic modeling of family trees representing the history of language families and by Haynie et al. (2014) modeling the diffusion of a special class of loanwords, so-called Wanderw ̈orter in languages of Australia, North America, and South America. We assume that in the South-East Asia Archipelago Wanderwörter spread along specific maritime trade routes whose geographical characteristics can help unravel the history of Wanderwörter diffusion in the area. For millennia trade was conducted using sailing ships which were constrained by the monsoon system and in certain areas also by strong sea currents. Therefore rather than the geographical distances, the travel times of sailing ships should be considered as a major factor determining the intensity of contact among cultures. We use sailing navigation software to estimate travel times between different ports and show that the estimated travel times correspond well to the travel times of a Chinese map of the sea trade routes from the early seventeenth century. We model the spread of loanwords using a probabilistic graphical model - a Bayesian network. We design a novel heuristic Bayesian network structure learning algorithm that learns the structure as a union of spanning trees for graphs of all loanwords in the training dataset. We compare this algorithm with BIC optimal Bayesian networks by measuring how well these models predict the true presence/absence of a loanword. Interestingly, Bayesian networks learned by our heuristic spanning tree-based algorithm provide better results than the BIC optimal Bayesian networks.
Klíčová slova:
Bayesian methods; loanwords; probabilistic graphical model Číslo projektu: GA20-18407S (CEP) Poskytovatel projektu: GA ČR Zdrojový dokument: Proceedings of the 12th Workshop on Uncertainty Processing, ISBN 978-80-7378-460-7