Modeling Russian-Slovak Interlingual Homonymy Via Embeddings

: Hits: 3009

Gajarsky L., Kipchatov M.

Article:

DOI: 10.26170/2071-2405-2026-31-1-174-184

Abstract: Interlingual lexical interference has long been an object of scholarly research. It is well documented that when learning a foreign language genetically related to the learner’s native language, increased levels of cross-linguistic interference can occur. From this perspective, interlingual homonyms represent particularly high-risk lexical items, as their formal similarity may lead to incorrect mapping between form and meaning. Embedding models capture distributional differences in language use and broader semantic associations. In the present study, static embedding models (fastText and MUSE) and a deep embedding model (OpenAI) were applied to a dataset of Russian-Slovak lexical pairs consisting of interlingual homonyms and translation equivalents (150 items in each category). Based on similarity patterns observed across the models, a five-level typology of interlingual homonymy was proposed (evident-risk, hidden-risk, medium-risk, conceptual-risk, and asymmetric-risk). The predictive potential of the model was tested on a sample of 46 Slovak learners of Russian as a foreign language. A consistent correspondence between model-based risk predictions and learner performance was observed. Lexical items classified as high-risk produced significantly higher error rates among learners, and an asymmetry between productive and receptive tasks was also observed. The results suggest that embedding models may serve as an empirically grounded tool for supporting vocabulary learning in closely related languages.

Key words: interlingual homonymy; vector word presentations; predictive didactics; lexica; interference; lexical units; language modeling; embeddings; lexical pairs; interlingual homonyms; Russian language; Russian lexicology; Slovak language; Slovak lexicology; Russian as a foreign language; methods of teaching Russian; Slovak students

Для цитирования:

Гаярски, Л. Моделирование русско-словацкой межъязыковой омонимии с помощью эмбеддингов / Л. Гаярски, М. Кипчатов // Philological Class. – 2026. – Vol. 31 • No. 1. – С. 174-184. DOI 10.26170/2071-2405-2026-31-1-174-184.

For citation

Gajarsky, L., Kipchatov, M. (2026). Modeling Russian-Slovak Interlingual Homonymy Via Embeddings. In Philological Class. 2026. Vol. 31 • No. 1. P. 174-184. DOI 10.26170/2071-2405-2026-31-1-174-184.

About the author(s) :

Lukas Gajarsky

University of Ss. Cyril and Methodius in Trnava (Trnava, Slovakia)

ORCID ID: https://orcid.org/0000-0001-8090-6977

Mikhail Kipchatov

University of Ss. Cyril and Methodius in Trnava (Trnava, Slovakia)

ORCID ID: https://orcid.org/0000-0003-3021-6390

Publication Timeline:

Date of receipt: 11.02.2026; date of publication: 31.03.2026

References:

Alkhuzaey, S., Grasso, F., Payne, T. R., Tamma, V. (2023). Text-based Question Difficulty Prediction: A Systematic Review of Automatic Approaches. International Journal of Artificial Intelligence in Education. DOI: 10.1007/s40593-023-00362-1. EDN WYPOSM.

Boleda, G. (2020). Distributional Semantics and Linguistic Theory. Annual Review of Linguistics, 6(1), 213–234. DOI: 10.1146/
annurev-linguistics-011619-030303. EDN BYOZBQ.

Brown, T. B., Mann, B., Ryder, N. et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems (vol. 33), 1877–1901. Curran Associates, Inc. DOI: abs/10.5555/3495724.3495883.

Csiriková, M., Koníčková, N. (2015). Zrádná slova v ruštině: Slovník rusko-českých homonym. LEDA.

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2016). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 4171–4186. Association for Computational Linguistics.

Gajarský, L., Grigorjanová, T. (2020). Učebnica rusko-slovenských homonym. Tribun EU.

Grigorjanová, T., Gajarský, L. (2019). Slovník rusko-slovenských medzijazykových homonym. Tribun EU.

Iermachkova, O., Spišiaková, A. (2024). Význam interkultúrnej komunikácie vo vzdelávaní študentov – rusistov. Inovatívne metódy vo výučbe cudzích jazykov z aspektu interkultúrnej komunikácie II, 68–76. Vydavateľstvo Prešovskej university.

Kollár, D. (1987). Medzijazyková homonymia. Studia Academica Slovaca, 11, 229–233.

Kootstra, G. J., Dijkstra, T., Starren, M. (2015). Second language acquisition. International encyclopedia of the social & behavioral sciences (2^nd edition), 349–359. Elsevier. DOI: 10.1016/B978-0-08-097086-8.53025-6.

Lenci, A., Sahlgren, M., Jeuniaux, P. et al. (2022). A comparative evaluation and analysis of three generations of Distributional Semantic Models. Language Resources and Evaluation, 56(4), 1269–1313. DOI: 10.1007/s10579-021-09575-z. EDN YOSRAP.

O’Neill, M., Casanovas Catalá, M. (1997). False friends: A historical perspective and present implications for lexical acquisition. Bells: Barcelona English Language and Literature Studies, 8, 103–115.

Odlin, T. (1989). Language transfer: Cross-linguistic influence in language learning. Cambridge University Press.

Pančíková, M.(2003). Niekoľko príkladov medzijazykovej homonymie – slovinčina, slovenčina. Obdobja: Metode in zvrsti v slovenskem jeziku, literatúre in kulturi, 495–500. Ljubljana: Center za slovenščino kot drugi/tuji jezik, Filozofska fakulteta.

Peters, M. E., Neumann, M., Iyyer, M. et al. (2018). Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2227–2237. Association for Computational Linguistics.

Uban, A. S., Dinu, L. P. (2020). Automatically building a multilingual lexicon of false friends with no supervision. Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC 2020), 3001–3007. European Language Resources Association.

Journal archive