Article: PDF
Abstract: The article deals with the problems of using corpus data in educational lexicography. The data from traditional collocations dictionaries, such as the Oxford Collocations Dictionary for Students of English, are compared with the data extracted from the British National Corpus (BNC). The BNC is an approximately 100-million-word corpus of written and spoken British English (it is considered as a balanced corpus that contains texts from a wide range of different language genres and text domains). A corpus manager (a web-based tool for searching and retrieving lexical, grammatical and textual data) was employed in the study. Due to this it has become possible to analyze the data, generating frequency information, concordances (i.e. lists of all of the occurrences of a particular search term in a corpus, presented within the context in which they occur), keywords, collocations or carrying out statistical tests. In addition, the data from Dictionnaire des combinaisons de mots are compared with the data from the corpus-based electronic dictionary Antidote of the Canadian software company Druid informatique. This program comprises multiple dictionaries placed within a unified interface. The entry for each word displays its pronunciation, inflected forms, etymology, etc. along with their respective frequency. A frequency index is provided for each word; it indicates the relative frequency of the word in the six billion-word corpus. The presence of a dictionary of collocations that provides all the most significant combinations of the entry word with other words (functioning either as leading or dependent components), grouped by their syntactic function in the sentence and frequency is the most valuable feature of this program. The novelty of the work lies in the fact that it demonstrates the educational potential of corpus data in lexicography, in particular, in the field of compiling collocation dictionaries. The specific examples show how linguistic corpora can help comprehend the semantic, stylistic and syntactic specific features of words. The paper concludes that corpus data has many advantages over traditional dictionaries; at the same time, the limitations of corpus data in syntactic and semantic analysis are noted. In conclusion, the authors outline a project for developing a corpus-based pedagogical dictionary for students of Russian.
Key words: corpus study; Russian National Corpus (RNC); British National Corpus (BNC); Antidote Druide; lexical collocability; collocations dictionary; attributive collocations

Для цитирования:

Гончаренко, И. Г. Корпусные данные в разработке учебных словарей сочетаемости / И. Г. Гончаренко // Philological Class. – 2023. – Vol. 28 ⋅ №2. – С. 55-68.

For citation

Goncharenko, I. G. (2023). Using Language Corpora in Building Learner’s Combinatorial Dictionaries. In Philological Class. 2023. Vol. 28 ⋅ №2. P. 55-68.

About the author(s) :

Ilia G. Goncharenko
Russian State Vocational Pedagogical University (Ekaterinburg, Russia)

Publication Timeline:

Date of receipt: 28.08.2022; date of publication: 30.06.2023


Apresyan, V. Yu., Apresyan, Yu. D., Babaeva E. E. et al. (2010). Prospekt aktivnogo slovarya russkogo yazyka [Prospectus of an Active Dictionary of Russian]. Moscow, Yazyki slavyanskikh kul’tur. 784 p.

Atkins, B. T. Sue, Rundell, M. (2008). The Oxford Guide to Practical Lexicography. Oxford, Oxford University Press. 540 p.

Baker P., Hardie A., McEnery T. (2006). Glossary of Corpus Linguistics. Edinburgh, Edinburgh University Press. 187 p.

Dziemianko, A. (2018). Electronic Dictionaries. In Fuertes-Olivera, Pedro A. (Ed.). The Routledge Handbook of Lexicography. London, New York, Routledge, pp. 663–682.

Gromenko, E. S., Kozlovskaya, N. V., Pavlova, A. S. et al. (2021). Russkii yazyk koronavirusnoi epokhi [The Russian Language of the Coronavirus Era]. Saint Petersburg, Institut lingvisticheskikh issledovanii RAN. 610 p.

Guide d’utilisation d’Antidote 11. URL: (mode of access: 25.08.2022).

Khokhlova, M. V. (2010). Issledovanie leksiko-semanticheskoi sochetaemosti v russkom yazyke s pomoshch’yu statisticheskikh metodov [Research of Lexical-Syntactic Collocation in Russian Using Statistical Methods: Corpus-based Study]. Avtoref. dis. … kand. philol. nauk. Saint Petersburg. 26 p.

Kilgarriff, A. (2022). Using Corpora as Data Sources for Dictionaries. In Jackson, H. (Ed.). The Bloomsbury Handbook of Lexicography. London, Bloomsbury, pp. 71–88.

Kilgarriff, A., Grefenstette, G. (2003). Introduction to the Special Issue on the Web as Corpus. In Computational Linguistics. No. 29 (3), pp. 333–347.

Kormacheva, D., Pivovarova, L., Kopotev, M. (2018). Evaluation of Collocation Extraction Methods for the Russian Language. In Quantitative approaches to the Russian language. Abingdon, Routledge, pp. 137–157.

Kosem, I. (2016). Interrogating a Corpus. In Durkin, P. (Ed.). The Oxford Handbook of Lexicography. Oxford, Oxford University Press, pp. 76–93.

Laposhina, A. N. (2020). Korpus tekstov uchebnikov RKI kak instrument analiza uchebnykh materialov [A Corpus of Russian Textbook Materials for Foreign Students as an Instrument of an Educational Content Analysis]. In Russkii yazyk za rubezhom. No. 6, pp. 22–28.

Lebedeva, M. Yu. (2020). Dano mne telo – chto mne delat’ s nim? Primenenie korpusnykh tekhnologii v lingvodidaktike RKI [I’m Given a Corpus – What to Do With It? Corpus Technologies in Russian Language Teaching and Learning]. In Russkii yazyk za rubezhom. No. 6, pp. 4–13.

Meurers, D. W., Müller, S. (2009). Corpora and Syntax. In Lüdeling, A., Kytö, M. (Eds.). Corpus Linguistics: An International Handbook. Vol. 2. Berlin, New York, Walter de Gruyter, pp. 920–933.

Pastor, V., Alcina, A. (2022). Researching the Use of Electronic Dictionaries. In Jackson, H. (Ed.). The Bloomsbury Handbook of Lexicography. London, Bloomsbury, pp. 89–130.

Stefanowitsch, A. (2020). Corpus Linguistics: A Guide to the Methodology. Berlin, Language Science Press. 494 p.

Vlavatskaya, M. V. (2016). Kombinatornaya lingvistika: aspekty izucheniya sochetaemosti slov [Combinatorial Linguistics. Aspects of Word Combinatorial Power Study]. Novosibirsk, NGTU. 244 p.

Vorontsova, I. A. (2022). Sovremennye praktiki angloyazychnoi onlain-leksikografii [Modern Practices of English Online Lexicography]. In Verkhnevolzhskii filologicheskii vestnik. No. 1, pp. 186–194.

Zakharov, V. P., Bogdanova, S. Yu. (2020). Korpusnaya lingvistika [Methods in Corpus Linguistics]. Saint Petersburg, Izdatel’srvo Sankt-Peterburgskogo un-ta. 234 p.