Article: PDF
DOI: 10.51762/1FK-2021-26-02-07
Abstract: The article is devoted to a discussion of dominant approaches developed within the framework of Corpus Linguistics (CL) and their influence on the general theory of language. Based on the research co-authored with his colleagues, the author describes three approaches to linguistic research in CL. First, corpus-informed analysis assumes that the data collected in the corpus are used as a source of examples in a natural language. Second, corpus-based analysis presupposes that the data are examined not only qualitatively but also quantitatively. Third, corpus-driven analysis assumes that the research task is to create an algorithm for data processing, the results of which require theoretical interpretation or practical application. The article concludes with a discussion of those implications that CL brings into the general understanding of language. The most important of them are: reduction of the role of introspection, increase of attention to peripheral linguistic phenomena, and reliance on quantitative data. It is still too early to sum up the impact of corpus linguistics on the general theory of language, but it is already clear that syntagmatic connections, in particular idiomatization in a broad sense, have moved into the focus of linguistic attention and are recognized as one of the main phenomena of language and its evolution. Moreover, an adequate description of a language is not limited to the rules of interaction of units divided into levels, but the description of all – both individual and the most general – probabilistic parameters of use, representing a single continuum in which the division into language and speech is conventional.
Key words: Corpus linguistics; theory of language; linguistic studies.

Для цитирования:

Копотев, М. В. О некоторых следствиях корпусной лингвистики для общей теории языка / М. В. Копотев // Philological Class. – 2021. – Vol. 26 ⋅ №2. – С. 90-102. DOI 10.51762/1FK-2021-26-02-07.

For citation

Kopotev, M. V. (2021). Some Thoughts on Corpus and General Linguistics. In Philological Class. 2021. Vol. 26 ⋅ №2. P. 90-102. DOI 10.51762/1FK-2021-26-02-07.

About the author(s) :

Mikhail V. Kopotev
Higher School of Economics (Saint Petersburg, Russia)
University of Helsinki (Helsinki, Finland)

Publication Timeline:

Date of receipt: 31.05.2021; date of publication: 30.06.2021.


ACTFL proficiency guidelines. (2012). Alexandria, VA.
Ädel, A. (2020). Corpus Compilation. In Paquot, M., Gries, S. (Eds.). A Practical Handbook of Corpus Linguistics. New York, Springer, pp. 3–24.
Anthony, L. (2004). AntConc: A Learner and Classroom Friendly, Multi-Platform Corpus Analysis Toolkit. In Proceedings of IWLeL, pp. 7–13.
Barlow, M., Kemmer, S. (2000). Usage-Based Models of Language. Stanford, CA, Center for the Study of Language and Information.
Cheremisina, M. I., Kolosova, T. A. (1987). Ocherki po teorii slozhnogo predlozheniya [Essays on the Theory of Complex Sentences]. Novosibirsk, Nauka.
Devlin, J. et al. (2018). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In arXiv preprint arXiv:1810.04805. URL: (mode of access: 28.05.2021).
Du Bois, J. W. (1985). Competing Motivations. In Iconicity in syntax. Vol. 6, pp. 343–365.
Evert, S. (2008). Corpora and Collocations. In Corpus linguistics. An international handbook. Vol. 2, pp. 1212–1248.
Fillmore, C. J. (2011). Corpus Linguistics or Computer-Aided Armchair Linguistics. In Svartvik, J. (Ed.). Directions in corpus linguistics. Berlin, New York, de Gruyter Mouton, pp. 35–60.
Firth, J. (1957). Papers in Linguistics. London, Oxford University Press.
Goldberg, A. E. (2006). Constructions at Work: The Nature of Generalization in Language. London, Oxford University Press.
Hunston, S. (2000). Pattern grammar: A Corpus-Driven Approach to the Lexical Grammar of English. Amsterdam, John
Benjamins Publishing.
Jakubíček, M. et al. (2013). The TenTen Corpus Family. In 7th International Corpus Linguistics Conference CL 2013,
pp. 125–127.
Janda, L. A., Kopotev, M. V., Nesset, T. (2020). Constructions, Their Families, and Their Neighborhoods: the Case of durak durakom‘a fool times two’. In Russian Linguistics. Vol. 44, pp. 109-27.
Khokhlova, M. V. (2008). Eksperimental’naya proverka metodov vydeleniya kollokatsii [Experimental Evaluation of Collocation Extraction Methods]. In Slavica Helsingiensia. Helsinki, Unigrafia, pp. 343–357.
Kilgarriff, A. et al. (2014). The Sketch Engine: Ten Years on. In Lexicography. Vol. 1. No. 1, pp. 7–36.

Kisselev, O., Klimov, A., Kopotev, M. (2021). Syntactic Complexity Measures as Indices of Language Proficiency in Writing: Focus on Heritage Learners of Russian. Heritage Language Journal. A Special Issue on Heritage Language Complexity. (In print).
Kopotev, M. V. (2014). Vvedenie v korpusnuyu lingvistiku [Introduction to Corpus Linguistics]. Praha, Animedia Company.
Kopotev, M., Lyashevskaya, O., Mustajoki, A. (2018). Russian Challenges for Quantitative Research. In Quantitative approaches to the Russian language. Routledge, pp. 3–29.
Kopotev, M., Mustajoki, A., Bonch-Osmolovskaya, A. (2021). Corpora in Text-Based Russian Studies. In The Palgrave Handbook of Digital Russia Studies. Cham, Palgrave Macmillan, pp. 299–317.
Kutuzov, A. (2017). WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models. In International Conference on Analysis of Images, Social Networks and Texts. Vol. 661, pp. 155–161.
Lakoff, G. (2008). Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. University of Chicago press.
Langacker, R. W. (2010). A Dynamic Usage-Based Model. In Grammar and Conceptualization. Amsterdam, de Gruyter Mouton, pp. 91–146.
MacWhinney, B. E., Malchukov, A. E., Moravcsik, E. E. (2014). Competing Motivations in Grammar and Usage. London, Oxford University Press.
Materialy dlya proekta korpusnogo opisaniya russkoi grammatiki [Materials to the Description of Corpus-Based Russian Grammar]. URL: http: // (mode of access: 12.05.2021).
McEnery, T., Wilson, A. (1996). Corpus Linguistics: An Introduction. Edinburgh, Edinburgh University Press.
Melchuk, I. A., Iordanskaya, L. N. (2017). Smysl i sochetaemost’ v slovare [Meaning and Compatibility in the Dictionary]. Moscow, Yazyki slavyanskikh kul’tur.
Mikolov, T. et al. (2013). Efficient Estimation of Word Representations in Vector Space. In arXiv preprint arXiv:1301.3781. URL: (mode of access: 28.05.2021).
Morozov, N. A. (1916). Lingvisticheskie spektry: Sredstvo dlya otlicheniya plagiatov ot istinnykh proizvedenii togo ili drugogo izvestnogo avtora [Linguistic Spectra: A Tool for Distinguishing Plagiarism from the True Works of One or Another Famous
Author]. Petrograd, Tipographiya Imperatorskoi Akademii nauk. 42 p.
Nichols, J. (1981). Predicate Nominals: A Partial Surface Syntax of Russian. Los Angeles, Univ. of California Press.
Peters, M. E. et al. (2018). Deep Contextualized Word Representations. In arXiv preprint arXiv:1802.05365. URL: (mode of access: 28.05.2021).
Pivovarova, L. (2017). Evaluation of Collocation Extraction Methods for the Russian Language. In Quantitative Approaches to the Russian Language. Routledge, pp. 137–157.
Rakhilina, E. V. (2010). Lingvistika konstruktsii [Construction Grammar]. Moscow, Azbukovnik, 2010.
Scott, M. (2008). Developing Wordsmith. In International Journal of English Studies. Vol. 8. No. 1, pp. 95–106.
Shansky, N. M. (2010). Frazeologiya sovremennogo russkogo yazyka [Phraseology of the Modern Russian Language]. Moscow, URSS.
Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford University Press.
Sinclair, J. (2000). Lexical Grammar. In Naujoji metodologija. Vol. 24, pp. 191–203.
Stefanowitsch, A., Gries, S. T. (2003). Collostructions: Investigating the Interaction of Words and Constructions. In International Journal of Corpus Linguistics. Vol. 8. No. 2, pp. 209–243.
Velichko, A. V. (2016). Predlozheniya frazeologizirovannoi struktury v russkom yazyke. Strukturno-semanticheskoe i funktsional’no-kommunikativnoe issledovanie [Sentences of the Phraseological Structure in the Russian Language. Structural-Semantic and Functional-Communicative Research]. Moscow, MAKS Press.
Vlakhov, A. V. (2010). Prichastiya budushchego vremeni v russkom yazyke [Future Tense Participles in Russian]. Vypusknaya kvalifikatsionnaya rabota bakalavra filologii. Saint Petersburg, SPbSU.
Yagunova, E. V., Pivovarova, L. M. (2010). Priroda kollokatsii v russkom yazyke. Opyt avtomaticheskogo izvlecheniya i klassifikatsii na materiale novostnykh tekstov [The Nature of Collocations in the Russian Language. Experience of Automatic Extraction and Classification Based on News Texts]. In Nauchno-tekhnicheskaya informatsiya. Seriya 2. Vol. 2, pp. 30-40