Article: PDF
DOI: 10.51762/1FK-2022-27-02-02
Abstract: The paper describes the project of RuTOC – a corpus of online lessons in Russian as a foreign language – and presents the first results of corpus analysis. The corpus RuTOC presented in the article is a special type of corpus, specifically, a corpus of classroom academic or educational discourse. Such collections of text data serve as a basis of discursive and sociolinguistic studies of classroom communication and investigation of the second language acquisition and make a certain contribution to the development of the pedagogical theo- ry and practice. The relevance of the study stems from the fact that for the first time it collected, pre-processed and marked samples of classroom communication in Russian language classes; the corpus has created opportunities for evidence-based research in the theory and practice of teaching Russian as a foreign language. In addition, the relevance of the study is related to the increased need to study the peculiarities of online language lear- ning during the pandemic. The paper describes the process of creating the corpus, which includes the following steps: 1) collecting video recordings of RFL classes; 2) developing a standard for transcribing video recordings and creating a collection of transcripts; 3) developing a corpus marking system; 4) corpus data marking; 5) post-processing and analysis of the corpus. Currently, the corpus consists of 40 transcripts of lessons with a total duration of more than 56 hours and a total volume of 236,400 words; the first version of the corpus includes lessons in the Russian lan- guage at different educational levels, from the pre-university to the master»s program, at three Russian univer- sities. The article presents some difficulties and peculiarities of the transcription and marking of materials and the first results of a corpus analysis aimed at identifying the differences between student talk and teacher talk in RFL classes. It has been found that online RFL classes are generally characterized by high interactivity, understood as the ratio of conversational turns to total speech amount; at the same time, there is a significant imbalance between the amount of teacher talk and student talk. The paper concludes with a suggestion of promising directions for research on the basis of RuTOC.
Key words: Educational discourse corpus; educational discourse; educational texts; foreign language corpus; pedagogical corpus; corpus development; Russian as a foreign language; methods of teaching Russian; information and communication technologies; informatization of education; information educational environment; online lessons.

Для цитирования:

Лебедева, М. Ю. RuTOC: корпус онлайн-уроков по русскому языку как иностран- ному / М. Ю. Лебедева, А. Н. Лапошина, Н. А. Алкснит, Т. В. Ляшенко // Philological Class. – 2022. – Vol. 27 ⋅ №2. – С. 19–29. DOI 10.51762/1FK-2022-27-02-02.

For citation

Lebedeva, M. Yu., Laposhina, A. N., Alksnit, N. A., Lyashenko, T. V. (2022). RuTOC: A Corpus of Online Lessons in Russian as a Foreign Language. In Philological Class. 2022. Vol. 27 ⋅ №2. P. 19–29. DOI 10.51762/1FK-2022-27-02-02.

About the author(s) :

Maria Yu. Lebedeva

Pushkin State Russian Language Institute (Moscow, Russia)


Pushkin State Russian Language Institute (Moscow, Russia)



Natalia A. Alksnit

Pushkin State Russian Language Institute (Moscow, Russia)



Tatyana V. Lyashenko

Pushkin State Russian Language Institute (Moscow, Russia)


Publication Timeline:

Date of receipt: 11.05.2022; date of publication: 29.06.2022.


Atwood, S., Turnbull, W., Carpendale, J. I. M. (2010). The Construction of Knowledge in Classroom Talk. In Journal of the Learning Sciences. Vol. 19 (3), pp. 358–402.

Barker, F. (2010). How Can Corpora Be Used in Language Testing? In The Routledge Handbook of Corpus Linguistics.

1st ed. London, Routledge. Vol. 34 (3), p. 637.

Betz, N., Leffers, J. S., Thor, E. E. D., Fux, M., de Nesnera, K., Tanner, K. D. & Coley, J. D. (2019). Cognitive Const- rual-Consistent Instructor Language in the Undergraduate Biology Classroom. In CBE – Life Sciences Education. Vol. 18 (4), ar63, pp. 1–16. URL:

Biber, D. (2006). Stance in Spoken and Written University Registers. In Journal of English for Academic Purposes. Vol. 5.

Issue 2, pp. 97–116.

Biber, D. (2006). University Language: A Corpus-Based Study of Spoken and Written Registers. J. Benjamins. 261 p.

Biber, D., Conrad, S., Cortes, V. (2004). If You Look at …: Lexical Bundles in University Teaching and Textbooks.

In Applied Linguistics. Vol. 25. Issue 3, pp. 371–405.

Biber, D., Conrad, S., Reppen, R., Byrd, P., Helt, M., Clark, V., Cortes, V., Csomay, E. and Urzua, A. (2004). Represen- ting Language Use in the University: Analysis of the TOEFL 2000 Spoken and Written Academic Language Corpus. Report Number: RM-04-03, Supplemental Report Number: TOEFL-MS-25. Princeton, NJ, Educational Testing Service. URL: https://www.ets. org/Media/Research/pdf/RM-04-03.pdf.

Breeze, R. & Sancho Guinda, C. (2021). Teaching English-Medium Instruction Courses in Higher Education: A Guide for Non-Native Speakers (Chapter 3: Lecturing in English). London, Bloomsbury.

Caines, A. et al. (2020). The Teacher-Student Chatroom Corpus. In Proceedings of the 9th Workshop on NLP4CALL, pp. 10–20. URL:

Dapeng, W. (2014). On the Significance of English Classroom Discourse Corpus Construction. In Proceedings of the 2014 Conference on Informatisation in Education, Management and Business. Vol. 7. Atlantis Press, pp. 376–378. URL: https://

Evison, J. (2013). Turn Openings in Academic Talk: Where Goals and Roles Intersect. In Classroom Discourse. Vol. 4 (1), pp. 3–26.

Farr, F. & Riordan, E. (2015). Tracing the Reflective Practices of Student Teachers in Online Modes. In ReCALL.

Vol. 27 (1), pp. 104–123. doi: 10.1017/S0958344014000299.

Fortanet-Gomez, I. (2004). I Think: Opinion, Uncertainty or Politeness in Academic Spoken English? In RAEL: revista electrónica de lingüística aplicada. No. 3, pp. 63–84.

Gillian, S. (2020). Using Corpus Methods to Investigate Classroom Interaction and Teacher Discourse in Special Educatio- nal Needs (SEN) Classrooms: An Investigation of Methodological Possibilities. Lancaster University. URL: https://eprints.lancs.

Hong, H. Q. (2005). SCoRE: A Multimodal Corpus Database of Education Discourse. In Proceedings from the Corpus Linguistics Conference Series. Birmingham. Vol. 1 (1).

Ishikawa, S. (2019). The ICNALE Spoken Dialogue: A New Dataset for the Study of Asian Learners’ Performance in L2 English Interviews. In English Teaching. Vol. 74 (4), pp. 153–177.

Koester, A. (2010). Building Small Specialised Corpora. In The Routledge Handbook of Corpus Linguistics. 1st ed. London, Routledge. Vol. 34 (3), pp. 66–78.

Limberg, H. (2019). The Primary English Classroom Corpus (PECC). FLENSBURG UNIVERSITY PRESS. Vol. 1. 450

p. URL: https://www>

Nergis, A. (2021). Can Explicit Instruction of Formulaic Sequences Enhance L2 Oral Fluency? In Lingua. Vol. 255.

Nesi, H., Thompson, P. (2006). British Academic Spoken English corpus. In Oxford Text Archive. URL: http://hdl.han-

O’Keeffe, A., McCarthy, M. & Walsh, S. (2008). Post-colonialism, Multi-culturalism, Structuralism, Feminism, Post-modernism and So on So Forth – Vague Language in Academic Discourse, a Comparative Analysis of Form, Function and Context. In Reppen, R. and Ädels, A. (Eds). Corpora and Discourse (SCL31). Amsterdam, John Benjamins, pp. 9–29.

O’Keeffe, A., Walsh, S. (2012). Applying Corpus Linguistics and Conversation Analysis in the Investigation of Small Group Teaching in Higher Education. In Corpus Linguistics and Linguistic Theory. Vol. 8 (1), pp. 159–181.

Simpson, R. C., Lee, D. Y. W. & Leicher, S. (2002). MICASE Manual. MI, English Language Institute, The University of Michigan. URL:

Smith, G. (2020). Using Corpus Methods to Investigate Classroom Interaction and Teacher Discourse in Special Educational Needs (SEN) Classrooms: An Investigation of Methodological Possibilities. Lancaster University. 342 p. URL: https://aclantholo-

Sung, M. C., Kim, K. (2020). Spontaneous Motion in L1- And L2-English Speech: A Corpus-Based Study. In English Teaching. Vol. 75. No. 1, pp. 49–66. URL:

Tehseen, Z. & Akhta, A. (2018). Pedagogical Implications of Corpus-based Approaches to ELT in Pakistan. In Journal of Education and Educational Development. No. 5, p. 259. 10.22555/joeed.v5i2.1565.

Vodyanitskaya, A., Yaremenko, V. (2020). What Is Valuable in the Academe: Corpus-Based Analysis. Society. Integra- tion. Education. In Proceedings of the International Scientific Conference. Vol. II, pp. 437–455.

Wigham, C. R., Chanier, T. (2013). LEarning and TEaching Corpora (LETEC): Data-Sharing and Repository for Re- search on Multimodal Interactions. In WorldCALL. 10–13 juillet 2013. Glasgow, Royaume-Uni. URL: