Article: PDF
Abstract: The article investigates the issue of heuristic productivity of using the method of computer-assisted topic modeling for philological analysis of fiction text. The study analyzes the results of applying the algorithm of Latent Placement Dirichlet (LDA) for searching intertextual connections of motifs in two sub-corpora of fiction texts: 62 texts of different genres (stories, essays, novels, critical articles) belonging to S. Dovlatov, on the one hand, and 35 fiction works, which the writer listed in one of the letters to T. Urzhumova as the works that had deeply influenced him and should be read by everybody. The algorithm has revealed 20 themes (topics), into which all the texts were distributed. Each topic obtained was a chain of words with weights of significance for the realization of that topic. As a result of the comparison of the texts and the topics, three “text – topic” correspondences were discovered. The texts in each of the following three groups belong to one common topic: 1) B. Pilyniak’s novel “The Bare Year” and Dovlatov’s story “By the River”; 2) G. Wells’s novel “The Time Machine”, E. Hemingway’s story “The Old Man and the Sea” and Dovlatov’s story “Emigrants”; 3) A. Grin’s story “The Commandant of the Port” and Dovlatov’s essay “We Speak Different Languages”. Further philological analysis demonstrated the intersection of motifs in these groups of works of fiction. The pilot study under consideration has shown that methods of computer-assisted text analysis, including those based on machine learning, can become a philologist’s tool for experimental search, guiding the expert intuition along the path outlined by the algorithm via processing large corpus arrays.
Key words: fiction text; computer-assisted topic modeling method; motif; intertextuality; S. Dovlatov

Для цитирования:

Колмогорова, А. В. Компьютерное моделирование как инструмент анализа художественного текста / А. В. Колмогорова, Е. Д. Залевская // Philological Class. – 2023. – Vol. 28 ⋅ №2. – С. 22-33.

For citation

Kolmogorova, A. V., Zalevskaya, E. D. (2023). Computer-Assisted Modeling as an Instrument for Fiction Text Analysis. In Philological Class. 2023. Vol. 28 ⋅ №2. P. 22-33.

About the author(s) :

Anastasia V. Kolmogorova
National Research University Higher School of Economics (Saint Petersburg, Russia)
ORCID ID: https://orcid.org/0000-0002-6425-2050

 

Ekaterina D. Zalevskaya
National Research University Higher School of Economics (Saint Petersburg, Russia)
ORCIDE ID: https://orcid.org/0009-0009-0929-722X

 

Acknowledgments: This research paper uses the results of the project “Text as Big Data: Modeling Convergent Processes in Language and Speech by Digital Methods”, implemented as part of the HSE University Basic Research Program in 2023.

Publication Timeline:

Date of receipt: 02.05.2023; date of publication: 30.06.2023

References:

Andreev, S. N. (2016). Raspredelenie trigramm v tekste (dinamicheskii aspekt izucheniya stikhotvornogo teksta) [Distribution of Trigrams in the Text (the Dynamic Aspect of the Study of the Poetic Text)]. In Kvantitativnaya lingvistika. Vol. 4, pp. 20–30.

Andreev, V. S. (2019). «Svetlyi» Longfello: kontsept Svet v menyayushchemsya stile [«Lighted» Longfellow: Concept Light in Changing Style]. In Izvestiya Smolenskogo gosudarstvennogo universiteta. No. 3 (47), pp. 201–210.

Bulatov, V. G. (2020). Metody otsenivaniya kachestva i mnogokriterial’noi optimizatsii tematicheskikh modelei v biblioteke TopicNet [Methods for Quality Assessment and Multi-criteria Optimization of Topic Models in TopicNet Library]. Dis … kand. tekhn. nauk. Moscow. 147 p.

Gasparov, B. M. (1993). Literaturnye leitmotivy. Ocherki po russkoi literature XX v. [Literary Leitmotifs. Essays on Russian Literature of the 20th Century]. Moscow, Nauka. Izdatel’skaya firma «Vostochnaya literatura». 304 p.

Gibson, J. (1986). The Ecological Approach to Visual Perception. Tailor and Francis. 359 p.

Grootendorst, M. (2022). BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure. In ArXiv. arXiv:2203.05794.

Jacobs, T., Tschötschel, R. (2019). Topic Models Meet Discourse Analysis: A Quantitative Tool for a Qualitative Approach. In International Journal of Social Research Methodology. No. 22:5, pp. 469–485. DOI: 10.1080/13645579.2019.1576317.

Jelisavčić, V., Furlan, B., Protić, J., Milutinović, C. (2012). Topic Models and Advanced Algorithms for Profiling of Knowledge in Scientific Papers. In Proceedings of the 35th International Convention, pp.1030–1035.

Koltsov, S., Pashakhin, S., Dokuka, S. (2018). A Full-Cycle Methodology for News Topic Modeling and User Feedback Research. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in 7 Bioinformatics). 10th International Conference on Social Informatics, SocInfo 2018. Saint Petersburg, Cham, Springer, pp. 308–321.

Martynenko, G. Ya. (2019). Metody matematicheskoi lingvistiki v stilisticheskikh issledovaniyakh [Methods of Mathematical Linguistics in Stylistic Research]. Saint Petersburg, Nestor-Istoriya. 295 p.

Matveeva, G. V. (1990). Funktsional’nye stili v aspekte tekstovykh kategorii: Sinkhronno-sopostavitel’nyi ocherk [Functional Styles in the Aspect of Textual Categories: A Synchronous and Comparative Essay]. Sverdlovsk, Izdatel’stvo Ural’skogo universiteta. 172 p.

Mitrofanova, O. A. (2019). Issledovanie strukturnoi organizatsii khudozhestvennogo proizvedeniya s pomoshch’yu tematicheskogo modelirovaniya: opyt raboty s tekstom romana «Master i Margarita» M. A. Bulgakova [The Study of the Structural Organization of a Work of Fiction through Thematic Modeling: Experience with the Text of the Novel “The Master and Margarita” by M. A. Bulgakov]. In Korpusnaya lingvistika-2019. Saint Petersburg, pp. 387–394.

Nikolenko, S. I., Koltcov, S., Koltsova, O. (2017). Topic Modelling for Qualitative Studies. In Journal of Information Science. No. 43 (1), pp. 88–102. DOI: https://doi.org/10.1177/0165551515617393.

Quercia, D., Askham, H., Crowcroft, J. (2012). Tweet LDA: Supervised Topic Classification and Link Prediction in Twitter. In Proceedings of the ACM Web science conference, 2012. New York, ACM, pp. 247–250.

Ritter, A., Cherry, C., Dolan, B. (2010). Unsupervised Modeling of Twitter Conversations. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 172–180.

Schöch, Ch. (2017). Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama. In Digital Humanities Quarterly. Vol. 11. No. 2. URL: http://digitalhumanities.org:8081/dhq/vol/11/2/000291/000291.html (mode of access: 29.04.2023).

Sherstinova, T. Yu., Moskvina, A. D., Kirina, M. A. et al. (2022). Tematicheskoe modelirovanie russkogo rasskaza 1900–1930: naibolee chastotnye temy i ikh dinamika [Thematic Modeling of the Russian Story 1900–1930: The Most Frequent Themes and Their Dynamics]. In Komp’yuternaya lingvistika i intellektual’nye tekhnologii: po materialam mezhdunarodnoi konferentsii «Dialog 2022». Issue 21, pp. 512–526.

Skorinkin, D., Orekhov, B. (2023). Hacking Stylometry with Multiple Voices: Imaginary Writers Can Override Authorial Signal in Delta. In Digital Scholarship in the Humanities. DOI: https://doi.org/10.1093/llc/fqad012.

Spisok Sergeya Dovlatova [Sergei Dovlatov’s List]. URL: http://sergeidovlatov.com/books/urzhumova.html.

Tomashevsky, B. V. (1999). Teoriya literatury. Poetika [Theory of literature. Poetics]. Moscow, Aspekt Press. 334 p.

Vakhshtayn, V. (2021). Tekhnika [Technics]. Saint Petersburg, Izdatel’stvo Evropeiskogo universiteta v Sankt-Peterburge. 156 p.

Zholkovsky, A. K., Shcheglov, Yu. K. (1986). Mir avtora i struktura teksta [Author’s World and Structure of the Text]. Tenafly. 348 p.