Detection and Classification of Ideological Texts in the Kazakh Language Using Machine Learning and Transformers

Authors

DOI:

https://doi.org/10.18778/1731-7533.23.21

Keywords:

ideological text classification, deep learning, transformers, propaganda, radicalization, recruitment

Abstract

Modern information technologies enable the automatic analysis of textual data to detect extremist and propagandistic content. This paper examines deep learning methods and transformers models for the automatic classification of ideologically charged texts in the Kazakh language. A comparison was conducted between neural network models (CNN, BiLSTM, GRU, Hybrid CNN+BiLSTM) and modern transformers (DistilBERT). The performance evaluation of the models was based on accuracy, recall, precision, and F1-score metrics, as well as error analysis. Experimental results showed that hybrid CNN+BiLSTM demonstrated the highest accuracy (95.11%), outperforming other models. CNN, BiLSTM and GRU also achieved high results (92-93%), making them effective for this task. Among transformers, DistilBERT proved to be the most balanced (85.74%). This study demonstrates that hybrid neural network models (CNN+BiLSTM) are the most effective solution, while DistilBERT performs best among transformer models. The findings can be utilized for developing automatic monitoring and filtering systems for Kazakh-language texts, capable of efficiently identifying ideologically charged content.

Author Biographies

Milana Bolatbek, Al-Farabi Kazakh National University

Milana Bolatbek is a researcher specializing in Artificial Intelligence and Natural Language Processing. She holds PhD degree in Information Security Systems and focuses on the development of intelligent systems for text analysis, hate speech detection, and digital content monitoring. Her academic interests include deep learning, computational linguistics, and social media analytics. Milana Bolatbek has contributed to several interdisciplinary projects integrating linguistics, psychology, and AI for cybersecurity applications. She has co-authored papers published in peer-reviewed and Scopus-indexed journals and actively participates in international conferences on artificial intelligence and data science.

Shynar Mussiraliyeva, Al-Farabi Kazakh National University

Shynar Mussiraliyeva is a researcher in the field of Cyber Security and Data Analytics. She is a professor of the department of Cybersecuirty and Cryptology at al-Farabi Kazakh National University and has extensive experience in machine learning, natural language processing, and intelligent information systems. Her research focuses on applying AI technologies to solve problems in cybersecurity, social media analysis, and digital communication. Shynar Mussiraliyeva has published numerous papers in international peer-reviewed and Scopus-indexed journals. She is actively involved in academic collaborations and has supervised several research projects related to AI applications in language and behavior analysis.

References

Mussiraliyeva, S, Baisylbayeva, K., Bolatbek, M., Yeltay, Z., Decoding Ideology: Machine learning-based Detection of Extremist Content. 2024 International Conference on Intelligent Computing, Communication, Networking and Services, ICCNS 2024. doi 10.1109/ICCNS62192.2024.10776480
Google Scholar DOI: https://doi.org/10.1109/ICCNS62192.2024.10776480

Imran Awan, Cyber-Extremism: Isis and the Power of Social Media, Social Science and Public Policy, Volume 54, pages 138–149, (2017), https://link.springer.com/article/10.1007/s12115-017-0114-0
Google Scholar DOI: https://doi.org/10.1007/s12115-017-0114-0

Saifudeen, O. A. (2014). The Cyber Extremism Orbital Pathways Model. RSIS Working Paper No. 283, S. Rajaratnam School of International Studies, Nanyang Technological University, Singapore
Google Scholar

Mukhamedzhanova, L. A., Kadirova, D. S., Agzamova, N. S., Tulaev, A. I., Rajabov, S. S., Alimov, S. K. (2019). Formation of Cyber Space, Protecting Youth From the Danger of Cyber Extremism. International Journal of Recent Technology and Engineering (IJRTE), 8(2 S4): 612-616. DOI:10.35940/ijrte.B1121.0782S419
Google Scholar DOI: https://doi.org/10.35940/ijrte.B11210782S419

Rashid, W. (2023). Using Artificial Intelligence to Combat Extremism. Pakistan Journal of Terrorism Research (PJTR), 5(2)
Google Scholar

Tahat, K., Habes, M., Mansoori, A., Naqbi, N., Al Ketbi, N., Maysari, I., Tahat, D., & Altawil, A. (2024). Social media algorithms in countering cyber extremism: A systematic review. Journal of Infrastructure, Policy and Development, 8(8), 6632. https://doi.org/10.24294/jipd.v8i8.6632
Google Scholar DOI: https://doi.org/10.24294/jipd.v8i8.6632

Lahnala, A., Varadarajan, V., Flek, L., Schwartz, H. A., & Boyd, R. L. (2025). Unifying the Extremes: Developing a Unified Model for Detecting and Predicting Extremist Traits and Radicalization. Proceedings of the International AAAI Conference on Web and Social Media (ICWSM), 19(1). https://doi.org/10.1609/icwsm.v19i1.35860
Google Scholar DOI: https://doi.org/10.1609/icwsm.v19i1.35860

Berjawi, O., Fenza, G., & Loia, V. (2023). A Comprehensive Survey of Detection and Prevention Approaches for Online Radicalization: Identifying Gaps and Future Directions. IEEE Access, 11, 1-1. https://doi.org/10.1109/ACCESS.2023.3326995
Google Scholar DOI: https://doi.org/10.1109/ACCESS.2023.3326995

Govers, J., Feldman, P., Dant, A., & Patros, P. (2023). Down the Rabbit Hole: Detecting Online Extremism, Radicalisation, and Politicised Hate Speech. ACM Computing Surveys, 55(14s). https://doi.org/10.1145/3583067
Google Scholar DOI: https://doi.org/10.1145/3583067

Aldera, S., Emam, A., Al-Qurishi, M., Alrubaian, M., & Alothaim, A. (2021). Online Extremism Detection in Textual Content: A Systematic Literature Review. IEEE Access, 9, 42384-42396. https://doi.org/10.1109/ACCESS.2021.3064178
Google Scholar DOI: https://doi.org/10.1109/ACCESS.2021.3064178

Downloads

Published

2025-12-30

How to Cite

Bolatbek, M., Mussiraliyeva, S., & Baisylbayeva, K. (2025). Detection and Classification of Ideological Texts in the Kazakh Language Using Machine Learning and Transformers. Research in Language, 23, 341–353. https://doi.org/10.18778/1731-7533.23.21

Issue

Section

Articles

Most read articles by the same author(s)