Topic Modeling in Sociology Using Social Welfare as an Example: Methodological Challenges and the Human Component

Authors

DOI:

https://doi.org/10.18778/1733-8069.20.4.05

Keywords:

topic modeling, methodology of sociology, social welfare, machine learning, Natural Language Processing

Abstract

Considering the dynamically evolving realms of social sciences influenced by network technologies and digital humanities, it is crucial to examine the adequacy of sociological data analysis methodologies in these new conditions. The availability of extensive digitized datasets not only poses a challenge to “classical” analysis methods developed under different circumstances and for different purposes, but also raises the question of whether the traditional demarcation between quantitative and qualitative methods, marked by a clear boundary, remains relevant in the era of Big Data. In this paper, based on topic modeling utilising Latent Dirichlet Allocation (LDA), we argue that quantitative methods (probabilistic statistical models) are not merely complementary or a starting point for qualitative analyses (the standard approach), but, rather, constitute an integral part of them. This thesis is illustrated through a case study involving the identification of themes within a dataset of 17,278 articles published in Web-of-Science-indexed journals between 1992 and 2020, focusing on social welfare. This empirical case study also serves to formulate meta-theoretical observations regarding the “cohesion” of quantitative and qualitative methods in the context of machine learning and natural language processing.

Downloads

Download data is not yet available.

Author Biographies

Piotr Cichocki, Uniwersytet im. Adama Mickiewicza w Poznaniu

Doktor, socjolog, pracownik badawczo-dydaktyczny zatrudniony na Wydziale Socjologii Uniwersytetu im. Adama Mickiewicza w Poznaniu. Zainteresowania badawcze: monitorowanie postaw społecznych i politycznych w badaniach międzykrajowych, maszynowa analiza tekstu oraz metodologia badań sondażowych.

Mariusz Baranowski, Uniwersytet im. Adama Mickiewicza w Poznaniu

Doktor, socjolog, pracownik badawczo-dydaktyczny zatrudniony na Wydziale Socjologii Uniwersytetu im. Adama Mickiewicza w Poznaniu. Zainteresowania badawcze: socjologia ekonomiczna, socjologia polityki oraz zagadnienia związane z dobrobytem społecznym i transformacją energetyczną.

References

Adler Matthew D. (2019), Measuring Social Welfare: An Introduction, Oxford: Oxford University Press.
Google Scholar

Akhmedov Farkhod, Abdusalomov Akmalbek, Makhmudov Fazliddin, Cho Young I. (2021), LDA-Based Topic Modeling Sentiment Analysis Using Topic/Document/Sentence (TDS) Model, „Applied Sciences”, vol. 11(23), 11091, https://doi.org/10.3390/app112311091
Google Scholar

Altbach Philip G., Wit Hans de (2018), Too much academic research is being published, „University World News”, 7 September, https://www.universityworldnews.com/post.php?story=20180905095203579 [dostęp: 24.09.2024].
Google Scholar

Ananiadou Sophia, Rea Brian, Okazaki Naoaki, Procter Rob, Thomas James (2009), Supporting Systematic Reviews Using Text Mining, „Social Science Computer Review”, vol. 27(4), s. 509–523, https://doi.org/10.1177/0894439309332293
Google Scholar

Asmussen Claus Boye, Møller Charles (2019), Smart literature review: a practical topic modelling approach to exploratory literature review, „Journal of Big Data”, vol. 6(93), s. 1–18, https://doi.org/10.1186/s40537-019-0255-7
Google Scholar

Baranowski Mariusz (2022), Epistemological aspect of topic modelling in the social sciences: Latent Dirichlet Allocation, „Przegląd Krytyczny”, vol. 4(1), s. 7–16, https://doi.org/10.14746/pk.2022.4.1.1
Google Scholar

Baranowski Mariusz, Cichocki Piotr (2021), Good and bad sociology: does topic modelling make a difference?, „Society Register”, vol. 5(4), s. 7–22, https://doi.org/10.14746/sr.2021.5.4.01
Google Scholar

Baranowski Mariusz, Cichocki Piotr, McKinley Jim (2023), Social welfare in the light of topic modelling, „Sociology Compass”, vol. 17(8), e13086, https://doi.org/10.1111/soc4.13086
Google Scholar

Battista Daniele (2024), Political communication in the age of artificial intelligence: an overview of deepfakes and their implications, „Society Register”, vol. 8(2), s. 7–24, https://doi.org/10.14746/sr.2024.8.2.01
Google Scholar

Blei David M., Ng Andrew Y., Jordan Michael I. (2003), Latent Dirichlet Allocation, „Journal of Machine Learning Research”, vol. 3, s. 993–1022.
Google Scholar

Carlsen Hjalmar, Ralund Snore (2022), Computational grounded theory revisited: From computer-led to computer-assisted text analysis, „Big Data & Society”, vol. 9(1), https://doi.org/10.1177/20539517221080146
Google Scholar

Cartwright Dorwin P. (1965), Zastosowania analizy treści, [w:] Stefan Nowak (red.), Metody badań socjologicznych, Warszawa: Państwowe Wydawnictwo Naukowe, s. 149–161.
Google Scholar

Ciziceno Marco (2024), Who will take care of them? A reflection on Southern European welfare regimes, „Society Register”, vol. 8(1), s. 27–42, https://doi.org/10.14746/sr.2024.8.1.02
Google Scholar

DiMaggio Paul, Nag Manish, Blei David (2013), Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of US government arts funding, „Poetics”, vol. 41(6), s. 570–606, https://doi.org/10.1016/j.poetic.2013.08.004
Google Scholar

Duan Jingyuan, Tian Ling, Mao Jianqiao, Li Jiaxin (2022), Optimal social welfare: A many-to-many data transaction mechanism based on double auctions, „Digital Communications and Networks”, vol. 9(5), s. 1230–1241, https://doi.org/10.1016/j.dcan.2022.04.020
Google Scholar

Evans James A., Aceves Pedro (2016), Machine Translation: Mining Text for Social Theory, „Annual Review of Sociology”, vol. 42, s. 21–50, https://doi.org/10.1146/annurev-soc-081715-074206
Google Scholar

Forder Anthony, Caslin Terry, Ponton Geoffrey, Walklate Sandra (2019), Theories of welfare, London: Routledge.
Google Scholar

Hirschberg Julia, Manning Christopher D. (2015), Advances in natural language processing, „Science”, vol. 349(6245), s. 261–266, https://doi.org/10.1126/science.aaa8685
Google Scholar

Isoaho Karoliina, Gritsenko Daria, Mäkelä Eetu (2021), Topic Modeling and Text Analysis for Qualitative Policy Research, „Policy Studies Journal”, vol. 49, s. 300–324, https://doi.org/10.1111/psj.12343
Google Scholar

Jabkowski Piotr, Cichocki Piotr, Kołczyńska Marta (2023), Multi-Project Assessments of Sample Quality in Cross-National Surveys: The Role of Weights in Applying External and Internal Measures of Sample Bias, „Journal of Survey Statistics and Methodology”, vol. 11(2), s. 316–339, https://doi.org/10.1093/jssam/smab027
Google Scholar

Jacobs Thomas, Tschötschel Robin (2019), Topic models meet discourse analysis: a quantitative tool for a qualitative approach, „International Journal of Social Research Methodology”, vol. 22(5), s. 469–485, https://doi.org/10.1080/13645579.2019.1576317
Google Scholar

Jakubowska Honorata, Cichocki Piotr, Jabkowski Piotr (2023), References to sex and gender differences in the social sciences: analysis of journal publication records (1971–2021), „Ruch Prawniczy, Ekonomiczny i Socjologiczny”, vol. 85(4), s. 275–297, https://doi.org/10.14746/rpeis.2023.85.4.14
Google Scholar

Jäger Friedrich, Wiskind Ora (1991), Culture or Society? The Significance of Max Weber’s Thought for Modern Cultural History, „History and Memory”, vol. 3(2), s. 115–140, http://www.jstor.org/stable/25618620
Google Scholar

Koseoglu Suzan, Bozkurt Aras (2018), An exploratory literature review on open educational practices, „Distance Education”, vol. 39(4), s. 441–461, https://doi.org/10.1080/01587919.2018.1520042
Google Scholar

Lasswell Harold D. (1927), The Theory of Political Propaganda, „The American Political Science Review”, vol. 21(3), s. 627–631, https://doi.org/10.2307/1945515
Google Scholar

Lewis Seth C., Zamith Rodrigo, Hermida Alfred (2013), Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods, „Journal of Broadcasting & Electronic Media”, vol. 57(1), s. 34–52, https://doi.org/10.1080/08838151.2012.761702
Google Scholar

Linares Julio, Cabaña Gabriela (2022), Towards an ecology of care: basic income after the nation-state, „Society Register”, vol. 6(3), s. 29–56, https://doi.org/10.14746/sr.2022.6.3.03
Google Scholar

Mayntz Renate, Holm Kurt, Hübner Peter (1976), Wprowadzenie do metod socjologii empirycznej, Warszawa: Państwowe Wydawnictwo Naukowe.
Google Scholar

Midgley James (1997), Social Welfare in Global Context, London: Sage Publications.
Google Scholar

Mohr John W., Bogdanov Petko (2013), Introduction – Topic models: What they are and why they matter, „Poetics”, vol. 41(6), s. 545–569, https://doi.org/10.1016/j.poetic.2013.10.001
Google Scholar

Naskar Debashis, Mokaddem Sidahmed, Rebollo Miguel, Onaindia Eva (2016), Sentiment analysis in social networks through topic modeling, [w:] Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož: European Language Resources Association, s. 46–53.
Google Scholar

Nelson Laura (2020), Computational Grounded Theory: A Methodological Framework, „Sociological Methods & Research”, vol. 49(1), s. 3–42, https://doi.org/10.1177/0049124117729703
Google Scholar

Nesterova Iana (2023), Responsibilities towards places in a degrowth society: How firms can become more responsible via embracing deep ecology, „Society Register”, vol. 7(1), s. 53–74, https://doi.org/10.14746/sr.2023.7.1.03
Google Scholar

Pääkkönen Juho, Ylikoski Petri (2021), Humanistic interpretation and machine learning, „Synthese”, vol. 199, s. 1461–1497, https://doi.org/10.1007/s11229-020-02806-w
Google Scholar

Praag Bernard M.S. van (1989), The Relativity of the Welfare Concept, „World Institute for Development Research of the United Nations University, Working Paper”, no. 69, s. 1–43.
Google Scholar

R Core Team (2022), _R: A Language and Environment for Statistical Computing_, „R Foundation for Statistical Computing”, Vienna, https://www.R-project.org/ [dostęp: 24.09.2024].
Google Scholar

Roberts Margaret E., Stewart Brandon M., Tingley Dustin (2019), stm: An R Package for Structural Topic Models, „Journal of Statistical Software”, vol. 91(2), s. 1–40, https://doi.org/10.18637/jss.v091.i02
Google Scholar

Silge Julia, Robinson David (2017), Text Mining with R: A Tidy Approach, Sebastopol: O’Reilly.
Google Scholar

Snyder Hannah (2019), Literature review as a research methodology: An overview and guidelines, „Journal of Business Research”, vol. 104, s. 333–339, https://doi.org/10.1016/j.jbusres.2019.07.039
Google Scholar

Syed Shaheen, Spruit Marco (2018), Selecting Priors for Latent Dirichlet Allocation, [w:] IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills: IEEE s. 194–202, https://doi.org/10.1109/ICSC.2018.00035
Google Scholar

Thangaraj Muthuraman, Sivakami Muthusamy (2018), Text Classification Techniques: A Literature Review, „Interdisciplinary Journal of Information, Knowledge, and Management”, vol. 13, s. 117–135, https://doi.org/10.28945/4066
Google Scholar

Timms Noel (1980), Social welfare: Why and how?, London: Routledge.
Google Scholar

Titmuss Richard M. (1967), The Welfare Complex in a Changing Society, „The Milbank Memorial Fund Quarterly”, vol. 45(1), s. 9–23, https://doi.org/10.2307/3349045
Google Scholar

Published

2024-11-30 — Updated on 2025-01-10

Versions

How to Cite

Cichocki, P., & Baranowski, M. (2025). Topic Modeling in Sociology Using Social Welfare as an Example: Methodological Challenges and the Human Component. Przegląd Socjologii Jakościowej, 20(4), 98–117. https://doi.org/10.18778/1733-8069.20.4.05 (Original work published November 30, 2024)

Issue

Section

Numer tematyczny: „Metody humanistyki cyfrowej w socjologii jakościowej”

Funding data

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.