Topic Modeling in Sociology Using Social Welfare as an Example: Methodological Challenges and the Human Component
DOI:
https://doi.org/10.18778/1733-8069.20.4.05Keywords:
topic modeling, methodology of sociology, social welfare, machine learning, Natural Language ProcessingAbstract
Considering the dynamically evolving realms of social sciences influenced by network technologies and digital humanities, it is crucial to examine the adequacy of sociological data analysis methodologies in these new conditions. The availability of extensive digitized datasets not only poses a challenge to “classical” analysis methods developed under different circumstances and for different purposes, but also raises the question of whether the traditional demarcation between quantitative and qualitative methods, marked by a clear boundary, remains relevant in the era of Big Data. In this paper, based on topic modeling utilising Latent Dirichlet Allocation (LDA), we argue that quantitative methods (probabilistic statistical models) are not merely complementary or a starting point for qualitative analyses (the standard approach), but, rather, constitute an integral part of them. This thesis is illustrated through a case study involving the identification of themes within a dataset of 17,278 articles published in Web-of-Science-indexed journals between 1992 and 2020, focusing on social welfare. This empirical case study also serves to formulate meta-theoretical observations regarding the “cohesion” of quantitative and qualitative methods in the context of machine learning and natural language processing.
References
Adler Matthew D. (2019), Measuring Social Welfare: An Introduction, Oxford: Oxford University Press. DOI: https://doi.org/10.1093/oso/9780190643027.001.0001
Akhmedov Farkhod, Abdusalomov Akmalbek, Makhmudov Fazliddin, Cho Young I. (2021), LDA-Based Topic Modeling Sentiment Analysis Using Topic/Document/Sentence (TDS) Model, „Applied Sciences”, vol. 11(23), 11091, https://doi.org/10.3390/app112311091 DOI: https://doi.org/10.3390/app112311091
Altbach Philip G., Wit Hans de (2018), Too much academic research is being published, „University World News”, 7 September, https://www.universityworldnews.com/post.php?story=20180905095203579 [dostęp: 24.09.2024]. DOI: https://doi.org/10.6017/ihe.2019.96.10767
Ananiadou Sophia, Rea Brian, Okazaki Naoaki, Procter Rob, Thomas James (2009), Supporting Systematic Reviews Using Text Mining, „Social Science Computer Review”, vol. 27(4), s. 509–523, https://doi.org/10.1177/0894439309332293 DOI: https://doi.org/10.1177/0894439309332293
Asmussen Claus Boye, Møller Charles (2019), Smart literature review: a practical topic modelling approach to exploratory literature review, „Journal of Big Data”, vol. 6(93), s. 1–18, https://doi.org/10.1186/s40537-019-0255-7 DOI: https://doi.org/10.1186/s40537-019-0255-7
Baranowski Mariusz (2022), Epistemological aspect of topic modelling in the social sciences: Latent Dirichlet Allocation, „Przegląd Krytyczny”, vol. 4(1), s. 7–16, https://doi.org/10.14746/pk.2022.4.1.1 DOI: https://doi.org/10.14746/pk.2022.4.1.1
Baranowski Mariusz, Cichocki Piotr (2021), Good and bad sociology: does topic modelling make a difference?, „Society Register”, vol. 5(4), s. 7–22, https://doi.org/10.14746/sr.2021.5.4.01 DOI: https://doi.org/10.14746/sr.2021.5.4.01
Baranowski Mariusz, Cichocki Piotr, McKinley Jim (2023), Social welfare in the light of topic modelling, „Sociology Compass”, vol. 17(8), e13086, https://doi.org/10.1111/soc4.13086 DOI: https://doi.org/10.1111/soc4.13086
Battista Daniele (2024), Political communication in the age of artificial intelligence: an overview of deepfakes and their implications, „Society Register”, vol. 8(2), s. 7–24, https://doi.org/10.14746/sr.2024.8.2.01 DOI: https://doi.org/10.14746/sr.2024.8.2.01
Blei David M., Ng Andrew Y., Jordan Michael I. (2003), Latent Dirichlet Allocation, „Journal of Machine Learning Research”, vol. 3, s. 993–1022.
Carlsen Hjalmar, Ralund Snore (2022), Computational grounded theory revisited: From computer-led to computer-assisted text analysis, „Big Data & Society”, vol. 9(1), https://doi.org/10.1177/20539517221080146 DOI: https://doi.org/10.1177/20539517221080146
Cartwright Dorwin P. (1965), Zastosowania analizy treści, [w:] Stefan Nowak (red.), Metody badań socjologicznych, Warszawa: Państwowe Wydawnictwo Naukowe, s. 149–161.
Ciziceno Marco (2024), Who will take care of them? A reflection on Southern European welfare regimes, „Society Register”, vol. 8(1), s. 27–42, https://doi.org/10.14746/sr.2024.8.1.02 DOI: https://doi.org/10.14746/sr.2024.8.1.02
DiMaggio Paul, Nag Manish, Blei David (2013), Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of US government arts funding, „Poetics”, vol. 41(6), s. 570–606, https://doi.org/10.1016/j.poetic.2013.08.004 DOI: https://doi.org/10.1016/j.poetic.2013.08.004
Duan Jingyuan, Tian Ling, Mao Jianqiao, Li Jiaxin (2022), Optimal social welfare: A many-to-many data transaction mechanism based on double auctions, „Digital Communications and Networks”, vol. 9(5), s. 1230–1241, https://doi.org/10.1016/j.dcan.2022.04.020 DOI: https://doi.org/10.1016/j.dcan.2022.04.020
Evans James A., Aceves Pedro (2016), Machine Translation: Mining Text for Social Theory, „Annual Review of Sociology”, vol. 42, s. 21–50, https://doi.org/10.1146/annurev-soc-081715-074206 DOI: https://doi.org/10.1146/annurev-soc-081715-074206
Forder Anthony, Caslin Terry, Ponton Geoffrey, Walklate Sandra (2019), Theories of welfare, London: Routledge. DOI: https://doi.org/10.4324/9780429466908
Hirschberg Julia, Manning Christopher D. (2015), Advances in natural language processing, „Science”, vol. 349(6245), s. 261–266, https://doi.org/10.1126/science.aaa8685 DOI: https://doi.org/10.1126/science.aaa8685
Isoaho Karoliina, Gritsenko Daria, Mäkelä Eetu (2021), Topic Modeling and Text Analysis for Qualitative Policy Research, „Policy Studies Journal”, vol. 49, s. 300–324, https://doi.org/10.1111/psj.12343 DOI: https://doi.org/10.1111/psj.12343
Jabkowski Piotr, Cichocki Piotr, Kołczyńska Marta (2023), Multi-Project Assessments of Sample Quality in Cross-National Surveys: The Role of Weights in Applying External and Internal Measures of Sample Bias, „Journal of Survey Statistics and Methodology”, vol. 11(2), s. 316–339, https://doi.org/10.1093/jssam/smab027 DOI: https://doi.org/10.1093/jssam/smab027
Jacobs Thomas, Tschötschel Robin (2019), Topic models meet discourse analysis: a quantitative tool for a qualitative approach, „International Journal of Social Research Methodology”, vol. 22(5), s. 469–485, https://doi.org/10.1080/13645579.2019.1576317 DOI: https://doi.org/10.1080/13645579.2019.1576317
Jakubowska Honorata, Cichocki Piotr, Jabkowski Piotr (2023), References to sex and gender differences in the social sciences: analysis of journal publication records (1971–2021), „Ruch Prawniczy, Ekonomiczny i Socjologiczny”, vol. 85(4), s. 275–297, https://doi.org/10.14746/rpeis.2023.85.4.14 DOI: https://doi.org/10.14746/rpeis.2023.85.4.14
Jäger Friedrich, Wiskind Ora (1991), Culture or Society? The Significance of Max Weber’s Thought for Modern Cultural History, „History and Memory”, vol. 3(2), s. 115–140, http://www.jstor.org/stable/25618620
Koseoglu Suzan, Bozkurt Aras (2018), An exploratory literature review on open educational practices, „Distance Education”, vol. 39(4), s. 441–461, https://doi.org/10.1080/01587919.2018.1520042 DOI: https://doi.org/10.1080/01587919.2018.1520042
Lasswell Harold D. (1927), The Theory of Political Propaganda, „The American Political Science Review”, vol. 21(3), s. 627–631, https://doi.org/10.2307/1945515 DOI: https://doi.org/10.2307/1945515
Lewis Seth C., Zamith Rodrigo, Hermida Alfred (2013), Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods, „Journal of Broadcasting & Electronic Media”, vol. 57(1), s. 34–52, https://doi.org/10.1080/08838151.2012.761702 DOI: https://doi.org/10.1080/08838151.2012.761702
Linares Julio, Cabaña Gabriela (2022), Towards an ecology of care: basic income after the nation-state, „Society Register”, vol. 6(3), s. 29–56, https://doi.org/10.14746/sr.2022.6.3.03 DOI: https://doi.org/10.14746/sr.2022.6.3.03
Mayntz Renate, Holm Kurt, Hübner Peter (1976), Wprowadzenie do metod socjologii empirycznej, Warszawa: Państwowe Wydawnictwo Naukowe.
Midgley James (1997), Social Welfare in Global Context, London: Sage Publications. DOI: https://doi.org/10.4135/9781483327945
Mohr John W., Bogdanov Petko (2013), Introduction – Topic models: What they are and why they matter, „Poetics”, vol. 41(6), s. 545–569, https://doi.org/10.1016/j.poetic.2013.10.001 DOI: https://doi.org/10.1016/j.poetic.2013.10.001
Naskar Debashis, Mokaddem Sidahmed, Rebollo Miguel, Onaindia Eva (2016), Sentiment analysis in social networks through topic modeling, [w:] Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož: European Language Resources Association, s. 46–53.
Nelson Laura (2020), Computational Grounded Theory: A Methodological Framework, „Sociological Methods & Research”, vol. 49(1), s. 3–42, https://doi.org/10.1177/0049124117729703 DOI: https://doi.org/10.1177/0049124117729703
Nesterova Iana (2023), Responsibilities towards places in a degrowth society: How firms can become more responsible via embracing deep ecology, „Society Register”, vol. 7(1), s. 53–74, https://doi.org/10.14746/sr.2023.7.1.03 DOI: https://doi.org/10.14746/sr.2023.7.1.03
Pääkkönen Juho, Ylikoski Petri (2021), Humanistic interpretation and machine learning, „Synthese”, vol. 199, s. 1461–1497, https://doi.org/10.1007/s11229-020-02806-w DOI: https://doi.org/10.1007/s11229-020-02806-w
Praag Bernard M.S. van (1989), The Relativity of the Welfare Concept, „World Institute for Development Research of the United Nations University, Working Paper”, no. 69, s. 1–43.
R Core Team (2022), _R: A Language and Environment for Statistical Computing_, „R Foundation for Statistical Computing”, Vienna, https://www.R-project.org/ [dostęp: 24.09.2024].
Roberts Margaret E., Stewart Brandon M., Tingley Dustin (2019), stm: An R Package for Structural Topic Models, „Journal of Statistical Software”, vol. 91(2), s. 1–40, https://doi.org/10.18637/jss.v091.i02 DOI: https://doi.org/10.18637/jss.v091.i02
Silge Julia, Robinson David (2017), Text Mining with R: A Tidy Approach, Sebastopol: O’Reilly.
Snyder Hannah (2019), Literature review as a research methodology: An overview and guidelines, „Journal of Business Research”, vol. 104, s. 333–339, https://doi.org/10.1016/j.jbusres.2019.07.039 DOI: https://doi.org/10.1016/j.jbusres.2019.07.039
Syed Shaheen, Spruit Marco (2018), Selecting Priors for Latent Dirichlet Allocation, [w:] IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills: IEEE s. 194–202, https://doi.org/10.1109/ICSC.2018.00035 DOI: https://doi.org/10.1109/ICSC.2018.00035
Thangaraj Muthuraman, Sivakami Muthusamy (2018), Text Classification Techniques: A Literature Review, „Interdisciplinary Journal of Information, Knowledge, and Management”, vol. 13, s. 117–135, https://doi.org/10.28945/4066 DOI: https://doi.org/10.28945/4066
Timms Noel (1980), Social welfare: Why and how?, London: Routledge.
Titmuss Richard M. (1967), The Welfare Complex in a Changing Society, „The Milbank Memorial Fund Quarterly”, vol. 45(1), s. 9–23, https://doi.org/10.2307/3349045 DOI: https://doi.org/10.2307/3349045
Downloads
Published
Versions
- 2024-11-30 (2)
- 2024-11-30 (1)
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
Funding data
-
Narodowe Centrum Nauki
Grant numbers 021/05/X/HS6/00067




