Topic Modeling in Sociology Using Social Welfare as an Example: Methodological Challenges and the Human Component
DOI:
https://doi.org/10.18778/1733-8069.20.4.05Keywords:
topic modeling, methodology of sociology, social welfare, machine learning, Natural Language ProcessingAbstract
Considering the dynamically evolving realms of social sciences influenced by network technologies and digital humanities, it is crucial to examine the adequacy of sociological data analysis methodologies in these new conditions. The availability of extensive digitized datasets not only poses a challenge to “classical” analysis methods developed under different circumstances and for different purposes, but also raises the question of whether the traditional demarcation between quantitative and qualitative methods, marked by a clear boundary, remains relevant in the era of Big Data. In this paper, based on topic modeling utilising Latent Dirichlet Allocation (LDA), we argue that quantitative methods (probabilistic statistical models) are not merely complementary or a starting point for qualitative analyses (the standard approach), but, rather, constitute an integral part of them. This thesis is illustrated through a case study involving the identification of themes within a dataset of 17,278 articles published in Web-of-Science-indexed journals between 1992 and 2020, focusing on social welfare. This empirical case study also serves to formulate meta-theoretical observations regarding the “cohesion” of quantitative and qualitative methods in the context of machine learning and natural language processing.
Downloads
References
Adler Matthew D. (2019), Measuring Social Welfare: An Introduction, Oxford: Oxford University Press.
Google Scholar
DOI: https://doi.org/10.1093/oso/9780190643027.001.0001
Akhmedov Farkhod, Abdusalomov Akmalbek, Makhmudov Fazliddin, Cho Young I. (2021), LDA-Based Topic Modeling Sentiment Analysis Using Topic/Document/Sentence (TDS) Model, „Applied Sciences”, vol. 11(23), 11091, https://doi.org/10.3390/app112311091
Google Scholar
DOI: https://doi.org/10.3390/app112311091
Altbach Philip G., Wit Hans de (2018), Too much academic research is being published, „University World News”, 7 September, https://www.universityworldnews.com/post.php?story=20180905095203579 [dostęp: 24.09.2024].
Google Scholar
DOI: https://doi.org/10.6017/ihe.2019.96.10767
Ananiadou Sophia, Rea Brian, Okazaki Naoaki, Procter Rob, Thomas James (2009), Supporting Systematic Reviews Using Text Mining, „Social Science Computer Review”, vol. 27(4), s. 509–523, https://doi.org/10.1177/0894439309332293
Google Scholar
DOI: https://doi.org/10.1177/0894439309332293
Asmussen Claus Boye, Møller Charles (2019), Smart literature review: a practical topic modelling approach to exploratory literature review, „Journal of Big Data”, vol. 6(93), s. 1–18, https://doi.org/10.1186/s40537-019-0255-7
Google Scholar
DOI: https://doi.org/10.1186/s40537-019-0255-7
Baranowski Mariusz (2022), Epistemological aspect of topic modelling in the social sciences: Latent Dirichlet Allocation, „Przegląd Krytyczny”, vol. 4(1), s. 7–16, https://doi.org/10.14746/pk.2022.4.1.1
Google Scholar
DOI: https://doi.org/10.14746/pk.2022.4.1.1
Baranowski Mariusz, Cichocki Piotr (2021), Good and bad sociology: does topic modelling make a difference?, „Society Register”, vol. 5(4), s. 7–22, https://doi.org/10.14746/sr.2021.5.4.01
Google Scholar
DOI: https://doi.org/10.14746/sr.2021.5.4.01
Baranowski Mariusz, Cichocki Piotr, McKinley Jim (2023), Social welfare in the light of topic modelling, „Sociology Compass”, vol. 17(8), e13086, https://doi.org/10.1111/soc4.13086
Google Scholar
DOI: https://doi.org/10.1111/soc4.13086
Battista Daniele (2024), Political communication in the age of artificial intelligence: an overview of deepfakes and their implications, „Society Register”, vol. 8(2), s. 7–24, https://doi.org/10.14746/sr.2024.8.2.01
Google Scholar
DOI: https://doi.org/10.14746/sr.2024.8.2.01
Blei David M., Ng Andrew Y., Jordan Michael I. (2003), Latent Dirichlet Allocation, „Journal of Machine Learning Research”, vol. 3, s. 993–1022.
Google Scholar
Carlsen Hjalmar, Ralund Snore (2022), Computational grounded theory revisited: From computer-led to computer-assisted text analysis, „Big Data & Society”, vol. 9(1), https://doi.org/10.1177/20539517221080146
Google Scholar
DOI: https://doi.org/10.1177/20539517221080146
Cartwright Dorwin P. (1965), Zastosowania analizy treści, [w:] Stefan Nowak (red.), Metody badań socjologicznych, Warszawa: Państwowe Wydawnictwo Naukowe, s. 149–161.
Google Scholar
Ciziceno Marco (2024), Who will take care of them? A reflection on Southern European welfare regimes, „Society Register”, vol. 8(1), s. 27–42, https://doi.org/10.14746/sr.2024.8.1.02
Google Scholar
DOI: https://doi.org/10.14746/sr.2024.8.1.02
DiMaggio Paul, Nag Manish, Blei David (2013), Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of US government arts funding, „Poetics”, vol. 41(6), s. 570–606, https://doi.org/10.1016/j.poetic.2013.08.004
Google Scholar
DOI: https://doi.org/10.1016/j.poetic.2013.08.004
Duan Jingyuan, Tian Ling, Mao Jianqiao, Li Jiaxin (2022), Optimal social welfare: A many-to-many data transaction mechanism based on double auctions, „Digital Communications and Networks”, vol. 9(5), s. 1230–1241, https://doi.org/10.1016/j.dcan.2022.04.020
Google Scholar
DOI: https://doi.org/10.1016/j.dcan.2022.04.020
Evans James A., Aceves Pedro (2016), Machine Translation: Mining Text for Social Theory, „Annual Review of Sociology”, vol. 42, s. 21–50, https://doi.org/10.1146/annurev-soc-081715-074206
Google Scholar
DOI: https://doi.org/10.1146/annurev-soc-081715-074206
Forder Anthony, Caslin Terry, Ponton Geoffrey, Walklate Sandra (2019), Theories of welfare, London: Routledge.
Google Scholar
DOI: https://doi.org/10.4324/9780429466908
Hirschberg Julia, Manning Christopher D. (2015), Advances in natural language processing, „Science”, vol. 349(6245), s. 261–266, https://doi.org/10.1126/science.aaa8685
Google Scholar
DOI: https://doi.org/10.1126/science.aaa8685
Isoaho Karoliina, Gritsenko Daria, Mäkelä Eetu (2021), Topic Modeling and Text Analysis for Qualitative Policy Research, „Policy Studies Journal”, vol. 49, s. 300–324, https://doi.org/10.1111/psj.12343
Google Scholar
DOI: https://doi.org/10.1111/psj.12343
Jabkowski Piotr, Cichocki Piotr, Kołczyńska Marta (2023), Multi-Project Assessments of Sample Quality in Cross-National Surveys: The Role of Weights in Applying External and Internal Measures of Sample Bias, „Journal of Survey Statistics and Methodology”, vol. 11(2), s. 316–339, https://doi.org/10.1093/jssam/smab027
Google Scholar
DOI: https://doi.org/10.1093/jssam/smab027
Jacobs Thomas, Tschötschel Robin (2019), Topic models meet discourse analysis: a quantitative tool for a qualitative approach, „International Journal of Social Research Methodology”, vol. 22(5), s. 469–485, https://doi.org/10.1080/13645579.2019.1576317
Google Scholar
DOI: https://doi.org/10.1080/13645579.2019.1576317
Jakubowska Honorata, Cichocki Piotr, Jabkowski Piotr (2023), References to sex and gender differences in the social sciences: analysis of journal publication records (1971–2021), „Ruch Prawniczy, Ekonomiczny i Socjologiczny”, vol. 85(4), s. 275–297, https://doi.org/10.14746/rpeis.2023.85.4.14
Google Scholar
DOI: https://doi.org/10.14746/rpeis.2023.85.4.14
Jäger Friedrich, Wiskind Ora (1991), Culture or Society? The Significance of Max Weber’s Thought for Modern Cultural History, „History and Memory”, vol. 3(2), s. 115–140, http://www.jstor.org/stable/25618620
Google Scholar
Koseoglu Suzan, Bozkurt Aras (2018), An exploratory literature review on open educational practices, „Distance Education”, vol. 39(4), s. 441–461, https://doi.org/10.1080/01587919.2018.1520042
Google Scholar
DOI: https://doi.org/10.1080/01587919.2018.1520042
Lasswell Harold D. (1927), The Theory of Political Propaganda, „The American Political Science Review”, vol. 21(3), s. 627–631, https://doi.org/10.2307/1945515
Google Scholar
DOI: https://doi.org/10.2307/1945515
Lewis Seth C., Zamith Rodrigo, Hermida Alfred (2013), Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods, „Journal of Broadcasting & Electronic Media”, vol. 57(1), s. 34–52, https://doi.org/10.1080/08838151.2012.761702
Google Scholar
DOI: https://doi.org/10.1080/08838151.2012.761702
Linares Julio, Cabaña Gabriela (2022), Towards an ecology of care: basic income after the nation-state, „Society Register”, vol. 6(3), s. 29–56, https://doi.org/10.14746/sr.2022.6.3.03
Google Scholar
DOI: https://doi.org/10.14746/sr.2022.6.3.03
Mayntz Renate, Holm Kurt, Hübner Peter (1976), Wprowadzenie do metod socjologii empirycznej, Warszawa: Państwowe Wydawnictwo Naukowe.
Google Scholar
Midgley James (1997), Social Welfare in Global Context, London: Sage Publications.
Google Scholar
DOI: https://doi.org/10.4135/9781483327945
Mohr John W., Bogdanov Petko (2013), Introduction – Topic models: What they are and why they matter, „Poetics”, vol. 41(6), s. 545–569, https://doi.org/10.1016/j.poetic.2013.10.001
Google Scholar
DOI: https://doi.org/10.1016/j.poetic.2013.10.001
Naskar Debashis, Mokaddem Sidahmed, Rebollo Miguel, Onaindia Eva (2016), Sentiment analysis in social networks through topic modeling, [w:] Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož: European Language Resources Association, s. 46–53.
Google Scholar
Nelson Laura (2020), Computational Grounded Theory: A Methodological Framework, „Sociological Methods & Research”, vol. 49(1), s. 3–42, https://doi.org/10.1177/0049124117729703
Google Scholar
DOI: https://doi.org/10.1177/0049124117729703
Nesterova Iana (2023), Responsibilities towards places in a degrowth society: How firms can become more responsible via embracing deep ecology, „Society Register”, vol. 7(1), s. 53–74, https://doi.org/10.14746/sr.2023.7.1.03
Google Scholar
DOI: https://doi.org/10.14746/sr.2023.7.1.03
Pääkkönen Juho, Ylikoski Petri (2021), Humanistic interpretation and machine learning, „Synthese”, vol. 199, s. 1461–1497, https://doi.org/10.1007/s11229-020-02806-w
Google Scholar
DOI: https://doi.org/10.1007/s11229-020-02806-w
Praag Bernard M.S. van (1989), The Relativity of the Welfare Concept, „World Institute for Development Research of the United Nations University, Working Paper”, no. 69, s. 1–43.
Google Scholar
R Core Team (2022), _R: A Language and Environment for Statistical Computing_, „R Foundation for Statistical Computing”, Vienna, https://www.R-project.org/ [dostęp: 24.09.2024].
Google Scholar
Roberts Margaret E., Stewart Brandon M., Tingley Dustin (2019), stm: An R Package for Structural Topic Models, „Journal of Statistical Software”, vol. 91(2), s. 1–40, https://doi.org/10.18637/jss.v091.i02
Google Scholar
DOI: https://doi.org/10.18637/jss.v091.i02
Silge Julia, Robinson David (2017), Text Mining with R: A Tidy Approach, Sebastopol: O’Reilly.
Google Scholar
Snyder Hannah (2019), Literature review as a research methodology: An overview and guidelines, „Journal of Business Research”, vol. 104, s. 333–339, https://doi.org/10.1016/j.jbusres.2019.07.039
Google Scholar
DOI: https://doi.org/10.1016/j.jbusres.2019.07.039
Syed Shaheen, Spruit Marco (2018), Selecting Priors for Latent Dirichlet Allocation, [w:] IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills: IEEE s. 194–202, https://doi.org/10.1109/ICSC.2018.00035
Google Scholar
DOI: https://doi.org/10.1109/ICSC.2018.00035
Thangaraj Muthuraman, Sivakami Muthusamy (2018), Text Classification Techniques: A Literature Review, „Interdisciplinary Journal of Information, Knowledge, and Management”, vol. 13, s. 117–135, https://doi.org/10.28945/4066
Google Scholar
DOI: https://doi.org/10.28945/4066
Timms Noel (1980), Social welfare: Why and how?, London: Routledge.
Google Scholar
Titmuss Richard M. (1967), The Welfare Complex in a Changing Society, „The Milbank Memorial Fund Quarterly”, vol. 45(1), s. 9–23, https://doi.org/10.2307/3349045
Google Scholar
DOI: https://doi.org/10.2307/3349045
Downloads
Published
Versions
- 2025-01-10 (2)
- 2024-11-30 (1)
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Funding data
-
Narodowe Centrum Nauki
Grant numbers 021/05/X/HS6/00067