Istnieje nowsza wersja tego artykułu opublikowanego 2024-11-30. Przeczytaj wersję najnowszą.

Odkrywanie reprezentacji demokracji w Big Data: semantyczny dobór celowy próby do badań jakościowych i mieszanych

Autor

DOI:

https://doi.org/10.18778/1733-8069.20.4.02

Słowa kluczowe:

dobór próby, dobór celowy, badania jakościowe, word embeddings, demokracja

Abstrakt

Wzrastająca liczba dużych, wielotematycznych korpusów tekstowych w naukach społecznych stwarza wyzwanie w doborze odpowiednich dokumentów do badań jakościowych i mieszanych. Tradycyjne metody doboru próby wymagają intensywnego kodowania manualnego lub uprzedniej wiedzy o zbiorze danych, podczas gdy metody nienadzorowane mogą dawać wyniki niespójne z kodowaniem opartym na teorii. Aby temu zaradzić, autorzy proponują semantyczny dobór celowy próby – podejście wykorzystujące przetwarzanie języka naturalnego z użyciem osadzeń dokumentów tworzonych przez średnią ważoną wektorów słów, z wagami określonymi współczynnikiem tf-idf (częstość terminu odwrotnie proporcjonalna do częstości dokumentu). Skuteczność podejścia zademonstrowano na przykładzie demokracji – złożonego tematu, trudnego do wydobycia z korpusów parlamentarnych. Proponowana metoda pozwala na niezawodny i efektywny dobór próby tekstów w dowolnej dziedzinie badań korzystającej z Big Data. Wkład autorów obejmuje walidację tego podejścia NLP dla nauk społecznych i humanistycznych oraz dostarczenie rzetelnego narzędzia dla badaczy, ułatwiającego pogłębioną analizę jakościową i eksplorację korpusów Big Data w ramach obliczeniowej teorii ugruntowanej.

Pobrania

Brak dostępnych danych do wyświetlenia.

Biogramy autorów

Hubert Plisiecki - Polish Academy of Sciences, Poland; Uniwersytet SWPS

A PhD Candidate in Psychology at the Polish Academy of Sciences (PAN), works at the intersection of political science, psychology, sociology, linguistics, and machine learning. Granted MSc in Psychological Research Methods with Data Science. Research assistant at the Digital Social Sciences lab at PAN and in a project “Institutionalization of political parties in the parliaments of Central Europe – data mining of parliamentary debates” at SWPS University. His research interests include Machine Learning in Social Sciences, Meta-analytic Bias Detection.

Agnieszka Kwiatkowska - European University Institute, Italy; SWPS University, Poland

Assistant Professor at the SWPS University and Jean Monnet Fellow at the European University Institute, holds a PhD in sociology and an MA in political science. Her research focuses on political discourse: how issues are politicized, introduced into parliamentary competition, and become determinants of political behavior. Principal Investigator in the project “Institutionalization of political parties in the parliaments of Central Europe – data mining of parliamentary debates” (funding: National Science Centre), which investigates mixed methods of analyzing parliamentary speeches and voting.

Bibliografia

Albaugh Quinn, Sevenans Julie, Soroka Stuart, Loewen Peter J. (2013), The Automated Coding of Policy Agendas: A Dictionary-Based Approach, “Paper presented at the 6th Annual Comparative Agendas Conference”, Antwerp, Belgium, June 27–29.
Google Scholar

Back Hanna, Debus Marc, Fernandes Jorge M. (eds.) (2021), The Politics of Legislative Debates, Oxford: Oxford University Press.
Google Scholar DOI: https://doi.org/10.1093/oso/9780198849063.001.0001

Baden Christian, Pipal Christian, Schoonvelde Martijn, Velden Mariken van der (2022), Three gaps in computational text analysis methods for social sciences: a research agenda, “Communication Methods and Measures”, vol. 16(1), pp. 1–18.
Google Scholar DOI: https://doi.org/10.1080/19312458.2021.2015574

Beer Caroline (2009), Democracy and gender equality, “Studies in Comparative International Development”, vol. 44, pp. 212–227.
Google Scholar DOI: https://doi.org/10.1007/s12116-009-9043-2

Bischof Jonathan, Airoldi Edoardo M. (2012), Summarizing topical content with word frequency and exclusivity, [in:] John Langford, Joelle Pineau (eds.), Proceedings of the 29th International Conference on Machine Learning, Madison: Omnipress, pp. 9–16.
Google Scholar

Blei David M., Ng Andrew Y., Jordan Michael I. (2003), Latent Dirichlet Allocation, “Journal of Machine Learning Research”, vol. 3, pp. 993–1022.
Google Scholar

Blum Avrim, Mitchell Tom (1998), Combining labeled and unlabeled data with co-training, [in:] Peter Barlett (ed.), Proceedings of the 11th Annual Conference on Computational Learning Theory, New York: ACM, pp. 92–100.
Google Scholar DOI: https://doi.org/10.1145/279943.279962

Boix Carles (2003), Democracy and Redistribution, Cambridge: Cambridge University Press.
Google Scholar DOI: https://doi.org/10.1017/CBO9780511804960

Carlsen Hjalmar, Ralund Snorre (2022), Computational grounded theory revisited: From computer-led to computer-assisted text analysis, “Big Data and Society”, vol. 9(1), pp. 1–16.
Google Scholar DOI: https://doi.org/10.1177/20539517221080146

Ceka Besir, Magalhães Pedro C. (2020), Do the rich and the poor have different conceptions of democracy? Socioeconomic status, inequality, and the political status quo, “Comparative Politics”, vol. 52(3), pp. 383–412.
Google Scholar DOI: https://doi.org/10.5129/001041520X15670823829196

Charmaz Kathy (2006), Constructing Grounded Theory: A Practical Guide through Qualitative Analysis, London: Sage Publications.
Google Scholar

Coffé Hilde, Michels Ank (2014), Education and Support for Representative, Direct and Stealth Democracy, “Electoral Studies”, vol. 35(1), pp. 1–11.
Google Scholar DOI: https://doi.org/10.1016/j.electstud.2014.03.006

Coppedge Michael, Altman David, Fish Steven, Kroenig Matthew, McMann Kelly M., Gerring John, Bernhard Michael, Hicken Allen, Lindberg Staffan I. (2011), Conceptualizing and measuring democracy: A new approach, “Perspectives on Politics”, vol. 9(2), pp. 247–267.
Google Scholar DOI: https://doi.org/10.1017/S1537592711000880

Cunningham Frank (2002), Theories of Democracy: A Critical Introduction, London: Routledge.
Google Scholar DOI: https://doi.org/10.4324/9780203466247

Dalton Richard J., Shin Doh Chull, Jou Willy (2007), Popular Conceptions of the Meaning of Democracy: Democratic Understanding in Unlikely Places, Irvine: Center for the Study of Democracy.
Google Scholar

Deterding Nicole M., Waters Mary C. (2021), Flexible coding of in-depth interviews: A twenty-first-century approach, “Sociological Methods & Research”, vol. 50(2), pp. 708–739.
Google Scholar DOI: https://doi.org/10.1177/0049124118799377

Devlin Jacob, Chang Ming-Wei, Lee Kenton, Toutanova Kristina (2019), Bert: Pre-training of deep bidirectional transformers for language understanding, [in:] Jill Burstein, Christy Doran, Thamar Solorio (eds.), ACL Anthology, Minneapolis: ACL, pp. 4171–4186.
Google Scholar

Dieng Adji B., Ruiz Francisco J.R., Blei David M. (2020), Topic modeling in embedding spaces, “Transactions of the Association for Computational Linguistics”, vol. 8, pp. 439–453.
Google Scholar DOI: https://doi.org/10.1162/tacl_a_00325

Dryzek John S. (1996), Political inclusion and the dynamics of democratization, “American Political Science Review”, vol. 90(3), pp. 475–487.
Google Scholar DOI: https://doi.org/10.2307/2082603

Elster Jon, Offe Claus, Preuss Ulrich K. (1998), Institutional Design in Post-Communist Societies: Rebuilding the Ship at Sea, Cambridge: Cambridge University Press.
Google Scholar DOI: https://doi.org/10.1017/CBO9780511628351

Erjavec Tomaž, Ogrodniczuk Maciej, Osenova Petya, Ljubešić Nikola, Simov Kiril, Pančur Andrej, Rudolf Michał, Kopp Matyáš, Barkarson Starkaður, Steingrímsson Steinþór, Çöltekin Çağrı, Does Jesse de, Depuydt Katrien, Agnoloni Tommaso, Venturi Giulia, Pérez María Calzada, Macedo Luciana D. de, Navarretta Costanza, Luxardo Giancarlo, Coole Matthew, Rayson Paul, Morkevičius Vaidas, Krilavičius Tomas, Darǵis Roberts, Ring Orsolya, Heusden Ruben van, Marx Maarten, Fišer Darja (2023), The ParlaMint corpora of parliamentary proceedings, “Language Resources and Evaluation”, vol. 57, pp. 415–448.
Google Scholar DOI: https://doi.org/10.1007/s10579-021-09574-0

Ferrín Mónica, Kriesi Hanspeter (eds.) (2016), How Europeans View and Evaluate Democracy, Oxford: Oxford University Press.
Google Scholar DOI: https://doi.org/10.1093/acprof:oso/9780198766902.001.0001

Foster Ian, Ghani Rayid, Jarmin Ron S., Kreuter Frauke, Laneet Julia (eds.) (2021), Big Data and Social Science: Data Science Methods and Tools for Research and Practice, Boca Raton: CRC Press.
Google Scholar DOI: https://doi.org/10.1201/9780429324383

Freedom House (2022), Freedom in the World. The Global Expansion of Authoritarian Rule, Washington: Freedom House.
Google Scholar

Gillick Daniel, Presta Alessandro, Tomar Gaurav Singh (2018), End-to-End Retrieval in Continuous Space, https://arxiv.org/abs/1811.08008 [accessed: 21.02.2024].
Google Scholar

Glaser Barney G., Strauss Anselm L. (1999), The Discovery of Grounded Theory: Strategies for Qualitative Research, New York: Aldine.
Google Scholar

Guo Jiafeng, Cai Yinqiong, Fan Yixing, Sun Fei, Zhang Ruqing, Zhang Cheng (2022), Semantic models for the first-stage retrieval: A comprehensive review, “ACM Transactions on Information Systems (TOIS)”, vol. 40(4), pp. 1–42. Houle Christian (2015), Ethnic inequality and the dismantling of democracy: A global analysis, “World Politics”, vol. 67(3), pp. 469–505.
Google Scholar DOI: https://doi.org/10.1145/3486250

Ilie Cornelia (2015), Parliamentary discourse, [in:] Karen Tracy (ed.), The International Encyclopedia of Language and Social Interaction, New Jersey: Wiley-Blackwell, pp. 1–15.
Google Scholar DOI: https://doi.org/10.1002/9781118611463.wbielsi201

Jemielniak Dariusz (2020), Thick Big Data: Doing Digital Social Sciences, Oxford: Oxford University Press.
Google Scholar DOI: https://doi.org/10.1093/oso/9780198839705.001.0001

Katz Richard S. (2001), Models of Democracy: Elite Attitudes and the Democratic Deficit in the European Union, “European Union Politics”, vol. 2(2), pp. 53–79.
Google Scholar DOI: https://doi.org/10.1177/1465116501002001003

Kherwa Pooja, Bansal Poonam (2019), Topic modeling: a comprehensive review, “EAI Endorsed Transactions on Scalable Information Systems”, vol. 7(24), 159623.
Google Scholar DOI: https://doi.org/10.4108/eai.13-7-2018.159623

Knutsen Carl Henrik, Wegmann Simone (2016), Is democracy about redistribution?, “Democratization”, vol. 23(1), pp. 164–192. Krippendorff Klaus (2018), Content Analysis: An Introduction to its Methodology, London: Sage.
Google Scholar DOI: https://doi.org/10.1080/13510347.2015.1094460

Kwiatkowska Agnieszka (2017), “Hańba w Sejmie” – zastosowanie modeli generatywnych do analizy debat parlamentarnych, “Przegląd Socjologii Jakościowej”, vol. XIII, no. 2, pp. 82–109.
Google Scholar DOI: https://doi.org/10.18778/1733-8069.13.2.05

Kwiatkowska Agnieszka, Grzybowska-Walecka Katarzyna (2024 forthcoming), Polarized Democracy: Diverging Attitudes towards Democracy in Poland, [in:] Katarzyna Grzybowska-Walecka, Simona Guerra, Fernando Casal Bértoa (eds.), The Oxford Handbook of Polish Politics, Oxford: Oxford University Press.
Google Scholar

Kwiatkowska Agnieszka, Muliavka Viktoriia, Plisiecki Hubert (2023), Hollowed or redefined? Changing visions of democracy in the political discourse of Law and Justice, “Democratization”, vol. 30(3), pp. 458–478.
Google Scholar DOI: https://doi.org/10.1080/13510347.2022.2152439

Lehmann Pola, Franzmann Simon, Al-Gaddooa Denise, Burst Tobias, Ivanusch Christoph, Regel Sven, Riethmüller Felicia, Volkens Andrea, Weßels Bernhard, Zehnter Lisa (2024), Manifesto Project Dataset (version 2024a), Berlin: Wissenschaftszentrum Berlin für Sozialforschung, https://doi.org/10.25522/manifesto.mpds.2024a
Google Scholar

Levitsky Steven, Ziblatt Daniel (2018), How Democracies Die, New York: Crown Publishing.
Google Scholar

Lorenzini Jasmine, Kriesi Hanspeter, Makarov Peter, Wüest Bruno (2022), Protest event analysis: Developing a semiautomated NLP approach, “American Behavioral Scientist”, vol. 66(5), pp. 555–577.
Google Scholar DOI: https://doi.org/10.1177/00027642211021650

Lourenço Rui Pedro, Piotrowski Suzanne, Ingrams Alex (2017), Open data driven public accountability, “Transforming Government: People, Process and Policy”, vol. 11(1), pp. 42–57.
Google Scholar DOI: https://doi.org/10.1108/TG-12-2015-0050

Lyrio Maurício Vasconcellos Leão, Lunkes Rogério João, Castelló Taliani Emma (2018), Thirty Years of Studies on Transparency, Accountability, and Corruption in the Public Sector: The State of the Art and Opportunities for Future Research, “Public Integrity”, vol. 20(5), pp. 512–533.
Google Scholar DOI: https://doi.org/10.1080/10999922.2017.1416537

Markowski Radosław, Kwiatkowska Agnieszka (2018), The Political Impact of the Global Economic Crisis in Poland: Delayed and Indirect Effects, “Historical Social Research”, vol. 43(4), pp. 250–273.
Google Scholar

Meijer Albert Jacob (2003), Transparent government: Parliamentary and legal accountability in an information age, “Information Polity”, vol. 8(1–2), pp. 67–78.
Google Scholar DOI: https://doi.org/10.3233/IP-2003-0027

Meijer Harm Jan, Truong Joanne, Karimi Reza (2021), Document Embedding for Scientific Articles: Efficacy of Word Embeddings vs TFIDF, https://arxiv.org/abs/2107.05151 [accessed: 21.05.2024].
Google Scholar

Mikolov Tomas, Kai Chen, Greg Corrado, Jeffrey Dean (2013a), Efficient Estimation of Word Representations in Vector Space, https://arxiv.org/abs/1301.3781 [accessed: 21.05.2024].
Google Scholar

Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Greg, Dean Jeffrey (2013b), Distributed Representations of Words and Phrases and their Compositionality, https://arxiv.org/abs/1310.4546 [accessed: 21.05.2024].
Google Scholar

Miłkowski Marcin (2022), Morfologik software, version 2.1.6, https://github.com/morfologik/morfologik-stemming/releases [accessed: 21.05.2024].
Google Scholar

Munck Gerardo L., Verkuilen Jay (2002), Conceptualizing and measuring democracy: Evaluating alternative indices, “Comparative Political Studies”, vol. 35(1), pp. 5–34.
Google Scholar DOI: https://doi.org/10.1177/001041400203500101

Nelson Laura K. (2020), Computational Grounded Theory: A Methodological Framework, “Sociological Methods & Research”, vol. 49(1), pp. 3–42.
Google Scholar DOI: https://doi.org/10.1177/0049124117729703

Nicholls Tom, Culpepper Pepper D. (2021), Computational identification of media frames: Strengths, weaknesses, and opportunities, “Political Communication”, vol. 38(1–2), pp. 159–181.
Google Scholar DOI: https://doi.org/10.1080/10584609.2020.1812777

Nieuwelink Hessel, Dam Geert ten, Dekker Paul (2018), Growing into politics? The development of adolescents’ views on democracy over time, “Politics”, vol. 38(4), pp. 395–410.
Google Scholar DOI: https://doi.org/10.1177/0263395717724295

Patton Michael Quinn (2014), Qualitative Research & Evaluation Methods: Integrating Theory and Practice, London: Sage Publications.
Google Scholar

Plisiecki Hubert (2024), Package retfidf. Document Retrieval for Social Sciences, https://pypi.org/project/retfidf/ [accessed: 21.05.2024].
Google Scholar

Plisiecki Hubert, Kwiatkowska Agnieszka (2022), Finding democracy in big data: word-embedding-based document retrieval. Dataset, https://osf.io/rk6pc [accessed: 21.05.2024].
Google Scholar

Rauh Christian, Schwalbach Jan (2020), The ParlSpeech V2 data set: Full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies, https://doi.org/10.7910/DVN/L4OAKN
Google Scholar DOI: https://doi.org/10.31235/osf.io/cd2qs

Roberts Margaret E., Stewart Brandon M., Tingley Dustin, Lucas Christopher, Leder-Luis Jetson, Kushner Gadarian Shana, Albertson Bethany, Rand David G. (2014), Structural topic models for open-ended survey responses, “American Journal of Political Science”, vol. 58(4), pp. 1064–1082.
Google Scholar DOI: https://doi.org/10.1111/ajps.12103

Sack Benjamin C. (2017), Regime change and the convergence of democratic value orientations through socialization. Evidence from reunited Germany, “Democratization”, vol. 24(3), pp. 444–462.
Google Scholar

Saldaña Johnny (2021), The Coding Manual for Qualitative Researchers, London: Sage Publications.
Google Scholar

Schmidt Craig W. (2019), Improving a tf-idf weighted document vector embedding, https://arxiv.org/abs/1902.09875 [accessed: 21.05.2024].
Google Scholar

Schönhofen Peter (2009), Identifying document topics using the Wikipedia category network, “Web Intelligence and Agent Systems: An International Journal”, vol. 7(2), pp. 195–207.
Google Scholar DOI: https://doi.org/10.3233/WIA-2009-0162

Schou Jannick, Hjelholt Morten (2018), Digitalization and Public Sector Transformations, Cham: Palgrave Macmillan.
Google Scholar DOI: https://doi.org/10.1007/978-3-319-76291-3

Schwörer Jakob, Koß Michael (2023), ‘Void’ democrats? The populist notion of ‘democracy’ in action, “Party Politics”, https://doi. org/10.1177/13540688231200992
Google Scholar DOI: https://doi.org/10.1177/13540688231200992

Scott Mike, Tribble Christopher (2006), Textual Patterns: Key Words and Corpus Analysis in Language Education, Philadelphia: John Benjamins.
Google Scholar DOI: https://doi.org/10.1075/scl.22

Shaw Sylvia (2000), Language, gender and floor apportionment in political debates, “Discourse & Society”, vol. 11(3), pp. 401–418. Varieties of Democracy (2022), Dataset v14 [Country-Year/Country-Date]. VoD Project, https://doi.org/10.23696/mcwt-fr58
Google Scholar DOI: https://doi.org/10.1177/0957926500011003006

Voermans Wim, Napel Hans-Martien ten, Passchier Reijer (2015), Combining efficiency and transparency in legislative processes, “The Theory and Practice of Legislation”, vol. 3(3), pp. 279–294.
Google Scholar DOI: https://doi.org/10.1080/20508840.2015.1133398

Wang Di, Thint Marcus, Al-Rubaie Ahmad (2012), Semi-supervised Latent Dirichlet Allocation and its application for document classification, [in:] Li Yuefeng, Zhang Yanqing, Zhong Ning (eds), 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Los Alamitos: CPS, pp. 306–310.
Google Scholar DOI: https://doi.org/10.1109/WI-IAT.2012.211

Watanabe Kohei, Zhou Yuan (2022), Theory-driven analysis of large corpora: Semisupervised topic classification of the UN speeches, “Social Science Computer Review”, vol. 40(2), pp. 346–366.
Google Scholar DOI: https://doi.org/10.1177/0894439320907027

Wodak Ruth, Krzyżanowski Michal (eds.) (2008), Qualitative Discourse Analysis in the Social Sciences, London: Palgrave MacMillan.
Google Scholar DOI: https://doi.org/10.1007/978-1-137-04798-4

Wolf Thomas, Debut Lysandre, Sanh Victor, Chaumond Julien, Delangue Clement, Moi Anthony, Cistac Pierric, Rault Tim, Louf Remi, Funtowicz Morgan, Davison Joe, Shleifer Sam, Platen Patrick von, Ma Clara, Jernite Yacine, Plu Julien, Xu Canwen, Le Scao Teven, Gugger Sylvain, Drame Mariama, Lhoest Quentin, Rush Alexander (2020), Transformers: State-of-the-art natural language processing, [in:] Liu Qun, Schlangen David (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Minneapolis: ACL, pp. 38–45.
Google Scholar DOI: https://doi.org/10.18653/v1/2020.emnlp-demos.6

Xian Jasper, Teofili Tommaso, Pradeep Ronak, Lin Jimmy (2024), Vector search with OpenAI embeddings: Lucene is all you need, [in:] Proceedings of the 17th ACM International Conference on Web Search and Data Mining, New York: ACM, pp. 1090–1093.
Google Scholar DOI: https://doi.org/10.1145/3616855.3635691

Yamamoto Hironori (ed.) (2007), Tools for Parliamentary Oversight: A Comparative Study of 88 National Parliaments, Geneve: Inter-Parliamentary Union.
Google Scholar

Young Iris Marion (2002), Inclusion and Democracy, Oxford: Oxford University Press.
Google Scholar

Opublikowane

2024-11-30

Wersje

Jak cytować

Plisiecki, H., & Kwiatkowska, A. (2024). Odkrywanie reprezentacji demokracji w Big Data: semantyczny dobór celowy próby do badań jakościowych i mieszanych. Przegląd Socjologii Jakościowej, 20(4), 18–43. https://doi.org/10.18778/1733-8069.20.4.02

Numer

Dział

Numer tematyczny: „Metody humanistyki cyfrowej w socjologii jakościowej”

Podobne artykuły

<< < 4 5 6 7 8 9 10 11 12 13 > >> 

Możesz również Rozpocznij zaawansowane wyszukiwanie podobieństw dla tego artykułu.