This is an outdated version published on 2024-11-30. Read the most recent version.

Corpus-Assisted Discourse Studies (CADS) as Support for Qualitative Content Analysis: A Case Study Using SketchEngine in Discourse Research

Authors

Marek Troszyński Collegium Civitas w Warszawie https://orcid.org/0000-0002-3653-4018

DOI:

https://doi.org/10.18778/1733-8069.20.4.03

Keywords:

corpus linguistics, SketchEngine, Qualitative Content Analysis, mixed methods

Abstract

The article presents the potential use of corpus linguistics tools as an initial stage in qualitative content analysis. It discusses the development of Corpus-Assisted Discourse Studies (CADS). The core part of the article is a discussion of the functions of a program supporting CADS – SketchEngine. The text includes numerous examples that illustrate the ways of using CADS methods and SketchEngine functionalities for analyzing Polish press discourse. By enabling easy reference to source texts (concordances), SketchEngine facilitates the inclusion of mixed methods in discourse research.

Downloads

Download data is not yet available.

Author Biography

Marek Troszyński, Collegium Civitas w Warszawie

Doktor, socjolog, prowadzi Obserwatorium Cywilizacji Cyfrowej w Collegium Civitas w Warszawie. Bada dyskurs medialny dotyczący migrantów i uchodźców w Polsce oraz język komunikatów z wojny w Ukrainie. W pracy naukowej zajmuje się także zagadnieniami mowy nienawiści wobec mniejszości w Polsce. W badaniach wykorzystuje metody lingwistyki korpusowej (CL) oraz narzędzia automatycznej analizy języka naturalnego (NLP).

References

Baker Paul (2004), Querying Keywords: Questions of Difference, Frequency, and Sense in Keywords Analysis, „Journal of English Linguistics”, vol. 32(4), s. 346–359, https://doi.org/10.1177/0075424204269894
Google Scholar DOI: https://doi.org/10.1177/0075424204269894

Baker Paul (2006), Using corpora in discourse analysis, London–New York: Continuum.
Google Scholar DOI: https://doi.org/10.5040/9781350933996

Baker Paul, Mcenery Tony (2005), A corpus-based approach to discourses of refugees and asylum seekers in UN and newspaper texts, „Journal of Language and Politics”, vol. 4(2), s. 197–226.
Google Scholar DOI: https://doi.org/10.1075/jlp.4.2.04bak

Baker Paul, Gabrielatos Costas, Khosravinik Majid, Mcenery Tony, Wodak Ruth (2008), A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press, „Discourse & Society”, vol. 19(3), s. 273–306, https://doi.org/10.1177/0957926508088962
Google Scholar DOI: https://doi.org/10.1177/0957926508088962

Bednarek Monika (2006), Evaluation in Media Discourse: Analysis of a Newspaper Corpus, London–New York: Continuum.
Google Scholar

Chen Yingying, Peng Zhao, Kim Sei Hill, Choi Chang Won (2023), What We Can Do and Cannot Do with Topic Modeling: A Systematic Review, „Communication Methods and Measures”, vol. 17(2), s. 111–130, https://doi.org/10.1080/19312458.2023.2167965
Google Scholar DOI: https://doi.org/10.1080/19312458.2023.2167965

CLARIN-PL (b.r.), https://ws.clarin-pl.eu/ [dostęp: 10.05.2024].
Google Scholar

CLARIN-PL (b.r.), Login, https://services.clarin-pl.eu/services [dostęp: 10.05.2024].
Google Scholar

Costa Antonio Pedro, Moreira Antonio, Freitas Fabio, Costa King, Bryda Grzegorz (red.) (2023), Computer Supported Qualitative Research, Cham: Springer International Publishing, https://doi.org/10.1007/978-3-031-31346-2
Google Scholar DOI: https://doi.org/10.1007/978-3-031-31346-2

Creswell John W. (2009), Editorial: Mapping the Field of Mixed Methods Research, „Journal of Mixed Methods Research”, vol. 3(2), s. 95–108, https://doi.org/10.1177/1558689808330883
Google Scholar DOI: https://doi.org/10.1177/1558689808330883

Efe İbrahim (2019), A corpus-driven analysis of representations of Syrian asylum seekers in the Turkish press 2011–2016, „Discourse and Communication”, vol. 13(1), s. 48–67, https://doi.org/10.1177/1750481318801624
Google Scholar DOI: https://doi.org/10.1177/1750481318801624

Egbert Jesse, Biber Douglas (2018), Incorporating text dispersion into keyword analyses, „Corpora”, vol. 14(1), s. 77–104, https://doi.org/10.3366/cor.2019.0162
Google Scholar DOI: https://doi.org/10.3366/cor.2019.0162

Egbert Jesse, Larsson Tove, Biber Douglas (2020), Doing Linguistics with a Corpus. Methodological Considerations for the Everyday User, Cambridge: Cambridge University Press, https://doi.org/10.1017/9781108888790
Google Scholar DOI: https://doi.org/10.1017/9781108888790

Fairclough Norman (2000), New Labour, New Language?, London: Routledge.
Google Scholar

Gabrielatos Costas (2018), Keyness analysis: Nature, metrics and techniques, [w:] Charlotte Taylor, Anna Marchi (red.), Corpus Approaches To Discourse: A critical review, Oxford: Routledge, s. 225–258.
Google Scholar DOI: https://doi.org/10.4324/9781315179346-11

Gabrielatos Costas, Baker Paul (2008), Fleeing, Sneaking, Flooding: A Corpus Analysis of Discursive Constructions of Refugees and Asylum Seekers in the UK Press, 1996–2005, „Journal of English Linguistics”, vol. 36(1), s. 5–38, https://doi.org/10.1177/0075424207311247
Google Scholar DOI: https://doi.org/10.1177/0075424207311247

Gabrielatos Costas, Marchi Anna (2012), Keyness: Appropriate metrics and practical issues Discourse-Oriented Corpus Studies View project Conditionals and Modality View project. CADS, https://www.researchgate.net/publication/261708842 [dostęp: 10.05.2024].
Google Scholar

Gillings Mathew, Mautner Gerlinde, Baker Paul (2023), Corpus-Assisted Discourse Studies, Cambridge: Cambridge University Press, https://doi.org/10.1017/9781009168144
Google Scholar DOI: https://doi.org/10.1017/9781009168144

Hardt-Mautner Gerlinde (1995), „Only Connect.” Critical Discourse Analysis and Corpus Linguistics, „UCREL Technical Paper”, no. 6.
Google Scholar

Heidenreich Tobias, Lind Fabienne, Eberl Jakob-Moritz, Boomgaarden Hajo G. (2019), Media Framing Dynamics of the „European Refugee Crisis”: A Comparative Topic Modelling Approach, „Journal of Refugee Studies”, vol. 32(1), s. i172–i182, https://doi.org/10.1093/jrs/fez025
Google Scholar DOI: https://doi.org/10.1093/jrs/fez025

Hunston Susan (2002), Corpora in Applied Linguistics, Cambridge: Cambridge University Press.
Google Scholar DOI: https://doi.org/10.1017/CBO9781139524773

Isoaho Karoliina, Gritsenko Daria, Mäkelä Eetu (2021), Topic Modeling and Text Analysis for Qualitative Policy Research, „Policy Studies Journal”, vol. 49(1), s. 300–324, https://doi.org/10.1111/psj.12343
Google Scholar DOI: https://doi.org/10.1111/psj.12343

Jakubíček Milos, Kilgarriff Adam, Kovář Vojtech, Rychlý Pavel (2013), The TenTen Corpus Family, [w:] 7th International Corpus Linguistics Conference CL, s. 125–127, https://www.sketchengine.eu/wp-content/uploads/The_TenTen_Corpus_2013.pdf [dostęp 10.05.2024].
Google Scholar

Kieraś Witold, Kobyliński Łukasz (2021), Korpusomat – present state and the future of the project, „Jezyk Polski”, R. 101, z. 2, s. 49–58, https://doi.org/10.31286/JP.101.2.4
Google Scholar DOI: https://doi.org/10.31286/JP.101.2.4

Kieraś Witold, Kobyliński Łukasz, Ogrodniczuk Maciej (2018), Korpusomat – a Tool for Creating Searchable Morphosyntactically Tagged Corpora, „Computational Methods in Science and Technology”, vol. 24(1), s. 21–27, https://doi.org/10.12921/cmst.2018.0000005
Google Scholar DOI: https://doi.org/10.12921/cmst.2018.0000005

Kilgarriff Adam (2009), Simple maths for keywords, [w:] Proceedings of the Corpus Linguistics Conference. Liverpool, UK. 2009, https://www.sketchengine.eu/wp-content/uploads/2015/04/2009-Simple-maths-for-keywords.pdf [dostęp: 10.05.2024].
Google Scholar

Kilgarriff Adam, Baisa Vit, Rychlý Pavel, Jakubíček Milos (2015), Longest-commonest Match, [w:] Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, s. 397–404, https://www.sketchengine.eu/wp-content/uploads/Longest-commonest_eLex2015.pdf [dostęp: 10.05.2024]
Google Scholar

Kilgarriff Adam, Reddy Siva, Pomikálek Jan, Pvs Avinesh (2010), A Corpus Factory for many languages, [in:] Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias (red.), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta: European Language Resources Association, s. 904–910, https://aclanthology.org/L10-1044/ [dostęp: 10.05.2024].
Google Scholar

Kilgarriff Adam, Baisa Vit, Bušta Jan, Jakubícek Milos, Kovár Vojtech, Michelfeit Jan, Rychlý Pavel, Suchomel Vit (2014), The Sketch Engine: Ten years on, „Lexicography”, vol. 1(1), s. 7–36, https://doi.org/10.1007/s40607-014-0009-9
Google Scholar DOI: https://doi.org/10.1007/s40607-014-0009-9

Korpusomat (b.r.), https://korpusomat.pl/ [dostęp: 10.05.2024].
Google Scholar

Korpusomat (Beta) (b.r.), https://korpusomat.eu/ [dostęp: 10.05.2024].
Google Scholar

Krippendorff Klaus (2004), Content analysis. An Introduction to Its Methodology, Thousand Oaks–London–New Delhi: Sage Publications.
Google Scholar

Krzyżanowski Michał, Forchtner Bernhard (2016), Theories and concepts in critical discourse studies: Facing challenges, moving beyond foundations, „Discourse & Society”, vol. 27(3), s. 253–261, https://doi.org/10.1177/0957926516630900
Google Scholar DOI: https://doi.org/10.1177/0957926516630900

Leech Geoffrey, Fallon Roger (1992), Computer corpora – What do they tell us about culture?, „ICAME Journal”, vol. 16, s. 29–50.
Google Scholar

Matytcina Marina S., Grigoryanova Tatiana (2022), Statistical Methods for Extracting Collocations from a Text Corpus, [w:] 2022 2nd International Conference on Technology Enhanced Learning in Higher Education (TELE), Lipetsk: IEEE, s. 55–57, https://doi.org/10.1109/TELE55498.2022.9801038
Google Scholar DOI: https://doi.org/10.1109/TELE55498.2022.9801038

Piasecki Maciej (2007), Polish Tagger TaKIPI: Rule Based Construction and Optimisation, „Task Quarterly”, vol. 11(1–2), s. 151–167, https://www.researchgate.net/publication/272685698 [dostęp 10.05.2024].
Google Scholar

Piasecki Maciej (2014), User-driven Language Technology Infrastructure -t he Case of CLARIN-PL, [w:] 9th Language Technologies Conference Information Society – IS 2014, s. 7–13, https://nl.ijs.si/isjt14/proceedings/isjt2014_01.pdf [dostęp: 10.05.2024].
Google Scholar

Piper Alison (2000), Some People Have Credit Cards and Others Have Giro Cheques: “Individuals” and “People” as Lifelong Learners in Late Modernity, „Discourse and Society”, vol. 11(4), s. 515–542.
Google Scholar DOI: https://doi.org/10.1177/0957926500011004004

Potts Amanda, Bednarek Monika, Caple Helen (2015), How can computer-based methods help researchers to investigate news values in large datasets? A corpus linguistic study of the construction of newsworthiness in the reporting on Hurricane Katrina, „Discourse and Communication”, vol. 9(2), s. 149–172, https://doi.org/10.1177/1750481314568548
Google Scholar DOI: https://doi.org/10.1177/1750481314568548

Przepiórkowski Adam (2009), A comparison of two morphosyntactic tagsets of Polish, [w:] Representing Semantics in Digital Lexicography: Proceedings of MONDILEX Fourth Open Workshop, s. 138–144, https://nlp.ipipan.waw.pl/~adamp/Papers/2009-mondilex/article.pdf [dostęp: 10.05.2024].
Google Scholar

Radziszewski Adam, Kilgarriff Adam, Lew Robert (2011), Polish Word Sketches, https://www.sketchengine.eu/wp-content/uploads/Polish_Word_Sketches_2011.pdf [dostęp: 10.05.2024].
Google Scholar

Rychlý Pavel (2008), A Lexicographer-Friendly Association Score, [w:] Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2008, s. 6–9, https://nlp.fi.muni.cz/raslan/2008/papers/13.pdf [dostęp: 10.05.2024]
Google Scholar

Saputa Karol, Tomaszewska Aleksandra, Zawadzka-Paluektau Natalia, Kieraś Witold, Kobyliński Łukasz (2023), Korpusomat.eu: A Multilingual Platform for Building and Analysing Linguistic Corpora, [w:] Jiří Mikyška, Clélia de Mulatier, Maciej Paszynski, Valeria V. Krzhizhanovskaya, Jack J. Dongarra, Peter M.A. Sloot (red.), Computational Science – ICCS 2023. 23rd International Conference, Prague, Czech Republic, July 3–5, 2023, Proceedings, Part II, s. 230–237, https://nlp.ipipan.waw.pl/Bib/sap:etal:23:iccs.pdf [dostęp: 10.05.2024].
Google Scholar DOI: https://doi.org/10.1007/978-3-031-36021-3_22

Savicky Petr, Hlavacova Jaroslava (2002), Measures of word commonness, „Journal of Quantitative Linguistics”, vol. 9, s. 215–231.
Google Scholar DOI: https://doi.org/10.1076/jqul.9.3.215.14124

Scott Mike (1997), PC analysis of key words – and key key words, „System”, vol. 25(2), s. 233–245.
Google Scholar DOI: https://doi.org/10.1016/S0346-251X(97)00011-0

Scott Mike (2011), WordSmith Tools Manual, Version 6, Stroud: Lexical Analysis Software Ltd., https://lexically.net/downloads/version6/wordsmith6.pdf [dostęp: 10.05.2024].
Google Scholar

SketchEngine (b.r.), https://www.sketchengine.eu/ [dostęp: 10.05.2024].
Google Scholar

Stubbs Michael (1997), Whorf’s Children: Critical Comments on Critical Discourse Analysis (CDA), [w:] Ann Ryan, Alison Wray (red.), Evolving Models of Language, Clavendon: Multilingual Matters, s. 100–116.
Google Scholar

Törnberg Anton, Törnberg Petter (2016), Combining CDA and topic modeling: Analyzing discursive connections between Islamophobia and anti-feminism on an online forum, „Discourse & Society”, vol. 27(4), s. 401–422, https://doi.org/10.1177/0957926516634546
Google Scholar DOI: https://doi.org/10.1177/0957926516634546

Zawadzka-Paluektau Natalia (2023), Ukrainian refugees in Polish press, „Discourse and Communication”, vol. 17(1), s. 96–111, https://doi.org/10.1177/17504813221111636
Google Scholar DOI: https://doi.org/10.1177/17504813221111636

Downloads

PDF (Język polski)

Published

2024-11-30

Versions

2024-11-30 (2)
2024-11-30 (1)

How to Cite

Troszyński, M. (2024). Corpus-Assisted Discourse Studies (CADS) as Support for Qualitative Content Analysis: A Case Study Using SketchEngine in Discourse Research. Przegląd Socjologii Jakościowej, 20(4), 44–67. https://doi.org/10.18778/1733-8069.20.4.03