„Analiza sentymentu” – metoda analizy danych jakościowych. Przykład zastosowania oraz ewaluacja słownika RID i metody klasyfikacji Bayesa w analizie danych jakościowych

Krzysztof Tomanek

doi:10.18778/1733-8069.10.2.07

Autor

Krzysztof Tomanek Uniwersytet Jagielloński, Instytut Socjologii

DOI:

https://doi.org/10.18778/1733-8069.10.2.07

Słowa kluczowe:

analiza danych jakościowych, analiza sentymentu, analiza treści, text mining, kodowanie tekstów, przetwarzanie języka naturalnego, słownik RID, naiwny klasyfikator Bayesa, CAQDAS

Abstrakt

Celem artykułu jest prezentacja podstawowych metod klasyfikacji jakościowych danych tekstowych. Metody te korzystają z osiągnięć wypracowanych w takich obszarach, jak przetwarzanie języka naturalnego i analiza danych nieustrukturalizowanych. Przedstawiam i porównuję dwie techniki analityczne stosowane wobec danych tekstowych. Pierwsza to analiza z zastosowaniem słownika tematycznego. Druga technika oparta jest na idei klasyfikacji Bayesa i opiera się na rozwiązaniu zwanym naiwnym klasyfikatorem Bayesa. Porównuję efektywność dwóch wspomnianych technik analitycznych w ramach analizy sentymentu. Akcentuję rozwiązania mające na celu zbudowanie trafnego, w kontekście klasyfikacji tekstów, słownika. Porównuję skuteczność tak zwanych analiz nadzorowanych do skuteczności analiz zautomatyzowanych. Wyniki, które prezentuję, wzmacniają wniosek, którego treść brzmi: słownik, który w przeszłości uzyskał dobrą ocenę jako narzędzie klasyfikacyjne, gdy stosowany jest wobec nowego materiału empirycznego, powinien przejść fazę ewaluacji. Jest to, w proponowanym przeze mnie podejściu, podstawowy proces adaptacji słownika analitycznego, traktowanego jako narzędzie klasyfikacji tekstów.

Pobrania

Biogram autora

Krzysztof Tomanek - Uniwersytet Jagielloński, Instytut Socjologii

Krzysztof Tomanek, doktorant w Instytucie Socjologii Uniwersytetu Jagiellońskiego. Jego zainteresowania badawcze dotyczą zagadnień lojalności, teorii zaufania, zagadnienia Quality of Life w badaniach społecznych. Najważniejsze zainteresowania metodologiczne obejmują zastosowanie technik text mining do analiz danych jakościowych, analizy danych jakościowych wspierane rozwiązaniami NLP, SVR. Prowadzi grant badawczy MNiSW dotyczący Festiwalu Kultury Żydowskiej w Krakowie (wspólnie z dr Anną Marią Orla-Bukowską). Jest autorem projektów ogólnopolskich badań konsumenckich oraz publikacji dotyczących wykorzystania zaawansowanych technik analizy treści w różnorodnych środowiskach CAQDAS.

Bibliografia

Acerbi Alberto i in. (2013) The Expression of Emotions in 20th Century Books. „PLoS ONE”, vol. 8, no. 3, s. 1–6 [dostęp 1 maja 2014 r.]. Dostępny w Internecie http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0059030&representation=PDF
Google Scholar DOI: https://doi.org/10.1371/journal.pone.0059030

Cardie Claire i in. (2003) Combining low-level and summary representations of opinions for multi-perspective question answering [w:] Proceedings of the AAAI Spring Symposium on New Directions in Question Answering, s. 20–27 [dostęp 1 maja 2014 r.]. Dostępny w Internecie http://www.aaai.org/Papers/Symposia/Spring/2003/SS-03-07/SS03-07-004.pdf
Google Scholar

Das Sanjiv R., Chen Mike J. (2001) Yahoo! for Amazon: Sentiment Extraction fromSmall Talk on the Web,„Management Science”, Vol. 53, No. 9, s. 1375–1388 [dostęp 1 maja 2014 r.]. Dostępny w Internecie http://algo.scu.edu/~sanjivdas/chat_FINAL.pdf
Google Scholar DOI: https://doi.org/10.1287/mnsc.1070.0704

Dave Kushal, Lawrence Steve, Pennock David M. (2003) Mining the peanut gallery: Opinion extraction and semantic classification of product reviews [w:] Proceedings of WWW, s. 519–528, [dostęp 1 maja 2014 r.]. Dostępny w Internecie http://www.kushaldave.com/p451-dave.pdf
Google Scholar DOI: https://doi.org/10.1145/775152.775226

DeWall Nathan C. i in. (2011) Tuning in to psychological change: Linguistic markers of psychological traits and emotions over time in popular U.S. song lyrics. „Psychology of Aesthetics, Creativity, and the Arts”, vol. 5, no. 3, s. 200–207.
Google Scholar DOI: https://doi.org/10.1037/a0023195

Dini Luca, Mazzini Giampaolo (2002) Opinion classification through information extraction [w:] Proceedings of the Conference on Data Mining Methods and Databases for Engineering, Finance and Other Fields (Data Mining), s. 299–310 [dostęp 1 maja 2014 r.]. Dostępny w Internecie http://www.google.pl/url?sa=t&rct=j-&q=&esrc=s&source=web&cd=1&ved=0CC8QFjAA&url=http%3A%2F%2Fia2010primercuat.googlecode.com%2Fsvn-history%2Fr45%2Ftrunk%2FSEI-GO%2Fdocs%2F10.1.1.109.1736.pdf&ei=D6diU9ahG8ep7AbGu4GYDQ&usg=AFQjCNGlzrqDMZ3aj-M_a-Yv4ITbwdU0KQ&bvm=bv.65788261,d.ZGU&cad=rja
Google Scholar

Domingos Pedro, Pazzani Michael (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning”, vol. 29, no. 2/3,s.103–130.
Google Scholar DOI: https://doi.org/10.1023/A:1007413511361

Esuli Andrea, Sebastiani Fabrizio (2006) SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining [w:] Proceedings of the 5th Conference on Language Resources and Evaluation, LREC’06, s. 417–422 [dostęp 1 maja 2014]. Dostępny w Internecie http://gandalf.aksis.uib.no/lrec2006/pdf/384_pdf.pdf
Google Scholar

Hogenraad Robert, Orianne Emilie (1986) Imagery, regressive thinking, and verbal performance in internal monologue. „Imagination, Cognition, and Personality”, vol. 5, no. 2, s. 127–145.
Google Scholar DOI: https://doi.org/10.2190/8DB8-ELNU-FCDY-ENMR

Hopkins Daniel, King Gary (2010) Extracting systematic social science meaning from text. „American Journal of Political Science”, vol. 54, no. 1, s. 229–247.
Google Scholar DOI: https://doi.org/10.1111/j.1540-5907.2009.00428.x

Hotho Andreas, Nürnberger Andreas, Paaß Gerhard (2005) ABrief Survey of Text Mining. „German Journal for Computer Linguistics and Speech Technology”, vol. 20, no. 1, s. 19–62.
Google Scholar

Jurafsky Dan, Martin James H. (2009) Speech and natural language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, NJ: Prentice Hall.
Google Scholar

Lieberman Erez i in. (2007) Quantifying the evolutionary dynamics of language. „Nature”, vol. 449, no. 7163, s. 713–716.
Google Scholar DOI: https://doi.org/10.1038/nature06137

Loughran Tim, McDonald Bill (2011) When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks. „The Journal of Finance”, vol. 66, no. 1, s. 35–65.
Google Scholar DOI: https://doi.org/10.1111/j.1540-6261.2010.01625.x

Martindale Colin (1976) Primitive mentality and the relationship between art and society. „Scientific Aesthetics”, vol. 1, s. 5–18.
Google Scholar

Martindale Colin (1977) Syntactic and semantic correlates of verbal tics in Gilles de la Tourette’s syndrome: A quantitative case study. „Brain and Language”, vol. 4, s. 231–247.
Google Scholar DOI: https://doi.org/10.1016/0093-934X(77)90020-7

Martindale Colin (1990) The clockwork muse: The predictability of artistic change. New York: Basic Books.
Google Scholar

Michel Jean-Baptistei in. (2011) Quantitative Analysis of Culture Using Millions of Digitized Books. „Science”, vol. 331, s. 176–182.
Google Scholar DOI: https://doi.org/10.1126/science.1199644

Nasukawa Tetsuya, Yi Jeonghee (2003) Sentiment analysis: Capturing favorability using natural language processing [w:] Proceedings of the Conference on Knowledge Capture (K-CAP) s. 70–77 [dostęp 1 maja 2014 r.]. Dostępny w Internecie http://tredocs.com/tw_files2/urls_41/40/d-39217/7z-docs/7.pdf
Google Scholar

Nielsen Finn Å. (2011) A new ANEW: Evaluation of a word list for sentiment analysis in microblogs [ w:] R oweMatthew i in., eds., Proceedings of the ESWC2011 Workshop on ‘Making Sense of Microposts’: Big things come in small packages 718 in CEUR Workshop Proceedings, Heraklion, s. 93–98 [dostęp 1 maja 2014 r.]. Dostępny w Internecie http://ceur-ws.org/Vol-18/msm2011_proceedings.pdf
Google Scholar

Pagel Mark, Atkinson Quentin D., Meade Andrew (2007) Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. „Nature”, vol. 449, s. 717–720.
Google Scholar DOI: https://doi.org/10.1038/nature06176

Pang Bo, Lee Lillian (2002) Thumbs up? Sentiment Classification using Machine Learning Techniques.„EMNLP ‘02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing”, vol. 10, s. 79–86.
Google Scholar DOI: https://doi.org/10.3115/1118693.1118704

Pang Bo, Lee Lillian (2008) Opinion Mining and Sentiment Analysis. „Foundations and Trends in Information Retrieval”, vol. 2, s. 1–135.
Google Scholar DOI: https://doi.org/10.1561/1500000011

Rorty Richard (1996) Przygodność, ironia i solidarność. Przełożył Wacław J. Popowski. Warszawa: Spacja.
Google Scholar

Rorty Richard (1999) Obiektywność, relatywizm i prawda. Przełożył Janusz Margański. Warszawa: Aletheia.
Google Scholar

Tong Richard M. (2001) An operational system for detecting and tracking opinions in on-line discussion [w:] Working Notes of the SIGIR Workshop on Operational Text Classification. New York: ACM, s. 1–6.
Google Scholar

Yi Jeonghee i in. (2003) Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques [w:] Proceedings of the Third IEEE International Conference on Data Mining (ICDM’03). Washington: IEEE Computer Society, s. 427–434.
Google Scholar