Sentiment Classification of Bank Clients’ Reviews Written in the Polish Language

Adam Piotr Idczak

doi:10.18778/0208-6018.353.03

Authors

Adam Piotr Idczak University of Łódź, Faculty of Economics and Sociology, Department of Statistical Methods Łódź, Poland https://orcid.org/0000-0001-9676-2410

DOI:

https://doi.org/10.18778/0208-6018.353.03

Keywords:

sentiment analysis, opinion mining, text classification, text mining, logistic regression, naive Bayes classifier

Abstract

It is estimated that approximately 80% of all data gathered by companies are text documents. This article is devoted to one of the most common problems in text mining, i.e. text classification in sentiment analysis, which focuses on determining the sentiment of a document. A lack of defined structure of the text makes this problem more challenging. This has led to the development of various techniques used in determining the sentiment of a document. In this paper, a comparative analysis of two methods in sentiment classification, a naive Bayes classifier and logistic regression, was conducted. Analysed texts are written in the Polish language and come from banks. The classification was conducted by means of a bag‑of‑n‑grams approach, where a text document is presented as a set of terms and each term consists of n words. The results show that logistic regression performed better.

Downloads

References

Asur S., Huberman B. A. (2010), Prediction the Future with Social Media, https://www.researchgate.net/publication/45909086_Predicting_the_Future_with_Social_Media [accessed: 10.02.2021].
Google Scholar DOI: https://doi.org/10.1109/WI-IAT.2010.63

Bermingham A., Smeaton A. F. (2011), On Using Twitter to Monitor Political Sentiment and Predict Election Results, “Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP)”, pp. 2–10, https://www.aclweb.org/anthology/W11-3702.pdf [accessed: 10.02.2021].
Google Scholar

Das S., Chen M. (2001), Yahoo! For Amazon: Extracting Market Sentiment from Stock Message Boards, “Proceedings of APFA–2001”.
Google Scholar

Dave K., Lawrence S., Pennock D. M. (2003), Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews, “Proceedings of International Conference on World Wide Web (WWW–2003)”, https://www.researchgate.net/publication/2904559_Mining_the_Peanut_Gallery_Opinion_Extraction_and_Semantic_Classification_of_Product_Reviews [accessed: 10.02.2021].
Google Scholar DOI: https://doi.org/10.1145/775152.775226

Domański Cz., Pruska K. (2000), Nieklasyczne metody statystyczne, PWE, Warszawa.
Google Scholar

Hanbury A., Nopp C. (2015), Detecting Risks in the Banking System by Sentiment Analysis, “Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing”, pp. 591–600, https://www.aclweb.org/anthology/D15-1071.pdf [accessed: 15.02.2021].
Google Scholar

Hosmer D. W., Lemeshow S., Sturdivant R. X. (2013), Applied Logistic Regression, 3rd ed., John Wiley & Sons, New Jersey.
Google Scholar DOI: https://doi.org/10.1002/9781118548387

Liu B. (2015), Sentiment Analysis. Mining Opinions, Sentiments, and Emotions, Cambridge University Press, New York.
Google Scholar DOI: https://doi.org/10.1017/CBO9781139084789

Loughran T., McDonald B. (2011), When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‑Ks, “Journal of Finance”, vol. 66, no. 1, pp. 35–65, https://www.uts.edu.au/sites/default/files/ADG_Cons2015_Loughran%20McDonald%20JE%202011.pdf [accessed: 19.02.2021].
Google Scholar DOI: https://doi.org/10.1111/j.1540-6261.2010.01625.x

Morinaga S., Yamanishi K., Tateishi K., Fukushima T. (2002), Mining Product Reputations on the Web, “Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD–2002)”, https://www.researchgate.net/publication/200044311_Mining_product_reputations_on_the_Web [accessed: 10.02.2021].
Google Scholar DOI: https://doi.org/10.1145/775047.775098

Na J.Ch., Khoo C., Wu P. H.J. (2005), Use of negation phrases in automatic sentiment classification of product reviews, “Library Collections, Acquisitions & Technical Services”, no. 29, pp. 180–191, https://ccc.inaoep.mx/~villasen/bib/Use%20of%20negation%20phrases%20in%20automatic%20sentiment%20classification.pdf [accessed: 11.02.2021].
Google Scholar DOI: https://doi.org/10.1080/14649055.2005.10766050

Nasukawa T., Yi J. (2003), Sentiment Analysis: Capturing Favorability Using Natural Language Processing, “Proceedings of the K‑CAP–03, 2nd International Conference on Knowledge Capture”, pp. 70–77, https://www.researchgate.net/publication/220916772_Sentiment_analysis_Capturing_favorability_using_natural_language_processing [accessed: 15.02.2021].
Google Scholar DOI: https://doi.org/10.1145/945645.945658

Pang B., Lee L., Vaithyanathan S. (2002), Thumbs up? Sentiment Classification using Machine Learning Techniques, “Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)”, pp. 79–86, https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf [accessed: 8.02.2021].
Google Scholar DOI: https://doi.org/10.3115/1118693.1118704

Review Centre, https://www.reviewcentre.com/ [accessed: 25.02.2021].
Google Scholar

Saif H., He Y., Alani H. (2012), Alleviating data sparsity for Twitter sentiment analysis, [in:] 2nd Workshop on Making Sense of Microposts (#MSM2012): Big things come in small packages at the 21st International Conference on the World Wide Web (WWW’12), 16 Apr 2012, Lyon, France, CEUR Workshop Proceedings (CEUR‑WS.org), pp. 2–9, https://www.researchgate.net/publication/228450062_Alleviating_Data_Sparsity_for_Twitter_Sentiment_Analysis [accessed: 25.02.2021].
Google Scholar

Sullivan D. (2001), Integrating Data and Document Warehouses, “DM Review Magazine”, http://www.dmreview.com/article_sub_articleId_3697.html [accessed: 18.02.2021].
Google Scholar

Tong R.M (2001), An Operational System for Detecting and Tracking Opinions in on‑Line Discussion, “Proceedings of SIGIR Workshop on Operational Text Classification”.
Google Scholar

Turney P. D. (2002), Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, “Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL)”, pp. 417–424, https://www.researchgate.net/publication/248832100_Thumbs_Up_or_Thumbs_Down_Semantic_Orientation_Applied_to_Unsupervised_Classification_of_Reviews [accessed: 22.02.2021].
Google Scholar DOI: https://doi.org/10.3115/1073083.1073153

Wiebe J. (2000), Learning Subjective Adjectives from Corpora, “Proceedings of National Conference on Artificial Intelligence (AAAI–2000)”, pp. 735–740, https://www.aaai.org/Papers/AAAI/2000/AAAI00-113.pdf [accessed: 13.02.2021].
Google Scholar