A Modification of the Leacock-Chodorow Measure of the Semantic Relatedness of Concepts

Authors

  • Jerzy Korzeniewski University of Łódź, Faculty of Economics and Sociology, Department of Demography, Łódź, Poland https://orcid.org/0000-0001-6526-5921

DOI:

https://doi.org/10.18778/0208-6018.351.06

Keywords:

text mining, WordNet network, semantic relatedness, Lecock-Chodorov measure

Abstract

The measures of the semantic relatedness of concepts can be categorised into two types: knowledge‑based methods and corpus‑based methods. Knowledge‑based techniques make use of man‑created dictionaries, thesauruses and other artefacts as a source of knowledge. Corpus‑based techniques assess the semantic similarity of two concepts making use of large corpora of text documents. Some researchers claim that knowledge‑based measures outperform corpus‑based ones, but it is much more important to observe that the latter ones are heavily corpus dependent. In this article, we propose to modify the best WordNet‑based method of assessing semantic relatedness, i.e. the Leacock‑Chodorow measure. This measure has proven to be the best in several studies and has a very simple formula. We asses our proposal on the basis of two popular benchmark sets of pairs of concepts, i.e. the Ruben‑Goodenough set of 65 pairs of concepts and the Fickelstein set of 353 pairs of terms. The results prove that our proposal outperforms the traditional Leacock‑Chodorow measure.

Downloads

Download data is not yet available.

References

Bird S., Loper E., Klein E. (2009), Natural Language Processing with Python, O’Reilly Media Inc., Sebastopol.
Google Scholar

Budanitsky A., Hirst G. (2006), Evaluating WordNet‑based Measures of Lexical Semantic Relatedness, “Computational Linguistics”, vol. 32, issue 1, pp. 13–47.
Google Scholar

Fellbaum Ch. (ed.) (1998), WordNet: An Electronic Lexical Database, The MIT Press, Cambridge.
Google Scholar

Hirst G., St‑Onge D. (1998), Lexical chains as representations of context for the detection and correction of malapropisms, [in:] Ch. Fellbaum (ed.), WordNet: An Electronic Lexical Database, The MIT Press, Cambridge, pp. 305–332.
Google Scholar

Jiang J., Conrath D. (1997), Semantic similarity based on corpus statistics and lexical taxonomy, Proceedings of International Conference on Research in Computational Linguistics, Taiwan, pp. 19–33.
Google Scholar

Leacock C., Chodorow M. (1998), Combining local context and WordNet similarity for word sense identification, [in:] Ch. Fellbaum (ed.), WordNet: An Electronic Lexical Database, The MIT Press, Cambridge, pp. 265–283.
Google Scholar

Lin D. (1998), Automatic retrieval and clustering of similar words, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING–ACL ’98), Montreal, pp. 296–304.
Google Scholar

McInnes B., Pedersen T., Liu Y., Melton G., Pakhomov S. (2014), U‑path: An undirected path‑based measure of semantic similarity, Proceedings of the Annual Symposium of the American Medical Informatics Association, Washington, pp. 882–891.
Google Scholar

Resnick P. (1995), Using information content to evaluate semantic similarity, Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, pp. 448–453.
Google Scholar

Wu Z., Palmer M. (1994), Verbs semantics and lexical selection, Proceedings of the 32nd annual meeting on Association for Computational Linguistics, ACL ’94, Association for Computational Linguistics, Stroudsburg, pp. 133–138.
Google Scholar

Zugang C., Jia S., Yaping Y. (2018), An Approach to Measuring Semantic Relatedness of Geographic Terminologies Using a Thesaurus and Lexical Database Sources, “International Journal of Geo‑Information”, vol. 7(3), pp. 98–12.
Google Scholar

Downloads

Published

2020-12-15

How to Cite

Korzeniewski, J. (2020). A Modification of the Leacock-Chodorow Measure of the Semantic Relatedness of Concepts. Acta Universitatis Lodziensis. Folia Oeconomica, 6(351), 97-106. https://doi.org/10.18778/0208-6018.351.06

Issue

Section

Articles