A Modification of the Leacock-Chodorow Measure of the Semantic Relatedness of Concepts

Jerzy Korzeniewski

doi:10.18778/0208-6018.351.06

Autor

Jerzy Korzeniewski University of Łódź, Faculty of Economics and Sociology, Department of Demography, Łódź, Poland https://orcid.org/0000-0001-6526-5921

DOI:

https://doi.org/10.18778/0208-6018.351.06

Słowa kluczowe:

badanie tekstu, Sieć WordNet, podobieństwo semantyczne słów, miara Leacock‑Chodorowa

Abstrakt

Miary semantycznego podobieństwa pojęć można podzielić na dwa rodzaje: metody oparte na wiedzy i metody oparte na bazie tekstów. Techniki oparte na wiedzy stosują stworzone przez człowieka słowniki oraz inne opracowania. Techniki oparte na bazie tekstów oceniają podobieństwo semantyczne dwóch pojęć, odwołując się do obszernych baz dokumentów tekstowych. Niektórzy badacze twierdzą, że miary oparte na wiedzy są lepsze jakościowo od tych opartych na bazie tekstów, ale o wiele istotniejsze jest to, że te drugie zależą bardzo mocno od użytej bazy tekstów. W niniejszym artykule przedstawiono propozycję modyfikacji najlepszej metody pomiaru semantycznego podobieństwa pojęć, opartej na sieci WordNet, a mianowicie miary Leacock‑Chodorowa. Ta miara była najlepsza w kilku eksperymentach badawczych oraz można zapisać ją za pomocą prostej formuły. Nową propozycję oceniono na podstawie dwóch popularnych benchmarkowych zbiorów par pojęć, tj. zbioru 65 par pojęć Rubensteina‑Goodenougha oraz zbioru 353 par pojęć Fickelsteina. Wyniki pokazują, że przedstawiona propozycja spisała się lepiej od tradycyjnej miary Leacock‑Chodorowa.

Pobrania

Brak dostępnych danych do wyświetlenia.

Bibliografia

Bird S., Loper E., Klein E. (2009), Natural Language Processing with Python, O’Reilly Media Inc., Sebastopol.
Google Scholar

Budanitsky A., Hirst G. (2006), Evaluating WordNet‑based Measures of Lexical Semantic Relatedness, “Computational Linguistics”, vol. 32, issue 1, pp. 13–47.
Google Scholar DOI: https://doi.org/10.1162/coli.2006.32.1.13

Fellbaum Ch. (ed.) (1998), WordNet: An Electronic Lexical Database, The MIT Press, Cambridge.
Google Scholar DOI: https://doi.org/10.7551/mitpress/7287.001.0001

Hirst G., St‑Onge D. (1998), Lexical chains as representations of context for the detection and correction of malapropisms, [in:] Ch. Fellbaum (ed.), WordNet: An Electronic Lexical Database, The MIT Press, Cambridge, pp. 305–332.
Google Scholar

Jiang J., Conrath D. (1997), Semantic similarity based on corpus statistics and lexical taxonomy, Proceedings of International Conference on Research in Computational Linguistics, Taiwan, pp. 19–33.
Google Scholar

Leacock C., Chodorow M. (1998), Combining local context and WordNet similarity for word sense identification, [in:] Ch. Fellbaum (ed.), WordNet: An Electronic Lexical Database, The MIT Press, Cambridge, pp. 265–283.
Google Scholar

Lin D. (1998), Automatic retrieval and clustering of similar words, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING–ACL ’98), Montreal, pp. 296–304.
Google Scholar DOI: https://doi.org/10.3115/980691.980696

McInnes B., Pedersen T., Liu Y., Melton G., Pakhomov S. (2014), U‑path: An undirected path‑based measure of semantic similarity, Proceedings of the Annual Symposium of the American Medical Informatics Association, Washington, pp. 882–891.
Google Scholar

Resnick P. (1995), Using information content to evaluate semantic similarity, Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, pp. 448–453.
Google Scholar

Wu Z., Palmer M. (1994), Verbs semantics and lexical selection, Proceedings of the 32nd annual meeting on Association for Computational Linguistics, ACL ’94, Association for Computational Linguistics, Stroudsburg, pp. 133–138.
Google Scholar DOI: https://doi.org/10.3115/981732.981751

Zugang C., Jia S., Yaping Y. (2018), An Approach to Measuring Semantic Relatedness of Geographic Terminologies Using a Thesaurus and Lexical Database Sources, “International Journal of Geo‑Information”, vol. 7(3), pp. 98–12.
Google Scholar DOI: https://doi.org/10.3390/ijgi7030098