МОДИФИЦИРОВАННЫЙ МЕТОД УСТРАНЕНИЯ НЕОДНОЗНАЧНОСТИ СМЫСЛА СЛОВ, ОСНОВАННЫЙ НА МЕТОДАХ РАСПРЕДЕЛЕННОГО ПРЕДСТАВЛЕНИЯ

Y. A. Kravchenko; Mansour Ali  Mahmoud; Mohammad Juman Hussain

Y. A. Kravchenko Southern Federal University
Mansour Ali Mahmoud Southern Federal University
Mohammad Juman Hussain Southern Federal University

Keywords: Word sense disambiguation, WordNet, knowledge-based WSD, semEval, text similarity, text mining

Abstract

In the text mining tasks, textual representation should be not only efficient but also interpretable,
as this enables an understanding of the operational logic underlying the data mining
models. This paper describes a modified Word Sense Disambiguation (WSD) method which extends
two well-known variations of the Lesk WSD approach. Given a word and its context, Lesk
bases its calculations on the overlap between the context of a word and each definition of its senses
(gloss) in order to select the proper meaning. The main contribution of the proposed method is
the adoption of the concept of “similarity” between definition and context instead of "overlap", in
addition to expanding the definition with examples provided by WordNet for each sense of the
target word. The proposed method is also characterized by the use of text similarity measurement
functions defined in a distributed semantic space. The proposed method has been tested on five
different benchmark datasets for words sense disambiguation tasks and compared with several
basic methods, including simple Lesk, extended Lesk, WordNet 1st sense, Babelfy and UKB. The
results show that proposed method outperforms most basic methods with the exception of Babelfy
and the WN 1st sense methods.

References

1. Pal A.R., Saha D.J. a. p. a. Word sense disambiguation: A survey, 2015.
2. Raganato A., Camacho-Collados J., Navigli R. Word sense disambiguation: A unified evaluation
framework and empirical comparison, Proceedings of the 15th Conference of the European
Chapter of the Association for Computational Linguistics: Vol. 1. Long Papers, 2017,
pp. 99-110.
3. Pradhan S., Loper E., Dligach D., Palmer M. Semeval-2007 task-17: English lexical sample,
srl and all words, Proceedings of the fourth international workshop on semantic evaluations
(SemEval-2007), 2007, pp. 87-92.
4. Lesk M. Automatic sense disambiguation using machine readable dictionaries: how to tell a
pine cone from an ice cream cone, Proceedings of the 5th annual international conference on
Systems documentation, 1986, pp. 24-26.
5. Kilgarriff A., Rosenzweig J. Framework and results for English SENSEVAL, Computers the
Humanities, 2000, Vol. 34, No. 1, pp. 15-48.
6. Vasilescu F., Langlais P., Lapalme G. Evaluating Variants of the Lesk Approach for Disambiguating
Words, Lrec., 2004.
7. Banerjee S., Pedersen T. An adapted Lesk algorithm for word sense disambiguation using
WordNet, International conference on intelligent text processing and computational linguistics.
Springer, 2002, pp. 136-145.
8. Basile P., Caputo A., Semeraro G. An enhanced lesk word sense disambiguation algorithm
through a distributional semantic model, Proceedings of COLING 2014, the 25th International
Conference on Computational Linguistics: Technical Papers, 2014, pp. 1591-1600.
9. Banerjee S., Pedersen T. Extended gloss overlaps as a measure of semantic relatedness, Ijcai.,
Vol. 3. Citeseer, 2003, pp. 805-810.
10. Agirre E., Soroa A. Personalizing PageRank for Word Sense Disambiguation, EACL, 2009.
11. Agirre E., Lopez de Lacalle O., Soroa A. Random walks for knowledge-based word sense disambiguation,
Computational Linguistics, 2014, Vol. 40, No. 1, pp. 57-84.
12. Tripodi R., Pelillo M. A game-theoretic approach to word sense disambiguation, Computational
Linguistics, 2017, Vol. 43, No. 1, pp. 31-70.
13. Haveliwala T.H. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web
search, IEEE transactions on knowledge data engineering, 2003, Vol. 15, No. 4, pp. 784-796.
14. Moro A., Navigli R. Semeval-2015 task 13: Multilingual all-words sense disambiguation and
entity linking, Proceedings of the 9th international workshop on semantic evaluation (SemEval
2015), 2015, pp. 288-297.
15. Navigli R., Ponzetto S.P. BabelNet: The automatic construction, evaluation and application of
a wide-coverage multilingual semantic network, Artificial intelligence, 2012, Vol. 193,
pp. 217-250.
16. Reimers N., Gurevych I. Sentence-bert: Sentence embeddings using siamese bert-networks,
arXiv preprint arXiv: 10084, 2019.
17. Iacobacci I., Pilehvar M.T., Navigli R. Embeddings for word sense disambiguation: An evaluation
study, Proceedings of the 54th Annual Meeting of the Association for Computational
Linguistics (Vol. 1: Long Papers), 2016, pp. 897-907.
18. Pennington J., Socher R., Manning C.D. Glove: Global vectors for word representation, Proceedings
of the 2014 conference on empirical methods in natural language processing
(EMNLP), 2014, pp. 1532-1543.
19. Kenter T., De Rijke M. Short text similarity with word embeddings, Proceedings of the 24th
ACM international on conference on information and knowledge management, 2015,
pp. 1411-1420.
20. Edmonds P., Cotton S. Senseval-2: overview, Proceedings of SENSEVAL-2 Second International
Workshop on Evaluating Word Sense Disambiguation Systems, 2001, pp. 1-5.
21. Snyder B., Palmer M. The English all-words task, Proceedings of SENSEVAL-3, the Third
International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, 2004,
pp. 41-43.
22. Navigli R., Jurgens D., Vannella D. Semeval-2013 task 12: Multilingual word sense disambiguation,
Second Joint Conference on Lexical and Computational Semantics (* SEM). Vol. 2:
Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013),
2013, pp. 222-231.
23. Navigli R.J.A. c. s. Word sense disambiguation: A survey, 2009, Vol. 41, No. 2, pp. 1-69.

MODIFIED WORD SENSE DISAMBIGUATION METHOD BASED ON DISTRIBUTED REPRESENTATION METHODS

Abstract

References