KEYPHRASE EXTRACTION BASED ON LARGE LANGUAGE MODELS
Abstract
The article addresses the current problem of extracting key phrases from natural language texts,
which is a critical task in the field of natural language processing and text mining. It examines in detail
the main approaches to extracting key phrases (keywords), including both traditional methods and modern
approaches based on artificial intelligence. The paper discusses a set of widely used methods in this field,
such as TF-IDF, RAKE, YAKE, and linguistic parser-based methods. These methods are based on statistical
principles and/or graph structures, but they often face problems related to their insufficient ability to
take into account the context of the text. The GPT-3 large language model demonstrates superior contextual
understanding compared to traditional methods for key phrase extraction. This advanced capability
allows GPT-3 to more accurately identify and extract relevant key phrases from text. The comparative
analysis using the Inspec benchmark dataset reveals GPT-3's significantly higher performance in terms of
Mean Average Precision (MAP@K). However, it should be noted that despite high accuracy and extraction
quality, the use of large language models may be limited in real-time applications due to their longer
response time compared to classical statistical methods. Thus, the article emphasizes the need for further
research in this area to optimize key phrase extraction algorithms, taking into account real-time requirements
and text context.
References
52nd Annual Meeting of the Association for Computational Linguistics, 2014, Vol. 1, pp. 1262-1273.
2. Schutz A.T. Keyphrase extraction from single documents in the open domain exploiting linguistic and
statistical methods: M. App. Sc Thesis, 2008.
3. Mihalcea R., Tarau P. Textrank: Bringing order into text, Proceedings of the 2004 conference on empirical
methods in natural language processing, 2004, pp. 404-411.
4. Floridi L., Chiriatti M. GPT-3: Its Nature, Scope, Limits, and Consequences, Minds and Machines,
2020, Vol. 30. GPT-3, No. 4, pp. 681-694.
5. Kaur J., Gupta V. Effective approaches for extraction of keywords, International Journal of Computer
Science Issues, 2010, Vol. 7, No. 6, pp. 144.
6. Giarelis N., Kanakaris N., Karacapilidis N. A Comparative Assessment of State-Of-The-Art Methods
for Multilingual Unsupervised Keyphrase Extraction, IFIP International Conference on Artificial Intelligence
Applications and Innovations. Springer, 2021, pp. 635-645.
7. Ramos J. Using tf-idf to determine word relevance in document queries, Proceedings of the first instructional
conference on machine learning. Citeseer, 2003, Vol. 242, pp. 29-48.
8. Rose S., Engel D., Cramer N., Cowley W. Automatic keyword extraction from individual document,
Text mining: applications theory, 2010, Vol. 1, pp. 1-20.
9. Campos R., Mangaravite V., Pasquali A., Jorge A., Nunes C., Jatowt A. YAKE! Keyword extraction
from single documents using multiple local features, Information Sciences, 2020, Vol. 509,
pp. 257-289.
10. Alqaryouti O., Khwileh H., Farouk T., Nabhan A., Shaalan K. Graph-Based Keyword Extraction, Intelligent
Natural Language Processing: Trends and Applications: Studies in Computational Intelligence
/ eds. K. Shaalan, A.E. Hassanien, F. Tolba. Cham: Springer International Publishing, 2018,
Vol. 740, pp. 159-172. ISBN 978-3-319-67055-3.
11. Beliga S., Meštrović A., Martinčić-Ipšić S. An overview of graph-based keyword extraction methods
and approaches, Journal of information and organizational sciences, 2015, Vol. 39, No. 1, pp. 1-20.
12. Yijun G., Tian X. Study on keyword extraction with LDA and TextRank combination, Data Analysis
and Knowledge Discovery, 2014, Vol. 30, No. 7, pp. 41-47.
13. Cho T., Lee J.-H. Latent keyphrase extraction using LDA model, Journal of The Korean Institute of
Intelligent Systems, 2015, Vol. 25, No. 2, pp. 180-185.
14. Abulaish M., Anwar T. A supervised learning approach for automatic keyphrase extraction, International
Journal of Innovative Computing, Information and Control, 2012, Vol. 8, No. 11, pp. 7579-7601.
15. Akhil K.K., Rajimol R., Anoop V.S. Parts-of-Speech tagging for Malayalam using deep learning techniques,
International Journal of Information Technology, 2020, Vol. 12, No. 3, pp. 741-748.
16. Chiche A., Yitagesu B. Part of speech tagging: a systematic review of deep learning and machine learning
approaches, Journal of Big Data, 2022, Vol. 9. Part of speech tagging. No. 1, pp. 10.
17. Aro T.O., Dada F., Balogun A.O., Oluwasogo S.A. Stop words removal on textual data classification,
2019.
18. Nadeau D., Sekine S. A survey of named entity recognition and classification, Lingvisticae
Investigationes, 2007, Vol. 30, No. 1, pp. 3-26.
19. Das B., Pal S., Mondal S.K., Dalui D., Shome S.K. Automatic keyword extraction from any text document
using N-gram rigid collocation, Int. J. Soft Comput. Eng. (IJSCE), 2013, Vol. 3, No. 2, pp. 238-242.
20. Evert S., Krenn B. Exploratory collocation extraction, Phraseology 2005: The Many Faces of Phraseology,
2005, pp. 113-115.
21. Maragheh R.Y., Fang C., Irugu C.C., Parikh P., Cho J., Xu J., Sukumar S., Patel M., Korpeoglu E.,
Kumar S. LLM-take: theme-aware keyword extraction using large language models, 2023 IEEE International
Conference on Big Data (BigData). IEEE, 2023. LLM-take. pp. 4318-4324.
22. Mokhammad Zh.Kh., Mansur A.M., Kravchenko Yu.A., Kravchenko D.Yu. Metod avtomaticheskogo
izvlecheniya klyuchevykh slov [Method of automatic keyword extraction], Mezhdunarodnyy nauchnotekhnicheskiy
kongress «Intellektual'nye sistemy i informatsionnye tekhnologii – 2022» [International
scientific and technical congress "Intelligent systems and information technologies - 2022"], 2022,
pp. 91-97.
23. Mokhammad Zh.Kh., Mansur A.M., Kravchenko Yu.A., Bova V.V. Metod izvlecheniya klyuchevykh fraz
na osnove novoy funktsii ranzhirovaniya [Method of key phrase extraction based on a new ranking function],
Informatsionnye tekhnologii [Information technologies], 2022, Vol. 28, No. 9, pp. 465-474.