KEYPHRASE EXTRACTION BASED ON LARGE LANGUAGE MODELS

Authors

  • Mohammad Juman Hussain

Abstract

The article addresses the current problem of extracting key phrases from natural language texts,
which is a critical task in the field of natural language processing and text mining. It examines in detail
the main approaches to extracting key phrases (keywords), including both traditional methods and modern
approaches based on artificial intelligence. The paper discusses a set of widely used methods in this field,
such as TF-IDF, RAKE, YAKE, and linguistic parser-based methods. These methods are based on statistical
principles and/or graph structures, but they often face problems related to their insufficient ability to
take into account the context of the text. The GPT-3 large language model demonstrates superior contextual
understanding compared to traditional methods for key phrase extraction. This advanced capability
allows GPT-3 to more accurately identify and extract relevant key phrases from text. The comparative
analysis using the Inspec benchmark dataset reveals GPT-3's significantly higher performance in terms of
Mean Average Precision (MAP@K). However, it should be noted that despite high accuracy and extraction
quality, the use of large language models may be limited in real-time applications due to their longer
response time compared to classical statistical methods. Thus, the article emphasizes the need for further
research in this area to optimize key phrase extraction algorithms, taking into account real-time requirements
and text context

References

Downloads

Published

2024-11-10

Issue

Section

SECTION I. INFORMATION PROCESSING ALGORITHMS