ALGORITHM FOR SEARCHING AND ACQUISITION OF KNOWLEDGE BASED ON TECHNOLOGIES FOR PROCESSING AND ANALYZING TEXTS IN NATURAL LANGUAGE

Authors

  • Е.М. Gerasimenko
  • Y.А. Kravchenko
  • D.А. Shanenko

Abstract

The article is devoted the topical scientific problem of increasing the efficiency of processing and
analyzing text information when solving problems of searching and acquiring knowledge. The relevance of
this task is related to the need to create effective means of processing the accumulated huge amount of
poorly structured data containing important, sometimes hidden knowledge that is necessary for building
effective control systems for complex objects of different nature. The algorithm of search and knowledge
acquisition in processing and analyzing textual information proposed by the author is characterized by the
use of low-level deterministic rules that allow for qualitative text simplification based on the exclusion of
words invariant to meaning from textual information. The algorithm relies on domain elaboration that
allows to create lists of domain-specific words, which allows for high quality text simplification. In this
task, the input data are streams of textual information (profile descriptions) extracted from online recruiting
platforms; the output information is represented by sentences formed in the form of a triple "subjectverb-
object", reflecting the granules of knowledge obtained during text processing. The use of this order of
units constituting a sentence is due to the fact that this order is the most widespread in the Russian language,
although other variations of the order are possible in the texts themselves without losing the general
meaning. The main idea of the algorithm is to split a large corpus of text into sentences, then filter the
resulting sentences based on the keywords entered by the user. Subsequently, the sentences are further
split into components and simplified depending on the type of received component (verbal, nominal).
The field of marketing was used as an example in this work, and the keywords were "social media".
The author has developed an algorithm for for knowledge search and acquisition based on natural language
text processing and analysis technologies, and a software implementation of the proposed algorithm
has been performed. A number of metrics were used as efficiency evaluation methods: the Flash-
Kincaid index; the Coleman-Liau index; and the automatic readability index. The conducted computational
experiments have confirmed the effectiveness of the proposed algorithm in comparison with analogues
that use neural networks to solve similar problems

References

Downloads

Published

2024-11-10

Issue

Section

SECTION I. INFORMATION PROCESSING ALGORITHMS