• V.V. Kureichik Southern Federal University
  • P.S. Gerasimenko Southern Federal University
Keywords: Full-text search, B-trees, vector space model, inverse index, n-gram indexing, two-phase text search, indexes, information extraction, ranking, neural networks, fuzzy logic, binned algorithms


This article is devoted to the review of known and modern approaches, methods and algorithms of
full-text search. A brief history of the solution of the problem of search in unstructured text data, its development
and relevance are described. The main task of search in text data is formulated. The definition of
the database index is given. The target function of the search information system is defined in general
terms and possible compromise variations of its parameters when solving various applied problems are
described. A generalized architecture of a modern search information system is given with the division of
the search problem into two phases: the primary extraction of relevant records and their subsequent ranking
to form the final search results. The article provides basic descriptions of the main algorithms and
methods of full-text search, such as: search by terms (logical search), search using trees and their varieties
(B-trees, UB-trees, tries), search based on n-grams (including search based on frequency representation),
use of the vector space model (VSM), search based on an inverted (reverse) index, search using the apparatus of fuzzy logic and bioinspired methods. The main advantages and disadvantages of these methods
are given, their applicability in various conditions is described, and possible methods for optimizing
the search for text data to improve the accuracy, speed of search and efficiency of resource use are considered.
Possible promising directions in the field of solving the problem of primary information extraction
are presented. Some methods for determining the similarity of text records for solving the ranking
problem based on the apparatus of fuzzy logic are given. The article touches upon the issues of increasing
the relevance of primary extraction using artificial intelligence methods, neural networks, fuzzy logic and
bioinspired methods, in particular methods for expanding the search query and/or expanding the processed
text records. The influence of the boundary conditions of the search system construction on increasing
its efficiency is described. In conclusion, the article summarizes the review and discusses the prospects
for further development of various full-text search methods.


