This article discusses and presents the main problems and principles of the data clustering
process, in particular, the principles and tasks of clustering text arrays of linguistic expert information.
In the course of this work, the main difficulties arising in the design of such systems were
identified, for example: the need for preprocessing data, reducing the size of the initial sample,
etc. To effectively perform the presented tasks, the implemented solution must have an integrated
approach that takes into account the efficiency indicators of methods aimed at solving individual
subtasks, as well as the ability to provide high efficiency indicators for the implementation of each
stage of the clustering process. In the presented work, various groups of hierarchical clustering
algorithms are considered, in particular, a subgroup of agglomerative clustering algorithms was
considered in relation to the problems of clustering linguistic expert information. In the described
work, a formal statement of the text clustering problem is given, and the main group of implemented
solutions based on the principles of agglomerative clustering is determined: ROCK, CURE,
CHAMELEON. A detailed review of each of the presented algorithms is carried out, and the main
advantages and disadvantages of each of them are formulated. The advantage of this work can be
considered the totality of the presented data on the algorithms, as well as the results of a comparative analysis, which make it possible to further assess the feasibility and potential probability of
using these solutions from the presented group of agglomerative clustering algorithms. The novelty
of this work lies in the formation of an overview analysis of existing approaches in the field of
hierarchical clustering for solving the problems of cluster analysis of linguistic expert information,
as well as the formation of the results of the comparative analysis of the considered algorithms.
