Skip to main content Skip to main navigation menu Skip to site footer
##common.pageHeaderLogo.altText##
Izvestiya SFedU
Engineering sciences
  • Current
  • Previous issues
    • Archive
    • Issues 1995 – 2019
  • Editorial Board
  • About journal
    • Officially
    • The main tasks
    • Main sections
    • Specialties of the Higher Attestation Commission of the Russian Federation
    • Editor-in-Chief
Русский
ISSN 1999-9429 print
ISSN 2311-3103 online
  • Login
  1. Home /
  2. Search

Search

Advanced filters
Published After
Published Before

Search Results

Found one item.
  • METHODS AND ALGORITHMS FOR TEXT DATA CLUSTERING (REVIEW)

    V.V. Bova, Y.A. Kravchenko, S.I. Rodzin
    2022-11-01
    Abstract ▼

    The article deals with one of the important tasks of artificial intelligence – machine processing
    of natural language. The solution of this problem based on cluster analysis makes it possible
    to identify, formalize and integrate large amounts of linguistic expert information under conditions
    of information uncertainty and weak structure of the original text resources obtained from
    various subject areas. Cluster analysis is a powerful tool for exploratory analysis of text data,
    which allows for an objective classification of any objects that are characterized by a number of
    features and have hidden patterns. A review and analysis of modern modified algorithms for agglomerative
    clustering CURE, ROCK, CHAMELEON, non-hierarchical clustering PAM, CLARA
    and the affine transformation algorithm used at various stages of text data clustering, the effectiveness
    of which is verified by experimental studies, is carried out. The paper substantiates the
    requirements for choosing the most efficient clustering method for solving the problem of increasing the efficiency of intellectual processing of linguistic expert information. Also, the paper considers
    methods for visualizing clustering results for interpreting the cluster structure and dependencies
    on a set of text data elements and graphical means of their presentation in the form of
    dendograms, scatterplots, VOS similarity diagrams, and intensity maps. To compare the quality of
    the algorithms, internal and external performance metrics were used: "V-measure", "Adjusted
    Rand index", "Silhouette". Based on the experiments, it was found that it is necessary to use a
    hybrid approach, in which, for the initial selection of the number of clusters and the distribution of
    their centers, use a hierarchical approach based on sequential combining and averaging the characteristics
    of the closest data of a limited sample, when it is not possible to put forward a hypothesis
    about the initial number of clusters. Next, connect iterative clustering algorithms that provide
    high stability with respect to noise features and the presence of outliers. Hybridization increases
    the efficiency of clustering algorithms. The research results showed that in order to increase the
    computational efficiency and overcome the sensitivity when initializing the parameters of clustering
    algorithms, it is necessary to use metaheuristic approaches to optimize the parameters of the
    learning model and search for a global optimal solution.

1 - 1 of 1 items

links

For authors
  • Submit article
  • Author Guidelines
  • Editorial Policy
  • Reviewing
  • Ethics of scientific publications
  • Open access policy
  • Supporting documents
Language
  • English
  • Русский

journal

* not an advertisement

index

Индексация журнала
* not an advertisement
Information
  • For Readers
  • For Authors
  • For Librarians
Address: 347900, Taganrog, Chekhov St., 22, A-211 Phone: +7 (8634) 37-19-80 E-mail: iborodyanskiy@sfedu.ru
Publication is free
More information about the publishing system, Platform and Workflow by OJS/PKP.
logo Developed by RDCenter