Search
Search Results
-
ANALYTICAL REVIEW OF THE DECISION TREE ALGORITHM IN DATA INTELLIGENCE TECHNOLOGY
E.V. Kuliev, V.A. Semenov, A.V. Kotelva, S.V. Ignateva2022-05-26Abstract ▼The decision algorithm is the preferred filtering algorithm in data mining technology, and
its results are usually chosen in the form of "if-then" rules. Algorithm C4.5 is one of the decision
algorithms that takes advantage of the ease of understanding and increasing importance, and also
takes advantage of the advanced information rate gain of its advanced ID3 algorithm. After the
theoretical analysis of the information, the algorithm C4.5 is selected to analyze the results of
performance appraisal, and enterprise performance appraisal decisions by collecting data, preprocessing
data, calculating information gain and determining selection parameters. The system isdeveloped in B/S architecture, an R&D project management platform that can perform evaluation
analysis with decision analysis results evaluation tools and web coverage. The system includes
information storage, task management, reporting, receipt and presentation control, information
visualization and other functions of the management information system functions. They can realize
project management functions, such as creating and managing a project, flow tasks, filling and
managing information about functions, creating a performance evaluation system, creating reports
of various sizes, building management. decision decision algorithm as the core technology,
the system acquires scientific significant project management information with high data accuracy,
and realizes visualization, which can help the enterprise to have a good management system in
large areas. Task management, reporting, audit control, information visualization and other functions
of the system's management reporting management functions are included. -
AGGLOMERATIVE CLUSTERIZATION ALGORITHMS FOR THE PROBLEMS OF ANALYSIS OF LINGUISTIC EXPERT INFORMATION
F.S. Bulyga, V.M. Kureichik2022-01-31Abstract ▼This article discusses and presents the main problems and principles of the data clustering
process, in particular, the principles and tasks of clustering text arrays of linguistic expert information.
In the course of this work, the main difficulties arising in the design of such systems were
identified, for example: the need for preprocessing data, reducing the size of the initial sample,
etc. To effectively perform the presented tasks, the implemented solution must have an integrated
approach that takes into account the efficiency indicators of methods aimed at solving individual
subtasks, as well as the ability to provide high efficiency indicators for the implementation of each
stage of the clustering process. In the presented work, various groups of hierarchical clustering
algorithms are considered, in particular, a subgroup of agglomerative clustering algorithms was
considered in relation to the problems of clustering linguistic expert information. In the described
work, a formal statement of the text clustering problem is given, and the main group of implemented
solutions based on the principles of agglomerative clustering is determined: ROCK, CURE,
CHAMELEON. A detailed review of each of the presented algorithms is carried out, and the main
advantages and disadvantages of each of them are formulated. The advantage of this work can be
considered the totality of the presented data on the algorithms, as well as the results of a comparative analysis, which make it possible to further assess the feasibility and potential probability of
using these solutions from the presented group of agglomerative clustering algorithms. The novelty
of this work lies in the formation of an overview analysis of existing approaches in the field of
hierarchical clustering for solving the problems of cluster analysis of linguistic expert information,
as well as the formation of the results of the comparative analysis of the considered algorithms. -
SOLUTION OF THE PROBLEM OF INTELLECTUAL DATA ANALYSIS BASED ON BIOINSPIRED ALGORITHM
E.V. Kuliev, D.Y. Zaporozhets, Y.A. Kravchenko, М.М. Semenova2022-01-31Abstract ▼The article discusses a bioinspired algorithm for solving the problems of intellectual analysis.
The integration of bioinspired algorithms for solving data mining problems is a promising
area of research. As a bioinspired algorithm, an algorithm based on the adaptive behavior of an
ant colony is considered. The ant colony algorithm allows for a high-quality search for promising
solutions to obtain optimal and quasi-optimal solutions. The algorithm has the ability to search for
suitable logical conditions. The ant colony algorithm is based on the example of the behavior of
living ants in nature. Ants are able to find the shortest solution by adapting to changes in the environment.
The authors proposed a modified ant colony algorithm for solving the problem of data
mining. The clustering problem was chosen as the task of data mining. Clustering is a combining
of similar objects into groups, is one of the fundamental tasks in the field of data analysis and
Data Mining. The list of application areas where it is applied is wide: image segmentation, marketing,
anti-fraud, forecasting, text analysis and many others. The solution to this problem is of particular relevance in the context of the constantly growing volume of generated, transmitted and
processed data. Classical clustering methods are optimized by combining with the proposed
bioinspired optimization algorithm - the ant algorithm. The proposed method is a model in which
ants are represented as agents that randomly move in the solution space with some restrictions
(for example, obstacles in their path). To determine the effectiveness of the developed modified ant
algorithm (ALA) with the clustering algorithm, the authors carried out a series of computational
experiments. For comparison, we took the genetic algorithm, the monkey algorithm and the wolf
algorithm. The simulation results prove that the clustering-based ant algorithm gives better results
than other proposed algorithms. -
TEXT VECTORIZATION USING DATA MINING METHODS
Ali Mahmoud Mansour , Juman Hussain Mohammad, Y. A. Kravchenko2021-07-18Abstract ▼In the text mining tasks, textual representation should be not only efficient but also interpretable,
as this enables an understanding of the operational logic underlying the data mining
models. Traditional text vectorization methods such as TF-IDF and bag-of-words are effective and
characterized by intuitive interpretability, but suffer from the «curse of dimensionality», and they
are unable to capture the meanings of words. On the other hand, modern distributed methods effectively
capture the hidden semantics, but they are computationally intensive, time-consuming,
and uninterpretable. This article proposes a new text vectorization method called Bag of weighted
Concepts BoWC that presents a document according to the concepts’ information it contains. The
proposed method creates concepts by clustering word vectors (i.e. word embedding) then uses the
frequencies of these concept clusters to represent document vectors. To enrich the resulted document
representation, a new modified weighting function is proposed for weighting concepts based
on statistics extracted from word embedding information. The generated vectors are characterized
by interpretability, low dimensionality, high accuracy, and low computational costs when used in
data mining tasks. The proposed method has been tested on five different benchmark datasets in
two data mining tasks; document clustering and classification, and compared with several baselines,
including Bag-of-words, TF-IDF, Averaged GloVe, Bag-of-Concepts, and VLAC. The results
indicate that BoWC outperforms most baselines and gives 7 % better accuracy on average -
INTELLIGENT DATA ANALYSIS IN ENTERPRISE MANAGEMENT BASED ON THE ANNEALING SIMULATION ALGORITHM
E.V. Kuliev, А.V. Kotelva, М.М. Semenova, S.V. Ignateva, А.P. Kukharenko2022-11-01Abstract ▼The article considers an analytical review of the annealing simulation algorithm for the
problem of efficient enterprise management. The optimization of the annealing simulation algorithm
for the problem of efficient enterprise management has been carried out. For the analysis of
cases, the optimization of the work schedule of workers in the organization was used. Established
worker scheduling model with strong and weak constraints. The simulated annealing algorithm is
used to optimize the strategy for solving the staff scheduling model. The simulated annealing algorithm
is an algorithm suitable for solving large-scale combinatorial optimization problems. It also
evaluates and obtains the optimal scheduling strategy. The simulated annealing algorithm has a
good effect on the data mining of human resource management. Big data mining can help companies
conduct dynamic analysis in talent recruitment, and the talent recruitment plan is carried out
in a quality and standard way to analyze the characteristics of various talents from many angles
and improve the level of human resource management. An algorithm has been developed that implements
the operation of the annealing simulation algorithm. The simulated annealing algorithm
makes new decisions based on the Metropolis criterion, so in addition to making an optimized
decision, it also makes a reduced decision in a limited range. The Metropolis algorithm is a sampling
algorithm mainly used for complex distribution functions. It is somewhat similar to the variance
sampling algorithm, but here the auxiliary distribution function changes over time. Experimental
studies have been carried out that show that a worker scheduling model based on strong
and weak constraints is significantly better than a manual scheduling model, achieving an effective
balance between controlling wage costs in an organization and increasing employee satisfaction.
The successful application of a workforce scheduling model based on a simulated annealing
algorithm brings new insights and insights to solve large-scale worker scheduling problems.
The results presented can serve as a starting point for studying personnel management systems
based on data mining technology. -
MONITORING OF THE EDUCATION QUALITY AND IMPLEMENTING OF INDIVIDUAL LEARNING: DEMONSTRATION OF APPROACHES AND EDUCATIONAL DATA MINING ALGORITHMS
Yass Khudheir Salal , S. M. Abdullaev2020-10-11Abstract ▼The quality monitoring system for traditional and distance education requires the development
of machine learning classification and quantification techniques necessary to predict individual
and collective student performance. This article theoretically and experimentally shows that
the most promising approach that simultaneously solves both forecast tasks is to create heterogeneous
ensembles consisting of an odd number of different base classifiers, such as decision trees,
simple neural networks, naive Bayesian classifier and others. By training and testing 11 different
binary classifiers on six different samples of educational data, we show that the individual determined
forecast of such ensembles exceeds the accuracy of forecasts of both individual base classifiers
and homogeneous ensembles created by bagging and busting technologies. The advantage of
heterogeneous ensembles is decisive when we deal with the imbalance of sample characteristic ofeducational data. In these cases, only the forecasts with accuracies exceeding the relative frequency
of the class of objects dominating in the sample of data can be considered as useful forecasts.
The main advantage of the heterogeneous ensemble is the ability to transform the deterministic
forecast into a probabilistic forecast, when instead of referring the object to a particular class, the
probability of its belonging to individual classes is given. On this basis, we have proposed a new
method of binary quantification, where individual probabilities of belonging to each of the classes
of objects are summed up separately, and the resulting total probabilities are interpreted as relative
frequencies of objects in the sample. As a result of experiments, it is shown that such ensemble
binary quantification is significantly superior to the traditional "classify and count" method.








