DATA CLUSTERING ALGORITHM FOR PROTECTING CONFIDENTIAL INFORMATION ON THE INTERNET

  • I.S. Bereshpolov Southern Federal University
  • Y.А. Kravchenko Southern Federal University
  • А. G. Sleptsov Southern Federal University
Keywords: Information security, confidential information, clustering, cloud model, heuristic algorithm

Abstract

The article is devoted to solving the scientific problem of protecting confidential information
in the Internet based on the algorithm for clustering significant amounts of data. The protection of
a computer network confidential information is a hot topic for research, especially in connection
with the growing use of information technology and the increase in data of valuable information
stored in the Internet. With the growth of information responsibility, the need for effective methods
of computer networks information security has become critical. In this scientific article, the authors
propose a solution to the problem of protecting computer networks confidential information
based on the big data clustering algorithm. Traditional intrusion detection methods have limitations
such as the ability to work only with one- or two-dimensional data, and also have a strong
reliance on prior knowledge. To eliminate these limitations, the authors propose a heuristic intrusion
detection algorithm that uses clustering based on a cloud model. The proposed algorithm
takes advantage of both labeled and unlabeled samples for data clustering, thereby reducing reliance
on a priori knowledge. The results of a computational experiment carried out on the proposed
algorithm were compared with several canonical intrusion detection algorithms. The results
showed that the proposed algorithm improved the performance of the intrusion detection system,
increased the accuracy of detection, reduced the false alarm rate, and enhanced the reliability of
the system. The dynamic weighting method used in the algorithm removed the complexity of highlevel
data processing and allowed the algorithm to learn itself, resulting in a relatively stable
cloud model. Despite the significant improvement in the performance of the proposed algorithm
compared to the canonical clustering algorithms, the results of the study also showed that the
algorithm has some limitations, such as a high false positive rate and sensitivity to data with certain
types of distribution. To eliminate these shortcomings, further improvement of the algorithm is
required. In general, the proposed heuristic clustering intrusion detection algorithm based on the
cloud model is a promising solution for protecting computer networks confidential information.

References

1. Kravchenko Y.A., Bova V.V., Kursitys I.O. Models for Supporting of Problem-Oriented
Knowledge Search and Processing, Intelligent Information Technologies for Industry, 2016,
Vol. 1, pp. 287-295.
2. Lipinskiy A.P. Obespechenie konfidentsial'nosti informatsii, poluchaemoy pri proizvodstve
sledstvennykh deystviy [Ensuring the confidentiality of information obtained in the course of
investigative actions], Vestnik Udmurtskogo universiteta. Seriya Ekonomika i pravo [Bulletin
of the Udmurt University. Series Economics and Law], 2021, Vol. 31, No. 5, pp. 856-860.
3. Babieva N.A. Informatsionnaya bezopasnost' lichnosti i voprosy zashchity konfidentsial'noy
informatsii [Information security of the individual and issues of confidential information protection],
Sborniki konferentsiy NITS Sotsiosfera [Collections of conferences of the Research
Center Sociosphere], 2016, No. 31, pp. 66-68.
4. Kravchenko Yu.A., Natskevich A.N., Kursitys I.O. Busting bioinspirirovannykh algoritmov
dlya resheniya zadachi klasterizatsii [Boosting bioinspired algorithms for solving the clustering
problem], Mezhdunarodnaya konferentsiya po myagkim vychisleniyam i izmereniyam [International
Conference on Soft Computing and Measurements], 2018, Vol. 1, pp. 777-780.
5. Lobchikova A.S. Zashchita konfidentsial'noy informatsii pri ee peredache po otkrytym kanalam
svyazi [Protecting confidential information during its transmission over open communication
channels], Novaya nauka: Problemy i perspektivy [New Science: Problems and Perspectives],
2017, Vol. 2, No. 3, pp. 147-149.
6. Sadullaev U.B. Problemy zashchity konfidentsial'noy informatsii [Problems of protection of
confidential information], Rostovskiy nauchnyy zhurnal [Rostov scientific journal], 2019,
No. 1, pp. 196-203.
7. Dulya I.S. Primenenie metodov glubokogo obucheniya k zadache klasterizatsii vremennykh
ryadov [Application of deep learning methods to the problem of time series clustering], Alleya
nauki [Alley of Science], 2021, Vol. 1, No. 5 (56), pp. 974-978.
8. Bova V.V., Kuliev E.V., Shcheglov S.N. Metod semanticheskoy klasterizatsii raspredelennykh
resursov znaniy s dinamicheskimi komponentami na osnove kontentnoy fil'tratsii [The method
of semantic clustering of distributed knowledge resources with dynamic components based on
content filtering], Informatika, vychislitel'naya tekhnika i inzhenernoe obrazovanie [Informatics,
Computer Science and Engineering Education], 2019, No. 1 (34), pp. 30-42.
9. Kozlova O.A. Metody klasterizatsii v zadachakh otsenki tekhnicheskogo sostoyaniya
telekommunikatsionnogo oborudovaniya [Clustering methods in the problems of assessing the
technical condition of telecommunication equipment], Mezhdunarodnaya konferentsiya po
myagkim vychisleniyam i izmereniyam [International Conference on Soft Computing and
Measurements], 2014, Vol. 1, pp. 95-96.
10. Solov'ev A.S. Otsenki, analiz, klasterizatsiya i upravlenie v ierarkhicheskikh strukturakh [Evaluation,
analysis, clustering and management in hierarchical structures], Ekonomika i sotsium
[Economy and Society], 2021, No. 4-2 (83), pp. 404-419.
11. Boyko E.A. Klasterizatsiya sotsial'nykh setey s pomoshch'yu algoritma klasterizatsii BSP
[Clustering of social networks using the BSP clustering algorithm ], Vostochno-Evropeyskiy
zhurnal peredovykh tekhnologiy [Eastern European Journal of Advanced Technologies], 2012,
Vol. 3, No. 11 (57), pp. 34-36.
12. Giordano J., O’Reilly M., Taylor H., Dogra N. Confidentiality and autonomy: The challenge(
s) of offering research participants a choice of disclosing their identity, Qualitative
Health Research, 2007, 17, pp. 264-275.
13. He Z., Cai Z., and Yu J. Latent-data privacy preserving with customized data utility for social
network data, IEEE Transactions on Vehicular Technology, 2017, Vol. PP, No. 99, pp. 1-10.
14. Omran M.G.H., Engelbrecht A.P., Salman A. An overview of clustering methods, Intelligent
Data Analysis, 2007, Vol. 11, No. 6, pp. 583-605.
15. Crotty B.H., Mostaghimi A. Confidentiality in the digital age, BMJ, 2014, Vol. 348.
16. Alekseev D.M., Minyuk A.N., Shumilin A.S. Zashchita konfidentsial'noy informatsii v oblachnoy
meditsinskoy informatsionnoy sisteme [Protection of confidential information in a cloud-based
medical information system], Innovatsionnaya nauka [Innovative Science], 2020, No. 6.
17. Egoshin N.S. et al. A Model of Threats to the Confidentiality of Information Processed in Cyberspace
Based on the Information Flows Model, Symmetry, 2020, Vol. 12, No. 11, pp. 1840.
18. Livraga G., Viviani M. Data confidentiality and information credibility in on-line ecosystems,
Proceedings of the 11th International Conference on Management of Digital EcoSystems,
2019, pp. 191-198.
19. Ge J., Liu J. Security assessment algorithm of navigation control system based on big data,
Journal of coastal research, 2019, Vol. 93, pp. 1026-1033.
20. Deng H. Multicriteria analysis with fuzzy pairwise comparison, International Journal of Approximate
Reasoning, 1999, 21 (3), pp. 215-231
21. Danilowicz C., Nguyen N. Consensus Methods for Solving Inconsistency of Replicated Data in
Distributed Systems, Distributed and Parallel Databases, 2003, Vol. 14, pp. 53-69.
22. Paixao M.P., Silva L. Elias G. Clustering Large-Scale Distributed Software Component Repositories,
Proc. the Fourth Int'l Conf. Advances in Databases Knowledge and Data Applications,
2012, pp. 124-129.
Published
2023-08-14
Section
SECTION II. INFORMATION PROCESSING ALGORITHMS