ОЦЕНКА ЭФФЕКТИВНОСТИ МЕТОДА ПОИСКА АССОЦИАТИВНЫХ ПРАВИЛ ДЛЯ ЗАДАЧ ОБРАБОТКИ БОЛЬШИХ ДАННЫХ

V. V. Bova; E.V. Kuliev; S.N. Scheglov

V. V. Bova Southern Federal University
E.V. Kuliev Southern Federal University
S.N. Scheglov Southern Federal University

Keywords: Associative rule extraction, unstructured data, genetic algorithm, associative rule base, big data

Abstract

The modern databases have significant volume and consist of large masses of information.
One of the popular methods of knowledge identification in terms of tasks of analysis and processing
of large data volumes is composed of the algorithms for searching the associative rules.
The paper solves the problem of building the bases of associative rules for the analysis of the unstructured
large data volumes on the basis of searching different regularities considering the importance
of their characteristics. The authors propose the method for synthesizing the bases and
building the transaction database to calculate the threshold values of support and application of
criteria of estimating implicit associations. This allows us to extract repeated and implicit associative
rules. To improve the computational effectiveness of extracting the associative rules, the paper
applies the genetic algorithm for optimization of input parameters of the characteristic searching
space. The developed method shortens the time of rules extraction, reduces the number of generated
common rules, and avoid the resource-consuming procedure of pre-processing the synthesized
rule base. The authors developed the program and algorithmic module to carry out the experimental
research of the proposed method for synthesizing the associative rules on the basis of filtering
the input parameters of the search model for solving the tasks of processing the unstructured
data. The experiments conducted on the test transaction bases allow us to clarify the theoretical
estimations of time complexity of the proposed method that used the genetic algorithm to calculate
the weighed support of the set of rules considering the assessment of a priori informative content
of the characteristics included in the dataset. The time complexity of the developed method is estimated
as  О(I2). The comparative analysis is performed using the test data of the Retail Data
with the algorithms Apriori and Frequent Pattern-Growth. The results have proven the effectiveness
of the search method on big sets of transactions. The method allows us to reduce the cardinal
of an irredundant set of extracted associative rules in more than 40% in comparison with the popular
algorithms. The experiments have shown that the method can be effective for the tasks of
knowledge discovery in terms of processing large volumes of data.

References

1. Gaziev G.Z., Kurdyukova G.N., Kurdyukov V.V. Klasterizatsiya Big Data dlya ikh analiza i
obrabotki [Clusterization of Big Data for their analysis and processing], Sb. konferentsii
«Napravleniya i mekhanizmy razvitiya nauki novogo vremeni: ot teorii do vnedreniya
rezul'tatov» [Collection of the conference "Directions and mechanisms of modern science development:
from theory to implementation of results"], 2017, pp. 150-162.
2. Bova V.V., Shcheglov S.N., Leshchanov D.V. Modifitsirovannyy algoritm EM-klasterizatsii
dlya zadach integrirovannoy obrabotki bol'shikh dannykh [Modified EM clustering algorithm
for integrated big data processing], Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya SFedU.
Engineering Sciences], 2018, No. 4 (165), pp. 197-211.
3. Bova V.V., Kureichik V.V., Scheglov S.N., Kureichik L.V. Multi-level ontological model of big
data processing, Advances in Intelligent Systems and Computing, 2019, Vol. 874, pp. 171-181.
4. Wu X., Zhu X., Wu G., Ding W. Data mining with big data, IEEE Transaction on Knowledge
and Data Engineering, 2014, Vol. 26, pp. 97-107.
5. Kravchenko Y.A., Kuliev E.V., Kursitys I.O. Information’s semantic search, classification,
structuring and integration objectives in the knowledge management context problems, 10th
IEEE International Conference on «Application of Information and Communication Technologies,
AICT 2016, pp. 136-141.
6. Wedyan S. Review and Comparison of Associative Classification Data Mining Approaches,
International Journal of Computer, Information, Systems and Control Engineering, 2014,
Vol. 8, pp. 34-45.
7. Zayko T.A., Oleynik A.A., Subbotin S.A. Izvlechenie chislennykh assotsiativnykh pravil s
uchetom znachimosti priznakov [Extracting numeric Association rules taking into account the
importance of the signs], Vostochno-Evropeyskiy zhurnal peredovykh tekhnologiy [East European
journal of advanced technologies], 2013, Vol. 5, No. 4 (65), pp. 28-34.
8. Ibrahim S., Chandran K.R. Compact Weighted Class Association Rule Mining using Information
Gain, International Journal of Data Mining and Knowledge Management Process,
2011, Vol. 1, pp. 1-13.
9. Muyeba M., Khan M. S., Coenen F. Fuzzy weighted association rule mining with weighted
support and confidence framework, New Frontiers in Applied Data Mining Lecture Notes in
Computer Science, 2009, Vol. 5433, pp. 312-320.
10. Zayko T.A., Oleynik A.A., Subbotin S.A. Assotsiativnye pravila v intellektual'nom analize
dannykh [Associative rules in data mining], Vestnik Natsional'nogo tekhnicheskogo
universiteta Khar'kovskiy politekhnicheskiy institut. Seriya: Informatika i modelirovanie [Bulletin
of the national technical University Kharkiv Polytechnic Institute. Series: computer Science
and modeling], 2013, No. 39 (1012), pp. 82-96.
11. Kravchenko Yu.A. Model' fil'tra znaniy dlya zadach semanticheskoy identifikatsii [Knowledge
filter model for semantic identification tasks], Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya
SFedU. Engineering Sciences], 2018, No. 4 (165), pp. 197-211.
12. Subbotin S.A., Oleynik An.A., Gofman E.A. Intellektual'nye informatsionnye tekhnologii
proektirovaniya avtomatizirovannykh sistem diagnostirovaniya i raspoznavaniya obrazov:
monografiya [Intelligent information technologies of automated diagnosis and pattern recognition:
monograph], ed. by S.A. Subbotina. Khar'kov: OOO «Kompaniya Smit», 2012, 317 p.
13. Bova V.V., Shcheglov S.N., Leshchanov D.V. Primenenie metodov geneticheskogo poiska dlya
zadach obrabotki assotsiativnykh pravil [Application of genetic search methods for processing
associative rules], XXI Mezhdunarodnaya konferentsiya po myagkim vychisleniyam i
izmereniyam (SCM-2018) [XXI international conference on soft computing and measurement
(SCM-2018)]. Saint Petersburg: SPbGETU «LETI», 2018, Vol. 1, pp. 761-769.
14. Shcheglov S.N. Modifitsirovannyy algoritm obrabotki i analiza nestrukturirovannoy informatsii
na osnove poiska assotsiativnykh pravil [Modified algorithm for processing and analysis of
unstructured information based on search for associative rules], Tr. Kongressa po
intellektual'nym sistemam i informatsionnym tekhnologiyam – «IS&IT’18» [Proceedings of the
Congress on intelligent systems and information technologies – "IS&IT'18"]. Taganrog: Izd-vo
YuFU, 2018, Vol. 2, pp. 183-191.
15. Bova V.V., Scheglov S.N., Lemanov D.V. Modified Approach to Problems of Associative Rules
Processing based on Genetic Search, 2019 International Russian Automation Conference
(RusAutoCon). 10.1109/RUSAUTOCON, 2019, No. 8867675.
16. Lezhebokov A.A., Kuliev E.V. Tekhnologii vizualizatsii dlya prikladnykh zadach
intellektual'nogo analiza dannykh [Visualization technologies for data mining applications],
Izvestiya Kabardino-Balkarskogo nauchnogo tsentra RAN [Izvestiya Kabardino-Balkar scientific
center of the Russian Academy of Sciences], 2019, No. 4 (90), pp. 14-23.
17. Guo Z., Chi D., Wu J., Zhang W. A new wind speed forecasting strategy based on the chaotic
time series modelling technique and the Apriori algorithm, Energy Conversion and Management,
2014, No. 84, pp. 140-151.
18. Kumar B.S. Rukmani K.V. Implementation of web usage mining using Apriori and FP Growth
algorithms, International Journal of Advanced Networking and Applications, 2010, Vol. 400,
pp. 400-404.
19. Pal'mov S.V., Franuzova E.N. Algoritm poiska assotsiativnykh pravil FP-GROWTH [Search
algorithm for associative rules FP-GROWTH], Natsional'naya assotsiatsiya uchenykh [National
Association of scientists]. Moscow. Izd-vo: OOO «Evraziyskoe Nauchnoe
Sodruzhestvo», 2016, No. 10-1 (26), pp. 27-32.
20. Qureshi Z. Bansal S. Improving Apriori Algorithm to get better performance with Cloud
Computing, International Journal of Software and Hardware Research in Engineerin, 2014,
Vol. 2, pp. 33-37.
21. Singh J., Ram H. Improving Efficiency of Apriori Algorithm Using, International Journal of
Scientific and Research Publications, 2013, Vol. 3, pp. 1-4.
22. Yahya O., Hegazy O, Ezat E. An efficient implementation of Apriori algorithm based on Hadoop-
Mapreduce model, International Journal of Reviews in Computing, 2012, Vol. 12, pp. 59-67.
23. Frequent Itemset Mining Implementations Repository. Retail. Available at:
http://fimi.ua.ac.be/data/retail.dat/.
24. Zhao Y., Zhang C., Cao L. Post-mining of association rules: techniques for effective
knowledge extraction. New York: Information Science Reference. 2009, 372 p.
25. Gkoulalas-Divanis A., Verykios V.S. Association Rule Hiding for Data Mining. New York:
Springer-Verlag. 2010, 150 p.

ESTIMATING THE EFFECTIVENESS OF THE METHOD FOR SEARCHING THE ASSOCIATIVE RULES FOR THE TASKS OF PROCESSING BIG DATA

Abstract

References