METHOD FOR SEARCHING SEQUENTIAL PATTERNS OF USER'S BEHAVIOR ON THE INTERNET

  • V.V. Kureychik Southern Federal University
  • V. V. Bova Southern Federal University
  • Y.A. Kravchenko Southern Federal University
Keywords: Search for sequential patterns, sequential analysis, genetic algorithm, transactional database, information search

Abstract

One of the important tasks of data mining is to isolate patterns and detect related events in
sequential data based on the analysis of sequential patterns. The article examines the possibility of
using sequential patterns to analyze the events of search and cognitive activity of users when interacting
with Internet resources of an open information and educational environment. Searching
for sequential patterns is a complex computational task whose goal is to retrieve all frequent sequences
representing potential relationships within elements from a transactional database of
sequences of search activity events with a given minimum support. To solve it, the article proposes
a method for searching for patterns in sequences of events to detect hidden patterns that indicate
possible levels of vulnerability when performing information search tasks in the Internet space.
A mathematical model of user behavior in a search session based on the theory of sequential patterns
is described. To improve the computational efficiency of the method, a modified algorithm
for generating sequential patterns has been developed, at the first stage of which AprioriAll is
performed, which forms frequent candidate sequences of all possible lengths, and at the second
stage, a genetic algorithm for optimizing the input parameters of the feature space of the generated
set to search for maximum patterns. A series of computational experiments were carried out on
test data from the MSNBC corpus, the SPMF open source data mining library. The comparative
analysis was carried out with the VMSP and GSP algorithms. The research results confirmed the
efficiency of the search for maximum sequential patterns by the proposed algorithm in terms of the
execution time and the number of extracted patterns. The results of the experimental studies of the
method showed that to increase the stability and accuracy of the work, the sample size obtained as
a result of the GA operation will reduce the required number of scans of the pattern database,
providing acceptable computational costs comparable to the VMSP algorithm and the GSP algorithm
that exceeds the search time for sequential patterns. an average of more than 150 %.

References

1. Tingting Z., Chen L.Y., Liang-Hsien T. Understanding user motivation for evaluating online
content: a self-determination theory perspective, J. Behaviour and Information Technology,
2015, Vol. 34, pp. 479-491.
2. Gupta M., Han J. Approaches for pattern discovery using sequential data mining. Pattern Discovery
Using Sequence Data Mining: Applications and Studies, IGI Global, 2012, pp. 137-154.
3. Jalalirad A., Tjalkens T. Using feature-based models with complexity penalization for selecting
features, J. Signal Processing Systems, 2018, Vol. 90, Isssue 2, pp. 201-210.
4. Zayko T.A., Oleynik A.A., Subbotin S.A. Izvlechenie chislennykh associativnykh pravil s
uchetom znachimosti priznakov [Extracting numeric Association rules taking into account the
importance of the signs], Vostochno-Evropeyskiy zhurnal peredovykh tekhnologiy [East European
journal of advanced technologies], 2013, Vol. 5, No. 4 (65), pp. 28-34.
5. Bova V.V., Shcheglov S.N., Leshchanov D.V. Modified Approach to Problems of Associative
Rules Processing based on Genetic Search, International Russian Automation Conference
(RusAutoCon), 2019, No. 8867675.
6. Bova V., Kravchenko Yu., Rodzin S., Kuliev E. Hybrid method for prediction of users’ information
behavior in the Internet based on bioinspired search, J. of Physics: Conference Series,
2019. DOI: 10.1088/1742-6596/1333/3/032008.
7. Wedyan S. Review and Comparison of Associative Classification Data Mining Approaches,
International Journal of Computer, Information, Systems and Control Engineering, 2014,
Vol. 8, pp. 34-45.
8. Obolonnyy V.I. Obnaruzhenie posledovatel'nostnykh patternov v sobytiyakh bezopasnosti
sistemy detekcii vtorzheniy [Detection of sequential patterns in security events of the intrusion
detection system], Molodoy uchenyy [Young scientist], 2018, No. 23 (209), pp. 181-187.
9. Jingjun Zhu GG, Wu Haiyan. An efficient method of web sequential pattern mining based on
session filter and transaction identification, J. Netw., 2010, No. 5 (9), pp. 1017-1024.
10. Wang J.-Z., Huang J.-L. On efficiently mining high utility sequential patterns, Knowledge and
Information Systems, 2016, Vol. 49, No. 2, pp. 597-627.
11. Kravchenko Yu.A. Model' fil'tra znaniy dlya zadach semanticheskoy identifikacii [Knowledge
filter model for semantic identification tasks], Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya
SFedU. Engineering Sciences], 2018, No. 4 (165), pp. 197-211.
12. Bova V.V., Nuzhnov E.V., Kureichik V.V. The combined method of semantic similarity estimation
of problem oriented knowledge on the basis of evolutionary procedures, Advances in Intelligent
Systems and Computing, 2017, Vol. 573, pp. 74-83.
13. Bova V.V., Kravchenko Yu.A. Bioinspirirovannyy podkhod k resheniyu zadachi klassifikacii profiley
povedeniya pol'zovateley v intellektual'nykh internet-servisakh [Bioinspired approach to solving the
problem of classifying user behavior profiles in intelligent Internet services], Izvestiya YuFU.
Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2019, No. 4 (206), pp. 89-102.
14. Fournier Viger P., Cheng-W., Gomariz A., Tseng V. VMSP: Efficient Vertical Mining of Maximal
Sequential Patterns, 2014. DOI: 10.1007/978-3-319-06483-3_8.
15. Lyz' N.A., Istratova O.N. Informacionno-obrazovatel'naya deyatel'nost' v internet-prostranstve:
vidy, faktory, riski [Information and educational activities in the Internet space: types, factors,
risks], Pedagogika [Pedagogy], 2019, No. 4, pp. 16-26.
16. Truong-Chi T. Fournier-Viger P. A survey of high utility sequential pattern mining, High-
Utility Pattern Mining: Theory, Algorithms and Applications. Springer, 2019, pp. 97-129.
17. Fournier-Viger P., Wu C.-W., Tseng V.-S. Mining Maximal Sequential Patterns without Candidate
Maintenance, Proc. 9th Intern. Conference on Advanced Data Mining and Applications.
Springer. LNAI 8346, 2013, pp. 169-180.
18. Gan W., Lin J. C.-W., Fournier-Viger P., Chao H.-C., Hong T.-P.A survey of incremental
high-utility itemset mining, Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery, 2018, Vol. 8, No.2. Art. e1242.
19. Gomariz A., Campos M., Marin R., Goethals B. ClaSP: An Efficient Algorithm for Mining
Frequent Closed Sequences, Proc. 17th Pacific-Asia Conf. Knowledge Discovery and Data
Mining. Springer, 2013, pp. 50-61.
20. Singh J., Ram H. Improving Efficiency of Apriori Algorithm Using, Journal of Scientific and
Research Publications, 2013, Vol. 3, pp. 1-4.
21. Lezhebokov A.A., Kuliev E.V. Tekhnologii vizualizacii dlya prikladnykh zadach
intellektual'nogo analiza dannykh [Visualization technologies for data mining applications],
Izvestiya Kabardino-Balkarskogo nauchnogo centra RAN [Izvestiya Kabardino-Balkar scientific
center of the Russian Academy of Sciences], 2019, No. 4 (90), pp. 14-23.
22. Gladkov L.A., Kureychik V.V., Kureychik V.M. Geneticheskie algoritmy: uchebnik [Genetic
algorithms: textbook]. Moscow: Fizmatlit. 2010, 368 p.
23. Kureychik V.V., Kureychik V.M., Sorokoletov P.V. Analiz i obzor modeley evolyutsii [Analysis
and review of evolution models], Izvestiya RAN. Teoriya i sistemy upravleniya [Izvestiya
RAS. Theory and control systems], 2007, No. 5, pp. 114-126.
24. Kureichik V., Zaporozhets D., Zaruba D. Generation of bioinspired search procedures for optimization
problems, Application of Information and Communication Technologies, AICT
2016 – 10, 2016, pp. 7991822.
25. SPMF: an open-source data mining mining library. Available at: https://www.philippefournier-
viger.com/spmf/index.php?link=datasets.php
Published
2020-11-22
Section
SECTION I. ARTIFICIAL INTELLIGENCE AND FUZZY SYSTEMS