FORMALIZATION OF RECOGNITION AND IDENTIFICATION OF SEMANTIC OBJECTS IN NATURAL LANGUAGE TEXT STREAMS

  • Y.М. Vishnyakov Kuban State University
  • R.Y. Vishnyakov Kuban State University
Keywords: Semantic object, linguistic trace, semantic recognizer, semantic comparison, text fragment, semantic proximity, semantic identification

Abstract

The increasing incidence of crimes committed in cyberspace, particularly on social networks and
various messengers, necessitates the development of adequate and effective countermeasures. The rise in
cybercrime is so significant that it poses a potential threat of inflicting irreparable harm to the state and
society. However, detecting such crimes and criminal activities is challenging because offenders operate
virtually and linguistically within social networks, exploiting their features to conceal their traces. Nonetheless,
various detection and identification tools capable of automatically processing natural language,
highlighting specific semantic features of criminal activities, and recognizing and identifying them could
serve as effective countermeasures. Given the impracticality of applying neural network approaches to
these situations for several reasons, this study proposes a formal method for designing a recognizer to
identify semantic objects in text streams based on their linguistic traces. Formal concepts such as the formal
model of a semantic object, behavior function, scenario, linguistic trace, and recognition function are
introduced. The reasoning is based on set-theoretical principles of computational theory of semantic interpretation
and utilizes computational representations of the meaning of text fragments for their comparison
in terms of semantic similarity. The proposed approach is general and universal, allowing for the
formal synthesis of a recognizer for semantic objects based on their linguistic descriptions and behavior.
All discussions and constructions in the work are illustrated with specific examples.

References

1. Nikolaev I.S. Komp'yuternaya i prikladnaya lingvistika [Computer and applied linguistics], Nikolaev
I.S., Mitrenina O.V., Lando T.M. (ed.). Moscow: Lenand, 2016, 316 p.
2. Shemyakin Yu.I. Nachala komp'yuternoy lingvistiki: ucheb. posobie [The beginnings of computational
linguistics: a textbook]. Moscow: Izd-vo MGOU, A/O "Rosvuznauka", 1992, 120 p.
3. Testelets Ya.G. Vvedenie v obshchiy sintaksis: ucheb. posobie [Introduction to the general syntax: a
textbook]. Moscow: Izd-vo RGGU, 2001, 830 p.
4. Prokhorenok N.A., Dronov V.A. Python 3. Samoe neobkhodimoe [Python 3. The most necessary].
Saint Petersburg: BKhV-Peterburg, 2019, 608 p.
5. Bengfort Bendzhamin, Bilbro Rebekka, Okheda Toni. Prikladnoy analiz tekstovykh dannykh na Python.
Mashinnoe obuchenie i sozdanie prilozheniy obrabotki estestvennogo yazyka [Applied analysis
of text data in Python. Machine learning and the creation of natural language processing applications].
Saint Petersburg: Piter, 2019, 368 p.
6. Luis Pedro Koel'o, Villi Richart. Postroenie Sistemy mashinnogo obucheniya na yazyke Python
[Building a Machine learning System in Python]: transl. from engl. Slinkin A.A. Moscow: DMK
Press, 2016, 302 p.
7. Koncel-Kedziorski R., Hajishirzi H. and Sabharwal A. et al. Parsing algebraic word problemsinto
equations, Transactions of the Association for Computational Linguistics, 2015, 3, pp. 585-597.
8. Devlin J., Chang M.W., Lee K. and Toutanova K. Bert: Pre-training of deep bidirectional transformers
for language understanding, arXiv preprint arXiv:1810.04805, 2018.
9. Hu K., Wu H. and Qi K. et al. A domain keyword analysis approach extending Term Frequency-
Keyword Active Index with Google Word2Vec model, Scientometrics. Springer, 2017, pp. 1-38.
10. Nalimov V.V. Veroyatnostnaya model' yazyka. O sootnoshenii estestvennykh i iskusstvennykh
yazykov [A probabilistic model of language. On the relationship between natural and artificial languages].
Moscow: Nauka, 1979, 303 p.
11. Gladkiy A.V. Sintaksicheskie struktury estestvennogo yazyka v avtomatizirovannykh sistemakh
obshcheniya [Syntactic structures of natural language in automated communication systems]. Moscow:
Nauka. Glavnaya redaktsiya fiziko-matematicheskoy nauki, 1985, 144 p. (Seriya «Problemy
iskusstvennogo intellekta» [Series "Problems of artificial intelligence"]).
12. Naykhanova L.V., Evdokimova I.S. Metody i algoritmy translyatsii estestvenno-yazykovykh zaprosov
k baze dannykh v SQL-zaprosy: monografiya [Methods and algorithms for translating natural language
database queries into SQL queries: monograph]. Ulan-Ude: Izd-vo VSGTU, 2004, 148 p.
13. Sevbo I.P. Graficheskoe predstavlenie sintaksicheskikh struktur i stilisticheskaya diagnostika [Graphical
representation of syntactic structures and stylistic diagnostics]. Kiev: Naukova dumka, 1983, 192 p.
14. Rubashkin V.Sh. Predstavlenie i analiz smysla v intellektual'nykh informatsionnykh sistemakh [Representation
and analysis of meaning in intelligent information systems]. Moscow: Nauka. Glavnaya
redaktsiya fiziko-matematicheskoy nauki, 1989, 192 p. (Seriya «Problemy iskusstvennogo intellekta»
[Series "Problems of artificial intelligence"]).
15. Aggarwal C.C., Al-Garawi F., Yu P.S. Intelligent crawling on the world wide web with arbitrary predicates,
In Proc. of the WWW10, May 2001, pp. 96-105.
16. Agichtein E., Lawrence S., Gravano L. Learning search engine specific query transformations for
question answering, In Proc. of the WWW10, 2001, pp. 169-178.
17. Vishnyakov Yu.M., Vishnyakov R.Yu. Vychislitel'naya semanticheskaya interpretatsiya tekstov
nauchno-tekhnicheskogo stilya [Computational semantic interpretation of scientific and technical style
texts], Sovremennye naukoemkie tekhnologii [Modern high-tech technologies], 2016, No. 12–2,
pp. 236-242.
18. Vishnyakov R.Y., Vishnyakov Y.M. Identification of semantic objects, Journal of Physics: Conference
Series, Bristol May 2021. – Vol. 1902. DOI: 10.1088/1742–6596/1902/1/012104.
19. Vishnyakov Yu.M. Vishnyakov R.Yu. O primenenii vychislitel'noy teorii semanticheskoy interpretatsii k
vyyavleniyu kiberprestupleni [On the application of the computational theory of semantic interpretation
to the identification of cybercrimes], Aktual'nye problemy prikladnoy matematiki, informatiki i
mekhaniki: Sb. trudov mezhdunarodnoy nauchno-tekhnicheskoy konferentsii, Voronezh, 13–15
dekabrya 2021 g. [Actual problems of applied mathematics, computer science and mechanics: Proceedings
of the International Scientific and Technical Conference, Voronezh, December 13-15, 2021].
Voronezh, 2022, pp. 1526-1530.
20. Stepanova E.V., Vishnyakov Yu.M. Programmnoe obespechenie dlya vychisleniya semanticheskoy
blizosti tekstov [Software for calculating the semantic proximity of texts], Prikladnaya matematika:
sovremennye problemy matematiki, informatiki i modelirovaniya: Sb. materialov III-y Vserossiyskoy
nauchno-prakticheskoy konferentsii molodykh uchenykh [Applied mathematics: modern problems of
mathematics, computer science and modeling: A collection of materials of the III All-Russian Scientific
and practical Conference of Young Scientists]. Krasnodar, 2021, pp. 218-223.
Published
2024-10-08
Section
SECTION II. DATA ANALYSIS AND MODELING