ПОДСИСТЕМА АВТОМАТИЧЕСКОГО АННОТИРОВАНИЯ ТЕКСТОВ НА ОСНОВЕ МЕТОДОВ МАШИННОГО ОБУЧЕНИЯ

L.А. Gladkov; N. V. Gladkova; V.М. Kureichik

Abstract

This paper considers the problem of automatic text annotation. The formulation of the problem is considered. The relevance and importance of developing effective methods and software systems for solving the problem of automatic text summarization in modern information systems is substantiated. Definitions of the concepts “data” and knowledge are given.” A list of tasks related to the Data Mining direction is described. The Text Mining problem and existing methods for solving it are described in detail. The problem of summarizing texts is considered. The main stages of solving the summation problem are highlighted. The main methods of automatic text processing are described, their advantages and disadvantages are highlighted. Abstractive summarization and extractive summarization methods are discussed in detail. A comparative analysis of the effectiveness of various abstracting and quasi-abstracting methods has been carried out, their key advantages and disadvantages have been highlighted. A brief description of the encoder-decoder architecture is given from the point of view of using this architecture in the developed algorithm for automatic text summarization. A description of the model of recurrent neural networks is given, the advantages and disadvantages of such models are noted. The architecture of a recurrent neural network is considered in relation to solving the problem of automatic text summarization. A description of the modified model of a recurrent neural network – a neural network with long short-term memory – is given. A description of the proposed automatic abstracting algorithm and the settings of its main parameters are given. A description of the developed automatic abstracting software subsystem is given. Computer modeling is performed and the results obtained during computational experiments are presented. The quality of the solutions obtained was assessed. The optimal parameters of the developed software system are determined. Directions for continuing research are formulated.

Authors

L.А. Gladkov Southern Federal University
N. V. Gladkova Southern Federal University
V.М. Kureichik Southern Federal University

References

1. Mordvinov A.V. Razrabotka i issledovanie modeli teksta dlya ego kategorizatsii: avtoref. dis.
… kand. tekhn. nauk [Development and research of a text model for its categorization: abstract
of cand. of eng. sc. diss.]: 05.13.01. Nizhniy Novgorod, 2010, 25 p.
2. Trevgoda S.A. Metody i algoritmy avtomaticheskogo referirovaniya teksta na osnove analiza
funktsional'nykh otnosheniy: avtoref. dis. … kand. tekhn. nauk [Methods and algorithms for
automatic text summarization based on the analysis of functional relationships: abstract of
cand. of eng. sc. diss.]: 05.13.01. St. Petersburg, 2009, 19 p.
3. Lukashevich N.V. Modeli i metody avtomaticheskoy obrabotki nestrukturirovannoy informatsii
na osnove bazy znaniy ontologicheskogo tipa: avtoref. diss. … kand. tekhn. nauk [Models and
methods for automatic processing of unstructured information based on an ontological
knowledge base: abstract of cand. of eng. sc. diss.]: 05.25.05. Moscow, 2014, 32 p.
4. Van Lierde H., Chow T.W.S. Query-oriented text summarization based on hypergraph transversals,
Information Processing and Management, 2019, Vol. 56, No. 4, pp. 1317-1338.
5. Greengrass E. Information Retrieval: A Survey: University of Maryland. 2000, 225 p.
6. Manning D., Raghavan C., Schütze H. Introduction to Information Retrieval: Cambridge. England.
2008.
7. Alguliev R.M., Isazade N.R., Abdi A., Idris N. COSUM: Text summarization based on clustering
and optimization, Expert Systems, 2019, Vol. 36, No. 1.
8. Kharlamov A. Tekhnologiya avtomaticheskogo smyslovogo analiza tekstov TextAnalyst
[Technology for automatic semantic analysis of texts TextAnalyst], Vestnik Moskovskogo
gosudarstvennogo lingvisticheskogo universiteta [Bulletin of the Moscow State Linguistic
University], 2014, pp. 234-244.
9. Khoay L., Tuzovskiy A.F. Semanticheskoe annotirovanie dokumentov v elektronnykh
bibliotekakh [Semantic annotation of documents in electronic libraries], Izvestiya Tomskogo
politekhnicheskogo universiteta [News of Tomsk Polytechnic University], 2013, pp. 157-164.
10. Kharlamov A. Kognitivnyy podkhod k smyslovomu analizu tekstov [Cognitive approach to
semantic analysis of texts], Vestnik Moskovskogo gosudarstvennogo lingvisticheskogo
universiteta [Bulletin of the Moscow State Linguistic University], 2013, Vol. 13, No. 673,
pp. 196-205.
11. Gupta V.. Bansal N., Sharma A. Text summarization for big data: A comprehensive survey,
Lecture Notes in Networks and Systems. Delhi, 2019, Vol. 56, pp. 503-516.
12. Anam S.A., Muntasir Rahman A.M., Sleheen N.N., Arif H. Automatic text summarization using
fuzzy C-Means clustering, 2018 Joint 7th International Conference on Informatics, Electronics
and Vision and 2nd International Conference on Imaging, Vision and Pattern Recognition.
Kitakyushu, 2018, pp. 180-184.
13. Chua S., Kulathuramaiyer N., Ranaivo-Malancon B., Iboi H. A comparative Study of Sentiment-
Based Graphs of Text Summaries, 2018 IEEE 5th International Conference on Engineering
Technologies and Applied Sciences. Sarawak, 2018.
14. Siddiqui T. Generating abstractive summaries using sequence to sequence attention model,
2018 International Conference on Frontiers of Information Technology. Proceedings. Karachi,
2018, pp. 212-217.
15. Sonawane S., Ghotkar A., Hinge S. Context-based multi-document summarization, Advances
in Intelligent Systems and Computing, 2018, Vol. 812, pp. 153-165.
16. Alwis V. Intelligent E-news summarization, 18th International Conference on Advances in ICT
for Emerging Regions. Colombo, 2018, pp. 189-195.
17. Joshi A., Mehta K., Gupta N., Valloli V.K. Data generation using sequence-to-sequence, 2018
IEEE Recent Advances in Intelligent Computational Systems. Pune, 2018, pp. 108-112.
18. Gigioli P., Sagar N., Rao A., Voyles J. Domain-Aware Abstractive Text Summarization for
Medical Documents, Proceedings 2018 IEEE International Conference on Bioinformatics and
Biomedicine. New York. 2018, pp. 2338-2343.
19. Mahajani A., Pandya V., Maria I., Sharma D. Ranking-Based Sentence Retrieval for Text
Summarization, 2018 2nd International Conference on Smart Innovations in Communications
and Computational Sciences. Mumbai, 2018, pp. 465-474.
20. Kirmani M., Manzoor Hakak N., Mohd M., Mohd M. Hybrid text summarization, 2nd International
conference of the series Soft Computing: Theories and Applications, 2017. Kuruhshetra,
2017, pp. 63-73.
21. Hochreiter S.; Schmidhuber J. Long short-term memory, Neural Computation: journal, 1997,
Vol. 9, No. 8, pp. 1735-1780. DOI: 10.1162/neco.1997.9.8.1735. PMID 9377276.
22. Gladkov L.A., Gladkova N.V., Bova V.V. Metod avtomaticheskogo annotirovaniya tekstov na
osnove gibridnykh intellektual'nykh tekhnologiy [Method for automatic annotation of texts
based on hybrid intelligent technologies], Informatizatsiya i svyaz' [Informatization and communication],
2022, No. 2, pp. 54-60.

SUBSYSTEM FOR AUTOMATIC TEXT ANNOTATION BASED ON MACHINE LEARNING METHODS

Abstract

Authors

References

Скачивания

Published:

Issue:

Section:

Keywords:

links

journal

index