IDENTIFICATION OF KEY TECHNOLOGIES BASED ON COLLECTION AND ANALYSIS OF DATA FROM OPEN RUSSIAN-LANGUAGE SOURCES
Abstract
This article is devoted to the development and approbation of a new approach to the collection, processing and analysis of open data in the Russian language for identification of key technological trends. To solve the problem of formation and subsequent analysis of structured datasets methods of web scraping, natural language processing and analysis of time-series have been developed and implemented via programming. The approach described in the article has been applied for the first time in order to extract and structure information from scientific articles, news resources and patent documentation in the Russian language for the first time. As a result of analyzing the obtained dataset of scientific publications, 30 most frequently mentioned bigrams and the same number of trigrams of technological terms have been identified. Based on the frequency analysis of bigrams and trigrams, key technological terms were identified which then were used for complex filtration on key technologies. Complex filtration enabled to fulfill the search of patents in Russian and their collection for further analysis. As a result of preprocessing of the obtained patent data time series of patent activity have been formed. The programme system of key technological identification has been implemented in JavaScript and Python using Selenium and BeautifulSoup libraries for web scraping, NLTK and Scikit-learn for text data processing and analysis. The study focused on the dynamics of the development of key technologies over time has allowed to identify periods of intensive patent activity and declining interest in this or that kind of technology. The results presented in the article provide a basis for further development of machine learning methods for the purpose of predicting technological development and identifying promising areas of applied research.
References
1. Bezrukov A.O., Baydarov D.Yu., Faykov D.Yu. Tekhnologicheskoye liderstvo gosudarstva: kontseptu-al'noye ponimaniye i mekhanizmy formirovaniya [Technological leadership of the state: conceptual un-derstanding and mechanisms of formation], Ekonomicheskoye vozrozhdenie Rossii [Economic Revival of Russia], 2024, No. 1 (79), pp. 75-89. DOI: 10.37930/1990-9780-2024-1-79-75-89.
2. Eliseev V.A. Dominanty prognozirovaniya nauchno-tekhnologicheskogo razvitiya [Dominants of fore-casting scientific and technological development], Avtomatizatsiya. Sovremennye tekhnologii [Automa-tion. Modern Technologies], 2019, Vol. 73, No. 10, pp. 461-466.
3. Bondarenko A.G., Kravets A.G. Instrumenty prognozirovaniya tekhnologicheskogo razvitiya na osnove dannykh iz otkrytykh istochnikov: sistematicheskoye issledovaniye russkoyazychnykh dokumentov [Tools for forecasting technological development based on data from open sources: a systematic study of Russian-language documents], Prikaspiyskiy zhurnal: upravlenie i vysokie tekhnologii [Caspian Journal: Management and High Technologies], 2024, No. 3 (67), pp. 49-62.
4. Porter A.L. et al. Emergence scoring to identify frontier R&D topics and key players, Technol. Forecast. Soc. Change, 2019, Vol. 146, pp. 628-643. DOI: 10.1016/j.techfore.2018.04.016.
5. Kravets A.G., Nguyen T.V. Prognozirovaniye tekhnologicheskikh tendentsiy na osnove analiza razno-rodnykh dannykh [Forecasting technological trends based on the analysis of heterogeneous data], Pro-grammnye produkty i sistemy [Software & Systems], 2022, No. 3, pp. 396-412. DOI: 10.15827/0236-235X.139.396-412.
6. Nivash J.P., Babu L.D.D. Analyzing the impact of news trends on research publications and scientific collaboration networks, Concurrency and Computation-Practice & Experience, 2019,
Vol. 31, No. 14, pp. 10.
7. Injadat M.N., Salo F., Nassif A.B. Data mining techniques in social media: A survey, Neurocomputing, 2016, Vol. 214, pp. 654-670. DOI: 10.1016/j.neucom.2016.06.045.
8. Antons D. et al. The application of text mining methods in innovation research: current state, evolution patterns, and development priorities, R & D Management, 2020, pp. 329-351. DOI: 10.1111/radm.12408.
9. Zhou Y. et al. Forecasting emerging technologies using data augmentation and deep learning, Scientomet-rics, 2020. DOI: 10.1007/s11192-020-03351-6.
10. Kalenov N.E., Vlasova S.A. O realizatsii mnogofunktsional'noy web-sistemy registratsii i ucheta rezul'ta-tov intellektual'noy deyatel'nosti uchenykh [On the implementation of a multifunctional web-system for registration and accounting of the results of intellectual activity of scientists], Programmnye produkty i sistemy [Software & Systems], 2021, No. 4, pp. 501-510. DOI: 10.15827/0236-235X.136.501-510.
11. Sotnikov A.N., Kalenov N.E., Vlasova S.A. Razvitiye sistemy «Ekspertiza» kak instrumenta dlya formiro-vaniya entsiklopediy i napolneniya Edinogo tsifrovogo prostranstva nauchnykh znaniy [Development of the "Expertise" system as a tool for the formation of encyclopedias and filling the Unified Digital Space of Scientific Knowledge], Programmnye produkty i sistemy [Software & Systems], 2022, No. 4, pp. 541-548. DOI: 10.15827/0236-235X.140.541-548.
12. Vasiliev S.S., Korobkin D.M., Kravets A.G. et al. Extraction of Cyber-Physical Systems Inventions' Structural Elements of Russian-Language Patents, Studies in Systems, Decision and Control, 2020, Vol. 259, pp. 55-68. DOI: 10.1007/978-3-030-32579-4_5.
13. Song K., Kim K., Lee S. Identifying promising technologies using patents: A retrospective feature analy-sis and a prospective needs analysis on outlier patents, Technol. Forecast. Soc. Change, 2018. DOI: 10.1016/j.techfore.2017.11.008.
14. Korobkin D.M., Rublev A.A., Fomenkov S.A. Prognozirovanie znachimosti zapatenovannykh tekhnologiy na osnove metrik innovatsionnogo potentsiala [Forecasting the significance of patented technologies based on metrics of innovative potential], Programmnaya Inzheneriya [Software Engineering], 2024, Vol. 15, No. 5, pp. 243-253. DOI: 10.17587/prin.15.243-253.
15. Lee C., Kwon O., Kim M., Kwon D. Early identification of emerging technologies: A machine learning approach using multiple patent indicators, Technol. Forecast. Soc. Change, 2018. doi:10.1016/j.techfore.2017.10.002.
16. Yu J. et al. Identification of vacant and emerging technologies in smart mobility through the GTM-based patent map development, Sustain, 2020. DOI: 10.3390/su12229310.
17. Jun S. et al. Identification of promising vacant technologies for the development of truck on freight train transportation systems, Appl. Sci., 2021. DOI: 10.3390/app11020499.
18. Yoon B., Park I., Yun D., Park, G. Exploring promising vacant technology areas in a technology-oriented company based on bibliometric analysis and visualisation, Technol. Anal. Strateg.Manag., 2019. DOI: 10.1080/09537325.2018.1516864.
19. Belevtsev A.A., Belevtsev A.M., Balyberdin V.A. Metodika prognozirovaniya razvitiya tekhnologicheskikh trendov i postroyeniya dorozhnykh kart na osnove konstruirovaniya budushchikh sobytiy [Methodology for forecasting the development of technological trends and building roadmaps based on the construction of future events], Izvestiya YuFU. Tekhnicheskiye Nauki [Izvestiya SFedU. Engineering Sciences], 2023, No. 3(233), pp. 56-64. DOI: 10.18522/2311-3103-2023-3-56-64.
20. Viet N.T., Kravets A.G. Algoritm raboty web-kraulera dlya resheniya zadachi sbora dannykh iz ot-krytykh internet istochnikov [The algorithm of the web crawler for solving the problem of collecting data from open Internet sources], Izvestiya Sankt-Peterburgskogo gosudarstvennogo tekhnologicheskogo in-stituta (tekhnicheskogo universiteta) [Izvestiya of Saint-Petersburg State Technological Institute (Tech-nical University)], 2019, No. 51 (77), pp. 115-119. DOI: 10.36807/1998-9849-2019-51-77-115-119.
21. Kozina S.A., Kulinchenko I.A., Korobkin D.M., Fomenkov S.A. Kontseptsiya i arkhitektura parsinga i khraneniya edinoy bazy patentov i nauchnykh zhurnal'nykh publikatsiy [The concept and architecture of parsing and storing a unified database of patents and scientific journal publications], Modelirovanie, op-timizatsiya i informatsionnye tekhnologii [Modeling, Optimization and Information Technology], 2024, Vol. 12, No. 4, 15 p. DOI: 10.26102/2310-6018/2024.47.4.024. Available at: https://moitvivt.ru/ru/journal/pdf?id=1740.








