SCENE ANALYSIS IN MOBILE INFORMATION SYSTEMS ROBOTIC COMPLEXES

  • S.М. Sokolov Keldysh Institute of Applied Mathematics RAS
Keywords: Degree of autonomy, vision system, scene analysis, multilevel cognitive maps, configuration space, real time vision system framework, robotics ontologies, signature

Abstract

Modern robots are capable of performing increasingly complex tasks that usually require a
high degree of interaction with the environment in which they have to work. As a result, robotic
systems must have deep and specific knowledge about their workspaces, which go far beyond the
simple representation of indicators that a robotic system can create using visual data processing
techniques, for example, in the task of simultaneous localization and mapping (SLAM). Scene
analysis is the link between object recognition and knowledge about the world around us and is
present in one form or another in the process of extracting information from visual data necessary
to solve a specific task. The article presents a systematic approach to providing on-board STZ
analysis of the scene. The technologies of scene analysis are considered as an integral part of
increasing the degree of autonomy of mobile RTCs. A number of technologies have yet to be mastered
and implemented, but the overall structure allows you to gradually deepen the analysis of the
scene on board the RTK, thereby increasing the degree of autonomy without radically redesigning
the on-board information management system and STZ, as a key part of information support. The
information extracted from the visual data is integrated into a multi-layered map, providing a
high-level representation of the environment, which embodies the knowledge necessary for a robotic
complex to actually perform complex tasks. A multi-layered map is a form of storing
knowledge about the environment and the objects in it. This map combines a spatial hierarchy of
objects and places with a semantic hierarchy of concepts and relationships. The structures for
representing data in various layers of this map and the mechanisms for their use are described. In
particular, to describe the routes of the RTK, the principles of interpretive navigation are used to
provide information about the operating conditions and objects of interest of the signature structure.
The software implementation of the proposed mechanisms is based on a unified approach
based on the real-time STZ software framework. Examples of the use of the described technologies
in solving the problems of information support for targeted movements of ground RTCs are given

References

1. Minsky M., Papert S. Perseptrons. Cambridge, Mass.: MIT Press, 1969.
2. Duda R., Khart P. Raspoznavanie obrazov i analiz stsen [Pattern recognition and scene analysis].
Moscow: Mir, 1976.
3. Liang-Chieh Chen, Yukun Zhu, Papandreou G., Schroff Florian, Hartwig Adam. Encoder-
Decoder with Atrous Separable Convolution for Semantic Image Segmentation, ECCV, 2018,
Corpus ID: 3638670.
4. Salas-Moreno R., Newcombe R., Strasdat H., Kelly P., and Davison A. SLAM++: Simultaneous
Localisation and Mapping at the Level of Objects, In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR). IEEE, 2013, pp. 1352-1359.
5. Newcombe R.A., Lovegrove S.J., and Davison A.J. DTAM: Dense Tracking and Mapping in
Real-Time, In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
IEEE, 2011, pp. 2320-2327.
6. Bao S.Y., Bagra M., Chao Y.W., and Savarese S. Semantic structure from motion with points,
regions, and objects, In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). IEEE, 2012, pp. 2703-2710.
7. Sokolov S.M. Sravnitel'nyy analiz stepeni avtonomnosti robototekhnicheskikh kompleksov
[Comparative analysis of the degree of autonomy of robotic complexes], Izvestiya YuFU.
Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2023, No. 1 (231), pp. 65-76.
Available at: http://izv-tn.tti.sfedu.ru.
8. Nüchter A. and Hertzberg J. Towards semantic maps for mobile robots, Robotics and Autonomous
Systems, 2008, Vol. 56, No. 11, pp. 915-926.
9. Galindo C., Gonzalez J., Fernandez-Madrigal J.A. Interactive In-Vehicle Guidance through a
Multihierarchical Representation of Urban Maps, International journal of intelligent systems,
Vol. 25, pp. 597-620.
10. Ruijiao Li, Lai Wei, Dongbing Gu, Huosheng Hu, Klaus D. McDonald-Maier Multi-layered
Map based Navigation and Interaction for an Intelligent Wheelchair, Proceeding of the IEEE
International Conference on Robotics and Biomimetics (ROBIO) Shenzhen, China, December
2013, pp. 115-120.
11. Galindo C., Saffiotti A., Coradeschi S., Buschka P., Fernandez-Madrigal J., and Gonzalez J.
Multi-hierarchical semantic maps for mobile robotics, in Proceedings of the IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), Edmonton, CA, 2005,
pp. 3492-3497. Online at http://www.aass.oru.se/'asaffio/.
12. Martinez Mozos O. and Burgard W. Supervised learning of topolog¬ical maps using semantic
information extracted from range data, In Proceedings of the IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS), Beijing, China, 2006, pp. 2772-2777.
13. Goerke N. and Braun S. Building semantic annotated maps by mobile robots, in Proceedings
of the Conference Towards Autonomous Robotic Systems, Londonderry, UK, 2009.
14. Brunskill E., Kollar T., and Roy N. Topological mapping using spectral clustering and classification,
in Proc. of IEEE/RSJ Conference on Robots and Systems (IROS), 2007.
15. Friedman S., Pasula H., and Fox D. Voronoi random fields: Extracting the topological structure
of indoor environments via place labeling, in Proc. of 19th International Joint Conference
on Artificial Intelligence (IJCAI), 2007.
16. Mozos O.M., Mizutani H., Kurazume R., and Hasegawa T. Categorization of indoor places
using the kinect sensor, Sensors, 2012, Vol. 12, No. 5, pp. 6695-6711.
17. Diosi A., Taylor G., and Kleeman L. Interactive slam using laser and advanced sonar, in Proceedings
of the IEEE International Conference on Robotics and Automation, Barcelona,
Spain, 2005, pp. 1103-1108. Gard, Conceptual spatial representations for indoor mobile robots
// Robotics and Autonomous Systems, 2008, Vol. 56, No. 6, pp. 493-502.
18. Zender H., Martinez Mozos O., Jensfelt P., Kruijff G., and Bur-Annual W. Conference on Human-
Robot Interaction (HRI’06), Salt Lake.
19. Pronobi A. s and Jensfelt P. Multi-modal semantic mapping, in The RSS’11 Workshop on
Grounding Human-Robot Dialog for Spatial Tasks, Los Angeles, CA, USA, July 2011.
[Online]. Available: http://www.pronobis.pro/publications/pronobis2011rss-ghrdst.
20. Nieto-Granda C., III J.G.R., Trevor A.J.B., and Christensen H.I. Semantic map partitioning in
indoor environments using regional analysis, in 2010 IEEE/RSJ International Conference on Intelligent
Robots and Systems, October 18-22, 2010, Taipei, Taiwan. IEEE, 2010, pp. 1451-1456.
21. Sabrina Wilske & Geert-Jan Kruijff Service Robots dealing with indirect speech acts. Language
Technology Lab German Research Center for Artificial Intelligence (DFKI) Saarbrucken,
Germany.
22. Randell D., Cui Z., Cohn A. A spatial logic based on regions and connection, In: Proceedings
of the 3rd. International Conference on Knowledge Representation and Reasoning, San Mateo,
Morgan Kaufmann, 1992, pp. 165-176;
23. Cadena C., Carlone L., Carrillo H., Latif Y., Scaramuzza D., Neira J. and Reid I., Leonard J.J.
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-
Perception Age, in IEEE Transactions on Robotics, 32 (6), pp. 1309-1332.
24. Everingham M., Van-Gool L., Williams C K.I., Winn J., and Zisserman A. The PASCAL Visual
Object Classes (VOC) Challenge, International Journal of Computer Vision, 2010, 88 (2),
pp. 303-338.
25. Newcombe R.A., Lovegrove S.J., and Davison A.J. DTAM: Dense Tracking and Mapping in
Real-Time, In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
IEEE, 2011, pp. 2320-2327.
26. Flint A., Murray D., and Reid I.D. Manhattan Scene Understanding Using Monocular, Stereo,
and 3D Features, In Proceedings of the IEEE International Conference on Computer Vision
(ICCV). IEEE, 2011, pp. 2228-2235.
27. Bao S.Y., Bagra M., Chao Y.W., and Savarese S. Semantic structure from motion with points,
regions, and objects, In Proceedings of the IEEE Conference on Computer Vis.
28. Krishnan R.G., Shalit U., and Sontag D. Deep Kalman Filters, In NIPS 2016 Workshop: Advances
in Approximate Bayesian Inference. NIPS, 2016, pp. 1-7.
29. Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen. Fine-Grained Action Retrieval
Through Multiple Parts-of-Speech Embeddings, ICCV 2019 paper is the Open Access version, provided
bу the Computer Vision Foundation, is available оп IEEE Xplore. Available at:
ttps://openaccess.thecvf.com/content_ICCV_2019/papers/Wray_Fine-Grained_Action_Retrieval_
Through_Multiple_Parts-of-Speech_Embeddings_ICCV_2019_paper.pdf.
30. Boguslavskiy A.A., Borovin G.K., Kartashev V.A., Pavlovskiy V.E., Sokolov S.M. Modeli i
algoritmy dlya intellektual'nykh sistem upravleniya [Models and algorithms for intelligent control
systems]. Moscow: IPM im. M.V. Keldysha, 2019, 228 p.
31. Sokolov S.M., Boguslavskiy A.A., Beklemishev N.D. Programmnoe obespechenie sistem
tekhnicheskogo zreniya real'nogo vremeni dlya sistem upravleniya robototekhnicheskimi
kompleksami [Software for real-time vision systems for control systems of robotic complexes],
Mater. XIII Mezhdunarodnoy nauchno-tekhnicheskoy konferentsii «Zavalishinskie
chteniya – 2018» [Materials of the XIII International Scientific and Technical Conference
"Zavalishinsky Readings – 2018”]. St. Petersburg: GUAP, 16-20 aprelya 2018 g., pp. 205-211.
32. Sokolov S.M., Boguslavskiy A.A., Beklemishev N.D. Realizatsiya interpretiruyushchey
navigatsii s pomoshch'yu moduley STZ [Implementation of interpretive navigation using STZ
modules], Mater. 30-y Mezhdunarodnoy nauchno-tekhnicheskoy konferentsii «Ekstremal'naya
robototekhnika», 13-15 iyunya 2019, Sankt-Peterburg: TSNII RTK [Proceedings of the 30th
International Scientific and Technical Conference "Extreme Robotics", June 13-15, 2019,
St. Petersburg: Central Research Institute of RTK], pp. 264-267.
33. Thrun S. and Bucken A. Integrating grid-based and topological maps for mobile robot navigation,
In Proceedings of the National Conference on Artificial Intelligence, 1996, No. 8.
34. Mozos O. and Burgard W. Supervised Learning of Topological Maps using Semantic Information
Extracted from Range Data, 2006 IEEE/RSJ International Conference on Intelligent
Robots and Systems, Oct. 2006, pp. 2772-2777.
35. Pronobis A. and Jensfelt P. Understanding the real world: Combining objects, appearance,
geometry and topology for semantic mapping. Royal Institute of Technology (KTH) SE-100,
Tech. Rep., 2011.
36. Dansereau D.G., Williams S.B., and Corke P. Simple Change Detection from Mobile Light
Field Cameras. Computer Vision and Image Understanding, 2016.
37. Davison A., I. Reid, N. Molton, and O. Stasse. MonoSLAM: RealTime Single Camera SLAM,
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2007, 29 (6),
pp. 1052-1067.
38. Sokolov S.M. Ontologicheskiy podkhod v sozdanii robototekhnicheskikh kompleksov s
povyshennoy stepen'yu avtonomnosti [An ontological approach to the creation of robotic complexes
with an increased degree of autonomy], Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya
SFedU. Engineering Sciences], 2022, No. 1, pp. 42-59.
39. Vasil'ev S.N., Zherlov A.K., Fedosov E.A., Fedunov E.E. Ingellektnoe upravlenie
dinamicheskimi sistemami [Intelligent control of dynamic systems]. Moscow: Fizikomatematicheskaya
literatura, 2000, 352 p.
Published
2024-04-16
Section
SECTION IV. TECHNICAL VISION