РЕАЛИЗАЦИЯ СВЕРТОЧНЫХ НЕЙРОННЫХ СЕТЕЙ НА ВСТРАИВАЕМЫХ УСТРОЙСТВАХ С ОГРАНИЧЕННЫМ ВЫЧИСЛИТЕЛЬНЫМ РЕСУРСОМ

V.V. Kovalev; N.E. Sergeev

V.V. Kovalev Southern Federal University
N.E. Sergeev Southern Federal University

Keywords: Convolutional neural networks, computing optimization, embedded computing devices, optimization methods, object detection

Abstract

Large amounts of video data captured by sensor sensors in various spectral ranges, the significant
size of convolutional neural network architectures create problems with the implementation of
neural network algorithms on peripheral devices due to significant limitations of computing resources
on embedded computing devices. The article discusses the use of algorithms for automatic search and
pattern recognition based on machine learning methods, implemented on embedded devices with a
computing resource Graphics Processing Unit. Detection convolutional neural networks «You Only
Look Once V3» and «You Only Look Once V3-Tiny» are used as a search and pattern recognition algorithm,
which are implemented on embedded computing devices of the NVIDIA Jetson line, located in
different price ranges and with different computing resources ... Also, in the work, the estimates ofalgorithms on embedded devices are experimentally calculated for such indicators as power consumption,
forward passage time of a convolutional neural network, and detection accuracy.
On the basis of solutions implemented, both at the hardware level and in software, presented by
NVIDIA, it becomes possible to use deep neural network algorithms based on the convolution
operation in real time. Computational optimization methods offered by NVIDIA are considered.
Experimental studies of the influence of computations with reduced accuracy on the speed and
accuracy of object detection in images of the investigated architectures of convolutional neural
networks, which were previously trained on a sample of images consisting of the PASCAL VOC
2007 and PASCAL VOC 2012 datasets, have been carried out.

References

1. GOST R 59277—2020. Sistemy iskusstvennogo intellekta. Klassifikatsiya sistem
iskusstvennogo intellekta [GOST R 59277—2020. Artificial intelligence systems. Classification
of artificial intelligence systems].
2. Opisanie lineyki NVIDIA Jetson [Description of the NVIDIA Jetson line], Ofitsial'nyy sayt
NVIDIA [Official site of NVIDIA]. Available at: https://www.nvidia.com/ru-ru/autonomousmachines/
embedded-systems (accessed 14 November 2021).
3. Opisanie NVIDIA Jetson Nano [Description of NVIDIA Jetson Nano], Ofitsial'nyy sayt
NVIDIA [Official site of NVIDIA]. Available at: https://www.nvidia.com/ru-ru/autonomousmachines/
embedded-systems/jetson-nano (accessed 15 November 2021).
4. Opisanie NVIDIA Jetson TX2 [Description of NVIDIA Jetson TX2], Ofitsial'nyy sayt NVIDIA
[Official site of NVIDIA]. Available at: https://www.nvidia.com/ru-ru/autonomousmachines/
embedded-systems/jetson-tx2 (accessed 16 November 2021).
5. Opisanie NVIDIA Jetson Xavier NX [Description of NVIDIA Jetson Xavier NX], Ofitsial'nyy
sayt NVIDIA [Official site of NVIDIA]. Available at: URL: https://www.nvidia.com/ruru/
autonomous-machines/embedded-systems/jetson-xavier-nx (accessed 17 November 2021).
6. Opisanie NVIDIA Jetson AGX Xavier [Description of NVIDIA Jetson AGX Xavier]
Ofitsial'nyy sayt NVIDIA [Official site of NVIDIA]. Available at: https://www.nvidia.com/ruru/
autonomous-machines/embedded-systems/jetson-agx-xavier (accessed 15 November 2021).
7. Programmirovanie tenzornykh yader v CUDA [Programming tensor kernels in CUDA] Ofitsial'nyy
sayt NVIDIA [Official site of NVIDIA]. Available at: https://developer.nvidia.com/blog/programming
-tensor-cores-cuda-9 (accessed 16 November 2021).
8. Redmon J., Farhadi A. YOLOv3: An Incremental Improvement, arXiv, 2018. Available at:
https://arxiv.org/abs/1804.02767v1.
9. Redmon J., Divvala S., Girshick R., and Farhadi A. You only look once: Unified, real-time
object detection, IEEE conference on computer vision and pattern recognition, 2016.
10. Redmon J. and Farhadi A. Yolov3: An incremental improvement, arXiv preprint
arXiv:1804.02767, 2018.
11. Elias Stein, Siyu Liu, John Sun Real-Time Object Detection on an Edge Device (Final Report),
CS230: Deep Learning, 2019.
12. Sazli Murat H. A brief review of feed-forward neural networks, Ankara University, Faculty of
Engineering, Department of Electronics Engineering.
13. Van Etten A. Satellite imagery multiscale rapid detection with windowed networks, IEEE Winter
Conference on Applications of Computer Vision, 2019.
14. Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge
Belongie Feature Pyramid Networks for Object Detection, IEEE Conference on Computer Vision
and Pattern Recognition, 2017.
15. Girshick R., Donahue J., Darrell T., and Malik J. Rich feature hierarchies for accurate object
detection and semantic segmentation, IEEE conference on computer vision and pattern recognition,
2014.
16. He K., Zhang X., Ren S., and Sun J. Identity mappings in deep residual networks, European
Conference on Computer Vision, 2016.
17. Jan Hosang, Rodrigo Benenson, Bernt Schiele Learning Non-maximum Suppression, IEEE
Conference on Computer Vision and Pattern Recognition, 2017.
18. Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese,
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression,
arXiv, 2019. Available at: https://arxiv.org/abs/1902.09630.
19. Prakhar Ganesh, Yao Chen, Yin Yang, Deming Chen, Marianne Winslett YOLO-ReT: Towards
High Accuracy Real-time Object Detection on Edge GPUs, Computer Vision and Pattern
Recognition, 2021.
20. Huang J., Rathod V., Sun C., Zhu M., Korattikara A., Fathi A., Fischer I., Wojna Z., Song Y.,
Guadarrama S., et al. Speed/accuracy trade-offs for modern convolutional object detectors,
IEEE Conference on Computer Vision and Pattern Recognition, 2017.
21. Mark Everingham, Luc Van Gool, Christopher K.I. Williams, John Winn, and Andrew
Zisserman. The pascal visual object classes (voc) challenge, International Journal of Computer
Vision, 2010.

IMPLEMENTATION OF CONVENTIONAL NEURAL NETWORKS ON EMBEDDED DEVICES WITH A LIMITED COMPUTING RESOURCE

Abstract

References