УСКОРЕНИЕ ПРЯМОГО ПРОХОДА ПРИ РЕАЛИЗАЦИИ СНС НА ОГРАНИЧЕННОМ ВЫЧИСЛИТЕЛЬНОМ РЕСУРСЕ

А.Е. Щелкунов; В.В. Ковалев; И. В. Сидько; Н. Е. Сергеев

Authors

А.Е. Shchelkunov Joint Stock Company «Scientific Design Bureau of Computing Systems»
V.V. Kovalev Joint Stock Company «Scientific Design Bureau of Computing Systems»
I. V. Sidko Joint Stock Company «Scientific Design Bureau of Computing Systems»
N. Е. Sergeev Joint Stock Company «Scientific Design Bureau of Computing Systems»

Keywords:

Optimization of the execution of the direct pass of the CNN, tracking

Abstract

The work is devoted to the optimization of the neural network architecture for its launch on
a limited computing resource. Several optimization approaches are considered, estimates of the
complexity and execution time of the forward pass of the neural network are given. Comparative
estimates of the complexity of the network using different optimization approaches are given.
The paper presents an analysis of the selected network architecture, and estimates of the computational
complexity of individual components (modules) of the architecture are obtained. An analysis
of possible optimization methods for each module was made. The parameters of the considered
modules, the sizes of the input and output tensors are described. Several architectures were tested
to optimize the feature extraction module, ResNet 50, ResNet 18, MobileNet v3 small, MobileNet
v3 large. A comparative analysis of the computational complexity and execution time of the forward
pass for each architecture is presented. Forward pass times were measured on Nvidia's
Jetson AGX Xaver embedded computing device. Estimates of the execution time of the direct pass
for each module of the considered neural networks are presented. The paper presents the results of
comparing neural network accuracy estimates before and after architecture optimization. The test
data set consists of 100 video recordings. 5 different typical objects are involved in test videos,
10 different scenarios are recorded for each object class. For each of the developed architectures,
accuracy estimates were obtained, and a comparative analysis was made.

ACCELERATION OF THE DIRECT PASSAGE IN THE IMPLEMENTATION OF CNN ON A LIMITED COMPUTING RESOURCE

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

links

Language

journal

index

Information