ACCELERATION OF THE DIRECT PASSAGE IN THE IMPLEMENTATION OF CNN ON A LIMITED COMPUTING RESOURCE
Keywords:
Optimization of the execution of the direct pass of the CNN, trackingAbstract
The work is devoted to the optimization of the neural network architecture for its launch on
a limited computing resource. Several optimization approaches are considered, estimates of the
complexity and execution time of the forward pass of the neural network are given. Comparative
estimates of the complexity of the network using different optimization approaches are given.
The paper presents an analysis of the selected network architecture, and estimates of the computational
complexity of individual components (modules) of the architecture are obtained. An analysis
of possible optimization methods for each module was made. The parameters of the considered
modules, the sizes of the input and output tensors are described. Several architectures were tested
to optimize the feature extraction module, ResNet 50, ResNet 18, MobileNet v3 small, MobileNet
v3 large. A comparative analysis of the computational complexity and execution time of the forward
pass for each architecture is presented. Forward pass times were measured on Nvidia's
Jetson AGX Xaver embedded computing device. Estimates of the execution time of the direct pass
for each module of the considered neural networks are presented. The paper presents the results of
comparing neural network accuracy estimates before and after architecture optimization. The test
data set consists of 100 video recordings. 5 different typical objects are involved in test videos,
10 different scenarios are recorded for each object class. For each of the developed architectures,
accuracy estimates were obtained, and a comparative analysis was made.








