ESTIMATION OF THE SPATIAL POSITION OF AN ON-BOARD CAMERA BY COMPARING AERIAL IMAGES AND SATELLITE IMAGE DATA
DOI:
https://doi.org/10.18522/2311-3103-2026-1-%25pKeywords:
Camera pose estimation, robot localization, machine vision systems, visual navigation, artificial neural network, keypoints, Perspective-n-PointAbstract
The article describes a method for estimating the spatial position of an onboard camera of an aircraft. This method involves comparing aerial photographs and georeferenced remote sensing (RSS) data by using a neural network detector to detect stable spatiotemporal reference points in both datasets. This method then solves the well-known Perspective-n-Point (PnP) problem for estimating rotation and translation matrices that minimize the reprojection error based on the correspondences between 3D world points and 2D points of their projections onto the onboard camera matrix. This approach can be used to solve the pressing problem of aircraft localization in the absence of global navigation satellite system signals. Road intersections are selected as stable spatiotemporal reference points that are clearly visible in RSS data and aerial photographs. Other local semantic image patterns characteristic of a particular area may serve as an alternative. Since direct comparison of remote sensing and airborne images is difficult due to significant differences in shooting conditions, the use of robust landmark detectors based on artificial neural network (ANN) algorithms is proposed. To train the robust detector, a mixed dataset was created using satellite and airborne imagery. The mixed dataset was labeled using a 3D Gaussian function normalized to unity with a apex at the intersection center, the graph of which is projected onto a 2D mask of the training set. The parameters of the Gaussian function are calculated based on the radius of the circle enclosing the intersection. Using a normalized 3D Gaussian function with a apex at the geometric center of the intersection projection allows the network to predict the probability of each image pixel belonging to the intersection, with a maximum at the intersection center, which increases positioning accuracy due to more precise georeferencing of the landmark point in the global 3D dataset. A U-Net-type artificial neural network was trained as an intersection detector. A differentiable analog of the Dice metric was used as a training quality metric. AdamW, coupled with a CosineAnnealingLR cosine learning rate planner, was used as an optimizer. The final section of the paper presents the results of comparing satellite data and airborne imagery using the proposed method.
References
1. Stepanov O.A., Nosov A.S. Algoritm korrektsii navigatsionnoy sistemy po dannym karty i izmeritelya, ne trebuyushchiy predvaritel'nogo otsenivaniya znacheniy polya vdol' proydennoy traektorii [Algorithm for correcting a navigation system based on map and meter data that does not require preliminary estimation of field values along the trajectory], Giroskopiya i navigatsiya [Gyroscopy and Navigation], 2020, Vol. 28, No. 2 (109), pp. 70-90.
2. Scaramuzza D., Fraundorfer F. Visual Odometry [A Survey], IEEE Robotics and Automation Maga-zine, 2011.
3. Reichenbach M., Damker H., Federrath H., and Rannenberg K. Individual management of personal reachability in mobile communication, Proc. of the 13th International Information Security Conference, Copenhagen, Denmark, May 1997, pp. 164-174.
4. Lowe D.G. Distinctive image features from scale invariant keypoints, International journal of computer vision, 2004, 60 (2), pp. 91-110.
5. Bay H., Ess A., Tuytelaars T., and Van Gool L. Speeded-up robust features (SURF), Computer vision and image under-standing, 2008, 110 (3), pp. 346-359.
6. Cadena C., Carlone L., Carrillo H., Latif Y., Scaramuzza D., Neira J., Reid I.D., Leonard J.J. Simulta-neous Localization and Mapping: Present, Future, and the Robust-Perception Age, IEEE Transactions on Robotics (cond. Accepted), 2016.
7. Ghouaiel N. and Lefevre S. Coupling ground-level panoramas and aerial imagery for change detection, Geo-spatial Information Science, 2016, 19 (3), pp. 222-232.
8. Shukla P., Goel S., Singh P., and Lohani B. Automatic geolocation of targets tracked by aerial imaging platforms using satellite imagery, The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2014, 40 (1):381.
9. Dunegan E.D., Luecke T.E. Strapdown Inertial Navigation System Developments, Proceedings of the AIAA Guidance and Control Conference, 1965.
10. LeCun Y., Bottou L., Bengio Y., и Haffner P. Gradient-based learning applied to document recognition, Proceedings of the IEEE, 1998, Vol. 86, No. 11, pp. 2278-2324.
11. Krizhevsky A., Sutskever I., and Hinton G.E. ImageNet Classification with Deep Convolutional Neural Networks, Advances in Neural Information Processing Systems, 2012, Vol. 25.
12. Simonyan K. и Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv preprint arXiv:1409.1556, 2015.
13. He K., Zhang X., Ren S., и Sun J. Deep Residual Learning for Image Recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778.
14. Kendall A., Grimes M., Cipolla R. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.
15. Dosovitskiy A., Beyer L., Kolesnikov A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, arXiv preprint, 2020.
16. Arandjelović R., Gronat P., Torii A., Pajdla T., & Sivic J. NetVLAD: CNN architecture for weakly su-pervised place recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5297-5307.
17. Ronneberger O., Fischer P., Brox T. U-Net: Convolutional Networks for Biomedical Image Segmenta-tion / In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds), Medical Image Computing and Com-puter-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Springer, Cham, 2015. Available at: https://doi.org/10.1007/978-3-319-24574-4_28.
18. Ulmas Priit & Liiv Innar. (2020). Segmentation of Satellite Imagery using U-Net Models for Land Cov-er Classification, 10.48550/arXiv.2003.02899.
19. Yadavendra S., Chand S. Semantic segmentation and detection of satellite objects using U-Net model of deep learning, Multimed Tools Appl., 2022, 81, pp. 44291-44310. Available at: https://doi.org/10.1007/s11042-022-12892-2.
20. Eric Marchand, Hideaki Uchiyama, and Fabien Spindler. Pose Estimation for Augmented Reality:
A Hands-On Survey, IEEE Transactions on Visualization and Computer Graphics, December 2016, 22 (12), pp. 2633-2651.
21. Martin A. Fischler and Robert C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography, Comm. of the ACM: journal, 1981, June (Vol. 24), pp. 381-395. doi: 10.1145/358669.358692.








