Geometry-Guided Domain Generalization for Monocular 3D Object Detection

Tsinghua University, BNRist, Hangzhou Zhuoxi Institute of Brain and Intelligence, HoloMatic Technology
AAAI2024

*Corresponding Authors.

Abstract

Monocular 3D object detection (M3OD) is important for autonomous driving. However, existing deep learning-based methods easily suffer from performance degradation in real-world scenarios due to the substantial domain gap between training and testing. M3OD's domain gaps are complex, including camera intrinsic parameters, extrinsic parameters, image appearance, etc. Existing works primarily focus on the domain gaps of camera intrinsic parameters, ignoring other key factors. Moreover, at the feature level, conventional domain invariant learning methods generally cause the negative transfer issue, due to the ignorance of dependency between geometry tasks and domains. To tackle these issues, in this paper, we propose MonoGDG, a Geometry-Guided Domain Generalization framework for M3OD, which effectively addresses the domain gap at both camera and feature levels. Specifically, MonoGDG consists of two major components. One is geometry-based image reprojection, which mitigates the impact of camera discrepancy by unifying intrinsic parameters, randomizing camera orientations, and unifying the field of view range. The other is geometry-dependent feature disentanglement, which overcomes the negative transfer problems by incorporating domain-shared and domain-specific features. Additionally, we leverage a depth-disentangled domain discriminator and a domain-aware geometry regression attention mechanism to account for the geometry-domain dependency. Extensive experiments on multiple autonomous driving benchmarks demonstrate that our method achieves state-of-the-art performance in domain generalization for M3OD.

MY ALT TEXT

The domain gaps in monocular 3D object detection are very complex, including focal length gap, camera orientation gap, image appearance gap, etc. (a) Two vehicles of the same 2D and 3D size are taken at different focal lengths, and their depths vary dramatically. (b) A higher pitch angle of the camera causes objects to appear lower in the image, leading to the trained model predicting closer depths for the objects. (c) Variations in image appearance, such as adverse weather and simulation data can considerably affect the perceived contextual visual information for the M3OD model.

MY ALT TEXT

(a) Conventional domain invariant learning techniques often lead to the negative transfer issue in M3OD. The chart shows the DG performance of models trained on nuScenes and Lyft, tested on KITTI. As the extent of domain invariance increases, the accuracy of M3OD will significantly decrease. (b-e) M3OD demonstrates significant geometry-domain dependency, with notable disparity in the geometry distribution of various domains, such as objects’ depth, dimension, and rotation.

MY ALT TEXT

Overview of the proposed MonoGDG. At the camera level, the Geometry-Based Image Reprojection process is applied to images to address the domain gap of the camera, including Intrinsic Parameter Unification (IPU), Spherical Reprojection (SR), Camera Orientation Randomization (COR), and FOV Range Unification (FOVRU). The extracted features from images then undergo Geometry-Dependent Feature Disentanglement, which disentangles the feature into domain-shared and domain-specific branches. Depth-Disentangled Domain Discriminator disentangles the depth from domain alignment, and Domain-Aware Geometry Regression Attention is employed to integrate the domain and geometry features. GRL denotes the gradient reversal layer.

MY ALT TEXT

The 3D BBox and BEV prediction from DGMono3D (in red), MonoGDG (in blue), and ground truth (green in BEV) in PreSIL+nuScenes→KITTI setting. Zoom in for a clear comparison.

BibTeX


        @inproceedings{yang2024geometry,
          title={Geometry-Guided Domain Generalization for Monocular 3D Object Detection},
          author={Yang, Fan and Chen, Hui and He, Yuwei and Zhao, Sicheng and Zhang, Chenghao and Ni, Kai and Ding, Guiguang},
          booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
          volume={38},
          number={6},
          pages={6467--6476},
          year={2024}
        }