UniVoxel: A Novel Framework for 3-D Object Detection in Autonomous Vehicles With Multimodal Voxel Representation

Kaiqi Liu, Yuanyuan Deng, Jiaxun Tong*, Wei Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Fusing camera and LiDAR information is one of the effective means for achieving robust 3-D object detection. However, current 3-D multimodal methods typically rely on independent branches to extract features from different sensors separately, leading to underutilization of complementary information. In this article, a multimodal detector named UniVoxel is proposed, which is built on a query-based detection paradigm. The UniVoxel integrates inputs from various modalities into the voxel representation for fusion. Specifically, a semantic-guided query generator (SQG) is proposed, in which the low-level voxel features are utilized to adaptively sample multiscale image features, producing unified multimodal voxel features. The multimodal voxel features contain both the geometric and semantic information of the voxels and can ensure that the model focuses on the regions of interest (RoIs). Meanwhile, for maximizing the utilization of complementary information, a fusion voxel encoder (FVE) is introduced to update the multimodal voxels through interacting with the multiscale semantic information of different cameras. Extensive experiments are conducted on the nuScenes dataset. With the help of the proposed framework, the precision of the object detection has been improved both on the validation set and the test set.

Original languageEnglish
Pages (from-to)33142-33152
Number of pages11
JournalIEEE Sensors Journal
Volume25
Issue number17
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • 3-D object detection
  • deformable attention
  • multimodal backbone

Fingerprint

Dive into the research topics of 'UniVoxel: A Novel Framework for 3-D Object Detection in Autonomous Vehicles With Multimodal Voxel Representation'. Together they form a unique fingerprint.

Cite this