Visual Computing | AutoAI Lab

From MIRAGE to CLEAR: Component-Level Explainable Anomaly Reasoning for Autonomous Vehicle Perception Systems

Abstract: Autonomous vehicle (AV) perception systems remain vulnerable to failures that current anomaly detectors can flag but cannot trace to a specific component; an attribution gap that impedes forensics and emerging transparency mandates like the EU AI Act.

Sim2Real Diffusion: Leveraging Foundation Vision Language Models for Adaptive Automated Driving

Abstract: Simulation-based design, optimization, and validation of autonomous vehicles have proven to be crucial for their improvement over the years. Nevertheless, the ultimate measure of effectiveness is their successful transition from simulation to reality (sim2real).

Point Cloud Self-supervised Learning via 3D to Multi-view Masked Learner

Abstract: Recently, multi-modal masked autoencoders (MAE) has been introduced in 3D self-supervised learning, offering enhanced feature learning by leveraging both 2D and 3D data to capture richer cross-modal representations. However, these approaches have two limitations: (1) they inefficiently require both 2D and 3D modalities as inputs, even though the inherent multi-view properties of 3D point clouds already contain 2D modality.

Attention-Aware Temporal Adversarial Shadows on Traffic Sign Sequences

Abstract: We present a framework for black-box adversarial attacks on traffic signs using dynamic, temporally coherent shadows. Unlike prior work that focuses on single-image attacks or relies on conspicuous physical artifacts, our method operates over entire image sequences, mimicking realistic scenarios where a traffic sign is observed from varying distances.

SAM-Guided Masked Token Prediction for 3D Scene Understanding

Abstract: Foundation models have significantly enhanced 2D task performance, and recent works like Bridge3D have successfully applied these models to improve 3D scene understanding through knowledge distillation, marking considerable advancements. Nonetheless, challenges such as the misalignment between 2D and 3D representations and the persistent long-tail distribution in 3D datasets still restrict the effectiveness of knowledge distillation from 2D to 3D using foundation models.

DyConfidMatch: Dynamic Thresholding and Re-sampling for 3D Semi-supervised Learning

Abstract:

NARUTO: Neural Active Reconstruction from Uncertain Target Observations

Abstract:

Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

Abstract: Foundation models have made significant strides in 2D and language tasks such as image segmentation, object detection, and visual-language understanding. Nevertheless, their potential to enhance 3D scene representation learning remains largely untapped due to the domain gap.

MuTrans: Multiple Transformers for Fusing Feature Pyramid on 2D and 3D Object Detection

Abstract: One of the major components of the neural network, the feature pyramid plays a vital part in perception tasks, like object detection in autonomous driving. But it is a challenge to fuse multi-level and multi-sensor feature pyramids for object detection.

Rethinking 3D Geometric Feature Learning for Neural Reconstruction

Abstract: Recent advances in neural reconstruction using posed image sequences have made remarkable progress. However, due to the lack of depth information, existing volumetric-based techniques simply duplicate 2D image features of the object surface along the entire camera ray.