Abstract: Foundation models have made significant strides in 2D and language tasks such as image segmentation, object detection, and visual-language understanding. Nevertheless, their potential to enhance 3D scene representation learning remains largely untapped due to the domain gap.
Abstract: Recent advances in neural reconstruction using posed image sequences have made remarkable progress. However, due to the lack of depth information, existing volumetric-based techniques simply duplicate 2D image features of the object surface along the entire camera ray.
Abstract: This research proposes a novel semi-supervised learning framework FourStr (Four-Stream formed by two two-stream models) that focuses on the improvement of fusion and labeling efficiency for 3D multi-sensor detector. FourStr adopts a multi-sensor single-stage detector named adaptive fusion network (AFNet) as the backbone and trains it through the semi-supervision learning (SSL) strategy Stereo Fusion.
Abstract: Recent state-of-the-art method FlexMatch firstly demonstrated that correctly estimating learning status is crucial for semi-supervised learning (SSL). However, the estimation method proposed by FlexMatch does not take into account imbalanced data, which is the common case for 3D semi-supervised learning.
Abstract: Conventional self-supervised monocular depth prediction methods are based on a static environment assumption, which leads to accuracy degradation in dynamic scenes due to the mismatch and occlusion problems introduced by object motions.
Abstract: The feature pyramid, which is a vital component of the convolutional neural networks, plays a significant role in several perception tasks, including object detection for autonomous driving. However, how to better fuse multi-level and multi-sensor feature pyramids is still a significant challenge, especially for object detection.
Abstract: Current learning-based 3-D object detection accuracy is heavily impacted by the annotation quality. It is still a challenge to expect an overall high detection accuracy for all classes under different scenarios given the dataset sparsity.
Abstract: Recently significant progress has been made in 3D detection. However, it is still challenging to detect small contour objects under complex scenes. This paper proposes a novel Attention-based Multi-phase Multi-task Fusion (AMMF) that uses point-level, RoI-level, and multi-task fusions to complement the disadvantages of LiDAR and camera, to solve this challenge.
Abstract: Human-made concrete structures require cutting-edge inspection tools to ensure the quality of the construction to meet the applicable building codes and to maintain the sustainability of the aging infrastructure. This paper introduces a wall-climbing robot for metric concrete inspection that can reach difficult-to-access locations with a close-up view for visual data collection and real-time flaws detection and localization.
Abstract: Self-supervised monocular depth prediction provides a cost-effective solution to obtain the 3D location of each pixel. However, the existing approaches usually lead to unsatisfactory accuracy, which is critical for autonomous robots.