Abstract: Foundation models have significantly enhanced 2D task performance, and recent works like Bridge3D have successfully applied these models to improve 3D scene understanding through knowledge distillation, marking considerable advancements. Nonetheless, challenges such as the misalignment between 2D and 3D representations and the persistent long-tail distribution in 3D datasets still restrict the effectiveness of knowledge distillation from 2D to 3D using foundation models.
Abstract: Disability knows no boarders, so the development of assistive technology is an international effort. This review is a follow up to our previous comprehensive review (Leaman 2017) and a recent mini-review (Sivakanthan 2022).
Abstract: Foundation models have made significant strides in 2D and language tasks such as image segmentation, object detection, and visual-language understanding. Nevertheless, their potential to enhance 3D scene representation learning remains largely untapped due to the domain gap.
Abstract: Recent advances in neural reconstruction using posed image sequences have made remarkable progress. However, due to the lack of depth information, existing volumetric-based techniques simply duplicate 2D image features of the object surface along the entire camera ray.
Abstract: Conventional self-supervised monocular depth prediction methods are based on a static environment assumption, which leads to accuracy degradation in dynamic scenes due to the mismatch and occlusion problems introduced by object motions.
Abstract: Self-supervised monocular depth prediction provides a cost-effective solution to obtain the 3D location of each pixel. However, the existing approaches usually lead to unsatisfactory accuracy, which is critical for autonomous robots.
Abstract: This research proposes a novel semi-supervised learning framework FourStr (Four-Stream formed by two two-stream models) that focuses on the improvement of fusion and labeling efficiency for 3D multi-sensor detector. FourStr adopts a multi-sensor single-stage detector named adaptive fusion network (AFNet) as the backbone and trains it through the semi-supervision learning (SSL) strategy Stereo Fusion.
Abstract: Human-made concrete structures require cutting-edge inspection tools to ensure the quality of the construction to meet the applicable building codes and to maintain the sustainability of the aging infrastructure. This paper introduces a wall-climbing robot for metric concrete inspection that can reach difficult-to-access locations with a close-up view for visual data collection and real-time flaws detection and localization.