Visual Computing

Disentangling Object Motion and Occlusion for Unsupervised Multi-frame Monocular Depth

Abstract: Conventional self-supervised monocular depth prediction methods are based on a static environment assumption, which leads to accuracy degradation in dynamic scenes due to the mismatch and occlusion problems introduced by object motions.

FocusTR: Focusing on Valuable Feature by Multiple Transformers for Fusing Feature Pyramid on Object Detection

Abstract: The feature pyramid, which is a vital component of the convolutional neural networks, plays a significant role in several perception tasks, including object detection for autonomous driving. However, how to better fuse multi-level and multi-sensor feature pyramids is still a significant challenge, especially for object detection.

Class-Level Confidence Based 3D Semi-Supervised Learning

Abstract: Current pseudo-labeling strategies in 3D semi-supervised learning (SSL) fail to adaptively incorporate each class’s learning difficulty and learning status variance. In this work, we practically demonstrate that 3D unlabeled data class-level confidence can represent the learning status.

Automated Wall-Climbing Robot for Concrete Construction Inspection

Abstract: Human-made concrete structures require cutting-edge inspection tools to ensure the quality of the construction to meet the applicable building codes and to maintain the sustainability of the aging infrastructure. This paper introduces a wall-climbing robot for metric concrete inspection that can reach difficult-to-access locations with a close-up view for visual data collection and real-time flaws detection and localization.

SPD: Semi-Supervised Learning and Progressive Distillation for 3-D Detection

Abstract: Current learning-based 3-D object detection accuracy is heavily impacted by the annotation quality. It is still a challenge to expect an overall high detection accuracy for all classes under different scenarios given the dataset sparsity.

Advancing Self-Supervised Monocular Depth Learning with Sparse LiDAR

Abstract: Self-supervised monocular depth prediction provides a cost-effective solution to obtain the 3D location of each pixel. However, the existing approaches usually lead to unsatisfactory accuracy, which is critical for autonomous robots.

Multimodal Semi-Supervised Learning for 3D Objects

Abstract: In recent years, semi-supervised learning has been widely explored and shows excellent data efficiency for 2D data. There is an emerging need to improve data efficiency for 3D tasks due to the scarcity of labeled 3D data.

PSE-Match: A Viewpoint-Free Place Recognition Method With Parallel Semantic Embedding

Abstract: Accurate localization on the autonomous driving cars is essential for autonomy and driving safety, especially for complex urban streets and search-and-rescue subterranean environments where high-accurate GPS is not available. However current odometry estimation may introduce the drifting problems in long-term navigation without robust global localization.

Multi-Scale Fusion With Matching Attention Model: A Novel Decoding Network Cooperated With NAS for Real-Time Semantic Segmentation

Abstract: This paper proposes a real-time multi-scale semantic segmentation network (MsNet). MsNet is a combination of our novel multi-scale fusion with matching attention model (MFMA) as the decoding network and the network searched by asymptotic neural architecture search (ANAS) or MobileNetV3 as the encoding network.

3D Mapping and Stability Prediction for Autonomous Wheelchairs

Abstract: Autonomous wheelchairs can address a very large need in many populations by serving as the gateway to a much higher degree of independence and mobility capability. This is due to the fact that the big picture idea for autonomous wheelchairs integration into the transportation chain is to allow for individuals to be able to utilize the Intelligent wheelchair to reach the vehicle (regardless of terrain), mount into autonomous wheelchair that navigates to desired destination, and finally autonomous wheelchair dismounts.