GroundSLAM: A Robust Visual SLAM System for Warehouse Robots Using Ground Textures

Published: by

A robust visual localization and mapping system is essential for warehouse robot navigation, as cameras offer a more cost-effective alternative to LiDAR sensors. However, existing forward-facing camera systems often encounter challenges in dynamic environments and open spaces, leading to significant performance degradation during deployment. To address these limitations, a localization system utilizing a single downward-facing camera to capture ground textures presents a promising solution. Nevertheless, existing feature-based ground-texture localization methods face difficulties when operating on surfaces with sparse features or repetitive patterns. To address this limitation, we propose GroundSLAM, a novel feature-free and ground-texture-based simultaneous localization and mapping (SLAM) system. GroundSLAM consists of three components: feature-free visual odometry, ground-texture-based loop detection and map optimization, and map reuse. Specifically, we introduce a kernel cross-correlator (KCC) for image-level pose tracking, loop detection, and map reuse to improve localization accuracy and robustness, and incorporate adaptive pruning strategies to enhance efficiency. Due to these specific designs, GroundSLAM is able to deliver efficient and stable localization across various ground surfaces such as those with sparse features and repetitive patterns. To advance research in this area, we introduce the first ground-texture dataset with precise ground-truth poses, consisting of 131k images collected from 10 kinds of indoor and outdoor ground surfaces. Extensive experimental results show that GroundSLAM outperforms state-of-the-art methods for both indoor and outdoor localization.

Motivation

When deploying visual SLAM in warehouse robots, numerous hurdles must be overcome. On one hand, in environments like the one showed bellow, where multiple robots collaborate to transport goods within the warehouse, the high dynamism often leads to visual localization errors. On the other hand, warehouses are typically vast, causing static features to be distant from the camera and reducing visual SLAM system accuracy. To address this, we presented GroundSLAM for the warehouse robot using ground textures, which includes efficient visual odometry, loop closure detection, and map reuse. Our system can provide robust pose estimation and localization in environments with many dynamic objects or open spaces using only a monocular downward-facing camera.

Warehouse and warehouse robots. In the warehouse, dynamic objects (robots and storage racks) and large-scale environments make the localization with a forward-facing camera or LiDAR very challenging.

System Design

The pipeline of GroundSLAM is showed bellow. Our inputs comprise grayscale images taken by a camera mounted to face downward. The outputs are 3-DOF camera poses, specifically 2-DOF translations and rotation. The entire workflow is simple: The initial frame is designated as the first key-frame; Upon capturing a subsequent frame, the relative motion between the present and the key-frame is determined; Leveraging the estimated relative pose and its confidence, a new key-frame is chosen and integrated into the map; Subsequently, we identify its neighboring key-frames within the map, aiming to pinpoint a loop closure; If a loop closure is detected, we execute a pose graph optimization to minimize the drift error.

The structure of GroundSLAM. The rotation and translation are decoupled and estimated using the proposed KCC.

We also present a robust kernel cross-correlator (KCC) for robust image-level registration. Compared with the feature-based methods, KCC can achieve more robust and accurate estimation on the ground with few features or repetitive patterns.

Experiments

The figure below shows the comparison of data association of ORB, SIFT and KCC. Since KCC operates as an image-matching technique, its estimation is graphically represented with the normalized confidence on the vertical axis, while the horizontal axis details the corresponding movements in rotation and translation. It can be seen that KCC consistently demonstrates accuracy and stability across diverse ground textures. This is evident from the distinctly pronounced peaks that tower significantly over other positions. However, the performance of feature-based methods fluctuates. They show commendable results on textures enriched with unique corners, such as in the sequence of “parking place”, and “workroom linoleum”; while struggling to detect a sufficient number of matches in the sequence of “doormat”, “garage concrete”, and “terrace pavement”. On the “kitchen” texture, they fail to even detect an adequate number of features.

The comparison of data association of ORB, SIFT and KCC on different ground textures.

The VO comparison and loop detection experiment are showed below. It can be seen that our system achieve better performance than the SOTA ground-texture-based localization system. Besides, the pose errors are significantly decreased after the loop correction, which shows that the effectiveness of the proposed loop detection method.

The trajectories of visual odometry (left) and visual SLAM (right).

Live Mapping Demo

We showcase a real-time mapping demonstration with loop detection and correction. This data is gathered in a warehouse setting by an Automated Guided Vehicle (AGV). We maneuvered the robot along a rectangular trajectory (highlighted by the green line) to initiate and stop at the identical location (indicated by the red square), effectively creating a loop. Images from this path are then composited using the pose estimates produced by GroundSLAM, both without (on the left) and with (on the right) loop correction. Due to the drift error, the stitched map without loop correction exhibits noticeable blurring, which is eliminated on the right. This clarity underscores the effective elimination of drift errors and the precise pose corrections.

A live mapping demo of GroundSLAM without (left) and with (right) loop correction. The drift error of our VO module is only 0.2% of the trajectory, which is significantly lower than other SOTA systems (about 1%). The drift error causes the blurring in the stitched map (left), and it is eliminated by our loop correction (right).

Publications

  1. GroundSLAM: A Robust Visual SLAM System for Warehouse Robots Using Ground Textures.
    Kuan Xu, Zheng Yang, Lihua Xie, Chen Wang.
    arXiv preprint arXiv:1710.05502, 2025.
    SAIR Lab Selected

This work is based on

  1. Kernel Cross-Correlator.
    Chen Wang, Le Zhang, Lihua Xie, Junsong Yuan.
    Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), pp. 4179–4186, 2018.
    SAIR Lab Selected