iSeries Articles Released

Self-supervised Neural-Symbolic Learning for Robot Autonomy

Published: by

Imperative learning (IL) is a self-supervised neural-symbolic learning framework for robot autonomy.

A prototype of IL was first mentioned in the iSLAM paper, while it was then formally defined in this long article:

  1. Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy.
    Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao.
    arXiv preprint arXiv:2406.16087, 2024.
    SAIR Lab Recommended

This iSeries collects articles from the SAIR lab, named after a leading character “i” from “imperative learning”. In the iSeries collection, IL has been applied to various tasks including path planning, feature matching, and multi-robot routing, etc.

The list of iSeries articles

  1. iWalker: Imperative Visual Planning for Walking Humanoid Robot.
    Xiao Lin, Yuhao Huang, Taimeng Fu, Xiaobin Xiong, Chen Wang.
    arXiv preprint arXiv:2409.18361, 2024.
    SAIR Lab Recommended
  2. iMatching: Imperative Correspondence Learning.
    Zitong Zhan, Dasong Gao, Yun-Jou Lin, Youjie Xia, Chen Wang.
    European Conference on Computer Vision (ECCV), 2024.
    SAIR Lab Recommended
  3. iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning.
    Yifan Guo, Zhongqiang Ren, Chen Wang.
    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024.
    SAIR Lab Recommended
  4. iSLAM: Imperative SLAM.
    Taimeng Fu, Shaoshu Su, Yiren Lu, Chen Wang.
    IEEE Robotics and Automation Letters (RA-L), 2024.
    SAIR Lab Recommended
  5. iA*: Imperative Learning-based A* Search for Pathfinding.
    Xiangyu Chen, Fan Yang, Chen Wang.
    arXiv preprint arXiv:2403.15870, 2024.
    SAIR Lab Recommended
  6. iPlanner: Imperative Path Planning.
    Fan Yang, Chen Wang, Cesar Cadena, Marco Hutter.
    Robotics: Science and Systems (RSS), 2023.
    SAIR Lab Recommended

This blog will briefly explain IL in a high-level perspective, as the reader may find more in-depth explanation in the paper.

Readers may also find a slide in this link, which provides a more interactive format.

IL is to alleviate the challenges of robot learning frameworks such as reinforcement learning and imitation learning.
Challenges of robot learning frameworks such as reinforcement learning and imitation learning.

Why do we need Neural-Symbolic AI?

  • To combine the advantages of both neural and symbolic methods.
  • To overcome the challenges of existing robot learning frameworks.
The advantage of neural and symbolic methods.

What is Neural-Symbolic AI?

  • There is still NO consensus on Neural-Symbolic (NeSy) AI.
  • We have a narrow and a broader definition, where the difference is mainly on the scope of “symbols”.
The definition of Neura-Symbolic AI.

Examples of existing Neural-Symbolic AI?

  • Although many methods haven’t explicitly say this, but they can be viewed as Neural-Symbolic AI.
The examples and methods of Neura-Symbolic AI.

Why do we need Imperative Leanring?

  • Imperative learning is a self-supervised neural-symbolic learning framework.
  • It is designed to overcome the four challenges by a single design based on bilevel optimization.
    • Limited generalization ability, black-box nature, label intensiveness, sub-optimality.

What is Imperative Learning?

  • The framework of imperative learning (IL) consists of three primary modules including a neural perceptual network, a symbolic reasoning engine, and a general memory system.
  • IL is formulated as a special bilevel optimization, enabling reciprocal learning and mutual correction among the three modules.
The framework of imperative learning.

Denote the neural system as \(\boldsymbol z = f({\boldsymbol{\theta}}, \boldsymbol{x})\), where \(\boldsymbol{x}\) represents the sensor measurements, \({\boldsymbol{\theta}}\) represents the perception-related learnable parameters, and \(\boldsymbol z\) represents the neural outputs such as semantic attributes; the reasoning engine as \(g(f, M, {\boldsymbol{\mu}})\) with reasoning-related parameters \({\boldsymbol{\mu}}\) and the memory system as \(M({\boldsymbol{\gamma}}, {\boldsymbol{\nu}})\), where \({\boldsymbol{\gamma}}\) is perception-related memory parameters and \({\boldsymbol{\nu}}\) is reasoning-related memory parameters. Therefore, imperative learning (IL) is formulated as a special BLO:

\[\begin{align} \min_{ \boldsymbol \psi \doteq [{\boldsymbol{\theta}}^\top,~{\boldsymbol{\gamma}}^\top]^\top} & U\left(f({\boldsymbol{\theta}}, \boldsymbol{x}), g({\boldsymbol{\mu}}^*), M({\boldsymbol{\gamma}}, {\boldsymbol{\nu}}^*)\right), \label{eq:high-il} \\ \textrm{s.t.} \quad & \boldsymbol \phi^* \in \arg\min_{ \boldsymbol \phi \doteq [{\boldsymbol{\mu}}^\top,~{\boldsymbol{\nu}}^\top]^\top} L(f({\boldsymbol{\theta}}, \boldsymbol{x}), g({\boldsymbol{\mu}}), M({\boldsymbol{\gamma}}, {\boldsymbol{\nu}})), \label{eq:low-il} \\ &\textrm{s.t.} \quad \xi(M({\boldsymbol{\gamma}}, {\boldsymbol{\nu}}), {\boldsymbol{\mu}}, f({\boldsymbol{\theta}}, \boldsymbol{x})) = \text{ or }\leq 0, \label{eq:il-constraint} \end{align}\]

where \(\xi\) is a general constraint (either equality or inequality); \(U\) and \(L\) are the upper-level (UL) and lower-level (LL) cost functions; and \(\boldsymbol \psi \doteq [{\boldsymbol{\theta}}^\top, {\boldsymbol{\gamma}}^\top]^\top\) are stacked UL variables and \(\boldsymbol \phi \doteq [{\boldsymbol{\mu}}^\top, {\boldsymbol{\nu}}^\top]^\top\) are stacked LL variables, respectively. Alternatively, \(U\) and \(L\) are also referred to as the neural cost and symbolic cost, respectively.

  • The term “imperative” is used to denote the passive nature of the learning process:
    • Once optimized, the neural system \(f\) in the UL cost will be driven to align with the LL reasoning engine \(g\)
      • E.g., logical, physical, or geometrical reasoning process with constraint \(\xi\).
    • Therefore, IL can learn to generate logically, physically, or geometrically feasible semantic attributes or predicates.
  • In some applications, \(\boldsymbol \psi\) and \(\boldsymbol \phi\) are also referred to as neuron-like and symbol-like parameters, respectively.
Self-supervised Nature
  • Since many symbolic reasoning engines including geometric, physical, and logical reasoning, can be optimized or solved without providing labels.
    • For example, A\(^*\) search, geometrical reasoning such as bundle adjustment (BA), and physical reasoning like model predictive control (MPC) can be optimized without providing labels.
  • The IL framework leverages this phenomenon and jointly optimizes the three modules by bilevel optimization, which enforces the three modules to mutually correct each other.
  • Consequently, all three modules can learn and evolve in a self-supervised manner by observing the world.
  • Although IL is designed for self-supervised learning, it can easily adapt to supervised or weakly supervised learning by involving labels either in UL or LL cost functions or both.
Overcoming the other Challenges.
  • The symbolic module offers better Interpretability and Generalization Ability due to its explainable design.
  • The Optimality is brought by bilevel optimization, compared to separately training the neural and symbolic modules.
Optimization Challenge
  • The solution to IL mainly involves solving the UL parameters \({\boldsymbol{\theta}}\) and \({\boldsymbol{\gamma}}\) and the LL parameters \({\boldsymbol{\mu}}\) and \({\boldsymbol{\nu}}\).
  • Intuitively, the UL parameters which are often neuron-like weights can be updated with the gradients of the UL cost $U$:
\[\begin{aligned}\label{eq:solution} \nabla_{\boldsymbol{\theta}} U &= \frac{\partial U}{\partial f} \frac{\partial f}{\partial {\boldsymbol{\theta}}} + \frac{\partial U}{\partial g} \frac{\partial g}{\partial {\boldsymbol{\mu}}^*}{\color{blue}\frac{\partial {\boldsymbol{\mu}}^*}{\partial {\boldsymbol{\theta}}}} + \frac{\partial U}{\partial M}\frac{\partial M}{\partial {\boldsymbol{\nu}}^*}{\color{blue}\frac{\partial {\boldsymbol{\nu}}^*}{\partial {\boldsymbol{\theta}}}}, \\\nabla_{\boldsymbol{\gamma}} U& = \frac{\partial U}{\partial M} \frac{\partial M}{\partial {\boldsymbol{\gamma}}} + \frac{\partial U}{\partial g} \frac{\partial g}{\partial {\boldsymbol{\mu}}^*} {\color{blue}\frac{\partial {\boldsymbol{\mu}}^*}{\partial {\boldsymbol{\gamma}}}} +\frac{\partial U}{\partial M} \frac{\partial M}{\partial {\boldsymbol{\nu}}^*} {\color{blue}\frac{\partial {\boldsymbol{\nu}}^*}{\partial {\boldsymbol{\gamma}}}}. \end{aligned}\]
  • Since \(U\), \(L\), \(M\), \(g\), and \(f\) are often well defined, the challenge is to compute the derivative of lower-level (symbol-like) parameters w.r.t the upper-level (neuron-like) parameters, \(\color{blue}\frac{\partial \boldsymbol \phi^*}{\partial \boldsymbol \psi}\), which takes the form:
\[{\color{blue} \frac{\partial \boldsymbol \phi^*}{\partial \boldsymbol \psi}} = \left[\begin{aligned} {\color{blue} \frac{\partial \boldsymbol \mu^*}{\partial \boldsymbol \theta}} & \quad {\color{blue}\frac{\partial \boldsymbol \mu^*}{\partial \boldsymbol \gamma}} \\ {\color{blue}\frac{\partial \boldsymbol \nu^*}{\partial \boldsymbol \theta}} & \quad {\color{blue}\frac{\partial \boldsymbol \nu^*}{\partial \boldsymbol \gamma}} \\ \end{aligned} \right]\]
  • There are generally two ways to compute it, i.e., unrolled differentiation and implicit differentiation. See paper for more details.
  • Since \(\boldsymbol \psi \doteq [{\boldsymbol{\theta}}^\top, {\boldsymbol{\gamma}}^\top]^\top\) are LL parameters, the solution depends on the specific LL tasks.

Applications and Examples

  • The paper provides five distinct examples covering the different cases of LL tasks.
The five distinct examples and their LL optimization methods.

Path Planning

In the case of LL tasks have closed-form solutions, we provide examples in both global and local path planning.

Global Path Planning
  • A\(^*\) is widely used due to its optimality, but often suffers low efficiency due to its large search space.
  • Therefore, we could leverage a neural module to predict a confined search space, leading to overall improved efficiency.
  • We take A\(^*\) as the symbolic reasoning engine and train the neural module in a self-supervised way based on IL.
  • This results in a new framework, which is referred to as iA\(^*\).
The framework of iAstar.
  • Due to the confined search space and generalization ability from A*, iA\(^*\) outperforms both classic and other learning methods.
  • The following figure shows the qualitative results of path planning algorithms on datasets, including MP, Maze, and Matterport3D.
The qualitative results of path planning algorithms on three widely used datasets, including MP, Maze, and Matterport3D. The symbols S and G indicate the randomly selected start and goal positions. The optimal paths found by different path planning algorithms and their associated search space are indicated by red trajectories and green areas, respectively.
Local Path Planning
  • End-to-end local path planning has recently attracted considerable interest, particularly for its potential to enable efficient inference.
  • Reinforcement learning-based methods often suffer from sample inefficiency and difficulties in directly processing depth images.
  • Imitation learning-based methods rely heavily on the availability and quality of labeled trajectories.
  • To solve those problems, we leverage a neural module to predict sparse waypoints, leading to overall improved efficiency.
  • The waypoints are then interpolated using a trajectory optimization engine based on a cubic spline.
  • We use IL to train this new framework, which is referred to as iPlanner.
The framework of iPlanner.
  • The following figure shows real-world experiment for local path planning using iPlanner with a legged robot.
Real-world experiment for local path planning using iPlanner with a legged robot. The red curve indicates the robot's trajectory from right to left, beginning inside a building and then navigating to the outdoors. The robot follows a series of waypoints (blue) and plans in different scenarios marked by green boxes including (A) passing through doorways, (B, D, E) circumventing both static and dynamic obstacles, and (B, F) ascending and descending stairs.

Logical Reasoning

  • In the case of the LL task needs first-order optimization, we provide an example in inductive logical reasoning.
  • Existing works only focus on toy examples, such as Visual Sudoku, and binary vector representations in BlocksWorld.
  • They cannot simultaneously perform grounding (high dimensional data) and rule induction.
  • Based on IL, we use a neural network for concept and relationship prediction, and a neural logical machine (NLM) for rule induction.
  • We denote this new framework as iLogic.
The framework of iLogic.
  • In the following figure, iLogic conducts rule induction with perceived groundings and the constraining rules exhibited on the right side and finally gets the accurate action prediction exhibited on the left side.
The examples of learned rules using iLogic.

Optimal Control

  • In the case of the LL task needs constrained optimization, we provide an example of UAV attitude control based on IMU.
  • Differentiable model predictive control (MPC) to combine the physics-based modeling with data-driven methods, enabling learning dynamic models and control policies in an end-to-end manner.
  • However, many prior studies depend on expert demonstrations or labeled data for supervised learning.
  • They often suffer from challenging conditions such as unseen environments and external disturbances.
  • Based on IL, we use a neural network for IMU denoising and predict the hyperparameters for MPC.
  • We denote this new framework as iMPC.
The framework of iMPC.
  • We evaluate the control performance under the wind disturbance to validate the robustness of the proposed approach.
Control performance of iMPC under different levels of wind disturbance.

Visual Odometry

  • In the case of the LL task needs second-order optimization, we provide an example of simultaneous localization and mapping (SLAM).
  • Existing SLAM systems only have single connection between the front-end odometry and back-end pose graph optimization.
  • This leads to sub-optimal solutions since there is no feedback from the back-end to the front-end.
  • We proposed to optimize the entire SLAM system based on IL, leading the self-supervised reciprocal correction between the front-end and the back-end.
  • We refer to this new framework as iSLAM.
The framework of iSLAM. On the forward path, the odometry module (front-end) predicts the robot trajectory. The pose graph optimization (back-end) minimizes the LL cost in several iterations to get optimal poses. On the backward path, the UL cost is back-propagated through the map with a "one-step" strategy to update the network.
  • With more training iterations, the front-end odometry can be kept improving in the following figure.
The predicted trajectories from the front-end are improved concerning the number of imperative iterations in iSLAM.

Multi-agent Routing

  • In the case of the LL task needs discrete optimization, we provide an example of multiple traveling salesman problem (MTSP).
  • Traditional methods for MTSP needs combinatorial optimization, which is discrete optimization in a very large space.
  • Classic MTSP solvers such as Google’s OR-Tools routing library meet difficulties for large-scale problems (>500 cities).
  • We introduce IL and use a neural network for city allocation to agents and then use single TSP solvers for divided smaller problems.
  • To compute the differentiation in discrete space, we introduce a surrogate network to estimate the gradient based on control variate.
  • We refer this new framework as iMTSP.
The framework of iMTSP. A surrogate network is introduced as the memory in the IL framework, constructing a low-variance gradient for the allocation network through the non-differentiable and discrete TSP solvers.
  • Due to the generalization abilities of IL, iMTSP outperforms both classic solvers and RL-based methods.

Please refer to the iSeries articles for more technical details!

  1. iWalker: Imperative Visual Planning for Walking Humanoid Robot.
    Xiao Lin, Yuhao Huang, Taimeng Fu, Xiaobin Xiong, Chen Wang.
    arXiv preprint arXiv:2409.18361, 2024.
    SAIR Lab Recommended
  2. Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy.
    Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao.
    arXiv preprint arXiv:2406.16087, 2024.
    SAIR Lab Recommended
  3. iMatching: Imperative Correspondence Learning.
    Zitong Zhan, Dasong Gao, Yun-Jou Lin, Youjie Xia, Chen Wang.
    European Conference on Computer Vision (ECCV), 2024.
    SAIR Lab Recommended
  4. iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning.
    Yifan Guo, Zhongqiang Ren, Chen Wang.
    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024.
    SAIR Lab Recommended
  5. iSLAM: Imperative SLAM.
    Taimeng Fu, Shaoshu Su, Yiren Lu, Chen Wang.
    IEEE Robotics and Automation Letters (RA-L), 2024.
    SAIR Lab Recommended
  6. iA*: Imperative Learning-based A* Search for Pathfinding.
    Xiangyu Chen, Fan Yang, Chen Wang.
    arXiv preprint arXiv:2403.15870, 2024.
    SAIR Lab Recommended
  7. iPlanner: Imperative Path Planning.
    Fan Yang, Chen Wang, Cesar Cadena, Marco Hutter.
    Robotics: Science and Systems (RSS), 2023.
    SAIR Lab Recommended