**Imperative learning (IL)** is a self-supervised neural-symbolic learning framework for robot autonomy.

A prototype of **IL** was first mentioned in the **iSLAM** paper, while it was then formally defined in this long article:

This **iSeries** collects articles from the SAIR lab, named after a leading character “i” from *“imperative learning”*. In the iSeries collection, IL has been applied to various tasks including path planning, feature matching, and multi-robot routing, etc.

### The list of iSeries articles

This blog will briefly explain IL in a **high-level perspective**, as the reader may find more in-depth explanation in the paper.

Readers may also find a slide in this link, which provides a more interactive format.

###### IL is to alleviate the challenges of robot learning frameworks such as **reinforcement learning** and **imitation learning**.

### Why do we need Neural-Symbolic AI?

- To combine the advantages of both neural and symbolic methods.
- To overcome the challenges of existing robot learning frameworks.

### What is Neural-Symbolic AI?

- There is still NO consensus on Neural-Symbolic (NeSy) AI.
- We have a narrow and a broader definition, where the difference is mainly on the scope of “symbols”.

### Examples of existing Neural-Symbolic AI?

- Although many methods haven’t explicitly say this, but they can be viewed as Neural-Symbolic AI.

### Why do we need Imperative Leanring?

- Imperative learning is a
**self-supervised neural-symbolic learning framework**. - It is designed to
**overcome the four challenges by a single design**based on**bilevel optimization**.- Limited generalization ability, black-box nature, label intensiveness, sub-optimality.

### What is Imperative Learning?

- The framework of imperative learning (IL) consists of three primary modules including a
**neural perceptual network**, a**symbolic reasoning engine**, and a**general memory system**. - IL is formulated as a special bilevel optimization, enabling reciprocal learning and mutual correction among the three modules.

Denote the neural system as \(\boldsymbol z = f({\boldsymbol{\theta}}, \boldsymbol{x})\), where \(\boldsymbol{x}\) represents the sensor measurements, \({\boldsymbol{\theta}}\) represents the perception-related learnable parameters, and \(\boldsymbol z\) represents the neural outputs such as semantic attributes; the reasoning engine as \(g(f, M, {\boldsymbol{\mu}})\) with reasoning-related parameters \({\boldsymbol{\mu}}\) and the memory system as \(M({\boldsymbol{\gamma}}, {\boldsymbol{\nu}})\), where \({\boldsymbol{\gamma}}\) is perception-related memory parameters and \({\boldsymbol{\nu}}\) is reasoning-related memory parameters. Therefore, imperative learning (IL) is formulated as a special BLO:

\[\begin{align} \min_{ \boldsymbol \psi \doteq [{\boldsymbol{\theta}}^\top,~{\boldsymbol{\gamma}}^\top]^\top} & U\left(f({\boldsymbol{\theta}}, \boldsymbol{x}), g({\boldsymbol{\mu}}^*), M({\boldsymbol{\gamma}}, {\boldsymbol{\nu}}^*)\right), \label{eq:high-il} \\ \textrm{s.t.} \quad & \boldsymbol \phi^* \in \arg\min_{ \boldsymbol \phi \doteq [{\boldsymbol{\mu}}^\top,~{\boldsymbol{\nu}}^\top]^\top} L(f({\boldsymbol{\theta}}, \boldsymbol{x}), g({\boldsymbol{\mu}}), M({\boldsymbol{\gamma}}, {\boldsymbol{\nu}})), \label{eq:low-il} \\ &\textrm{s.t.} \quad \xi(M({\boldsymbol{\gamma}}, {\boldsymbol{\nu}}), {\boldsymbol{\mu}}, f({\boldsymbol{\theta}}, \boldsymbol{x})) = \text{ or }\leq 0, \label{eq:il-constraint} \end{align}\]where \(\xi\) is a general constraint (either equality or inequality); \(U\) and \(L\) are the **upper-level** (**UL**) and **lower-level** (**LL**) cost functions; and \(\boldsymbol \psi \doteq [{\boldsymbol{\theta}}^\top, {\boldsymbol{\gamma}}^\top]^\top\) are stacked UL variables and \(\boldsymbol \phi \doteq [{\boldsymbol{\mu}}^\top, {\boldsymbol{\nu}}^\top]^\top\) are stacked LL variables, respectively.
Alternatively, \(U\) and \(L\) are also referred to as the **neural cost** and **symbolic cost**, respectively.

- The term
*“imperative”*is used to denote the passive nature of the learning process:- Once optimized, the neural system \(f\) in the UL cost will be driven to align with the LL reasoning engine \(g\)
- E.g., logical, physical, or geometrical reasoning process with constraint \(\xi\).

- Therefore, IL can learn to generate logically, physically, or geometrically feasible semantic attributes or predicates.

- Once optimized, the neural system \(f\) in the UL cost will be driven to align with the LL reasoning engine \(g\)
- In some applications, \(\boldsymbol \psi\) and \(\boldsymbol \phi\) are also referred to as
**neuron-like**and**symbol-like**parameters, respectively.

##### Self-supervised Nature

- Since many symbolic reasoning engines including geometric, physical, and logical reasoning, can be optimized or solved without providing labels.
- For example, A\(^*\) search, geometrical reasoning such as bundle adjustment (BA), and physical reasoning like model predictive control (MPC) can be optimized without providing labels.

- The IL framework leverages this phenomenon and jointly optimizes the three modules by bilevel optimization, which enforces the three modules to mutually correct each other.
- Consequently, all three modules can learn and evolve in a
**self-supervised manner**by observing the world. - Although IL is designed for self-supervised learning, it can easily adapt to supervised or weakly supervised learning by involving labels either in UL or LL cost functions or both.

##### Overcoming the other Challenges.

- The symbolic module offers better
**Interpretability**and**Generalization Ability**due to its explainable design. - The
**Optimality**is brought by bilevel optimization, compared to separately training the neural and symbolic modules.

##### Optimization Challenge

- The solution to IL mainly involves solving the UL parameters \({\boldsymbol{\theta}}\) and \({\boldsymbol{\gamma}}\) and the LL parameters \({\boldsymbol{\mu}}\) and \({\boldsymbol{\nu}}\).
- Intuitively, the UL parameters which are often neuron-like weights can be updated with the gradients of the UL cost $U$:

- Since \(U\), \(L\), \(M\), \(g\), and \(f\) are often well defined, the challenge is to compute the derivative of lower-level (symbol-like) parameters w.r.t the upper-level (neuron-like) parameters, \(\color{blue}\frac{\partial \boldsymbol \phi^*}{\partial \boldsymbol \psi}\), which takes the form:

- There are generally two ways to compute it,
*i.e.*,**unrolled differentiation**and**implicit differentiation**. See paper for more details. - Since \(\boldsymbol \psi \doteq [{\boldsymbol{\theta}}^\top, {\boldsymbol{\gamma}}^\top]^\top\) are LL parameters, the solution depends on the specific LL tasks.

### Applications and Examples

- The paper provides five distinct examples covering the different cases of LL tasks.

#### Path Planning

In the case of LL tasks have **closed-form solutions**, we provide examples in both global and local path planning.

###### Global Path Planning

- A\(^*\) is widely used due to its optimality, but often suffers low efficiency due to its large search space.
- Therefore, we could leverage a neural module to predict a confined search space, leading to overall improved efficiency.
- We take A\(^*\) as the symbolic reasoning engine and train the neural module in a self-supervised way based on IL.
- This results in a new framework, which is referred to as
**iA\(^*\)**.

- Due to the confined search space and generalization ability from A*, iA\(^*\) outperforms both classic and other learning methods.
- The following figure shows the qualitative results of path planning algorithms on datasets, including MP, Maze, and Matterport3D.

###### Local Path Planning

- End-to-end local path planning has recently attracted considerable interest, particularly for its potential to enable efficient inference.
- Reinforcement learning-based methods often suffer from sample inefficiency and difficulties in directly processing depth images.
- Imitation learning-based methods rely heavily on the availability and quality of labeled trajectories.
- To solve those problems, we leverage a neural module to predict sparse waypoints, leading to overall improved efficiency.
- The waypoints are then interpolated using a trajectory optimization engine based on a cubic spline.
- We use IL to train this new framework, which is referred to as
**iPlanner**.

- The following figure shows real-world experiment for local path planning using iPlanner with a legged robot.

#### Logical Reasoning

- In the case of the LL task needs
**first-order optimization**, we provide an example in inductive logical reasoning. - Existing works only focus on toy examples, such as Visual Sudoku, and binary vector representations in BlocksWorld.
- They cannot simultaneously perform grounding (high dimensional data) and rule induction.
- Based on IL, we use a neural network for concept and relationship prediction, and a neural logical machine (NLM) for rule induction.
- We denote this new framework as
**iLogic**.

- In the following figure, iLogic conducts rule induction with perceived groundings and the constraining rules exhibited on the right side and finally gets the accurate action prediction exhibited on the left side.

#### Optimal Control

- In the case of the LL task needs
**constrained optimization**, we provide an example of UAV attitude control based on IMU. - Differentiable model predictive control (MPC) to combine the physics-based modeling with data-driven methods, enabling learning dynamic models and control policies in an end-to-end manner.
- However, many prior studies depend on expert demonstrations or labeled data for supervised learning.
- They often suffer from challenging conditions such as unseen environments and external disturbances.
- Based on IL, we use a neural network for IMU denoising and predict the hyperparameters for MPC.
- We denote this new framework as
**iMPC**.

- We evaluate the control performance under the wind disturbance to validate the robustness of the proposed approach.

#### Visual Odometry

- In the case of the LL task needs
**second-order optimization**, we provide an example of simultaneous localization and mapping (SLAM). - Existing SLAM systems only have single connection between the front-end odometry and back-end pose graph optimization.
- This leads to sub-optimal solutions since there is no feedback from the back-end to the front-end.
- We proposed to optimize the entire SLAM system based on IL, leading the self-supervised reciprocal correction between the front-end and the back-end.
- We refer to this new framework as iSLAM.

- With more training iterations, the front-end odometry can be kept improving in the following figure.

#### Multi-agent Routing

- In the case of the LL task needs
**discrete optimization**, we provide an example of multiple traveling salesman problem (MTSP). - Traditional methods for MTSP needs combinatorial optimization, which is discrete optimization in a very large space.
- Classic MTSP solvers such as Google’s OR-Tools routing library meet difficulties for large-scale problems (>500 cities).
- We introduce IL and use a neural network for city allocation to agents and then use single TSP solvers for divided smaller problems.
- To compute the differentiation in discrete space, we introduce a surrogate network to estimate the gradient based on
**control variate**. - We refer this new framework as
**iMTSP**.

- Due to the generalization abilities of IL, iMTSP outperforms both classic solvers and RL-based methods.