Introduction
Long-horizon task planning becomes expensive when robots must reason over many objects and complex logical constraints, such as object affordances, spatial relationships, and sequential dependencies. Previous neuro-symbolic planners reduce this search space by predicting which objects matter, but they train the scorer using full-space plans while deploying it inside its own pruned search spaces. This mismatch means a scoring error can remove critical objects or keep irrelevant ones, making the simplified planning problem unsolvable or unnecessarily large.
To address this, by training from planner feedback online, our method improves both planning efficiency and robustness for benchmark tasks. For brevity, we refer to our method as iFlax to highlight our contribution in stabilizing the imperative learning process for task planning and addressing Flax’s exposure bias under complex logical constraints. We also validate the deployability of iFlax on a quadruped-based mobile manipulator (Spot) in both simulation and the real world.
Method Overview
iFlax formulates object-importance learning as a bilevel optimization problem: Given a PDDL task and its relational graph, a neural network predicts object-importance scores. The lower-level planner operates in the score-pruned search space and returns a feasible plan, providing adaptive pseudo-supervision for updating the neural scorer.
To stabilize this loop, iFlax uses a parallel 3R strategy: Repair recovers missing critical objects, Restart rebuilds a cleaner active set, and Rollback re-expands cautiously after an overly large expansion step.
Experiments
iFlax is evaluated on three challenging benchmarks: MazeNamo, SokoMindPlus, and LogisticsPlus, testing dense obstacle rearrangement, irreversible push dependencies, and transport dependencies with resource constraints. On MazeNamo, iFlax reduces the average failure rate by 80.04% and weighted planning time by 57.14% compared with prior SOTA method Flax.
Simulation
In Isaac Sim, iFlax is deployed on a quadruped-based mobile manipulator in MazeNamo-style tasks with additional robot-specific logical constraints. Movable obstacles include heavy boxes, tall containers, and short containers. Containers can only be placed on the ground, the robot must stand to pick tall containers, and it must sit to pick short containers.
Real-World Experiments
In the real world, iFlax runs on a Spot quadruped equipped with a Jetson AGX Orin computing board, an AgileX Piper manipulator, and a wrist-mounted RealSense D435i. The system builds a symbolic planning problem from sensed geometry, solves it with iFlax, and executes the returned high-level plan with grounded skills for navigation, container pickup and placement, bottle pickup and placement, and box pushing.
Warehouse Navigation
The first three videos are navigation tasks with the high-level goal reach location. The robot must identify which containers, bottles, and heavy boxes determine access to the goal, then alternate obstacle rearrangement and navigation.
Warehouse Mobile Manipulation
The remaining six videos are mobile manipulation tasks. The goals include move bottle to location, move bottle to location and bottle on the ground, move container to location, and put bottle upon box. These tasks are harder than pure reach-location goals because the final plan may need to satisfy both where an object should be and how it should be placed.