Recent efforts in data-driven methods face challenges of the need for hard-to-obtain supervision and issues with high variance in gradient estimations, leading to slow convergence and highly sub-optimal solutions. We address these issues by reformulating MTSP as a bilevel optimization problem, using the concept of imperative learning (IL). This involves introducing an allocation network that decomposes the MTSP into multiple single-agent traveling salesman problems (TSPs). The longest tour from these TSP solutions is then used to self-supervise the allocation network, resulting in a new self-supervised, bilevel, end-to-end learning framework, which we refer to as imperative MTSP (iMTSP). Additionally, to tackle the high-variance gradient issues during the optimization, we introduce a control variate-based gradient estimation algorithm. Our experiments showed that these innovative designs enable our gradient estimator to converge
Imperative Learning
The framework of our self-supervised MTSP network. The allocation network uses supervision from the TSP solver, and the surrogate network is supervised by the single sample variance estimator, reducing the gradient variance.

To solve the NP-hard MTSP efficiently, we reformulate MTSP with Imperative Learning, a bi-level optimization-based learning framework. We define
where
Optimization
The major challenges for the back-propagation of the proposed framework are the non-differentiability of the TSP solver and the fact that city allocations
where
where
Thus, the gradient for the parameters in the allocation network is:
Experiments
The animation below presents the trajectories in two instances with 5 agents and 500 cities. From the left to the right, are the results corresponding to the proposed iMTSP, the RL baseline, and Google’s ORTools






Since all the agents move with constant speed, a shorter running time indicates better performance.
The gradient variance is significantly reduced with the surrogate network serving as a control variate. We explicitly record the mini-batch gradient variance of two training processes with 50 and 100 cities respectively.

Publications
-
iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning.IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024.