Unsupervised Online Learning for Visual Interestingness

Updated: May 1, 2020 by

Chen Wang

The problem of visual interestingness detection, which is crucial for many practical applications such as search and rescue, is explored in this project. Although prior research is able to detect significant objects or scenes, it is not able to adapt in real-time and loose interest over time after repeatedly observing the same objects or exploring the same scenes. To enable such behaviours for robots, we argue that a learning system should have both life-time human-like experience learned from a large amount of unlabeled data and a short-term learning capability for limited negative labeled data. This is because robots normally only know uninteresting objects before a mission and have to change their interests during a mission. To this end, we introduce an unsupervised learning model with a memory mechanism, which is able to train in real-time without back-propagation, resulting in a much faster learning speed. Our experiments show that, although implemented on a single machine, our approach is still able to learn online and find meaningful objects for a practical search task in mine tunnels.

Approaches

We propose to establish an online learning scheme to search for interesting scenes for robot exploration. On the other hand, existing algorithms are heavily dependent on back-propagation algorithm for learning, which is very computationally expensive. To solve this problem, we introduce a novel translation-invariant 4-D visual memory to identify and recall visually interesting scenes. Human beings have a great capacity to direct visual attention and judge the interestingness of a scene.

For mobile robots, we find the following properties are necessary to establish a sense of visual interestingness:

Unsupervised: Visual interestingness is a psychological process. Its definition is subjective and can change according to one’s experience and environments, thus labels are difficult to obtain. However, prior research mainly focuses on supervised methods, and their performance suffers in a prior unseen environment. We hypothesize that a sense of interestingness can be established for autonomous robots in an unsupervised manner.
Task-dependent: In many tasks, we might only know uninteresting objects before the task is started. For example, in a mine rescue search task, the deployment will be more efficient and easier, if the robots can be taught what is not interesting in the specific scene within several minutes. In this sense, we argue that a visual interestingness detection system should be able to learn negative labeled samples quickly, thus an incremental learning method is necessary. Note that we expect the model is capable of learning negative samples, but it is not necessary in all tasks.

Therefore, to construct a practical interestingness detection system and achieve the above properties, we introduce an unsupervised online learning model with a novel memory mechanism and expect the following outcomes:

Long-term learning: In this stage, we expect a model to be trained off-line on a large amount of data in an unsupervised manner as human beings acquire common knowledge from experience. We also expect the training time on a single machine to be no more than the order of days.
Short-term learning: For task-dependent knowledge, the model then should be able to learn from hundreds of uninteresting images in minutes. This can be done before a mission starts and is beneficial to quick robot deployment.
Online learning: During mission execution the system should express the top interests in real-time and the detected interests should be lost online when they appear frequently, regardless if they exist in the uninteresting images or not. Another important aspect for online learning is no data leakage, i.e., each frame is proceeding without using information from its subsequent frames.

interesting images — Detected interesting scenes

Key Results

In the DARPA Subterranean (SubT) Challenge, each team deploys multiple robots into several mine tunnels (GPS and wireless communication denied) to search for objects. The tunnels have a cumulative linear distance in the range of 4-8 km. The SubT front camera (SubTF) dataset contains seven long videos (1h) recorded by two fully autonomous unmanned ground vehicles (UGV) during their complete exploration in two tunnels during the tunnel circuit. Some of the video shots are presented in Figure 1. It can be seen that the SubTF dataset is very challenging, as the human annotation varies a lot, i.e. only 3.6% of the frames are labeled as interesting by at least 2 subjects, although 15% of the frames are labeled by at least 1 subject (Interest-1).

interestingness score — Fig. 1. This figure shows several examples of both uninteresting and interesting scenes in SubTF dataset taken by the Team Explorer who won the first place in DARPA SubT Challenge tunnel circuit. The height of green strip located at the right of each image indicates the interestingness level predicted by our unsupervised online learning algorithm when it sees the scene for the first time.

Compared to human indicated interestingness the algorithm achieves an average 20% higher accuracy than the approach without online learning. The results indicate that our three-stage architecture of long-term, short-term, and online learning shows promise in representing interestingness for robots.

The map created by Lidar during fully autonomous exploration

Source Codes

Plain Python Package: interestingness
- TRO Branch
- ECCV Branch
ROS Package: interestingness_ros

Publications

Visual Memorability for Robotic Interestingness via Unsupervised Online Learning.
Chen Wang, Wenshan Wang, Yuheng Qiu, Yafei Hu, Sebastian Scherer.
European Conference on Computer Vision (ECCV), pp. 52–68, 2020.

Selected as Oral presentation (2%)
```
@inproceedings{wang2020visual,
  title = {Visual Memorability for Robotic Interestingness via Unsupervised Online Learning},
  author = {Wang, Chen and Wang, Wenshan and Qiu, Yuheng and Hu, Yafei and Scherer, Sebastian},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2020},
  pages = {52--68},
  url = {https://arxiv.org/abs/2005.08829},
  code = {https://github.com/sair-lab/interestingness/tree/eccv},
  video = {https://youtu.be/o9LrDlemerE},
  addendum = {Selected as Oral presentation (2\%)},
  cover = {/img/posts/2020-05-01-interestingness/interestingness_video_short.mp4},
  website = {https://sairlab.org/interestingness/}
}
```
```
Wang, Chen and Wang, Wenshan and Qiu, Yuheng and Hu, Yafei and Scherer, Sebastian, "Visual Memorability for Robotic Interestingness via Unsupervised Online Learning," European Conference on Computer Vision (ECCV), 2020.
```
Unsupervised Online Learning for Robotic Interestingness with Visual Memory.
Chen Wang, Wenshan Wang, Yuheng Qiu, Yafei Hu, Seungchan Kim, Sebastian Scherer.
IEEE Transactions on Robotics (T-RO), vol. 38, no. 4, pp. 2446–2461, 2021.
```
@article{wang2021unsupervised,
  title = {Unsupervised Online Learning for Robotic Interestingness with Visual Memory},
  author = {Wang, Chen and Wang, Wenshan and Qiu, Yuheng and Hu, Yafei and Kim, Seungchan and Scherer, Sebastian},
  journal = {IEEE Transactions on Robotics (T-RO)},
  year = {2021},
  volume = {38},
  number = {4},
  pages = {2446--2461},
  url = {https://arxiv.org/abs/2111.09793},
  code = {https://github.com/sair-lab/interestingness},
  website = {https://sairlab.org/interestingness/},
  cover = {/img/posts/2020-05-01-interestingness/interestingness-tro.jpg}
}
```
```
Wang, Chen and Wang, Wenshan and Qiu, Yuheng and Hu, Yafei and Kim, Seungchan and Scherer, Sebastian, "Unsupervised Online Learning for Robotic Interestingness with Visual Memory," IEEE Transactions on Robotics (T-RO), 2021.
```

Latest News

Neuro-Symbolic Learning for Long-Horizon Task Planning Under Complex Logical Constraints

Bilevel object-importance learning with robust fail recovery for long-horizon task planning.

Bundle Adjustment in the Eager-mode

A PyTorch-native framework for efficient 2nd-order optimization workflows accelerated by GPU.

VL-Nav: Neuro-Symbolic Reasoning-based Vision-Language Navigation

Neural reasoning with symbolic guidance in large-scale environments.

Learning When to Jump for Off-road Navigation

A traversability map for adaptive strategies beyond simple avoidance on challenging terrains.

Fast Task Planning with Neuro-Symbolic Relaxation

A fast yet reliable neuro-symbolic relaxation strategy to accelerate task planning.

CSE 473/573: Computer Vision and Image Processing

Syllabus for Spring 2026

The Summary of 2025

The Theme of SAIR Lab in 2025 is 👉 Transform 👈

PyPose Accumulated Over 160,000 Downloads in 2025 on PyPI

A PyTorch-based library for robot learning with physics-based optimization.

AnyNav: Visual Neuro-Symbolic Friction Learning for Off-road Navigation

A neuro-symbolic framework for friction learning and physics-informed off-road navigation.

Vision-Language Memory for Spatial Reasoning

A vision-language model with memory for long-horizon spatial reasoning.

iA*: Imperative Learning-based A* Search for Path Planning

A self-supervised path-planning method to imporve the search efficiency of A* algorithm.

CSE 473/573: Computer Vision and Image Processing

Syllabus for Fall 2025

iWalker: Imperative Visual Planning for Walking Humanoid Robot

A vision-to-control humanoid stepping controller enhanced by Imperative Learning

Imperative Learning

A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy

SAIR Lab Inspired K-12 Kids on the Robotics Day

An open-to-all interactive robotics day for all K-12 kids and their parents.

GroundSLAM: A Robust Visual SLAM System for Warehouse Robots Using Ground Textures

An extremly efficient and accurate SLAM solution for warehouse robots.

AirRoom: Objects Matter in Room Reidentification

A simple yet highly effective room reidentification system.

SuperPC: A Single Diffusion Model for Unified Point Cloud Processing

A diffusion model for point cloud completion, upsampling, denoising, and colorization.

Roboranking: Robotics Faculty Hub & University Ranking System

A one-stop resources for robotics faculty-student matching, fostering greater visibility.

AirSLAM: An Efficient and Illumination-Robust Point-Line Visual SLAM System

An efficient point-line vSLAM addressing both short-term and long-term illumination challengs.

The Summary of 2024

The Theme of SAIR Lab in 2024 is 👉 Hope 👈

iKap: Kinematics-aware Planning with Imperative Learning

A novel local planning system that integrates a robot's kinematics into its learning to create mo...

LogiCity: Advancing Neural-Symbolic AI with Abstract Urban Simulation

LogiCity is an innovative urban simulator to benchmark Neural-Symbolic AI.

Map it Anywhere: Empowering BEV Map Prediction using Large-scale Public Datasets

A data engine enables seamless curation and modeling map prediction from existing map platforms.

ICRA'25 Workshop on Foundation Models and Neuro-Symbolic AI for Robotics

A series of interactive talks on foundation models and neuro-symbolic AI for robotics.

CSE 473/573: Computer Vision and Image Processing

Syllabus for Fall 2024

PhysORD: A Neuro-Symbolic Approach for Physics-infused Motion Prediction in Off-road Driving

A neural-symbolic motion prediction model integrating the conservation law into neural networks

iMatching: Imperative Correspondence Learning

A self-supervised approach to learn feature matching

iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning

A Self-supervised Approach to Efficiently Solve Min-Max MTSP

Air Series Articles from Junior Researchers

Air Series is a collection of articles that are first authored by junior researchers.