IRIDIA - Supplementary Information (ISSN: 2684-2041)

Supplementary material for the paper:

Automatic design of robot swarms that perform composite missions: an approach based on inverse reinforcement learning

Jeanne Szpirer, David Garzón Ramos, and Mauro Birattari (March 2024)


Table of Contents
  1. Abstract
  2. Expert demonstrations
  3. Detailed information per mission
  4. Control software
  5. Code
  6. Video material

Abstract

Typically, automatic design of robot swarms generates control software by maximizing a performance measure that specifies the mission of interest. Defining an appropriate performance measure is a non-trivial task that requires attention from an expert. Recently, inverse reinforcement learning was used to automatically design robot swarms starting from specifications provided via demonstrations of the desired behavior, rather than via a performance measure. In this paper, we propose a framework based on inverse reinforcement learning and multi-criteria optimization to enable the automatic design of robot swarms that can perform sequences of missions specified via demonstrations.

Expert demonstrations

Below are the five demonstrations per mission that were provided to Demo-Fruit to design the control software.

Mission A

Demonstration for Mission A Demonstration for Mission A Demonstration for Mission A Demonstration for Mission A Demonstration for Mission A

Mission B

Demonstration for Mission B Demonstration for Mission B Demonstration for Mission B Demonstration for Mission B Demonstration for Mission B

Mission C

Demonstration for Mission C Demonstration for Mission C Demonstration for Mission C Demonstration for Mission C Demonstration for Mission C

Mission D

Demonstration for Mission D Demonstration for Mission D Demonstration for Mission D Demonstration for Mission D Demonstration for Mission D

Mission E

Demonstration for Mission E Demonstration for Mission E Demonstration for Mission E Demonstration for Mission E Demonstration for Mission E

Detailed information per mission

Below are the detailed information needed to fully understand the learning and optimization process for each mission.

Mission C·B

Baseline control software
Quantitative results
Baseline PFSM for C·B Quantitative results for C·B
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for C·B: DTF-MO (top), DTF-2SO (bottom) Heatmaps for C·B: DTF-MO (top), DTF-2SO (bottom)

Mission A·B

Baseline control software
Quantitative results
Baseline PFSM for A·B Quantitative results for A·B
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for A·B: DTF-MO (top), DTF-2SO (bottom) Heatmaps for A·B: DTF-MO (top), DTF-2SO (bottom)
Videos to show that in SUB-MISSION A of mA·B with DTF-MO, the robots do not exactly aggregate in the center of the arena—as indicated in the demonstration. Instead, they remain near the borders of the white circle to perceive the walls more easily and react promptly to the cue.
With DTF-2SO, robots get trapped in the circle because they do not perceive the walls.

Mission B·C

Baseline control software
Quantitative results
Baseline PFSM for B·C Quantitative results for B·C
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for B·C: DTF-MO (top), DTF-2SO (bottom) Heatmaps for B·C: DTF-MO (top), DTF-2SO (bottom)

Mission C·D

Baseline control software
Quantitative results
Baseline PFSM for C·D Quantitative results for C·D
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for C·D: DTF-MO (top), DTF-2SO (bottom) Heatmaps for C·D: DTF-MO (top), DTF-2SO (bottom)
Videos to show that in SUB-MISSION D of mC·D, the robots not only head towards the left landmark. Sometimes, they also trigger their own cue to signal other robots to follow them—a communication behavior previously observed in AutoMoDe-TuttiFrutti.
With DTF-2SO, some robots do not see the black left area.

Mission B·A

Baseline control software
Quantitative results
Baseline PFSM for B·A Quantitative results for B·A
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for B·A: DTF-MO (top), DTF-2SO (bottom) Heatmaps for B·A: DTF-MO (top), DTF-2SO (bottom)

Mission E·A

Baseline control software
Quantitative results
Baseline PFSM for E·A Quantitative results for E·A
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for E·A: DTF-MO (top), DTF-2SO (bottom) Heatmaps for E·A: DTF-MO (top), DTF-2SO (bottom)

Mission A·E

Baseline control software
Quantitative results
Baseline PFSM for A·E Quantitative results for A·E
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for A·E: DTF-MO (top), DTF-2SO (bottom) Heatmaps for A·E: DTF-MO (top), DTF-2SO (bottom)

Mission D·C

Baseline control software
Quantitative results
Baseline PFSM for D·C Quantitative results for D·C
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for D·C: DTF-MO (top), DTF-2SO (bottom) Heatmaps for D·C: DTF-MO (top), DTF-2SO (bottom)

Mission B·E

Baseline control software
Quantitative results
Baseline PFSM for B·E Quantitative results for B·E
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for B·E: DTF-MO (top), DTF-2SO (bottom) Heatmaps for B·E: DTF-MO (top), DTF-2SO (bottom)

Mission E·B

Baseline control software
Quantitative results
Baseline PFSM for E·B Quantitative results for E·B
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for E·B: DTF-MO (top), DTF-2SO (bottom) Heatmaps for E·B: DTF-MO (top), DTF-2SO (bottom)

Mission E·D

Baseline control software
Quantitative results
Baseline PFSM for E·D Quantitative results for E·D
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for E·D: DTF-MO (top), DTF-2SO (bottom) Heatmaps for E·D: DTF-MO (top), DTF-2SO (bottom)

Mission D·E

Baseline control software
Quantitative results
Baseline PFSM for D·E Quantitative results for D·E
t-plots: DTF-MO (top), DTF-2SO (bottom)
Heatmaps: DTF-MO (top), DTF-2SO (bottom)
t-plots for D·E: DTF-MO (top), DTF-2SO (bottom) Heatmaps for D·E: DTF-MO (top), DTF-2SO (bottom)

Control software

The PFSM of instances of each sequence are available for download.

Code

The code and installation instructions are available for download.

Video material

Videos of some generated behaviors are available online.