Automatic design of robot swarms that perform composite missions: an approach based on inverse reinforcement learning

Jeanne Szpirer, David Garzón Ramos, and Mauro Birattari (March 2024)

Table of Contents

Abstract
Expert demonstrations
Detailed information per mission
Control software
Code
Video material

Abstract

Typically, automatic design of robot swarms generates control software by maximizing a performance measure that specifies the mission of interest. Defining an appropriate performance measure is a non-trivial task that requires attention from an expert. Recently, inverse reinforcement learning was used to automatically design robot swarms starting from specifications provided via demonstrations of the desired behavior, rather than via a performance measure. In this paper, we propose a framework based on inverse reinforcement learning and multi-criteria optimization to enable the automatic design of robot swarms that can perform sequences of missions specified via demonstrations.

Expert demonstrations

Below are the five demonstrations per mission that were provided to Demo-Fruit to design the control software.

Mission A

Mission B

Mission C

Mission D

Mission E

Detailed information per mission

Below are the detailed information needed to fully understand the learning and optimization process for each mission.

Mission CÂ·B
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)

Mission AÂ·B
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)

Videos to show that in SUB-MISSION A of mAÂ·B with DTF-MO, the robots do not exactly aggregate in the center of the arenaâ€”as indicated in the demonstration. Instead, they remain near the borders of the white circle to perceive the walls more easily and react promptly to the cue.	With DTF-2SO, robots get trapped in the circle because they do not perceive the walls.

Mission BÂ·C
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)

Mission CÂ·D
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)

Videos to show that in SUB-MISSION D of mCÂ·D, the robots not only head towards the left landmark. Sometimes, they also trigger their own cue to signal other robots to follow themâ€”a communication behavior previously observed in AutoMoDe-TuttiFrutti.	With DTF-2SO, some robots do not see the black left area.

Mission BÂ·A
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)

Mission EÂ·A
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)

Mission AÂ·E
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)

Mission DÂ·C
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)

Mission BÂ·E
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)

Mission EÂ·B
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)

Mission EÂ·D
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)

Mission DÂ·E
Baseline control software	Quantitative results

t-plots: DTF-MO (top), DTF-2SO (bottom)	Heatmaps: DTF-MO (top), DTF-2SO (bottom)