Gameplay Filters: Safe Robot Walking through Adversarial Imagination
Duy Phuong Nguyen Kai-Chieh Hsu Jaime Fernandez Fisac
This paper presents gameplay filter, a general approach that leverages offline game-theoretic reinforcement learning to synthesize a highly robust safety filter for high-order nonlinear dynamics that maintains runtime safety by continually simulating adversarial futures and precluding task-driven actions that would cause it to lose future games (and thereby violate safety).
Offline gameplay learning
We employ a game-theoretic reach–avoid reinforcement learning scheme that iteratively pits the robot’s controller against a simulated adversarial environment. The algorithm updates a safety value network (critic) and keeps a leaderboard of the most effective player policies (actors). The video below shows the co-training process:
Online gameplay safety filter
Experiments
Gameplay rollout - bumpy terrain experiment
Gameplay rollout - tugging experiment
Value shielding and task policy