Offline gameplay learning

We employ a game-theoretic reach–avoid reinforcement learning scheme that iteratively pits the robot’s controller against a simulated adversarial environment. The algorithm updates a safety value network (critic) and keeps a leaderboard of the most effective player policies (actors). The video below shows the co-training process:

Online gameplay safety filter

gameplay filter operation

Experiments

table 1 Gameplay rollout - bumpy terrain experiment

Gameplay rollout - tugging experiment

Value shielding and task policy

Gameplay Filters: Safe Robot Walking through Adversarial Imagination

Duy Phuong Nguyen Kai-Chieh Hsu Jaime Fernandez Fisac

Offline gameplay learning

Online gameplay safety filter

Experiments