Offline gameplay learning

We employ a game-theoretic reach–avoid reinforcement learning scheme that iteratively pits the robot’s controller against a simulated adversarial environment. The algorithm updates a safety value network (critic) and keeps a leaderboard of the most effective player policies (actors). The video below shows the co-training process:



Online gameplay safety filter

gameplay filter operation



Experiments

table 1 Gameplay rollout - bumpy terrain experiment

Gameplay rollout - tugging experiment

Value shielding and task policy