Safety Distance: Move Slow and Don’t Break Things

Safety Distance: Move Slow and Don’t Break Things

James Giammona, Brad Neuberg

Problem

  • Provide prior for RL agents so that agents avoid dangerous parts of the environment that they’ve never seen before
  • Zero shot generalization at test time beyond dangerous things agent has seen at training time

Proposed Solution

  • Add heuristic term to reward function that is only needed at test time (i.e. we don’t need to retrain a system to work with it).
  • This heuristic encodes the intuition that rapidly changing values in the environment could be dangerous (ex: fast moving cars, sharp changes in temperature, a rapid change in altitude such as a pit, etc.)
  • Basically, large changes in the first derivative of state attributes probably indicate safety issues.

  • Inspired by Deep Mind’s recent work in creating toy gridworlds encapsulating concrete problems in AI safety (“on-off switch”, “distributional shift”, “unexpected side effects”, etc.), we extended one of their gridworlds that has lava in a single location during training time, while this lava is in different locations at test time.
  • At test time, we calculate our “safety distance”, which should have a high value when the agent is far away from areas of rapid change and a low value when near these areas. This “safety distance” then augments the standard reward.
  • We did not have time to tie in an RL algorithm to then decide on what action to take with this modified reward, so simply implemented a random walk for now along with displaying the calculated safety distance.

Future Work

  • Actually tie in RL algorithms and see how this improves the number of times of death (i.e. agent terminates in a way that is irrevocable)
  • Rather than hard code a heuristic like we have done on the test-time reward value, can we use a function approximator like a deep net to emit this “safety distance” so it can be a learned value?