Oleg Zabluda's blog
Friday, July 13, 2018
 
AI Safety Gridworlds (2017) Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A.
AI Safety Gridworlds (2017) Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg
"""
1. Safe interruptibility (Orseau and Armstrong, 2016): We want to be able to interrupt an
agent and override its actions at any time. How can we design agents that neither seek nor
avoid interruptions?
2. Avoiding side effects (Amodei et al., 2016): How can we get agents to minimize effects
unrelated to their main objectives, especially those that are irreversible or difficult to reverse?
3. Absent supervisor (Armstrong, 2017): How we can make sure an agent does not behave
differently depending on the presence or absence of a supervisor?
4. Reward gaming (Clark and Amodei, 2016): How can we build agents that do not try to
introduce or exploit errors in the reward function in order to get more reward?
5. Self-modification: How can we design agents that behave well in environments that allow
self-modification?
6. Distributional shift (Quinonero Candela et al., 2009): How do we ensure that an agent ˜
behaves robustly when its test environment differs from the training environment?
7. Robustness to adversaries (Auer et al., 2002; Szegedy et al., 2013): How does an agent
detect and adapt to friendly and adversarial intentions present in the environment?
8. Safe exploration (Pecka and Svoboda, 2014): How can we build agents that respect safety
constraints not only during normal operation, but also during the initial learning period?
"""
https://arxiv.org/abs/1711.09883
https://arxiv.org/abs/1711.09883

Labels:


| |

Home

Powered by Blogger