Thread
Tweet 1
@sailaunderscore 2. If the AI was actually motivated to maximize it's utility according to this reward function it would never do anything, it would always click the button.
---
Tweet 2
@sailaunderscore I think 2 should be sort of insurmountable: either the AI wants the reward and is useless or it doesn't and is unsafe.
---