12/20/2022

Thread

Tweet 1

@sailaunderscore 2. If the AI was actually motivated to maximize it's utility according to this reward function it would never do anything, it would always click the button.

---

Tweet 2

@sailaunderscore I think 2 should be sort of insurmountable: either the AI wants the reward and is useless or it doesn't and is unsafe.

---