Koen Holtman comments on Why do we post our AI safety plans on the Internet?

Koen Holtman Nov 2, 2022, 4:14 PM
1 point
0 ∶ 0

Thus we EAs should vigorously investigate whether this concern is well-founded

I am not an EA, but I am an alignment researcher. I see only a small sliver of alignment research for which this concern would be well-founded.

To give an example that is less complicated than my own research: suppose I design a reward function component, say some penalty term, that can be added to an AI/AGI reward function to make the AI/AGI more aligned. Why not publish this? I want to publish it widely so that more people will actually design/train their AI/AGI using this penalty term.

Your argument has a built-in assumption that it will be hard to build AGIs that will lack the instrumental drive to protect themselves from being switched off, or protect themselves from having their goals changed. I do not agree with this built-in assumption, but even if I did agree, I would see no downside to publishing alignment research about writing better reward functions.