I personally think that this is yet more evidence that formal control is a path which is more promising than others. If you can formally prove that your code, when properly executed, has certain properties then that gives you some hope that those properties will be durable during and after a hard left turn.
Things like, if you had a magic wand, formally proving that any AI designed by a formally controlled AI will also be formally controlled. That way even if it whooshes and completely redesigns itself there is still some hope.
I would love to see the amount of resources going into formal methods be multiplied by 10x or 100x, I think if we built a really solid field, where all of modern mathematics and computer science is formalised and people write formally verified code by default because it’s safer and there are good libraries to do that, then in that environment the control problem becomes easier, if still extremely hard.
This is a nice writeup and summary.
I personally think that this is yet more evidence that formal control is a path which is more promising than others. If you can formally prove that your code, when properly executed, has certain properties then that gives you some hope that those properties will be durable during and after a hard left turn.
Things like, if you had a magic wand, formally proving that any AI designed by a formally controlled AI will also be formally controlled. That way even if it whooshes and completely redesigns itself there is still some hope.
I would love to see the amount of resources going into formal methods be multiplied by 10x or 100x, I think if we built a really solid field, where all of modern mathematics and computer science is formalised and people write formally verified code by default because it’s safer and there are good libraries to do that, then in that environment the control problem becomes easier, if still extremely hard.