Thanks for your comment!
I think a sufficiently intelligent ASI is equally likely to outsmart human goal-directedness efforts as it is to outsmart guardrails.
I think number 2 is a good point.
There are many people who actively want to create an aligned ASI as soon as possible to reap its benefits, for whom my suggestion is not useful.
But there are others who primarily want to prevent the creation of a misaligned ASI, and are willing to forgo the creation of an ASI if necessary.
There are also others who want to create an aligned ASI, but are willing to considerably delay this to improve the chances that the ASI is aligned.
I think my suggestion is mainly useful for these second and third groups.
Thanks for your comment!
I think a sufficiently intelligent ASI is equally likely to outsmart human goal-directedness efforts as it is to outsmart guardrails.
I think number 2 is a good point.
There are many people who actively want to create an aligned ASI as soon as possible to reap its benefits, for whom my suggestion is not useful.
But there are others who primarily want to prevent the creation of a misaligned ASI, and are willing to forgo the creation of an ASI if necessary.
There are also others who want to create an aligned ASI, but are willing to considerably delay this to improve the chances that the ASI is aligned.
I think my suggestion is mainly useful for these second and third groups.