Yep, thanks for pointing that out! Fixed it.
...I haven’t seen much discussion about the downsides of delaying
I’m not sure how your first point relates to what I was saying in this post; but, I’ll take a guess. I said something about how investing in capabilities at anthropic could be good. An upside to this would be increasing the probability that EAs end up controlling the super-intelligent AGI in the future. The downside is that it could shorten timelines, but hopefully this can be mitigated by keeping all of the research under wraps (which is what they are doing). This is a controversial issue though. I haven’t thought very much about whether the upsides outweigh the downsides, but the argument in this post caused me to believe the upsides were larger than I thought before.
Also I’m not sure about outcome 1 having zero utility...
It doesn’t matter what outcome you assign zero value to as long as the relative values are the same since if a utility function is an affine function of another utility function then they produce equivalent decisions.
Yep, I didn’t initially understand you. That’s a great point!
This means the framework I presented in this post is wrong. I agree now with your statement:
I think the framework in this post can be modified to incorporate this and the conclusions are similar. The quantity that dominates the utility calculation is now the expected representation of utilitarianism in the AGI’s values.
The two handles become:
(1) The probability of misalignment.
(2) The expected representation of utilitarianism in the moral parliament conditional on alignment.
The conclusion of the post, then, should be something like “interventions that increase (2) might be underrated” instead of “interventions that increase the probability of fully utilitarian AGI are underrated.”