I am mostly active on LessWrong. See my profile and my self-introduction there for more.
Max H
There are a lot more than two decision theories. Most are designed to do equally well or better than both causal and evidential decision theory in Newcomb-like problems and even more exotic setups.
The basic idea in all of them is that, instead of choosing the best decision at any particular decision point, they choose the best decision-making algorithm across possible world states.
I think the original CEV paper from 2003 addresses (or at least discusses) a lot of these concerns. Basically, the thing that a group attempting to build an aligned AI should try to align it with is the collective CEV of humanity, not any individual humans.
On anti-natalism, religious extremism, voluntary extinction, etc. - if those values end up being stable under reflection, faster and more coherent thinking, and don’t end up dominated by other values of the people who hold them, then the Future may indeed include things which satisfy or maximize those values.
(Though those values, and the people that hold them don’t necessarily get more say than people who believe the opposite. If some interests and values are truly irreconcilable, a compromise might look like dividing up chunks of the lightcone.)
Of course, the first group who attempts to build a super-intelligence might try to align it with something else—their own personal CEV (which may or may not have a component for the collective CEV of humanity), or some kind of equal or unequal split between the individual CEVs of every human, or every sentient, etc. or something else entirely.This would be inadvisable for various reasons discussed in the paper, and I agree it is a real danger / problem. (Mostly though, I think anyone who tries to build any kind of CEV sovereign right now just fails, and we end up with tiny molecular squiggles.)
Yep. I think in my ideal world, there would be exactly one operationally adequate organization permitted to build AGI. Membership in that organization would require a credible pledge to altruism and a test of oath-keeping ability.
Monopoly power of this organization to build AGI would be enforced by a global majority of nation states, with monitoring and deterrence against defection.
I think a stable equilibrium of that kind is possible in principle, though obviously we’re pretty far away from it being anywhere near the Overton Window. (For good reason—it’s a scary idea, and probably ends up looking pretty dystopian when implemented by existing Earth governments. Alas! Sometimes draconian measures really are necessary; reality is not always nice.)
In the absence of such a radically different global political order we might have to take our chances on the hope that the decision-makers at OpenAI, Deepmind, Anthropic, etc. will all be reasonably nice and altruistic, and not power / profit-seeking. Not great!
There might be worlds in between the most radical one sketched above and our current trajectory, but I worry that any “half measures” end up being ineffective and costly and worse than nothing, mirroring many countries’ approach to COVID lockdowns.