I do want to write something along the lines of “Alignment is a Political Philosophy Problem”
My takes on AI, and the problem of x-risk, have been in flux over the last 1.5 years, but they do seem to be more and focused on the idea of power and politics, as opposed to finding a mythical ‘correct’ utility function for a hypothesised superintelligence. Making TAI/AGI/ASI go well therefore falls in the reference class of ‘principal agent problem’/‘public choice theory’/‘social contract theory’ rather than ‘timeless decision theory/coherent extrapolated volition’. The latter 2 are poor answers to an incorrect framing of the question.
Nora Belrose’s various posts on scepticism of the case for AI Safety, and even on Safety policy proposals conditional on the first case being right.[1]
I also think this view helps explain the huge range of backlash that AI Safety received over SB1047 and after the awfully botched OpenAI board coup. They were both attempted exercises in political power, and the pushback often came criticising this instead of looking on the ‘object level’ of risk arguments. I increasingly think that this is not an ‘irrational’ response but perfectly thing, and “AI Safety” needs to pursue more co-operative strategies that credibly signal legitimacy.
My own take is that while I don’t want to defend the “find a correct utility function” approach to alignment to be sufficient at this time, I do think it is actually necessary, and that the modern era is an anomaly in how much we can get away with misalignment being checked by institutions that go beyond an individual.
The basic reason why we can get away with not solving the alignment problem is that humans depend on other humans, and in particular you cannot replace humans with much cheaper workers that have their preferences controlled arbitrarily.
AI threatens the need to depend on other humans, which is a critical part of how we can get away with not needing the correct utility function.
I like the Intelligence Curse series because it points out that an elite that doesn’t need the commoners for anything and the commoners have no selfish value to the elite fundamentally means that by default, the elites starve the commoners to death without them being value aligned.
I do want to write something along the lines of “Alignment is a Political Philosophy Problem”
My takes on AI, and the problem of x-risk, have been in flux over the last 1.5 years, but they do seem to be more and focused on the idea of power and politics, as opposed to finding a mythical ‘correct’ utility function for a hypothesised superintelligence. Making TAI/AGI/ASI go well therefore falls in the reference class of ‘principal agent problem’/‘public choice theory’/‘social contract theory’ rather than ‘timeless decision theory/coherent extrapolated volition’. The latter 2 are poor answers to an incorrect framing of the question.
Writing that influenced my on this journey:
Tan Zhi Xuan’s whole work, especially Beyond Preferences in AI Alignment
Joe Carlsmith’s Otherness and control in the age of AGI sequence
Matthew Barnett’s various posts on AI recently, especially viewing it as an ‘institutional design’ problem
Nora Belrose’s various posts on scepticism of the case for AI Safety, and even on Safety policy proposals conditional on the first case being right.[1]
The recent Gradual Disempowerment post is something along the lines I’m thinking of too
I also think this view helps explain the huge range of backlash that AI Safety received over SB1047 and after the awfully botched OpenAI board coup. They were both attempted exercises in political power, and the pushback often came criticising this instead of looking on the ‘object level’ of risk arguments. I increasingly think that this is not an ‘irrational’ response but perfectly thing, and “AI Safety” needs to pursue more co-operative strategies that credibly signal legitimacy.
I think the downvotes these got are, in retrospect, a poor sign for epistemic health
My own take is that while I don’t want to defend the “find a correct utility function” approach to alignment to be sufficient at this time, I do think it is actually necessary, and that the modern era is an anomaly in how much we can get away with misalignment being checked by institutions that go beyond an individual.
The basic reason why we can get away with not solving the alignment problem is that humans depend on other humans, and in particular you cannot replace humans with much cheaper workers that have their preferences controlled arbitrarily.
AI threatens the need to depend on other humans, which is a critical part of how we can get away with not needing the correct utility function.
I like the Intelligence Curse series because it points out that an elite that doesn’t need the commoners for anything and the commoners have no selfish value to the elite fundamentally means that by default, the elites starve the commoners to death without them being value aligned.
The Intelligence Curse series is below:
https://intelligence-curse.ai/defining/
The AIs are the elites, and the rest of humanity is the commoners in this analogy.