I do want to write something along the lines of “Alignment is a Political Philosophy Problem”
My takes on AI, and the problem of x-risk, have been in flux over the last 1.5 years, but they do seem to be more and focused on the idea of power and politics, as opposed to finding a mythical ‘correct’ utility function for a hypothesised superintelligence. Making TAI/AGI/ASI go well therefore falls in the reference class of ‘principal agent problem’/‘public choice theory’/‘social contract theory’ rather than ‘timeless decision theory/coherent extrapolated volition’. The latter 2 are poor answers to an incorrect framing of the question.
Nora Belrose’s various posts on scepticism of the case for AI Safety, and even on Safety policy proposals conditional on the first case being right.[1]
I also think this view helps explain the huge range of backlash that AI Safety received over SB1047 and after the awfully botched OpenAI board coup. They were both attempted exercises in political power, and the pushback often came criticising this instead of looking on the ‘object level’ of risk arguments. I increasingly think that this is not an ‘irrational’ response but perfectly thing, and “AI Safety” needs to pursue more co-operative strategies that credibly signal legitimacy.
I do want to write something along the lines of “Alignment is a Political Philosophy Problem”
My takes on AI, and the problem of x-risk, have been in flux over the last 1.5 years, but they do seem to be more and focused on the idea of power and politics, as opposed to finding a mythical ‘correct’ utility function for a hypothesised superintelligence. Making TAI/AGI/ASI go well therefore falls in the reference class of ‘principal agent problem’/‘public choice theory’/‘social contract theory’ rather than ‘timeless decision theory/coherent extrapolated volition’. The latter 2 are poor answers to an incorrect framing of the question.
Writing that influenced my on this journey:
Tan Zhi Xuan’s whole work, especially Beyond Preferences in AI Alignment
Joe Carlsmith’s Otherness and control in the age of AGI sequence
Matthew Barnett’s various posts on AI recently, especially viewing it as an ‘institutional design’ problem
Nora Belrose’s various posts on scepticism of the case for AI Safety, and even on Safety policy proposals conditional on the first case being right.[1]
The recent Gradual Disempowerment post is something along the lines I’m thinking of too
I also think this view helps explain the huge range of backlash that AI Safety received over SB1047 and after the awfully botched OpenAI board coup. They were both attempted exercises in political power, and the pushback often came criticising this instead of looking on the ‘object level’ of risk arguments. I increasingly think that this is not an ‘irrational’ response but perfectly thing, and “AI Safety” needs to pursue more co-operative strategies that credibly signal legitimacy.
I think the downvotes these got are, in retrospect, a poor sign for epistemic health