So I said two different things which made my argument unclear. First I said “assuming superintelligence comes aligned with human values” and then I said “AI could lead to major catastrophes, a global totalitarian regime, or human extinction.”
If we knew for sure that AGI is imminent and will eradicate all diseases then I agree with you that it’s worth it to donate to malaria charities. Right now, though, we don’t know what the outcome will be. So, not knowing the outcome of alignment, do you still choose to donate to malaria charities, or do you allocate that money toward, say, a nonprofit actively working on the alignment problem?
Let’s imagine you solve the “alignment problem” tomorrow. So? Exactly who did you solve alignment for? AI aligned to the interests of Elon Musk, Donald Trump, or Vladimir Putin? Or AI aligned with Peter Singer? Or AI aligned to the interests of Google, Meta, TikTok, or Netflix? Or is it alignment with the Democratically determined interests and moral values of the public?
We’ve never even solved the “alignment problem” with humans either. The interests of Google might be opposed to your interests. The interests of Vladimir Putin might be opposed to your interests.
But of course, seeing who is funding AI alignment research, I’ll readily bet that the goal is for AI to be aligned with the interests of tech companies and tech billionaires. That’s the goal after all. Make AI safe enough so that AI can be profitable.
I agree that AI could be aligned to certain people or groups, but the dialogue revolving around it is aligned with humanity. Even so, wouldn’t pushing for alignment for all of humanity be a worthwhile effort instead of funding malaria charities, especially given that if AI is aligned only with elites, it could possibly devolve into a totalitarian regime that should become the main focus of overthrowing instead of fighting malaria?
I’m not naive enough to assume alignment will be achieved with all of humanity, but that should be the goal, and is something many companies are at least openly advocating for, whether or not that comes to pass or not. There is also the possibility of other superintelligences being built after the first one, which could align with all of humanity (unless, of course, the first superintelligence built subverts those projects).
We’re not even capable of aligning of governments and corporations to humanity. How aligned is the US federal government? How aligned is the EU? how aligned is China?
We’re not capable of aligning the most powerful entities.
Moreover EA seems disinterested to be in aligning any of these powerful entities to humanity. EA funds little to nothing in for example, improving democratic decision making, which IMO the only viable alignment strategy. The obvious first step in alignment with “humanity” is to bother to even find out what humanity wants. That demands collective preference evaluation. And there already are already existing techniques to do so, little which interests either EA or AI advocates.
IMO if you were serious about alignment with humanity, you would be spending exorbitant amounts of alignment research on the lower hanging fruit, nation states and corporations, which presumably are less powerful than super intelligence. But you can’t even align the mere human, good luck with the superhuman. AI alignment will be impossible as these groups align AI with their own interests.
But please prove me wrong. Please show me a stronger commitment to democracy, to ensure that any entity can be aligned to “humanity”.
I don’t claim you can align human groups with individual humans. If I’m reading you correctly, I think you’re committing a category error in assigning alignment properties to groups of people like nation states or companies. Alignment, as I’m using the term, is the alignment of goals or values from an AI to a person or group of people. We expect this, I think, in part because we’re accustomed to telling computers what to do and having them do exactly what we say (not always exactly what we mean, though).
Alignment is extremely tricky for the unenhanced human, but theoretically possible. My first best guess at solving it would be to automate the research and development of it with AI itself. We’ll soon reach a sufficiently advanced AI that’s capable of reasoning beyond anything anyone on Earth can come up with; we just have to ensure that the AI is aligned and that the one that trained that one is also aligned, and so on. My second-best guess would be through BCIs, and my third would be whole-brain emulation interpretability.
Assuming we even do develop alignment techniques, I’d argue that exclusive alignment (that is, for one or a small group of people) is more difficult than aligning with humanity.at large for the following reasons (I realize some of these go both ways, but I include them because I see them as more serious for exclusive alignment–like value drift):
Value drift.
Impossible specification (e.g., in exploring the inherent contradictions in expressed human values, the AGI expands moral consideration beyond initial human constraints, discovering some form of moral universalism or a morality beyond all human reasoning).
Emergent properties appear, producing unexpected behavior, and we cannot align systems to exhibit properties we cannot anticipate.
Exclusive alignment’s instrumental goals may broaden AGI’s moral scope to include more humans (i.e., it may be that broader alignment makes for a more robust AI system).
Competing AGIs have been successfully created that are designed to align with all of humanity.
Exclusively aligned AGI may still satisfy many, if not all, of the preferences that the rest of humanity possesses.
Exclusive alignment requires perfect internal coordination of values within organizations, but inevitable divergent interests emerge as they scale; these coordination failures multiply when AGI systems interpret instructions literally and optimize against specified metrics.
Alignment requires resolving disagreements over value prioritization, a meta-preference problem. Yet resolving these conflicts necessitates assumptions about how they should be resolved, creating an infinite regress that defies a technical solution.
So I said two different things which made my argument unclear. First I said “assuming superintelligence comes aligned with human values” and then I said “AI could lead to major catastrophes, a global totalitarian regime, or human extinction.”
If we knew for sure that AGI is imminent and will eradicate all diseases then I agree with you that it’s worth it to donate to malaria charities. Right now, though, we don’t know what the outcome will be. So, not knowing the outcome of alignment, do you still choose to donate to malaria charities, or do you allocate that money toward, say, a nonprofit actively working on the alignment problem?
Shameless plug; I have an idea for a nonprofit that aims to help solve the alignment problem—https://forum.effectivealtruism.org/posts/GGxZhEdxndsyhFnGG/an-international-collaborative-hub-for-advancing-ai-safety?utm_campaign=post_share&utm_source=link
Let’s imagine you solve the “alignment problem” tomorrow. So? Exactly who did you solve alignment for? AI aligned to the interests of Elon Musk, Donald Trump, or Vladimir Putin? Or AI aligned with Peter Singer? Or AI aligned to the interests of Google, Meta, TikTok, or Netflix? Or is it alignment with the Democratically determined interests and moral values of the public?
We’ve never even solved the “alignment problem” with humans either. The interests of Google might be opposed to your interests. The interests of Vladimir Putin might be opposed to your interests.
But of course, seeing who is funding AI alignment research, I’ll readily bet that the goal is for AI to be aligned with the interests of tech companies and tech billionaires. That’s the goal after all. Make AI safe enough so that AI can be profitable.
I agree that AI could be aligned to certain people or groups, but the dialogue revolving around it is aligned with humanity. Even so, wouldn’t pushing for alignment for all of humanity be a worthwhile effort instead of funding malaria charities, especially given that if AI is aligned only with elites, it could possibly devolve into a totalitarian regime that should become the main focus of overthrowing instead of fighting malaria?
I’m not naive enough to assume alignment will be achieved with all of humanity, but that should be the goal, and is something many companies are at least openly advocating for, whether or not that comes to pass or not. There is also the possibility of other superintelligences being built after the first one, which could align with all of humanity (unless, of course, the first superintelligence built subverts those projects).
We’re not even capable of aligning of governments and corporations to humanity. How aligned is the US federal government? How aligned is the EU? how aligned is China?
We’re not capable of aligning the most powerful entities.
Moreover EA seems disinterested to be in aligning any of these powerful entities to humanity. EA funds little to nothing in for example, improving democratic decision making, which IMO the only viable alignment strategy. The obvious first step in alignment with “humanity” is to bother to even find out what humanity wants. That demands collective preference evaluation. And there already are already existing techniques to do so, little which interests either EA or AI advocates.
IMO if you were serious about alignment with humanity, you would be spending exorbitant amounts of alignment research on the lower hanging fruit, nation states and corporations, which presumably are less powerful than super intelligence. But you can’t even align the mere human, good luck with the superhuman. AI alignment will be impossible as these groups align AI with their own interests.
But please prove me wrong. Please show me a stronger commitment to democracy, to ensure that any entity can be aligned to “humanity”.
I don’t claim you can align human groups with individual humans. If I’m reading you correctly, I think you’re committing a category error in assigning alignment properties to groups of people like nation states or companies. Alignment, as I’m using the term, is the alignment of goals or values from an AI to a person or group of people. We expect this, I think, in part because we’re accustomed to telling computers what to do and having them do exactly what we say (not always exactly what we mean, though).
Alignment is extremely tricky for the unenhanced human, but theoretically possible. My first best guess at solving it would be to automate the research and development of it with AI itself. We’ll soon reach a sufficiently advanced AI that’s capable of reasoning beyond anything anyone on Earth can come up with; we just have to ensure that the AI is aligned and that the one that trained that one is also aligned, and so on. My second-best guess would be through BCIs, and my third would be whole-brain emulation interpretability.
Assuming we even do develop alignment techniques, I’d argue that exclusive alignment (that is, for one or a small group of people) is more difficult than aligning with humanity.at large for the following reasons (I realize some of these go both ways, but I include them because I see them as more serious for exclusive alignment–like value drift):
Value drift.
Impossible specification (e.g., in exploring the inherent contradictions in expressed human values, the AGI expands moral consideration beyond initial human constraints, discovering some form of moral universalism or a morality beyond all human reasoning).
Emergent properties appear, producing unexpected behavior, and we cannot align systems to exhibit properties we cannot anticipate.
Exclusive alignment’s instrumental goals may broaden AGI’s moral scope to include more humans (i.e., it may be that broader alignment makes for a more robust AI system).
Competing AGIs have been successfully created that are designed to align with all of humanity.
Exclusively aligned AGI may still satisfy many, if not all, of the preferences that the rest of humanity possesses.
Exclusive alignment requires perfect internal coordination of values within organizations, but inevitable divergent interests emerge as they scale; these coordination failures multiply when AGI systems interpret instructions literally and optimize against specified metrics.
Alignment requires resolving disagreements over value prioritization, a meta-preference problem. Yet resolving these conflicts necessitates assumptions about how they should be resolved, creating an infinite regress that defies a technical solution.