Thanks for explaining! To summarize, I think there are crucial disanalogies between the âmake their own life go wellâ case and the âmake the future of humanity go wellâ case:
Should they follow those instincts even when they donât know why evolution instilled them? ⌠(Note that this doesnât rely on them understanding evolutionâeven 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
In this case, the reasons to trust âthat some process designed their body and mind to function wellâ are relatively strong. Because of how weâre defining âwellâ: an individualâs survival in a not-too-extreme environment. Even if they donât understand evolution, they can think about how their instincts plausibly wouldâve been honed on feedback relevant to this objective. And/âor look at how other individuals (or their past self) have tended to survive when they trusted these kinds of instincts.[1]
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in themâe.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
Here, the reasons to trust that the instincts track the objective seem way weaker to me,[2] for all the reasons I discuss here: No feedback loops, radically unfamiliar circumstances due to the advent of ASI and the like, a track record of sign-flipping considerations. All else is really not equal.
You yourself have written about how the AIS movement arguably backfired hard, which I really appreciate! I expect that ex ante, people founding this movement told themselves: âAll else equal, it seems pretty reasonable to think that trying to warn people of a source of x-risk, and encouraging research on how to prevent it, will help humanity avoid that x-risk.â
(I think analogous problems apply to your subagent and forward-chaining framings. Theyâre justified when the larger system provides feedback, or the forward steps have been validated in similar contexts â which weâre missing here.)
How does this relate to cluelessness? Mostly I donât really know what the term means
The way I use the term, youâre clueless about how to compare A vs. B, relative to your values, if: It seems arbitrary (upon reflection) to say A is better than, worse than, or exactly as good as B. And instead it seems we should consider Aâs and Bâs goodness incomparable.
What if someone has always been totally solitary, doesnât understand evolution or feedback loops, and hasnât made many decisions based on similar instincts? Seems like such a person wouldnât have reasons to trust their instincts! Theyâd just be getting lucky.
See here for my reply to: âSure, âway weakerâ, but theyâre still slightly better than chance right?â Tl;dr: This doesnât work because the problem isnât just noise that weakens the signal, itâs âitâs ambiguous what the direction of the signal even isâ.
Hmm, I think youâre overstating the disanalogy here. In the case of the individual life, you say âthe reasons to trust âthat some process designed their body and mind to function wellâ are relatively strongâ. But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children. Thereâs a lot of stuff thatâs unprecedented, and which you only get one shot at.
Re the experience of AI safety so far: I certainly do think that a bunch of actions taken by the AI safety movement have backfired. I also think that a bunch have succeeded. If you think of the AI safety movement as a young organism going through its adolescence, itâs gaining a huge amount of experience from all of these interactions with the worldâand the net value of the things that itâs done so far may well be dominated by what updates it makes based on that experience.
I guess thatâs where we disagree. It seems like you think that we should update towards being clueless. Whereas I think we can extract a bunch of generalizable lessons that make us less clueless than we used to beâand one of those lessons is that many of these mistakes could have been prevented by using the right kinds of non-model-based decision-making.
EDIT: another way of trying to get at the crux: do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldnât be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
â[A]llowing [small] groups of humans to cooperate effectivelyâ is very far from âmaking the far future better, impartially speakingâ. Iâd be interested in your responses to the arguments here.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children.
First, itâs not clear to me these people werenât clueless â i.e. really had more reason to choose whatever they chose than the alternatives â depending on how long a time horizon they were aiming to make go well.
Second, insofar as we think these peopleâs choices were justified, I donât see why you think their instincts gave them such justification. Why would these instincts track unprecedented consequences so well?
and the net value of the things that itâs done so far may well be dominated by what updates it makes based on that experience
I donât think âmay wellâ gets us very far. Can you say more why this hypothesis is so much more likely than, say, âthe dominant impacts are the damage thatâs already been doneâ, or âthe dominant impacts will come from near-future decisions, made by actors who are still too ignorant about the extremely complex system theyâre intervening inâ?
do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldnât be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
No, because I think âfounding AI safety movements that succeed at making the far future go betterâ is a pretty out-of-distribution kind of sociopolitical intervention.
Thanks for explaining! To summarize, I think there are crucial disanalogies between the âmake their own life go wellâ case and the âmake the future of humanity go wellâ case:
In this case, the reasons to trust âthat some process designed their body and mind to function wellâ are relatively strong. Because of how weâre defining âwellâ: an individualâs survival in a not-too-extreme environment. Even if they donât understand evolution, they can think about how their instincts plausibly wouldâve been honed on feedback relevant to this objective. And/âor look at how other individuals (or their past self) have tended to survive when they trusted these kinds of instincts.[1]
Here, the reasons to trust that the instincts track the objective seem way weaker to me,[2] for all the reasons I discuss here: No feedback loops, radically unfamiliar circumstances due to the advent of ASI and the like, a track record of sign-flipping considerations. All else is really not equal.
You yourself have written about how the AIS movement arguably backfired hard, which I really appreciate! I expect that ex ante, people founding this movement told themselves: âAll else equal, it seems pretty reasonable to think that trying to warn people of a source of x-risk, and encouraging research on how to prevent it, will help humanity avoid that x-risk.â
(I think analogous problems apply to your subagent and forward-chaining framings. Theyâre justified when the larger system provides feedback, or the forward steps have been validated in similar contexts â which weâre missing here.)
The way I use the term, youâre clueless about how to compare A vs. B, relative to your values, if: It seems arbitrary (upon reflection) to say A is better than, worse than, or exactly as good as B. And instead it seems we should consider Aâs and Bâs goodness incomparable.
What if someone has always been totally solitary, doesnât understand evolution or feedback loops, and hasnât made many decisions based on similar instincts? Seems like such a person wouldnât have reasons to trust their instincts! Theyâd just be getting lucky.
See here for my reply to: âSure, âway weakerâ, but theyâre still slightly better than chance right?â Tl;dr: This doesnât work because the problem isnât just noise that weakens the signal, itâs âitâs ambiguous what the direction of the signal even isâ.
Hmm, I think youâre overstating the disanalogy here. In the case of the individual life, you say âthe reasons to trust âthat some process designed their body and mind to function wellâ are relatively strongâ. But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children. Thereâs a lot of stuff thatâs unprecedented, and which you only get one shot at.
Re the experience of AI safety so far: I certainly do think that a bunch of actions taken by the AI safety movement have backfired. I also think that a bunch have succeeded. If you think of the AI safety movement as a young organism going through its adolescence, itâs gaining a huge amount of experience from all of these interactions with the worldâand the net value of the things that itâs done so far may well be dominated by what updates it makes based on that experience.
I guess thatâs where we disagree. It seems like you think that we should update towards being clueless. Whereas I think we can extract a bunch of generalizable lessons that make us less clueless than we used to beâand one of those lessons is that many of these mistakes could have been prevented by using the right kinds of non-model-based decision-making.
EDIT: another way of trying to get at the crux: do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldnât be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
â[A]llowing [small] groups of humans to cooperate effectivelyâ is very far from âmaking the far future better, impartially speakingâ. Iâd be interested in your responses to the arguments here.
First, itâs not clear to me these people werenât clueless â i.e. really had more reason to choose whatever they chose than the alternatives â depending on how long a time horizon they were aiming to make go well.
Second, insofar as we think these peopleâs choices were justified, I donât see why you think their instincts gave them such justification. Why would these instincts track unprecedented consequences so well?
I donât think âmay wellâ gets us very far. Can you say more why this hypothesis is so much more likely than, say, âthe dominant impacts are the damage thatâs already been doneâ, or âthe dominant impacts will come from near-future decisions, made by actors who are still too ignorant about the extremely complex system theyâre intervening inâ?
No, because I think âfounding AI safety movements that succeed at making the far future go betterâ is a pretty out-of-distribution kind of sociopolitical intervention.