The stuff on cluelessness feels like itâs conceding a little too much to the EA/âbayesian frame. Itâs implying that you should have a model of the entire future in order to make decisions. But what I think you actually want to claim is that itâs sensible and even ârationalâ to make non-model-based decisions (e.g. via heuristics, intuitions, etc).
Iâd be interested in hearing more on what exactly you mean by this. Insofar as someone wants to make decisions based on impartially altruistic values, I think cluelessness is their problem, even if they donât make decisions by explicitly optimizing w.r.t. a model of the entire future. If such a person appeals to some heuristics or intuitions as justification for their decisions, then (as argued here) they need to say why those heuristics or intuitions reliably track impact on the impartial good. And the case for that looks pretty dubious to me.
(If youâre rejecting the âmake decisions based on impartially altruistic valuesâ step, fair enough, though I think weâd do well to be explicit about that.)
Letâs limit ourselves for a moment to someone who wants to make their own life go well. They have various instinctive responses that were ingrained in them by evolutionâe.g. a disgust response to certain foods, or a sense of anger at certain behavior. Should they follow those instincts even when they donât know why evolution instilled them? Personally I think they often should, and that this is an example of rational non-model-based decision-making. (Note that this doesnât rely on them understanding evolutionâeven 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in themâe.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
How does this relate to cluelessness? Mostly I donât really know what the term means, itâs not something salient in my ontology. I donât feel clueless about how to have a good life, and I donât feel clueless about how the long-term future should be better, and these two things do in fact seem analogous. In both cases the idea of avoiding pointless internal conflict (and more generally, becoming wiser and more virtuous in other ways) seems pretty solidly good. (Also, the evolution thing isnât central, you have similar dynamics with e.g. intuitions youâve learned at a subconscious level, or behaviors that youâve learned via reinforcement.)
Another way of thinking about it is that, when youâre a subagent within a larger agent, you can believe that âplaying your roleâ within the larger agent is good even when the larger agent is too complex for you to model well (i.e. you have Knightian uncertainty about it).
And yet another way of thinking about it is âforward-chaining often works, even when you donât have a back-chained reason why you think it should workâ.
Thanks for explaining! To summarize, I think there are crucial disanalogies between the âmake their own life go wellâ case and the âmake the future of humanity go wellâ case:
Should they follow those instincts even when they donât know why evolution instilled them? ⌠(Note that this doesnât rely on them understanding evolutionâeven 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
In this case, the reasons to trust âthat some process designed their body and mind to function wellâ are relatively strong. Because of how weâre defining âwellâ: an individualâs survival in a not-too-extreme environment. Even if they donât understand evolution, they can think about how their instincts plausibly wouldâve been honed on feedback relevant to this objective. And/âor look at how other individuals (or their past self) have tended to survive when they trusted these kinds of instincts.[1]
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in themâe.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
Here, the reasons to trust that the instincts track the objective seem way weaker to me,[2] for all the reasons I discuss here: No feedback loops, radically unfamiliar circumstances due to the advent of ASI and the like, a track record of sign-flipping considerations. All else is really not equal.
You yourself have written about how the AIS movement arguably backfired hard, which I really appreciate! I expect that ex ante, people founding this movement told themselves: âAll else equal, it seems pretty reasonable to think that trying to warn people of a source of x-risk, and encouraging research on how to prevent it, will help humanity avoid that x-risk.â
(I think analogous problems apply to your subagent and forward-chaining framings. Theyâre justified when the larger system provides feedback, or the forward steps have been validated in similar contexts â which weâre missing here.)
How does this relate to cluelessness? Mostly I donât really know what the term means
The way I use the term, youâre clueless about how to compare A vs. B, relative to your values, if: It seems arbitrary (upon reflection) to say A is better than, worse than, or exactly as good as B. And instead it seems we should consider Aâs and Bâs goodness incomparable.
What if someone has always been totally solitary, doesnât understand evolution or feedback loops, and hasnât made many decisions based on similar instincts? Seems like such a person wouldnât have reasons to trust their instincts! Theyâd just be getting lucky.
See here for my reply to: âSure, âway weakerâ, but theyâre still slightly better than chance right?â Tl;dr: This doesnât work because the problem isnât just noise that weakens the signal, itâs âitâs ambiguous what the direction of the signal even isâ.
Hmm, I think youâre overstating the disanalogy here. In the case of the individual life, you say âthe reasons to trust âthat some process designed their body and mind to function wellâ are relatively strongâ. But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children. Thereâs a lot of stuff thatâs unprecedented, and which you only get one shot at.
Re the experience of AI safety so far: I certainly do think that a bunch of actions taken by the AI safety movement have backfired. I also think that a bunch have succeeded. If you think of the AI safety movement as a young organism going through its adolescence, itâs gaining a huge amount of experience from all of these interactions with the worldâand the net value of the things that itâs done so far may well be dominated by what updates it makes based on that experience.
I guess thatâs where we disagree. It seems like you think that we should update towards being clueless. Whereas I think we can extract a bunch of generalizable lessons that make us less clueless than we used to beâand one of those lessons is that many of these mistakes could have been prevented by using the right kinds of non-model-based decision-making.
EDIT: another way of trying to get at the crux: do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldnât be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
â[A]llowing [small] groups of humans to cooperate effectivelyâ is very far from âmaking the far future better, impartially speakingâ. Iâd be interested in your responses to the arguments here.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children.
First, itâs not clear to me these people werenât clueless â i.e. really had more reason to choose whatever they chose than the alternatives â depending on how long a time horizon they were aiming to make go well.
Second, insofar as we think these peopleâs choices were justified, I donât see why you think their instincts gave them such justification. Why would these instincts track unprecedented consequences so well?
and the net value of the things that itâs done so far may well be dominated by what updates it makes based on that experience
I donât think âmay wellâ gets us very far. Can you say more why this hypothesis is so much more likely than, say, âthe dominant impacts are the damage thatâs already been doneâ, or âthe dominant impacts will come from near-future decisions, made by actors who are still too ignorant about the extremely complex system theyâre intervening inâ?
do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldnât be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
No, because I think âfounding AI safety movements that succeed at making the far future go betterâ is a pretty out-of-distribution kind of sociopolitical intervention.
Iâd be interested in hearing more on what exactly you mean by this. Insofar as someone wants to make decisions based on impartially altruistic values, I think cluelessness is their problem, even if they donât make decisions by explicitly optimizing w.r.t. a model of the entire future. If such a person appeals to some heuristics or intuitions as justification for their decisions, then (as argued here) they need to say why those heuristics or intuitions reliably track impact on the impartial good. And the case for that looks pretty dubious to me.
(If youâre rejecting the âmake decisions based on impartially altruistic valuesâ step, fair enough, though I think weâd do well to be explicit about that.)
Letâs limit ourselves for a moment to someone who wants to make their own life go well. They have various instinctive responses that were ingrained in them by evolutionâe.g. a disgust response to certain foods, or a sense of anger at certain behavior. Should they follow those instincts even when they donât know why evolution instilled them? Personally I think they often should, and that this is an example of rational non-model-based decision-making. (Note that this doesnât rely on them understanding evolutionâeven 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in themâe.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
How does this relate to cluelessness? Mostly I donât really know what the term means, itâs not something salient in my ontology. I donât feel clueless about how to have a good life, and I donât feel clueless about how the long-term future should be better, and these two things do in fact seem analogous. In both cases the idea of avoiding pointless internal conflict (and more generally, becoming wiser and more virtuous in other ways) seems pretty solidly good. (Also, the evolution thing isnât central, you have similar dynamics with e.g. intuitions youâve learned at a subconscious level, or behaviors that youâve learned via reinforcement.)
Another way of thinking about it is that, when youâre a subagent within a larger agent, you can believe that âplaying your roleâ within the larger agent is good even when the larger agent is too complex for you to model well (i.e. you have Knightian uncertainty about it).
And yet another way of thinking about it is âforward-chaining often works, even when you donât have a back-chained reason why you think it should workâ.
Thanks for explaining! To summarize, I think there are crucial disanalogies between the âmake their own life go wellâ case and the âmake the future of humanity go wellâ case:
In this case, the reasons to trust âthat some process designed their body and mind to function wellâ are relatively strong. Because of how weâre defining âwellâ: an individualâs survival in a not-too-extreme environment. Even if they donât understand evolution, they can think about how their instincts plausibly wouldâve been honed on feedback relevant to this objective. And/âor look at how other individuals (or their past self) have tended to survive when they trusted these kinds of instincts.[1]
Here, the reasons to trust that the instincts track the objective seem way weaker to me,[2] for all the reasons I discuss here: No feedback loops, radically unfamiliar circumstances due to the advent of ASI and the like, a track record of sign-flipping considerations. All else is really not equal.
You yourself have written about how the AIS movement arguably backfired hard, which I really appreciate! I expect that ex ante, people founding this movement told themselves: âAll else equal, it seems pretty reasonable to think that trying to warn people of a source of x-risk, and encouraging research on how to prevent it, will help humanity avoid that x-risk.â
(I think analogous problems apply to your subagent and forward-chaining framings. Theyâre justified when the larger system provides feedback, or the forward steps have been validated in similar contexts â which weâre missing here.)
The way I use the term, youâre clueless about how to compare A vs. B, relative to your values, if: It seems arbitrary (upon reflection) to say A is better than, worse than, or exactly as good as B. And instead it seems we should consider Aâs and Bâs goodness incomparable.
What if someone has always been totally solitary, doesnât understand evolution or feedback loops, and hasnât made many decisions based on similar instincts? Seems like such a person wouldnât have reasons to trust their instincts! Theyâd just be getting lucky.
See here for my reply to: âSure, âway weakerâ, but theyâre still slightly better than chance right?â Tl;dr: This doesnât work because the problem isnât just noise that weakens the signal, itâs âitâs ambiguous what the direction of the signal even isâ.
Hmm, I think youâre overstating the disanalogy here. In the case of the individual life, you say âthe reasons to trust âthat some process designed their body and mind to function wellâ are relatively strongâ. But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children. Thereâs a lot of stuff thatâs unprecedented, and which you only get one shot at.
Re the experience of AI safety so far: I certainly do think that a bunch of actions taken by the AI safety movement have backfired. I also think that a bunch have succeeded. If you think of the AI safety movement as a young organism going through its adolescence, itâs gaining a huge amount of experience from all of these interactions with the worldâand the net value of the things that itâs done so far may well be dominated by what updates it makes based on that experience.
I guess thatâs where we disagree. It seems like you think that we should update towards being clueless. Whereas I think we can extract a bunch of generalizable lessons that make us less clueless than we used to beâand one of those lessons is that many of these mistakes could have been prevented by using the right kinds of non-model-based decision-making.
EDIT: another way of trying to get at the crux: do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldnât be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
â[A]llowing [small] groups of humans to cooperate effectivelyâ is very far from âmaking the far future better, impartially speakingâ. Iâd be interested in your responses to the arguments here.
First, itâs not clear to me these people werenât clueless â i.e. really had more reason to choose whatever they chose than the alternatives â depending on how long a time horizon they were aiming to make go well.
Second, insofar as we think these peopleâs choices were justified, I donât see why you think their instincts gave them such justification. Why would these instincts track unprecedented consequences so well?
I donât think âmay wellâ gets us very far. Can you say more why this hypothesis is so much more likely than, say, âthe dominant impacts are the damage thatâs already been doneâ, or âthe dominant impacts will come from near-future decisions, made by actors who are still too ignorant about the extremely complex system theyâre intervening inâ?
No, because I think âfounding AI safety movements that succeed at making the far future go betterâ is a pretty out-of-distribution kind of sociopolitical intervention.
Richardâs Why Iâm not a Bayesian seems like a good starting point, as does this.
Thanks â Iâve read both but neither seems to answer my objection.