Let’s limit ourselves for a moment to someone who wants to make their own life go well. They have various instinctive responses that were ingrained in them by evolution—e.g. a disgust response to certain foods, or a sense of anger at certain behavior. Should they follow those instincts even when they don’t know why evolution instilled them? Personally I think they often should, and that this is an example of rational non-model-based decision-making. (Note that this doesn’t rely on them understanding evolution—even 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in them—e.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
How does this relate to cluelessness? Mostly I don’t really know what the term means, it’s not something salient in my ontology. I don’t feel clueless about how to have a good life, and I don’t feel clueless about how the long-term future should be better, and these two things do in fact seem analogous. In both cases the idea of avoiding pointless internal conflict (and more generally, becoming wiser and more virtuous in other ways) seems pretty solidly good. (Also, the evolution thing isn’t central, you have similar dynamics with e.g. intuitions you’ve learned at a subconscious level, or behaviors that you’ve learned via reinforcement.)
Another way of thinking about it is that, when you’re a subagent within a larger agent, you can believe that “playing your role” within the larger agent is good even when the larger agent is too complex for you to model well (i.e. you have Knightian uncertainty about it).
And yet another way of thinking about it is “forward-chaining often works, even when you don’t have a back-chained reason why you think it should work”.
Thanks for explaining! To summarize, I think there are crucial disanalogies between the “make their own life go well” case and the “make the future of humanity go well” case:
Should they follow those instincts even when they don’t know why evolution instilled them? … (Note that this doesn’t rely on them understanding evolution—even 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
In this case, the reasons to trust “that some process designed their body and mind to function well” are relatively strong. Because of how we’re defining “well”: an individual’s survival in a not-too-extreme environment. Even if they don’t understand evolution, they can think about how their instincts plausibly would’ve been honed on feedback relevant to this objective. And/or look at how other individuals (or their past self) have tended to survive when they trusted these kinds of instincts.[1]
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in them—e.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
Here, the reasons to trust that the instincts track the objective seem way weaker to me,[2] for all the reasons I discuss here: No feedback loops, radically unfamiliar circumstances due to the advent of ASI and the like, a track record of sign-flipping considerations. All else is really not equal.
You yourself have written about how the AIS movement arguably backfired hard, which I really appreciate! I expect that ex ante, people founding this movement told themselves: “All else equal, it seems pretty reasonable to think that trying to warn people of a source of x-risk, and encouraging research on how to prevent it, will help humanity avoid that x-risk.”
(I think analogous problems apply to your subagent and forward-chaining framings. They’re justified when the larger system provides feedback, or the forward steps have been validated in similar contexts — which we’re missing here.)
How does this relate to cluelessness? Mostly I don’t really know what the term means
The way I use the term, you’re clueless about how to compare A vs. B, relative to your values, if: It seems arbitrary (upon reflection) to say A is better than, worse than, or exactly as good as B. And instead it seems we should consider A’s and B’s goodness incomparable.
What if someone has always been totally solitary, doesn’t understand evolution or feedback loops, and hasn’t made many decisions based on similar instincts? Seems like such a person wouldn’t have reasons to trust their instincts! They’d just be getting lucky.
See here for my reply to: “Sure, ‘way weaker’, but they’re still slightly better than chance right?” Tl;dr: This doesn’t work because the problem isn’t just noise that weakens the signal, it’s “it’s ambiguous what the direction of the signal even is”.
Hmm, I think you’re overstating the disanalogy here. In the case of the individual life, you say “the reasons to trust “that some process designed their body and mind to function well” are relatively strong”. But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children. There’s a lot of stuff that’s unprecedented, and which you only get one shot at.
Re the experience of AI safety so far: I certainly do think that a bunch of actions taken by the AI safety movement have backfired. I also think that a bunch have succeeded. If you think of the AI safety movement as a young organism going through its adolescence, it’s gaining a huge amount of experience from all of these interactions with the world—and the net value of the things that it’s done so far may well be dominated by what updates it makes based on that experience.
I guess that’s where we disagree. It seems like you think that we should update towards being clueless. Whereas I think we can extract a bunch of generalizable lessons that make us less clueless than we used to be—and one of those lessons is that many of these mistakes could have been prevented by using the right kinds of non-model-based decision-making.
EDIT: another way of trying to get at the crux: do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldn’t be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
“[A]llowing [small] groups of humans to cooperate effectively” is very far from “making the far future better, impartially speaking”. I’d be interested in your responses to the arguments here.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children.
First, it’s not clear to me these people weren’t clueless — i.e. really had more reason to choose whatever they chose than the alternatives — depending on how long a time horizon they were aiming to make go well.
Second, insofar as we think these people’s choices were justified, I don’t see why you think their instincts gave them such justification. Why would these instincts track unprecedented consequences so well?
and the net value of the things that it’s done so far may well be dominated by what updates it makes based on that experience
I don’t think “may well” gets us very far. Can you say more why this hypothesis is so much more likely than, say, “the dominant impacts are the damage that’s already been done”, or “the dominant impacts will come from near-future decisions, made by actors who are still too ignorant about the extremely complex system they’re intervening in”?
do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldn’t be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
No, because I think “founding AI safety movements that succeed at making the far future go better” is a pretty out-of-distribution kind of sociopolitical intervention.
Let’s limit ourselves for a moment to someone who wants to make their own life go well. They have various instinctive responses that were ingrained in them by evolution—e.g. a disgust response to certain foods, or a sense of anger at certain behavior. Should they follow those instincts even when they don’t know why evolution instilled them? Personally I think they often should, and that this is an example of rational non-model-based decision-making. (Note that this doesn’t rely on them understanding evolution—even 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in them—e.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
How does this relate to cluelessness? Mostly I don’t really know what the term means, it’s not something salient in my ontology. I don’t feel clueless about how to have a good life, and I don’t feel clueless about how the long-term future should be better, and these two things do in fact seem analogous. In both cases the idea of avoiding pointless internal conflict (and more generally, becoming wiser and more virtuous in other ways) seems pretty solidly good. (Also, the evolution thing isn’t central, you have similar dynamics with e.g. intuitions you’ve learned at a subconscious level, or behaviors that you’ve learned via reinforcement.)
Another way of thinking about it is that, when you’re a subagent within a larger agent, you can believe that “playing your role” within the larger agent is good even when the larger agent is too complex for you to model well (i.e. you have Knightian uncertainty about it).
And yet another way of thinking about it is “forward-chaining often works, even when you don’t have a back-chained reason why you think it should work”.
Thanks for explaining! To summarize, I think there are crucial disanalogies between the “make their own life go well” case and the “make the future of humanity go well” case:
In this case, the reasons to trust “that some process designed their body and mind to function well” are relatively strong. Because of how we’re defining “well”: an individual’s survival in a not-too-extreme environment. Even if they don’t understand evolution, they can think about how their instincts plausibly would’ve been honed on feedback relevant to this objective. And/or look at how other individuals (or their past self) have tended to survive when they trusted these kinds of instincts.[1]
Here, the reasons to trust that the instincts track the objective seem way weaker to me,[2] for all the reasons I discuss here: No feedback loops, radically unfamiliar circumstances due to the advent of ASI and the like, a track record of sign-flipping considerations. All else is really not equal.
You yourself have written about how the AIS movement arguably backfired hard, which I really appreciate! I expect that ex ante, people founding this movement told themselves: “All else equal, it seems pretty reasonable to think that trying to warn people of a source of x-risk, and encouraging research on how to prevent it, will help humanity avoid that x-risk.”
(I think analogous problems apply to your subagent and forward-chaining framings. They’re justified when the larger system provides feedback, or the forward steps have been validated in similar contexts — which we’re missing here.)
The way I use the term, you’re clueless about how to compare A vs. B, relative to your values, if: It seems arbitrary (upon reflection) to say A is better than, worse than, or exactly as good as B. And instead it seems we should consider A’s and B’s goodness incomparable.
What if someone has always been totally solitary, doesn’t understand evolution or feedback loops, and hasn’t made many decisions based on similar instincts? Seems like such a person wouldn’t have reasons to trust their instincts! They’d just be getting lucky.
See here for my reply to: “Sure, ‘way weaker’, but they’re still slightly better than chance right?” Tl;dr: This doesn’t work because the problem isn’t just noise that weakens the signal, it’s “it’s ambiguous what the direction of the signal even is”.
Hmm, I think you’re overstating the disanalogy here. In the case of the individual life, you say “the reasons to trust “that some process designed their body and mind to function well” are relatively strong”. But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children. There’s a lot of stuff that’s unprecedented, and which you only get one shot at.
Re the experience of AI safety so far: I certainly do think that a bunch of actions taken by the AI safety movement have backfired. I also think that a bunch have succeeded. If you think of the AI safety movement as a young organism going through its adolescence, it’s gaining a huge amount of experience from all of these interactions with the world—and the net value of the things that it’s done so far may well be dominated by what updates it makes based on that experience.
I guess that’s where we disagree. It seems like you think that we should update towards being clueless. Whereas I think we can extract a bunch of generalizable lessons that make us less clueless than we used to be—and one of those lessons is that many of these mistakes could have been prevented by using the right kinds of non-model-based decision-making.
EDIT: another way of trying to get at the crux: do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldn’t be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
“[A]llowing [small] groups of humans to cooperate effectively” is very far from “making the far future better, impartially speaking”. I’d be interested in your responses to the arguments here.
First, it’s not clear to me these people weren’t clueless — i.e. really had more reason to choose whatever they chose than the alternatives — depending on how long a time horizon they were aiming to make go well.
Second, insofar as we think these people’s choices were justified, I don’t see why you think their instincts gave them such justification. Why would these instincts track unprecedented consequences so well?
I don’t think “may well” gets us very far. Can you say more why this hypothesis is so much more likely than, say, “the dominant impacts are the damage that’s already been done”, or “the dominant impacts will come from near-future decisions, made by actors who are still too ignorant about the extremely complex system they’re intervening in”?
No, because I think “founding AI safety movements that succeed at making the far future go better” is a pretty out-of-distribution kind of sociopolitical intervention.