Cool project :) There’s definitely something very important in the rough direction you’re pointing. Some thoughts on how to gain more clarity on it:
I suspect that it’d be worth your time to think a bunch about the relationship between altruism and ethics. In some sense, I think of ethics (and particularly virtue ethics) as already a kind of “integral altruism”—i.e. ethics as a set of principles and heuristics by which we can remain in integrity with ourselves and others, thereby allowing our compassion to actually make the world better.
I think that the hippie/metamodern/etc communities are very good at some aspects of ethics, but quite bad at others. In general they tend to err on the side of agreeableness, rather than e.g. being honest about unpleasant truths. It feels valuable to take this broad worldview and then try to add a bunch of moral courage that it’s currently missing (analogous to how you can think of EA as adding moral courage to econ-brained thinking).
However, I feel pretty confused about how to actually help people aim their moral courage towards being ethical, since IMO neither EA nor most inner work helps much with this. One litmus test that I use to evaluate whether inner work is actually making people braver is whether they’re more willing to break political taboos afterwards (e.g. for people in the UK, by making a fuss about the Pakistani rape gangs); however, this seldom comes up positive. Another litmus test is whether they’re more willing to face the possibility of physical violence when appropriate (e.g. when a crazy person is being a bit menacing in public, do they still just look away?). These are just illustrative examples but hopefully they point at what I think is missing by default.
The stuff on cluelessness feels like it’s conceding a little too much to the EA/bayesian frame. It’s implying that you should have a model of the entire future in order to make decisions. But what I think you actually want to claim is that it’s sensible and even “rational” to make non-model-based decisions (e.g. via heuristics, intuitions, etc). Some other terms that might be better: bounded rationality, group agency, Knightian uncertainty. I sometimes use “distributed agency” or “coalitional agency”, but I think they won’t make sense to most of your readers.
The problem with stuff like systems thinking & complexity science is that it’s not really aiming to make the same kind of scientific progress as sciences like physics or evolutionary biology have made. More generally, it seems easy for movements like integral altruism to fall into the trap of not pinning down core ideas and claims. But insofar as integral altruism is true, it suggests that something important about the expected utility maximization paradigm is false, which someone should pin down. In other words, imagine that someone from the 22nd century comes back and tells you that something like integral altruism was actually scientifically/mathematically correct. What’s the version of integral altruism that actually leads to you figuring that out?
I filled in your form, and am excited to see where you take this!
The stuff on cluelessness feels like it’s conceding a little too much to the EA/bayesian frame. It’s implying that you should have a model of the entire future in order to make decisions. But what I think you actually want to claim is that it’s sensible and even “rational” to make non-model-based decisions (e.g. via heuristics, intuitions, etc).
I’d be interested in hearing more on what exactly you mean by this. Insofar as someone wants to make decisions based on impartially altruistic values, I think cluelessness is their problem, even if they don’t make decisions by explicitly optimizing w.r.t. a model of the entire future. If such a person appeals to some heuristics or intuitions as justification for their decisions, then (as argued here) they need to say why those heuristics or intuitions reliably track impact on the impartial good. And the case for that looks pretty dubious to me.
(If you’re rejecting the “make decisions based on impartially altruistic values” step, fair enough, though I think we’d do well to be explicit about that.)
Let’s limit ourselves for a moment to someone who wants to make their own life go well. They have various instinctive responses that were ingrained in them by evolution—e.g. a disgust response to certain foods, or a sense of anger at certain behavior. Should they follow those instincts even when they don’t know why evolution instilled them? Personally I think they often should, and that this is an example of rational non-model-based decision-making. (Note that this doesn’t rely on them understanding evolution—even 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in them—e.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
How does this relate to cluelessness? Mostly I don’t really know what the term means, it’s not something salient in my ontology. I don’t feel clueless about how to have a good life, and I don’t feel clueless about how the long-term future should be better, and these two things do in fact seem analogous. In both cases the idea of avoiding pointless internal conflict (and more generally, becoming wiser and more virtuous in other ways) seems pretty solidly good. (Also, the evolution thing isn’t central, you have similar dynamics with e.g. intuitions you’ve learned at a subconscious level, or behaviors that you’ve learned via reinforcement.)
Another way of thinking about it is that, when you’re a subagent within a larger agent, you can believe that “playing your role” within the larger agent is good even when the larger agent is too complex for you to model well (i.e. you have Knightian uncertainty about it).
And yet another way of thinking about it is “forward-chaining often works, even when you don’t have a back-chained reason why you think it should work”.
Thanks for explaining! To summarize, I think there are crucial disanalogies between the “make their own life go well” case and the “make the future of humanity go well” case:
Should they follow those instincts even when they don’t know why evolution instilled them? … (Note that this doesn’t rely on them understanding evolution—even 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
In this case, the reasons to trust “that some process designed their body and mind to function well” are relatively strong. Because of how we’re defining “well”: an individual’s survival in a not-too-extreme environment. Even if they don’t understand evolution, they can think about how their instincts plausibly would’ve been honed on feedback relevant to this objective. And/or look at how other individuals (or their past self) have tended to survive when they trusted these kinds of instincts.[1]
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in them—e.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
Here, the reasons to trust that the instincts track the objective seem way weaker to me,[2] for all the reasons I discuss here: No feedback loops, radically unfamiliar circumstances due to the advent of ASI and the like, a track record of sign-flipping considerations. All else is really not equal.
You yourself have written about how the AIS movement arguably backfired hard, which I really appreciate! I expect that ex ante, people founding this movement told themselves: “All else equal, it seems pretty reasonable to think that trying to warn people of a source of x-risk, and encouraging research on how to prevent it, will help humanity avoid that x-risk.”
(I think analogous problems apply to your subagent and forward-chaining framings. They’re justified when the larger system provides feedback, or the forward steps have been validated in similar contexts — which we’re missing here.)
How does this relate to cluelessness? Mostly I don’t really know what the term means
The way I use the term, you’re clueless about how to compare A vs. B, relative to your values, if: It seems arbitrary (upon reflection) to say A is better than, worse than, or exactly as good as B. And instead it seems we should consider A’s and B’s goodness incomparable.
What if someone has always been totally solitary, doesn’t understand evolution or feedback loops, and hasn’t made many decisions based on similar instincts? Seems like such a person wouldn’t have reasons to trust their instincts! They’d just be getting lucky.
See here for my reply to: “Sure, ‘way weaker’, but they’re still slightly better than chance right?” Tl;dr: This doesn’t work because the problem isn’t just noise that weakens the signal, it’s “it’s ambiguous what the direction of the signal even is”.
Hmm, I think you’re overstating the disanalogy here. In the case of the individual life, you say “the reasons to trust “that some process designed their body and mind to function well” are relatively strong”. But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children. There’s a lot of stuff that’s unprecedented, and which you only get one shot at.
Re the experience of AI safety so far: I certainly do think that a bunch of actions taken by the AI safety movement have backfired. I also think that a bunch have succeeded. If you think of the AI safety movement as a young organism going through its adolescence, it’s gaining a huge amount of experience from all of these interactions with the world—and the net value of the things that it’s done so far may well be dominated by what updates it makes based on that experience.
I guess that’s where we disagree. It seems like you think that we should update towards being clueless. Whereas I think we can extract a bunch of generalizable lessons that make us less clueless than we used to be—and one of those lessons is that many of these mistakes could have been prevented by using the right kinds of non-model-based decision-making.
EDIT: another way of trying to get at the crux: do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldn’t be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
“[A]llowing [small] groups of humans to cooperate effectively” is very far from “making the far future better, impartially speaking”. I’d be interested in your responses to the arguments here.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children.
First, it’s not clear to me these people weren’t clueless — i.e. really had more reason to choose whatever they chose than the alternatives — depending on how long a time horizon they were aiming to make go well.
Second, insofar as we think these people’s choices were justified, I don’t see why you think their instincts gave them such justification. Why would these instincts track unprecedented consequences so well?
and the net value of the things that it’s done so far may well be dominated by what updates it makes based on that experience
I don’t think “may well” gets us very far. Can you say more why this hypothesis is so much more likely than, say, “the dominant impacts are the damage that’s already been done”, or “the dominant impacts will come from near-future decisions, made by actors who are still too ignorant about the extremely complex system they’re intervening in”?
do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldn’t be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
No, because I think “founding AI safety movements that succeed at making the far future go better” is a pretty out-of-distribution kind of sociopolitical intervention.
On the ‘hippies have too much agreeableness’ point—yes, you are totally right!!
On the ‘pinning down core int/a claims’ point. I agree that in general getting more precise about claims is good. But I have some caution around pushing to generate precise object-level claims that “define int/a”, in that you have to believe these claims to be part of it. One thing I feel towards EA is that it used to be about “the question” (how to do the most good), and created room for people to generate new answers to that question, but more recently it has become about “the answer” (this short list of career paths is how to do the most good). But I don’t think the cultural/structural locking in of those answers is good because we might be missing crucial considerations that will only become clear in the future.
I have some caution around pushing to generate precise object-level claims that “define int/a”, in that you have to believe these claims to be part of it.
Yeah, I phrased it badly when I said that the movement should be pinning down claims. I’m not suggesting that you use these claims to define membership. Indeed, even the framing of your original post feels too “we are a group defined by believing the same things” for my taste (as compared with, say, “we’re some collaborators with similar intellectual/emotional/ethical stances”).
But I’m excited about you (and the others you mention in this post) writing about the things you personally think the EA worldview gets wrong, ideally not just engaging with how the movement turned out in practice but the broken philosophical assumptions that led to practical mistakes.
As one example, EAs constantly use “value-aligned” as a metric of who to ally with. But it seems pretty plausible to me that SBF was extremely value-aligned with most of the stated philosophical principles of EA. The problem was that he wasn’t value-aligned with the background ethics of society that EA mostly takes for granted. Understanding this deeply enough I think would lead to you reconceptualize the whole concept of “value-aligned” towards things more reminiscent of int/a (in a way that would then have implications for e.g. what moral theories to believe, what alignment targets to aim AIs at, etc).
Cool project :) There’s definitely something very important in the rough direction you’re pointing. Some thoughts on how to gain more clarity on it:
I suspect that it’d be worth your time to think a bunch about the relationship between altruism and ethics. In some sense, I think of ethics (and particularly virtue ethics) as already a kind of “integral altruism”—i.e. ethics as a set of principles and heuristics by which we can remain in integrity with ourselves and others, thereby allowing our compassion to actually make the world better.
I think that the hippie/metamodern/etc communities are very good at some aspects of ethics, but quite bad at others. In general they tend to err on the side of agreeableness, rather than e.g. being honest about unpleasant truths. It feels valuable to take this broad worldview and then try to add a bunch of moral courage that it’s currently missing (analogous to how you can think of EA as adding moral courage to econ-brained thinking).
However, I feel pretty confused about how to actually help people aim their moral courage towards being ethical, since IMO neither EA nor most inner work helps much with this. One litmus test that I use to evaluate whether inner work is actually making people braver is whether they’re more willing to break political taboos afterwards (e.g. for people in the UK, by making a fuss about the Pakistani rape gangs); however, this seldom comes up positive. Another litmus test is whether they’re more willing to face the possibility of physical violence when appropriate (e.g. when a crazy person is being a bit menacing in public, do they still just look away?). These are just illustrative examples but hopefully they point at what I think is missing by default.
The stuff on cluelessness feels like it’s conceding a little too much to the EA/bayesian frame. It’s implying that you should have a model of the entire future in order to make decisions. But what I think you actually want to claim is that it’s sensible and even “rational” to make non-model-based decisions (e.g. via heuristics, intuitions, etc). Some other terms that might be better: bounded rationality, group agency, Knightian uncertainty. I sometimes use “distributed agency” or “coalitional agency”, but I think they won’t make sense to most of your readers.
The problem with stuff like systems thinking & complexity science is that it’s not really aiming to make the same kind of scientific progress as sciences like physics or evolutionary biology have made. More generally, it seems easy for movements like integral altruism to fall into the trap of not pinning down core ideas and claims. But insofar as integral altruism is true, it suggests that something important about the expected utility maximization paradigm is false, which someone should pin down. In other words, imagine that someone from the 22nd century comes back and tells you that something like integral altruism was actually scientifically/mathematically correct. What’s the version of integral altruism that actually leads to you figuring that out?
I filled in your form, and am excited to see where you take this!
I’d be interested in hearing more on what exactly you mean by this. Insofar as someone wants to make decisions based on impartially altruistic values, I think cluelessness is their problem, even if they don’t make decisions by explicitly optimizing w.r.t. a model of the entire future. If such a person appeals to some heuristics or intuitions as justification for their decisions, then (as argued here) they need to say why those heuristics or intuitions reliably track impact on the impartial good. And the case for that looks pretty dubious to me.
(If you’re rejecting the “make decisions based on impartially altruistic values” step, fair enough, though I think we’d do well to be explicit about that.)
Let’s limit ourselves for a moment to someone who wants to make their own life go well. They have various instinctive responses that were ingrained in them by evolution—e.g. a disgust response to certain foods, or a sense of anger at certain behavior. Should they follow those instincts even when they don’t know why evolution instilled them? Personally I think they often should, and that this is an example of rational non-model-based decision-making. (Note that this doesn’t rely on them understanding evolution—even 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in them—e.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
How does this relate to cluelessness? Mostly I don’t really know what the term means, it’s not something salient in my ontology. I don’t feel clueless about how to have a good life, and I don’t feel clueless about how the long-term future should be better, and these two things do in fact seem analogous. In both cases the idea of avoiding pointless internal conflict (and more generally, becoming wiser and more virtuous in other ways) seems pretty solidly good. (Also, the evolution thing isn’t central, you have similar dynamics with e.g. intuitions you’ve learned at a subconscious level, or behaviors that you’ve learned via reinforcement.)
Another way of thinking about it is that, when you’re a subagent within a larger agent, you can believe that “playing your role” within the larger agent is good even when the larger agent is too complex for you to model well (i.e. you have Knightian uncertainty about it).
And yet another way of thinking about it is “forward-chaining often works, even when you don’t have a back-chained reason why you think it should work”.
Thanks for explaining! To summarize, I think there are crucial disanalogies between the “make their own life go well” case and the “make the future of humanity go well” case:
In this case, the reasons to trust “that some process designed their body and mind to function well” are relatively strong. Because of how we’re defining “well”: an individual’s survival in a not-too-extreme environment. Even if they don’t understand evolution, they can think about how their instincts plausibly would’ve been honed on feedback relevant to this objective. And/or look at how other individuals (or their past self) have tended to survive when they trusted these kinds of instincts.[1]
Here, the reasons to trust that the instincts track the objective seem way weaker to me,[2] for all the reasons I discuss here: No feedback loops, radically unfamiliar circumstances due to the advent of ASI and the like, a track record of sign-flipping considerations. All else is really not equal.
You yourself have written about how the AIS movement arguably backfired hard, which I really appreciate! I expect that ex ante, people founding this movement told themselves: “All else equal, it seems pretty reasonable to think that trying to warn people of a source of x-risk, and encouraging research on how to prevent it, will help humanity avoid that x-risk.”
(I think analogous problems apply to your subagent and forward-chaining framings. They’re justified when the larger system provides feedback, or the forward steps have been validated in similar contexts — which we’re missing here.)
The way I use the term, you’re clueless about how to compare A vs. B, relative to your values, if: It seems arbitrary (upon reflection) to say A is better than, worse than, or exactly as good as B. And instead it seems we should consider A’s and B’s goodness incomparable.
What if someone has always been totally solitary, doesn’t understand evolution or feedback loops, and hasn’t made many decisions based on similar instincts? Seems like such a person wouldn’t have reasons to trust their instincts! They’d just be getting lucky.
See here for my reply to: “Sure, ‘way weaker’, but they’re still slightly better than chance right?” Tl;dr: This doesn’t work because the problem isn’t just noise that weakens the signal, it’s “it’s ambiguous what the direction of the signal even is”.
Hmm, I think you’re overstating the disanalogy here. In the case of the individual life, you say “the reasons to trust “that some process designed their body and mind to function well” are relatively strong”. But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.
I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children. There’s a lot of stuff that’s unprecedented, and which you only get one shot at.
Re the experience of AI safety so far: I certainly do think that a bunch of actions taken by the AI safety movement have backfired. I also think that a bunch have succeeded. If you think of the AI safety movement as a young organism going through its adolescence, it’s gaining a huge amount of experience from all of these interactions with the world—and the net value of the things that it’s done so far may well be dominated by what updates it makes based on that experience.
I guess that’s where we disagree. It seems like you think that we should update towards being clueless. Whereas I think we can extract a bunch of generalizable lessons that make us less clueless than we used to be—and one of those lessons is that many of these mistakes could have been prevented by using the right kinds of non-model-based decision-making.
EDIT: another way of trying to get at the crux: do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldn’t be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?
“[A]llowing [small] groups of humans to cooperate effectively” is very far from “making the far future better, impartially speaking”. I’d be interested in your responses to the arguments here.
First, it’s not clear to me these people weren’t clueless — i.e. really had more reason to choose whatever they chose than the alternatives — depending on how long a time horizon they were aiming to make go well.
Second, insofar as we think these people’s choices were justified, I don’t see why you think their instincts gave them such justification. Why would these instincts track unprecedented consequences so well?
I don’t think “may well” gets us very far. Can you say more why this hypothesis is so much more likely than, say, “the dominant impacts are the damage that’s already been done”, or “the dominant impacts will come from near-future decisions, made by actors who are still too ignorant about the extremely complex system they’re intervening in”?
No, because I think “founding AI safety movements that succeed at making the far future go better” is a pretty out-of-distribution kind of sociopolitical intervention.
Richard’s Why I’m not a Bayesian seems like a good starting point, as does this.
Thanks — I’ve read both but neither seems to answer my objection.
Thanks Richard!
On the ‘hippies have too much agreeableness’ point—yes, you are totally right!!
On the ‘pinning down core int/a claims’ point. I agree that in general getting more precise about claims is good. But I have some caution around pushing to generate precise object-level claims that “define int/a”, in that you have to believe these claims to be part of it. One thing I feel towards EA is that it used to be about “the question” (how to do the most good), and created room for people to generate new answers to that question, but more recently it has become about “the answer” (this short list of career paths is how to do the most good). But I don’t think the cultural/structural locking in of those answers is good because we might be missing crucial considerations that will only become clear in the future.
Yeah, I phrased it badly when I said that the movement should be pinning down claims. I’m not suggesting that you use these claims to define membership. Indeed, even the framing of your original post feels too “we are a group defined by believing the same things” for my taste (as compared with, say, “we’re some collaborators with similar intellectual/emotional/ethical stances”).
But I’m excited about you (and the others you mention in this post) writing about the things you personally think the EA worldview gets wrong, ideally not just engaging with how the movement turned out in practice but the broken philosophical assumptions that led to practical mistakes.
As one example, EAs constantly use “value-aligned” as a metric of who to ally with. But it seems pretty plausible to me that SBF was extremely value-aligned with most of the stated philosophical principles of EA. The problem was that he wasn’t value-aligned with the background ethics of society that EA mostly takes for granted. Understanding this deeply enough I think would lead to you reconceptualize the whole concept of “value-aligned” towards things more reminiscent of int/a (in a way that would then have implications for e.g. what moral theories to believe, what alignment targets to aim AIs at, etc).