Do you think that AI safety is i) at least a bit good in expectation (but like with a determinate credence barely higher than 50% because high risk/uncertainty) or ii) you don’t have determinate credences and feel clueless/agnostic about this? I feel like your post implicitly keeps jumping back and forth between these two positions, and only (i) could support your conclusions. If we assume (ii), everything falls apart. There’s no reason to support a cause X (or the exact opposite of X) to any degree if one is totally clueless about whether it is good.
One of the reasons I wrote this post was to reflect on excellent comments like yours. Thank you for posting and spotting this inconsistency!
You rightly point out that I jump between i) and ii). The short answer is that, at least for AI safety, I feel clueless or agnostic about whether this cause is positive in expectation. @Mo Putera summarised this nicely in their comment. I am happy to expand on the reasons as to why I think that.
What is your perspective here? If you do have a determinate credence above 50% for AI safety work, how do you arrive at this conclusion? I know you have been also doing some in-depth thinking on the topic of cluelessness.
Next, I want to push back on your claim that if ii) is correct, everything collapses. I agree that this would lead to the conclusion that we are probably entirely clueless about longtermist causes, probably the vast majority of causes in the world. However, it would make me lean toward near-term areas with much shorter causal chains, where there is a smaller margin of error—for example, caring for your family or local animals, which carry a low risk of backfiring.
Although, to be fair, this is unclear as well if one is also clueless about different moral frameworks. For example, helping a young child who fell off their skateboard might seem altruistic but could inadvertently increase their ambition, leading them to become the next Hitler or a power-seeking tech CEO. And to take this to the next level: not taking an action also has downsides (e.g not addressing the ongoing suffering in the world). Yaay!
If conclusion ii) is correct for all causes, altruism would indeed seem not possible from a consequentialist perspective. I don’t have a counterargument at the moment.
If you do have a determinate credence above 50% for AI safety work, how do you arrive at this conclusion?
It happens that I do not. But I would if I believed there was evidence robust to unknown unknowns in favor of assuming “AI Safety work” is good, factoring in all the possible consequences from now until the end of time. This would require robust reasons to believe that current AI safety work actually increases rather than decreases safety overall AND that increased safety is actually good all things considered (e.g., that human disempowerment is actually bad overall). (See Guillaume’s comment on the distinction). I won’t elaborate on what would count as “evidence robust to unknown unknowns” in such a context but this is a topic for a future post/paper, hopefully.
Next, I want to push back on your claim that if ii) is correct, everything collapses. I agree that this would lead to the conclusion that we are probably entirely clueless about longtermist causes, probably the vast majority of causes in the world. However, it would make me lean toward near-term areas with much shorter causal chains, where there is a smaller margin of error—for example, caring for your family or local animals, which carry a low risk of backfiring.
Sorry, I didn’t mean to argue against that. I just meant that work you are clueless about (e.g. maybe AI safety work in your case?) shouldn’t be given any weight in your diversified portfolio. I didn’t mean to make any claim about what I personnally think we should or shouldn’t be clueless about. The “everything falls apart” was unclear and probably unwarranted.
I agree with your reasoning, and the way you’ve articulated it is very compelling to me! It seems that the bar this evidence would need to reach is, quite literally, impossible.
I would even take this further and argue that your chain of reasoning could be applied to most causes (perhaps even all?), which seems valid.
Would you disagree with this?
Your reply also raises a broader question for me: What criteria must an intervention meet for our determinance credence in its expected value being positive to exceed 50%, thereby justifying work on it?
I would even take this further and argue that your chain of reasoning could be applied to most causes (perhaps even all?), which seems valid.
Would you disagree with this?
I mean, I didn’t actually give any argument for why I don’t believe AI safety is good overall (assuming pure longtermism, i.e., taking into account everything from now until the end of time). I just said that I would believe it if there was evidence robust to unknown unknowns. (I haven’t argued that there wasn’t such evidence already; although the burden of the proof is very much on the opposite claim tbf). But I think this criterion applies to all causes where unknown unknowns are substantial, and I believe this is all of them as long as we’re evaluating them from a pure longtermist perspective, yes. And whether there is any cause that meets this criterion depends on one’s values I think. From a classical utilitarian perspective (and assuming the trade-offs between suffering and pleasure that most longtermists endorse), for example, I think there’s very plausibly none that does meet this criterion.
Do you think that AI safety is i) at least a bit good in expectation (but like with a determinate credence barely higher than 50% because high risk/uncertainty) or ii) you don’t have determinate credences and feel clueless/agnostic about this? I feel like your post implicitly keeps jumping back and forth between these two positions, and only (i) could support your conclusions. If we assume (ii), everything falls apart. There’s no reason to support a cause X (or the exact opposite of X) to any degree if one is totally clueless about whether it is good.
Thanks for writing this :)
One of the reasons I wrote this post was to reflect on excellent comments like yours. Thank you for posting and spotting this inconsistency!
You rightly point out that I jump between i) and ii). The short answer is that, at least for AI safety, I feel clueless or agnostic about whether this cause is positive in expectation. @Mo Putera summarised this nicely in their comment. I am happy to expand on the reasons as to why I think that.
What is your perspective here? If you do have a determinate credence above 50% for AI safety work, how do you arrive at this conclusion? I know you have been also doing some in-depth thinking on the topic of cluelessness.
Next, I want to push back on your claim that if ii) is correct, everything collapses. I agree that this would lead to the conclusion that we are probably entirely clueless about longtermist causes, probably the vast majority of causes in the world. However, it would make me lean toward near-term areas with much shorter causal chains, where there is a smaller margin of error—for example, caring for your family or local animals, which carry a low risk of backfiring.
Although, to be fair, this is unclear as well if one is also clueless about different moral frameworks. For example, helping a young child who fell off their skateboard might seem altruistic but could inadvertently increase their ambition, leading them to become the next Hitler or a power-seeking tech CEO. And to take this to the next level: not taking an action also has downsides (e.g not addressing the ongoing suffering in the world). Yaay!
If conclusion ii) is correct for all causes, altruism would indeed seem not possible from a consequentialist perspective. I don’t have a counterargument at the moment.
I would love to hear your thoughts on this!
Thank you for engaging :)
It happens that I do not. But I would if I believed there was evidence robust to unknown unknowns in favor of assuming “AI Safety work” is good, factoring in all the possible consequences from now until the end of time. This would require robust reasons to believe that current AI safety work actually increases rather than decreases safety overall AND that increased safety is actually good all things considered (e.g., that human disempowerment is actually bad overall). (See Guillaume’s comment on the distinction). I won’t elaborate on what would count as “evidence robust to unknown unknowns” in such a context but this is a topic for a future post/paper, hopefully.
Sorry, I didn’t mean to argue against that. I just meant that work you are clueless about (e.g. maybe AI safety work in your case?) shouldn’t be given any weight in your diversified portfolio. I didn’t mean to make any claim about what I personnally think we should or shouldn’t be clueless about. The “everything falls apart” was unclear and probably unwarranted.
I agree with your reasoning, and the way you’ve articulated it is very compelling to me! It seems that the bar this evidence would need to reach is, quite literally, impossible.
I would even take this further and argue that your chain of reasoning could be applied to most causes (perhaps even all?), which seems valid.
Would you disagree with this?
Your reply also raises a broader question for me: What criteria must an intervention meet for our determinance credence in its expected value being positive to exceed 50%, thereby justifying work on it?
I mean, I didn’t actually give any argument for why I don’t believe AI safety is good overall (assuming pure longtermism, i.e., taking into account everything from now until the end of time). I just said that I would believe it if there was evidence robust to unknown unknowns. (I haven’t argued that there wasn’t such evidence already; although the burden of the proof is very much on the opposite claim tbf). But I think this criterion applies to all causes where unknown unknowns are substantial, and I believe this is all of them as long as we’re evaluating them from a pure longtermist perspective, yes. And whether there is any cause that meets this criterion depends on one’s values I think. From a classical utilitarian perspective (and assuming the trade-offs between suffering and pleasure that most longtermists endorse), for example, I think there’s very plausibly none that does meet this criterion.