Jim Buhler comments on Rethinking the Value of Working on AI Safety

Jim Buhler 10 Jan 2025 9:26 UTC
7 points
1 ∶ 0
Do you think that AI safety is i) at least a bit good in expectation (but like with a determinate credence barely higher than 50% because high risk/uncertainty) or ii) you don’t have determinate credences and feel clueless/agnostic about this? I feel like your post implicitly keeps jumping back and forth between these two positions, and only (i) could support your conclusions. If we assume (ii), everything falls apart. There’s no reason to support a cause X (or the exact opposite of X) to any degree if one is totally clueless about whether it is good.

Thanks for writing this :)
- Johan de Kock 10 Jan 2025 15:19 UTC
  3 points
  0 ∶ 0
  Parent
  One of the reasons I wrote this post was to reflect on excellent comments like yours. Thank you for posting and spotting this inconsistency!
  You rightly point out that I jump between i) and ii). The short answer is that, at least for AI safety, I feel clueless or agnostic about whether this cause is positive in expectation. @Mo Putera summarised this nicely in their comment. I am happy to expand on the reasons as to why I think that.
  What is your perspective here? If you do have a determinate credence above 50% for AI safety work, how do you arrive at this conclusion? I know you have been also doing some in-depth thinking on the topic of cluelessness.
  Next, I want to push back on your claim that if ii) is correct, everything collapses. I agree that this would lead to the conclusion that we are probably entirely clueless about longtermist causes, probably the vast majority of causes in the world. However, it would make me lean toward near-term areas with much shorter causal chains, where there is a smaller margin of error—for example, caring for your family or local animals, which carry a low risk of backfiring.
  Although, to be fair, this is unclear as well if one is also clueless about different moral frameworks. For example, helping a young child who fell off their skateboard might seem altruistic but could inadvertently increase their ambition, leading them to become the next Hitler or a power-seeking tech CEO. And to take this to the next level: not taking an action also has downsides (e.g not addressing the ongoing suffering in the world). Yaay!
  If conclusion ii) is correct for all causes, altruism would indeed seem not possible from a consequentialist perspective. I don’t have a counterargument at the moment.
  I would love to hear your thoughts on this!
  Thank you for engaging :)
  - Jim Buhler 10 Jan 2025 21:17 UTC
    3 points
    0 ∶ 0
    Parent
    If you do have a determinate credence above 50% for AI safety work, how do you arrive at this conclusion?
    It happens that I do not. But I would if I believed there was evidence robust to unknown unknowns in favor of assuming “AI Safety work” is good, factoring in all the possible consequences from now until the end of time. This would require robust reasons to believe that current AI safety work actually increases rather than decreases safety overall AND that increased safety is actually good all things considered (e.g., that human disempowerment is actually bad overall). (See Guillaume’s comment on the distinction). I won’t elaborate on what would count as “evidence robust to unknown unknowns” in such a context but this is a topic for a future post/paper, hopefully.
    Next, I want to push back on your claim that if ii) is correct, everything collapses. I agree that this would lead to the conclusion that we are probably entirely clueless about longtermist causes, probably the vast majority of causes in the world. However, it would make me lean toward near-term areas with much shorter causal chains, where there is a smaller margin of error—for example, caring for your family or local animals, which carry a low risk of backfiring.
    Sorry, I didn’t mean to argue against that. I just meant that work you are clueless about (e.g. maybe AI safety work in your case?) shouldn’t be given any weight in your diversified portfolio. I didn’t mean to make any claim about what I personnally think we should or shouldn’t be clueless about. The “everything falls apart” was unclear and probably unwarranted.