JohanEA comments on Rethinking the Value of Working on AI Safety

JohanEA Jan 11, 2025, 1:48 PM
1 point
0 ∶ 0
Hi Jan, I appreciate the kind words and the engagement!
You correctly summarized the two main arguments. I will start my answer by making sure we are on the same page regarding what expected value is.
Here is the formula I am using:
- (EV) = (p) × (+V) + (1 − p) × (−W)
- p = probability
- V = magnitude of the good outcome
- −W = magnitude of the bad outcome
As EAs, we are trying to maximize EV.
Given that I believe we are extremely morally uncertain about most causes, here is the problem I have encountered: Even if we could reliably estimate the probability of how good or bad a certain outcome is, and even how large +V and −W are, we still don’t know how to evaluate the overall intervention due to moral uncertainty.
For example, while the Risk-Averse Welfarist Consequentialism worldview would “aim to increase the welfare of individuals, human or otherwise, without taking long-shot bets or risking causing bad outcomes,” the Total Welfarist Consequentialism worldview would aim to maximize the welfare of all individuals, present or future, human or otherwise.
In other words, the two worldviews would interpret the formula (even if we perfectly knew the value of each variable) quite differently.
Which one is true? I don’t know. And this does not even take into account other worldviews, such as Egalitarianism, Nietzscheanism, or Kantianism, and others that exist.
To make matters worse, we are not only morally uncertain, but I am also saying that we can’t reliably estimate the probability of how likely a certain +V or −W is to come into existence through the objectives of AI safety. This is linked to my discussion with Jim about determinate credences (since I didn’t initially understand this concept well, ChatGPT gave me a useful explanation).
I think complex cluelessness leads to the expected value formula breaking down (for AI safety and most other longtermist causes) because we simply don’t know what p is, and I, at least, don’t have a determinate credence higher than 50%.
Even if we were to place a bet on a certain worldview (like Total Welfarist Consequentialism), this wouldn’t solve the problem of complex cluelessness or our inability to determine p in the context of AI safety (in my opinion).
This suggests that this cause shouldn’t be given any weight in my portfolio and implies that even specializing in AI safety on a community level doesn’t make sense.
Having said that, there are quite likely some causes where we can have a determinate credence way above 50% for p in the EV formula. And in these cases, a worldview diversification strategy seems to be the best option. This requires making a bet on a set or one worldview, though. Otherwise, we can’t interpret the EV formula, and doing good might not be possible.
Here is an (imperfect) example of how this might look and why a WDS could be the best to pursue. The short answer to why WDS is preferred appears to be that, given moral uncertainty, we don’t need to choose one out of 10 different worldviews and hope that it is correct; instead, we can diversify across different ones. Hence, this seems to be a much more robust approach to doing good.
What do you make of this?
- Anthony DiGiovanni Feb 2, 2025, 4:49 PM
  2 points
  1 ∶ 0
  Parent
  This is linked to my discussion with Jim about determinate credences (since I didn’t initially understand this concept well, ChatGPT gave me a useful explanation).
  FYI, I don’t think ChatGPT’s answer here is accurate. I’d recommend this post if you’re interested in (in)determinate credences.