Jan Wehner🔸 comments on Rethinking the Value of Working on AI Safety

Jan Wehner🔸Jan 9, 2025, 6:15 PM
3 points
0 ∶ 0
Great post Johan! It got me thinking more deeply about the value of working on x-risk reduction and how we ought to act under uncertainty. I think people (including me) doing direct work on x-risk reduction would do well to reflect on the possibility of their work having (large) negative effects.
I read the post as making 2 main arguments:
1) The value of working on x-risk reduction is highly uncertain
2) Given uncertainty about the value of cause areas, we should use worldview diversification as a decision strategy
I agree with 1, but am not convinced by 2). WDS might make sense for a large funder like OpenPhil, but not for individuals seeking to maximize their positive impact on the world. Diversification makes sense to reduce downside risk or because of diminishing returns of investing in one option. I think neither of those apply to individuals interested in maximizing the EV of their positive impact on the world. 1) While protecting against downside risks (e.g. losing all your money) makes sense in investing, it is not important if you’re only maximizing Expected Value. 2) Each individual’s contributions to a cause area won’t change things massively, so it seems implausible that there are strong diminishing returns to their contributions.
However, I am in favor of your suggestion of doing worldview diversification at the community level, not at the level of individuals. To an extent, EA already does that by emphasising Neglectedness. Perhaps practically EAs should put more weight on neglectedness, rather than working on the single most important problem they see.
- Mo Putera Jan 10, 2025, 8:57 AM
  4 points
  1 ∶ 0
  Parent
  I think you’re not quite engaging with Johan’s argument for the necessity of worldview diversification if you assume it’s primarily about risk reduction or or diminishing returns. My reading of their key point is that we don’t just have uncertainty about outcomes (risk), but uncertainty about the moral frameworks by which we evaluate those outcomes (moral uncertainty), combined with deep uncertainty about long-term consequences (complex cluelessness), leading to fundamental uncertainty in our ability to calculate expected value at all (even if we hypothetically want to as EV-maximisers, itself a perilous strategy), and it’s these factors that make them think worldview diversification can be the right approach even at the individual level.
  - JohanEA Jan 11, 2025, 1:52 PM
    1 point
    0 ∶ 0
    Parent
    Mo, thank you for chiming in. Yes, you understood the key point, and you summarised it very well! In my reply to Jan, I expanded on your point about why I think calculating the expected value is not possible for AI safety. Feel free to check it out.
    I am curious, though: do you disagree with the idea that a worldview diversification approach at an individual level is the preferred strategy? You understood my point, but how true do you think it is?
- JohanEA Jan 11, 2025, 1:48 PM
  1 point
  0 ∶ 0
  Parent
  Hi Jan, I appreciate the kind words and the engagement!
  You correctly summarized the two main arguments. I will start my answer by making sure we are on the same page regarding what expected value is.
  Here is the formula I am using:
  - (EV) = (p) × (+V) + (1 − p) × (−W)
  - p = probability
  - V = magnitude of the good outcome
  - −W = magnitude of the bad outcome
  As EAs, we are trying to maximize EV.
  Given that I believe we are extremely morally uncertain about most causes, here is the problem I have encountered: Even if we could reliably estimate the probability of how good or bad a certain outcome is, and even how large +V and −W are, we still don’t know how to evaluate the overall intervention due to moral uncertainty.
  For example, while the Risk-Averse Welfarist Consequentialism worldview would “aim to increase the welfare of individuals, human or otherwise, without taking long-shot bets or risking causing bad outcomes,” the Total Welfarist Consequentialism worldview would aim to maximize the welfare of all individuals, present or future, human or otherwise.
  In other words, the two worldviews would interpret the formula (even if we perfectly knew the value of each variable) quite differently.
  Which one is true? I don’t know. And this does not even take into account other worldviews, such as Egalitarianism, Nietzscheanism, or Kantianism, and others that exist.
  To make matters worse, we are not only morally uncertain, but I am also saying that we can’t reliably estimate the probability of how likely a certain +V or −W is to come into existence through the objectives of AI safety. This is linked to my discussion with Jim about determinate credences (since I didn’t initially understand this concept well, ChatGPT gave me a useful explanation).
  I think complex cluelessness leads to the expected value formula breaking down (for AI safety and most other longtermist causes) because we simply don’t know what p is, and I, at least, don’t have a determinate credence higher than 50%.
  Even if we were to place a bet on a certain worldview (like Total Welfarist Consequentialism), this wouldn’t solve the problem of complex cluelessness or our inability to determine p in the context of AI safety (in my opinion).
  This suggests that this cause shouldn’t be given any weight in my portfolio and implies that even specializing in AI safety on a community level doesn’t make sense.
  Having said that, there are quite likely some causes where we can have a determinate credence way above 50% for p in the EV formula. And in these cases, a worldview diversification strategy seems to be the best option. This requires making a bet on a set or one worldview, though. Otherwise, we can’t interpret the EV formula, and doing good might not be possible.
  Here is an (imperfect) example of how this might look and why a WDS could be the best to pursue. The short answer to why WDS is preferred appears to be that, given moral uncertainty, we don’t need to choose one out of 10 different worldviews and hope that it is correct; instead, we can diversify across different ones. Hence, this seems to be a much more robust approach to doing good.
  What do you make of this?
  - Anthony DiGiovanni Feb 2, 2025, 4:49 PM
    2 points
    1 ∶ 0
    Parent
    This is linked to my discussion with Jim about determinate credences (since I didn’t initially understand this concept well, ChatGPT gave me a useful explanation).
    FYI, I don’t think ChatGPT’s answer here is accurate. I’d recommend this post if you’re interested in (in)determinate credences.