Guillaume Corlouer comments on Rethinking the Value of Working on AI Safety

Guillaume Corlouer Jan 10, 2025, 12:25 PM
6 points
0 ∶ 0
Thanks for writing this post! It is important to think about the implications of cluelessness and moral uncertainty for AI safety. To clarify the value of working on AI safety, it helps to decompose the problem into two subquestions:
1. Is the outcome that we are aiming for robust to cluelessness, and moral uncertainty?
2. Do we know of an intervention that is robustly good to achieve that outcome? (i.e. an intervention that is at least better than doing nothing to achieve that outcome)
An outcome could be reducing X-risks from AI which could at least happen in two different ways: value lock-in from a human-aligned AI or from a non-human AI controlled future. Reducing value lock-in seems robustly good, and i won’t argue for that here.
If the outcome we are thinking about is reducing extinction from AI, then the near-term case from reducing extinction from AI seems more robust to cluelessness, and I feel that the post could have emphasised it a bit more. Indeed, reducing the risk of extinction from AI, for all the people alive today and in the next few generations, looks good from a range of moral perspectives (it is at least determinate good for humans) even though it is indeterminate in the long-term. But then, one has to compare short term AI X-risks with other interventions that makes things determinately good in the short term from an impartial perspective, like working on animal welfare, or reducing extreme poverty.
AI seems high stakes, even though we don’t know which way it will go in the short/long term, which might suggest focusing more on capacity building instead of a more direct intervention (I would put capacity building in some of the paths that you suggested, as a more general category to put careers like earning to give). This could hold as long as the capacity building (for ex. putting ourselves in a position to make things go well w.r.t AI when we have more information) has low risk of backfiring (don’t make things worse) that is.
If we grant that reducing X-risks from AI seems robustly good, and better than alternative short-term causes (which is a higher bar than ``better than doing nothing″), then we still need to figure out interventions that are robust to reduce X-risks from AI (i.e. so that we don’t make things worse). I already mentioned some non-backfire capacity building (if we find such kind of capacity building). Beyond capacity building; it’s not completely clear to me that there are robustly good interventions in AI safety, and I think more work is needed to prioritize interventions.
It seems useful to think of one’s career as being part of a portfolio, and work on things where one could plausibly be in a position to do excellent work, unless the intervention that one is working on is not determinately better than doing nothing.
- Greg_Colbourn ⏸️ Jan 14, 2025, 8:03 PM
  2 points
  0 ∶ 2
  Parent
  Beyond capacity building; it’s not completely clear to me that there are robustly good interventions in AI safety, and I think more work is needed to prioritize interventions.
  I think it’s pretty clear^[1] that stopping further AI development (or Pausing) is a robustly good intervention in AI Safety (reducing AI x-risk).
  1. ^
    But see this post for some detailed reasoning.
- JohanEA Jan 12, 2025, 11:31 AM
  2 points
  0 ∶ 0
  Parent
  I enjoyed reading your insightful reply! Thanks for sharing, Guillaume. You don’t make any arguments I strongly disagree with, and you’ve added many thoughtful suggestions with caveats. The distinction you make between the two sub-questions is useful.
  I am curious, though, about what makes you view capacity building (CB) in a more positive light compared to other interventions within AI safety. As you point out, CB also has the potential to backfire. I would even argue that the downside risk of CB might be higher than that of other interventions because it increases the number of people taking the issue seriously and taking proactive action—often with limited information.
  For example, while I admire many of the people working at PauseAI, I believe there are quite a few worlds in which those initially involved in setting up the group have had a net-negative impact in expectation. Even early on, there were indications that some people were okay with using violence or radical methods to stop AI (which was then banned by the organizers). However, what happens if these tendencies resurface when “shit hits the fan”? To push back on my own thinking, it still might be a good idea to work on PauseAI due to community diversification argument within AI safety (footnote two).
  I agree that other forms of CB, such as MATS, seem more robust. But even here, I can always find compelling arguments for why I should be clueless about the expected value. For instance, an increased number of AI safety researchers working on solving an alignment problem that might ultimately be unsolvable could create a false sense of security.
  - Greg_Colbourn ⏸️ Jan 14, 2025, 7:53 PM
    2 points
    0 ∶ 0
    Parent
    However, what happens if these tendencies resurface when “shit hits the fan”?
    I don’t think this could be pinned on PauseAI, when at no point has PauseAI advocated or condoned violence. Many (basically all?) political campaigns attract radical fringes. Non-violent moderates aren’t responsible for them.
  - Guillaume Corlouer Jan 12, 2025, 7:23 PM
    1 point
    0 ∶ 0
    Parent
    I am curious, though, about what makes you view capacity building (CB) in a more positive light compared to other interventions within AI safety. As you point out, CB also has the potential to backfire. I would even argue that the downside risk of CB might be higher than that of other interventions because it increases the number of people taking the issue seriously and taking proactive action—often with limited information.
    Yeah, just to clarify, CB is not necessarily better than other interventions. However, CB with low backfire risks could be promising. This does not necessarily mean doing community building, since community building could backfire depending on how it is done (for example maybe if it is done in a very expansive non-careful way it could more easily backfire). I think the PauseAI example that you gave is a good example of potentially non robust intervention, or at least I would not count it as a low backfire risk capacity building intervention.
    One of the motivation of CB would be to put ourselves in a better position to pursue some intervention if we end up less clueless. It might be that we don’t in fact end up less clueless, and that while we have done CB, there are still no robust interventions that we can pursue after some time. In that case, it would be better to pursue determinately good short-term interventions even after doing CB (but then we have to pay the opportunity cost of the resources spent doing CB rather than doing the interventions good in the short term directly).
    I am still uncertain about low backfire CB interventions (that are better than doing something good directly), perhaps some way of increasing capital or well targeted community building could be good examples, but it seems like an open question to me.