JohanEA

Karma: 150

JohanEA 12 Jan 2025 11:54 UTC
1 point
0 ∶ 0
in reply to: Jim Buhler’s comment on: Rethinking the Value of Working on AI Safety
I agree with your reasoning, and the way you’ve articulated it is very compelling to me! It seems that the bar this evidence would need to reach is, quite literally, impossible.
I would even take this further and argue that your chain of reasoning could be applied to most causes (perhaps even all?), which seems valid.
Would you disagree with this?
Your reply also raises a broader question for me: What criteria must an intervention meet for our determinance credence in its expected value being positive to exceed 50%, thereby justifying work on it?

JohanEA 12 Jan 2025 11:31 UTC
2 points
0 ∶ 0
in reply to: Guillaume Corlouer’s comment on: Rethinking the Value of Working on AI Safety
I enjoyed reading your insightful reply! Thanks for sharing, Guillaume. You don’t make any arguments I strongly disagree with, and you’ve added many thoughtful suggestions with caveats. The distinction you make between the two sub-questions is useful.
I am curious, though, about what makes you view capacity building (CB) in a more positive light compared to other interventions within AI safety. As you point out, CB also has the potential to backfire. I would even argue that the downside risk of CB might be higher than that of other interventions because it increases the number of people taking the issue seriously and taking proactive action—often with limited information.
For example, while I admire many of the people working at PauseAI, I believe there are quite a few worlds in which those initially involved in setting up the group have had a net-negative impact in expectation. Even early on, there were indications that some people were okay with using violence or radical methods to stop AI (which was then banned by the organizers). However, what happens if these tendencies resurface when “shit hits the fan”? To push back on my own thinking, it still might be a good idea to work on PauseAI due to community diversification argument within AI safety (footnote two).
I agree that other forms of CB, such as MATS, seem more robust. But even here, I can always find compelling arguments for why I should be clueless about the expected value. For instance, an increased number of AI safety researchers working on solving an alignment problem that might ultimately be unsolvable could create a false sense of security.

JohanEA 12 Jan 2025 0:38 UTC
1 point
1 ∶ 0
in reply to: Gideon Futerman’s comment on: Rethinking the Value of Working on AI Safety
Thank you for flagging. I actually tried removing the tag before you mentioned this, since I agree. I tried removing it multiple times already (when I wanted to replace it with a better tag), but it isn’t working …

Just before I posted it, I selected the tag because it stated: “The community topic covers posts about the effective altruism community, as well as applying EA in one’s personal life”. But the subsequent points didn’t match too well, and I didn’t think that it would be shown in a separate section.

@Toby Tremlett🔹 or @Sarah Cheng , could you please remove the tag?

JohanEA 11 Jan 2025 13:52 UTC
1 point
0 ∶ 0
in reply to: Mo Putera’s comment on: Rethinking the Value of Working on AI Safety
Mo, thank you for chiming in. Yes, you understood the key point, and you summarised it very well! In my reply to Jan, I expanded on your point about why I think calculating the expected value is not possible for AI safety. Feel free to check it out.
I am curious, though: do you disagree with the idea that a worldview diversification approach at an individual level is the preferred strategy? You understood my point, but how true do you think it is?

JohanEA 11 Jan 2025 13:48 UTC
1 point
0 ∶ 0
in reply to: Jan Wehner🔸’s comment on: Rethinking the Value of Working on AI Safety
Hi Jan, I appreciate the kind words and the engagement!
You correctly summarized the two main arguments. I will start my answer by making sure we are on the same page regarding what expected value is.
Here is the formula I am using:
- (EV) = (p) × (+V) + (1 − p) × (−W)
- p = probability
- V = magnitude of the good outcome
- −W = magnitude of the bad outcome
As EAs, we are trying to maximize EV.
Given that I believe we are extremely morally uncertain about most causes, here is the problem I have encountered: Even if we could reliably estimate the probability of how good or bad a certain outcome is, and even how large +V and −W are, we still don’t know how to evaluate the overall intervention due to moral uncertainty.
For example, while the Risk-Averse Welfarist Consequentialism worldview would “aim to increase the welfare of individuals, human or otherwise, without taking long-shot bets or risking causing bad outcomes,” the Total Welfarist Consequentialism worldview would aim to maximize the welfare of all individuals, present or future, human or otherwise.
In other words, the two worldviews would interpret the formula (even if we perfectly knew the value of each variable) quite differently.
Which one is true? I don’t know. And this does not even take into account other worldviews, such as Egalitarianism, Nietzscheanism, or Kantianism, and others that exist.
To make matters worse, we are not only morally uncertain, but I am also saying that we can’t reliably estimate the probability of how likely a certain +V or −W is to come into existence through the objectives of AI safety. This is linked to my discussion with Jim about determinate credences (since I didn’t initially understand this concept well, ChatGPT gave me a useful explanation).
I think complex cluelessness leads to the expected value formula breaking down (for AI safety and most other longtermist causes) because we simply don’t know what p is, and I, at least, don’t have a determinate credence higher than 50%.
Even if we were to place a bet on a certain worldview (like Total Welfarist Consequentialism), this wouldn’t solve the problem of complex cluelessness or our inability to determine p in the context of AI safety (in my opinion).
This suggests that this cause shouldn’t be given any weight in my portfolio and implies that even specializing in AI safety on a community level doesn’t make sense.
Having said that, there are quite likely some causes where we can have a determinate credence way above 50% for p in the EV formula. And in these cases, a worldview diversification strategy seems to be the best option. This requires making a bet on a set or one worldview, though. Otherwise, we can’t interpret the EV formula, and doing good might not be possible.
Here is an (imperfect) example of how this might look and why a WDS could be the best to pursue. The short answer to why WDS is preferred appears to be that, given moral uncertainty, we don’t need to choose one out of 10 different worldviews and hope that it is correct; instead, we can diversify across different ones. Hence, this seems to be a much more robust approach to doing good.
What do you make of this?

JohanEA 10 Jan 2025 15:19 UTC
5 points
0 ∶ 0
in reply to: Jim Buhler’s comment on: Rethinking the Value of Working on AI Safety
One of the reasons I wrote this post was to reflect on excellent comments like yours. Thank you for posting and spotting this inconsistency!
You rightly point out that I jump between i) and ii). The short answer is that, at least for AI safety, I feel clueless or agnostic about whether this cause is positive in expectation. @Mo Putera summarised this nicely in their comment. I am happy to expand on the reasons as to why I think that.
What is your perspective here? If you do have a determinate credence above 50% for AI safety work, how do you arrive at this conclusion? I know you have been also doing some in-depth thinking on the topic of cluelessness.
Next, I want to push back on your claim that if ii) is correct, everything collapses. I agree that this would lead to the conclusion that we are probably entirely clueless about longtermist causes, probably the vast majority of causes in the world. However, it would make me lean toward near-term areas with much shorter causal chains, where there is a smaller margin of error—for example, caring for your family or local animals, which carry a low risk of backfiring.
Although, to be fair, this is unclear as well if one is also clueless about different moral frameworks. For example, helping a young child who fell off their skateboard might seem altruistic but could inadvertently increase their ambition, leading them to become the next Hitler or a power-seeking tech CEO. And to take this to the next level: not taking an action also has downsides (e.g not addressing the ongoing suffering in the world). Yaay!
If conclusion ii) is correct for all causes, altruism would indeed seem not possible from a consequentialist perspective. I don’t have a counterargument at the moment.
I would love to hear your thoughts on this!
Thank you for engaging :)

Rethinking the Value of Working on AI Safety

JohanEA9 Jan 2025 14:15 UTC

45 points

21 comments10 min readEA link

JohanEA 24 Sep 2024 16:22 UTC
3 points
0 ∶ 0
in reply to: Vasco Grilo🔸’s comment on: Probability of massive unemployment in the United States based on historical data
Thank you for your reply!
Summary: My main intention in my previous comment was to share my perspective on why relying too much on the outside view is problematic (and, to be fair, that wasn’t clear because I addressed multiple points). While I think your calculations and explanation are solid, the general intuition I want to share is that people should place less weight on the outside view, as this article seems to suggest.
I wrote this fairly quickly, so I apologize if my response is not entirely coherent.
Emphasizing the definition of unemployment you use is helpful, and I mostly agree with your model of total AI automation, where no one is necessarily looking for a job.
Regarding your question about my estimate of the median annual unemployment rate: I haven’t thought deeply enough about unemployment to place a bet or form a strong opinion on the exact percentage points. Thanks for the offer, though.
To illustrate the main point in my summary, I want to share a basic reasoning process I’m using.
Assumptions:
- Most people are underestimating the speed of AI development.
- The new paradigm of scaling inference-time compute (instead of training compute) will lead to rapid increases in AI capabilities.
- We have not solved the alignment problem and don’t seem to be making progress quickly enough (among other unsolved issues).
- An intelligence explosion is possible.
Worldview implications of my assumptions:
- People should take this development much more seriously.
- We need more effective regulations to govern AI.
- Humanity needs to act now and ambitiously.
To articulate my intuition as clearly as possible: the lack of action we’re currently seeing from various stakeholders in addressing the advancement of frontier AI systems seems to be, in part, because they rely too heavily on the outside view for decision-making. While this doesn’t address the crux of your post ( but it prompted me to write my comment initially), I believe it’s dangerous to place significant weight on an approach that attempts to make sense of developments we have no clear reference classes for. AGI hasn’t happened yet, so I don’t understand why we should lean heavily on historical data to assess such a novel development.
What’s currently happening is that people are essentially throwing their arms up and saying, “Uh, the probabilities are so low for X or Y impact of AGI, so let’s just trust the process.” If people placed more weight on assumptions like those above, or reasoned more from first principles, the situation might look very different. Do you see? My issue is with putting too much weight on the outside view, not with your object-level claims.
I am open to changing my mind on this.

JohanEA 21 Sep 2024 15:13 UTC
7 points
2 ∶ 1
on: Probability of massive unemployment in the United States based on historical data
Thank you for the post! How much weight do you think should one allocate to the inside and outside view respectively in order to develop a comprehensive estimate of the potential future unemployment rate?
Your calculations look fancy and all of that, but it seems inappropriate to me to be putting so much weight on historical data as you are doing. Especially because I think this ignores the apparent fact that the development of intelligent systems that are more capable than humans has never occurred in history. This fundamentally changes the game.
The more the world changes, I think the less weight one should be putting on the outside view (needs more nuance). People are scared, people don’t update in the face of new evidence, people dislike change.
I know you are not saying that the inside view doesn’t matter, but I am concerned that a post like this anchors people toward a base rate that is a lot lower than what things will actually be like. It reinforces status quo bias. And this is frustrating to me because so many people don’t seem to understand the seriousness of our situation.
I think it makes a lot of sense to reason bottom-up when thinking about topics like these, and I actually disagree with you a lot. It seems to be that there is a deeply correlated failure happening in the AI safety community. In my view, people are putting way too much weight onto the outside view. I am happy to elaborate.

Thank you for sparking this discussion.

Beware of the new scaling paradigm

JohanEA19 Sep 2024 17:03 UTC

9 points

2 comments3 min readEA link

JohanEA 13 Sep 2024 14:29 UTC
9 points
1 ∶ 0
on: How to help crucial AI safety legislation pass with 10 minutes of effort
Thank you for writing this! I just took the time to write a letter.

JohanEA 11 Sep 2024 14:19 UTC
1 point
0 ∶ 0
on: Johan de Kock’s Quick takes
Would you consider adding your ideas for 2 minutes? - Creating an comprehensive overview of AI x-risk reduction strategies
------

Motivation: To identify the highest impact strategies for reducing the existential risk from AI, it’s important to know what options are available in the first place.
I’ve just started creating an overview and would love for you to take a moment to contribute and build on it with the rest of us!

Here is the work page: https://workflowy.com/s/making-sense-of-ai-x/NR0a6o7H79CQpLYw

Some thoughts on how we collaborate:
- Please don’t delete others’ bullet points; instead, use the comment feature to suggest changes or improvements.
- If you’re interested in discussing this further, feel free to add your name and contact details here. I may organize a follow-up discussion.

My (current) model of what an AI governance researcher does

JohanEA26 Aug 2024 11:22 UTC

7 points

1 comment5 min readEA link

JohanEA 18 Aug 2024 11:30 UTC
1 point
0 ∶ 0
on: Demis Hassabis — Google DeepMind: The Podcast
Thank you for sharing Zach! I think it is valuable to highlight the key parts from the podcast episode and share them here. With so many podcast episodes to choose from, this helps people selectively engage with the parts of the episode that are most relevant to them.

JohanEA 10 Jul 2024 14:06 UTC
3 points
0 ∶ 0
on: Advice to junior AI governance researchers
Thank you for writing this up, Akash! I am currently exploring my aptitude as an AI governance researcher and consider the advice provided here to be valuable. Especially the point on bouncing off ideas with people early on, but also throughout the research process is something I have started to appreciate a lot more.

For anyone who is in a similar position, I can also highly recommend to check out this and this post.
For any other (junior or senior) researchers interested in expanding their pool of people to reach out to for feedback on their research projects, or simply to connect, feel free to reach out on LinkedIn or schedule a call via Calendly! I look forward to chatting.

JohanEA 18 Jun 2024 23:11 UTC
1 point
0 ∶ 0
on: Is it possibly desirable for sentient ASI to exterminate humans?
I think this is an interesting post. I don’t agree with the conclusion, but I think it’s a discussion worth having. In fact, I suspect that this might be a crux for quite some people in the AI safety community. To contribute to the discussion, here are two other perspectives. These are rough thoughts and I could have added a lot more nuance.

Edit: I just noticed that your title includes the word “sentient”. Hence, my second perspective is not as applicable anymore. My own take that I offer at the end seems to hold up nonetheless.
1. If we develop an ASI that exterminates humans, it will likely also exterminate all other species that might exist in the universe.
2. Even if one subscribes to utilitarianism, it does not seem clear at all that an ASI would be able to experience any joy or happiness, or that it would be able to create it. Sure, it can accomplish objectives, but one can argue from a strong position that these won’t accomplish any utilitarian goals. Where is the the positive utility here? And even more importantly, how should we frame positive utility in this context?
I think a big reason to not buy your argument stems from the apparent fact that humans are a lot more predictable than an ASI. We know how to work together (at least a bit), we know that we have managed to improve the world throughout the last centuries pretty well. Many people dedicate their life to helping others (such as this lovely community) the higher they are located on Maslow hierarchy. Sure, we have so many flaws (humans), but it seems a lot more plausible to me that we will be able to accomplish full-scale cosmic colonisation that actually maximises positive utility if we don’t go extinct in the process. On the other hand, we don’t even know whether an ASI could create positive utility, or experience it.

JohanEA 12 Jan 2024 19:14 UTC
2 points
0 ∶ 0
in reply to: Hayven Frienby’s comment on: Why can’t we accept the human condition as it existed in 2010?
I hope you are okay with the storm! Good luck there. And indeed, figuring out how to work with ones evolutionary tendencies is not always straightforward. For many personal decisions this is easier, such as recognising that sitting 10 hours a day at the desk is not what our bodies have evolved for. “So let’s go for a run!” If it comes to large scale coordination, however, things get trickier...

”I think what has changed since 2010 has been general awareness of transcending human limits as a realistic possibility.” → I agree with this and your following points.

[Question] What is the impact of chip production on pausing AI development?

JohanEA10 Jan 2024 22:20 UTC

7 points

0 comments1 min readEA link

JohanEA 10 Jan 2024 10:42 UTC
1 point
0 ∶ 0
on: Why can’t we accept the human condition as it existed in 2010?
Thank you for writing this up Hayven! I think there are multiple reasons as to why it will be very difficult for humans to settle for less. Primarily, I suspect this to be the case because a large part of our human nature is to strive for maximizing resources, and wanting to consistently improve the conditions of life. There are clear evolutionary advantages to have this ingrained into a species. This tendency to want to have more got us out of picking berries and hunting mammoths to living in houses with heating, being able to connect with our loved ones via video calls and benefiting from better healthcare. In other words, I don’t think that the human condition was different in 2010, it was pretty much exactly the same as it is now, just as it was 20 000 years ago. “Bigger, better, faster.”
The combination out of this human tendency, combined with our short-sightedness is a perfect recipe for human extinction. If we want to overcome the Great Filter, I think the only realistic way we will accomplish this is by figuring out how we can combine this desire for more with more wisdom and better coordination. It seems to be that we are far from that point, unfortunately.
A key takeaway for me is the increased likelihood of success with interventions that guide, rather than restrict, human consumption and development. These strategies seem more feasible as they align with, rather than oppose, human tendencies towards growth and improvement. That does not mean that they should be favoured though, only that they will be more likely to succeed. I would be glad to get pushback here.
I can highly recommend the book The Molecule of More to read more about this perspective (especially Chapter 6).

JohanEA 8 Jan 2024 17:51 UTC
1 point
0 ∶ 0
in reply to: Ryan Greenblatt’s comment on: Johan de Kock’s Quick takes
Ryan, thank you for your thoughts! The distinctions you brought up are something I did not think about yet, so I am going to take a look at the articles you linked in your reply. If I have more to add to this point, I’ll add that. Lots of work ahead to figure out these important things. I hope we have enough time.

JohanEA

Re­think­ing the Value of Work­ing on AI Safety

Be­ware of the new scal­ing paradigm

My (cur­rent) model of what an AI gov­er­nance re­searcher does

[Question] What is the im­pact of chip pro­duc­tion on paus­ing AI de­vel­op­ment?

Rethinking the Value of Working on AI Safety

Beware of the new scaling paradigm

My (current) model of what an AI governance researcher does

[Question] What is the impact of chip production on pausing AI development?