simeon_c

Karma: 823

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management

simeon_cMar 13, 2025, 6:29 PM

6 points

0 comments EA link

(arxiv.org)

Responsible Scaling Policies Are Risk Management Done Wrong

simeon_cOct 25, 2023, 11:46 PM

42 points

1 comment EA link

(www.navigatingrisks.ai)

simeon_c Oct 12, 2023, 1:08 PM
11 points
4 ∶ 0
in reply to: Linch’s comment on: Linch’s Shortform
I agree with the general underlying point.

I also think that another important issue is that reasoning on counterfactuals makes people more prone to do things that are unusual AND is more prone to errors (e.g. by not taking into account some other effects).

Both combined make counterfactual reasoning without empirical data pretty perilous on average IMO.
In the case of Ali in your example above for instance, Ali could neglect that the performance he’ll have will determine the opportunities & impact he has 5y down the line and so that being excited/liking the job is a major variable. Without counterfactual reasoning, Ali would have intuitively relied much more on excitement to pick the job but by doing counterfactual reasoning which seemed convincing, he neglected this important variable and made a bad choice.
I think that counterfactual reasoning makes people very prone to ignoring Chesterton’s fence.

simeon_c Sep 29, 2023, 12:17 PM
4 points
0 ∶ 1
in reply to: richard_ngo’s comment on: Protest against Meta’s irreversible proliferation (Sept 29, San Francisco)
I think using “unsafe” in a very broad way like this is misleading overall and generally makes the AI safety community look like miscalibrated alarmists.
I agree that when there’s no memetic fitness/calibration trade-off, it’s always better to be calibrated. But here there is a trade-off. How should we take it?
1. My sense is that there’s never been any epistemically calibrated social movement and so that it would be playing against odds to impose that constraint. Even someone like Henry Spira who was very thoughtful personally used very unnuanced communication to achieve social change.
2. Richard, do you think that being miscalibrated has hurt or benefited the ability of past movements to cause social change? E.g. climate change and animal welfare.
  
  My impression is that probably not? They caused entire chunks of society to be miscalibrated on climate change (maybe less in the US but in Europe it’s pretty big), and that’s not good, but I would guess that the alarmism helped them succeed?
  As long as there also exists a moderate faction & and there still exists background debates on the object-level, I feel like having a standard social activism movement wd be overall very welcome.
Curious if anyone here knows the relevant literature on the topic, e.g. details in the radical flank literature.

simeon_c Jul 8, 2023, 12:01 PM
8 points
3 ∶ 1
on: Announcing Manifund Regrants
Very glad to see that happening, regranting solves a bunch of unsolved problems with centralized grantmaking.

simeon_c Jun 28, 2023, 9:49 PM
5 points
1 ∶ 0
in reply to: Yellow (Daryl)’s comment on: AGI x Animal Welfare: A High-EV Outreach Opportunity?
I mean, I agree that it has nuance but it’s still trained on a set of values that are pretty much current western people values, so it will probably put more or less emphasis on various values according to the weight western people give to each of those.

AGI x Animal Welfare: A High-EV Outreach Opportunity?

simeon_cJun 28, 2023, 8:44 PM

79 points

16 comments1 min readEA link

simeon_c Apr 22, 2023, 8:25 PM
5 points
1 ∶ 0
in reply to: Sanjay’s comment on: The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns
I may try to write something on that in the future. I’m personally more worried about accidents and think that solving accidents causes one to solve misuse pre-AGI. Post aligned AGI, misuse rebecomes a major worry.

The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns

simeon_cApr 22, 2023, 1:49 PM

27 points

17 comments EA link

AI Takeover Scenario with Scaled LLMs

simeon_cApr 16, 2023, 11:28 PM

29 points

1 comment EA link

Navigating AI Risks (NAIR) #1: Slowing Down AI

simeon_cApr 14, 2023, 2:35 PM

12 points

1 comment EA link

Announcing the European Network for AI Safety (ENAIS)

Esben KranMar 22, 2023, 5:57 PM

124 points

3 comments3 min readEA link

simeon_c Feb 13, 2023, 11:25 PM
11 points
7 ∶ 1
on: New EA Podcast: Critiques of EA
Note that saying “this isn’t my intention” doesn’t prevent net negative effects of a theory of change from applying. Otherwise, doing good would be a lot easier.
I also highly recommend clarifying what exactly you’re criticizing, i.e. the philosophy, the movement norms or some institutions that are core to the movement.
Finally, I usually find the criticism of people a) at the core of the movement and b) highly truth-seeking most relevant to improve the movement so I would expect that if you’re trying to improve the movement, you may want to focus on these people. There exists relevant criticisms external to the movement but usually they will lack of context and thus fail to address some key trade-offs that the movement cares about.
Here’s a small list of people I would be excited to hear on EA flaws and their recommandations for change:
- Rob Bensinger
- Eli Lifland
- Ozzie Gooen
- Nuno Sempere
- Oliver Habryka

simeon_c Feb 11, 2023, 12:57 AM
2 points
0 ∶ 0
on: There can be highly neglected solutions to less-neglected problems
Thanks for publishing that, I also had a draft lying somewhere on that!

simeon_c Jan 2, 2023, 7:48 AM
3 points
1 ∶ 0
on: How many hours is your standard workweek? Why?
I work every day from about 9:30am to 1am with about 3h off on average and 30 min of walk which helps me brainstorming. Technically this is ~12*7=84h. The main reason is that 1) I want that we don’t die and 2) think that there are increasing marginal returns on working hours in a lot of situation, mostly due to the fact that in a lot of domains, winner takes all even if he’s only 10% better than others, and because you accumulate in a single person more expertise/knowledge which gives access to more and more rare skills

Among that, I would say that I lose about 15h in unproductive or very little productive work (e.g Twitter or working on stuff which is not on my To Do list). I also spend about 10 to 15h a week in calls.
The rest of it (from 50 to 60h) is productive work.

My productivity (& thus productive work time) has been hugely increasing over the past 3 months (my first three months where I can fully decide the allocation of my time). The total amount of hours I work increased a bit (like +1h/day) in the last 3 months, mostly thanks to optimizations of my sleep & schedule.

simeon_c Jan 2, 2023, 7:05 AM
1 point
0 ∶ 0
in reply to: calebp’s comment on: How many hours is your standard workweek? Why?
Can you share the passive time tracking tools you’re using?

simeon_c Dec 27, 2022, 11:53 PM
3 points
0 ∶ 0
in reply to: Karl von Wendt’s comment on: AGI Timelines in Governance: Different Strategies for Different Timeframes
“Nobody cared about” LLMs is certainly not true—I’m pretty sure the relevant people watched them closely.
What do you mean by “the relevant people”? I would love that we talk about specifics here and operationalize what we mean. I’m pretty sure E. Macron haven’t thought deeply about AGI (i.e has never thought for more than 1h about timelines) and I’m at 50% that if he had any deep understanding of what changes it will bring, he would already be racing. Likewise for Israel, which is a country which has strong track record of becoming leads in technologies that are crucial for defense.
That many people aren’t concerned about AGI or doubting its feasibility by now only means that THOSE people will not pursue it, and any public discussion will probably not change their minds.
I think here you wrongly assume that people have even understood what are the implications of AGI and that they can’t update at all once the first systems will start being deployed. The situation where what you say could be true is if you think that most of your arguments hold because of ChatGPT. I think it’s quite plausible that since ChatGPT and probably even more in 2023 there will be deployments that may make mostly everyone that matter aware of AGI. I don’t have a good sense yet of how policymakers have updated yet.
Already there are many alarming posts and articles out there, as well as books like Stuart Russell’s “Human Compatible” (which I think is very good and helpful), so keeping the lid on the possibility of AGI and its profound impacts is way too late
Yeah, I realize thanks to this part that a lot of the debate should happen on specifics rather that at a high-level as we’re doing here. Thus, chatting about your book in particular will be helpful for that.
I’m currently in the process of translating it to English so I can do just that. I’ll send you a link as soon as I’m finished. I’ll also invite everyone else in the AI safety community (I’m probably going to post an invite on LessWrong).
Great! Thanks for doing that!
while discussing the great potential of AGI for humanity should not.
FYI I don’t think that it’s true.
Regarding all our discussion, I realized I didn’t mention a fairly important argument: a major failure mode specifically regarding risks is the following reaction from ~any country: “Omg, China is developing bad AGIs, so let’s develop safe AGIs first!”.
This can happen in two ways:
- Misuse as the mainline scenario that people are envisioning. Basically, if you’re mostly concerned about misuse, racing to be the first to have the AGI makes sense. And because misuse is way easier to understand than accidental risk, I expect this to be ~the default.
- Overestimating one’s competence. Even if you believed in AGI accidental X-risks, you could still race thinking that you’re better than the others and that could increase the chances of X-risk.
Thanks a lot for engaging with my arguments. I still think that you’re substantially overconfident about the positive aspects of communicating AGI X-risks to the general public but I appreciate the fact that you took the time to consider and answer to my arguments.

simeon_c Dec 20, 2022, 8:24 PM
1 point
0 ∶ 0
in reply to: Misha_Yagudin’s comment on: AGI Timelines in Governance: Different Strategies for Different Timeframes
Hey Misha! Thanks for the comment!
I am quite confused about what probabilities here mean, especially with prescriptive sentences like “Build the AI safety community in China” and “Beware of large-scale coordination efforts.”
As I wrote in note 2, I’m here claiming that this claim is more likely to be true under these timelines than the other timelines. But how could I make it clearer without bothering too much? Maybe putting note 2 under the table in italic?
I also disagree with the “vibes” of probability assignment to a bunch of these, and the lack of clarity on what these probabilities entail makes it hard to verbalize these.
I see, I hesitated in the trade-off (1) “put no probabilities” vs (2) “put vague probabilities” because I feel like that the second gives a lot more signal on how confident I am in what I say and allow people to more fruitfully disagree but at the same time it gives a “seriousness” signal which is not good when the predictions are not actual predictions.
Do you think that putting no probabilities would have been better?
By “I also disagree with the vibes of probability assignment to a bunch of these”, do you mean that it seems over/underconfident in a bunch of ways when you try to do a similar exercise?

simeon_c Dec 20, 2022, 8:12 PM
6 points
3 ∶ 0
in reply to: NickLaing’s comment on: AGI Timelines in Governance: Different Strategies for Different Timeframes
Ah ah you probably don’t realize it but “you” is actually 4 persons: Amber Dawn for the first draft of the post, me (Simeon) for the ideas, the table and the structure of the post, and me, Nicole Nohemi & Felicity Riddel for the partial rewriting of the draft to make it clearer.
So the credits are highly distributed! And thanks a lot, it’s great to hear that!

simeon_c Dec 20, 2022, 8:08 PM
5 points
2 ∶ 1
in reply to: HaydnBelfield’s comment on: AGI Timelines in Governance: Different Strategies for Different Timeframes
I think that our disagreement comes from what we mean by “regulating and directing it.”
My rough model of what usually happens in national governments (and not the EU, which is a lot more independent from its citizen than the typical national government) is that there are two scenarios:
1. Scenario 1 in which national governments regulate or do things on something nobody is caring about (in particular, not the media). That gives birth to a lot of degrees of freedom and the possibility of doing fairly ambitious things (cf Secret Congress)
2. Scenario 2 in which national governments regulate things that many people care about and brings attention and then nothing gets done, most measures are fairly weak etc. In this scenario my rough model is that national governments do the smallest thing that satisfy their electorate + key stakeholders.
I feel like we’re extremely likely to be in scenario 2 regarding AI. And thus that no significant measure will be taken, which is why I put the emphasis of “no strong [positive] effect” on AI safety. So basically I feel like the best you can probably do in national policy is something like “avoid that they do bad things” (which is really good if it’s a big risk) or “do mildly good things”. But to me, it’s quite unlikely that we go from a world where we die to a world where we don’t die thanks to a theory of change which is focused on national policy.
The EU AI Act is a bit different in that as I said above, the EU is much less tied to the daily worries of citizen and thus is operating under less constraints. Thus I think that it’s indeed plausible that the EU does something ambitious on GPAIS but I think that unfortunately it’s unlikely that the US will replicate something locally and that the EU compliance mechanisms are not super likely to cut the worst risks for the UK and US companies.
Regulating the training of these models is different and harder, but even that seems plausible to me at some point
I think that it’s plausible but not likely, and given that it would be the intervention that would cut the most risks, I tend to prefer corporate governance which seems significantly more tractable and neglected to me.
Out of curiosity, could you refer to a specific event you’d expect to see “if we get closer to substantial leaps in capabilities”? I think that it’s a useful exercise to disagree fruitfully on timelines and I’d be happy to bet on some events if we disagree on one.

simeon_c

A Fron­tier AI Risk Man­age­ment Frame­work: Bridg­ing the Gap Between Cur­rent AI Prac­tices and Estab­lished Risk Management

Re­spon­si­ble Scal­ing Poli­cies Are Risk Man­age­ment Done Wrong

AGI x An­i­mal Welfare: A High-EV Outreach Op­por­tu­nity?

The Cruel Trade-Off Between AI Mi­suse and AI X-risk Concerns

AI Takeover Sce­nario with Scaled LLMs

Nav­i­gat­ing AI Risks (NAIR) #1: Slow­ing Down AI

An­nounc­ing the Euro­pean Net­work for AI Safety (ENAIS)

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management

Responsible Scaling Policies Are Risk Management Done Wrong

AGI x Animal Welfare: A High-EV Outreach Opportunity?

The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns

AI Takeover Scenario with Scaled LLMs

Navigating AI Risks (NAIR) #1: Slowing Down AI

Announcing the European Network for AI Safety (ENAIS)