Holden Karnofsky

Karma: 8,649

Holden Karnofsky Jun 21, 2024, 12:49 AM
3 points
0 ∶ 0
in reply to: Pablo’s comment on: Case studies on social-welfare-based standards in various industries
Thanks for pointing this out! Should be resolved now.

Case studies on social-welfare-based standards in various industries

Holden KarnofskyJun 20, 2024, 1:33 PM

73 points

2 comments1 min readEA link

Holden Karnofsky May 13, 2024, 9:03 PM
42 points
4 ∶ 0
in reply to: Akash’s comment on: Joining the Carnegie Endowment for International Peace
> Besides RSPs, can you give any additional examples of approaches that you’re excited about from the perspective of building a bigger tent & appealing beyond AI risk communities? This balancing act of “find ideas that resonate with broader audiences” and “find ideas that actually reduce risk and don’t merely serve as applause lights or safety washing” seems quite important. I’d be interested in hearing if you have any concrete ideas that you think strike a good balance of this, as well as any high-level advice for how to navigate this.
I’m pretty focused on red lines, and I don’t think I necessarily have big insights on other ways to build a bigger tent, but one thing I have been pretty enthused about for a while is putting more effort into investigating potentially concerning AI incidents in the wild. Based on case studies, I believe that exposing and helping the public understand any concerning incidents could easily be the most effective way to galvanize more interest in safety standards, including regulation. I’m not sure how many concerning incidents there are to be found in the wild today, but I suspect there are some, and I expect there to be more over time as AI capabilities advance.
> Additionally, how are you feeling about voluntary commitments from labs (RSPs included) relative to alternatives like mandatory regulation by governments (you can’t do X or you can’t do X unless Y), preparedness from governments (you can keep doing X but if we see Y then we’re going to do Z), or other governance mechanisms?
The work as I describe it above is not specifically focused on companies. My focus is on hammering out (a) what AI capabilities might increase the risk of a global catastrophe; (b) how we can try to catch early warning signs of these capabilities (and what challenges this involves); and (c) what protective measures (for example, strong information security and alignment guarantees) are important for safely handling such capabilities. I hope that by doing analysis on these topics, I can create useful resources for companies, governments and other parties.
I suspect that companies are likely to move faster and more iteratively on things like this than governments at this stage, and so I often pay special attention to them. But I’ve made clear that I don’t think voluntary commitments alone are sufficient, and that I think regulation will be necessary to contain AI risks. (Quote from earlier piece: “And to be explicit: I think regulation will be necessary to contain AI risks (RSPs alone are not enough), and should almost certainly end up stricter than what companies impose on themselves.”)

Holden Karnofsky May 13, 2024, 8:59 PM
51 points
11 ∶ 1
in reply to: Adam_Scholl’s comment on: Joining the Carnegie Endowment for International Peace
My spouse isn’t currently planning to divest the full amount of her equity. Some factors here: (a) It’s her decision, not mine. (b) The equity has important voting rights, such that divesting or donating it in full could have governance implications. (c) It doesn’t seem like this would have a significant marginal effect on my real or perceived conflict of interest: I could still not claim impartiality when married to the President of a company, equity or no. With these points in mind, full divestment or donation could happen in the future, but there’s no immediate plan to do it.
The bottom line is that I have a significant conflict of interest that isn’t going away, and I am trying to help reduce AI risk despite that. My new role will not have authority over grants or other significant resources besides my time and my ability to do analysis and make arguments. People encountering any analysis and arguments will have to decide how to weigh my conflict of interest for themselves, while considering arguments and analysis on the merits.
For whatever it’s worth, I have publicly said that the world would pause AI development if it were all up to me, and I make persistent efforts to ensure people I’m interacting with know this. I also believe the things I advocate for would almost universally have a negative expected effect (if any effect) on the value of the equity I’m exposed to. But I don’t expect everyone to agree with this or to be reassured by it.

Joining the Carnegie Endowment for International Peace

Holden KarnofskyApr 29, 2024, 3:45 PM

228 points

14 comments2 min readEA link

Good job opportunities for helping with the most important century

Holden KarnofskyJan 18, 2024, 7:21 PM

46 points

1 comment4 min readEA link

(www.cold-takes.com)

We’re Not Ready: thoughts on “pausing” and responsible scaling policies

Holden KarnofskyOct 27, 2023, 3:19 PM

150 points

23 comments1 min readEA link

Holden Karnofsky Oct 10, 2023, 2:38 PM
5 points
0 ∶ 0
in reply to: Zeek’s comment on: This Can’t Go On
Digital content requires physical space too, just relatively small amounts. E.g., physical resources/atoms are needed to make the calculations associated with digital interactions. At some point the number of digital interactions will be capped, and the question will be how much they can be made better and better. More on the latter here: https://www.cold-takes.com/more-on-multiple-world-size-economies-per-atom/

3 levels of threat obfuscation

Holden KarnofskyAug 2, 2023, 5:09 PM

31 points

0 comments6 min readEA link

(www.alignmentforum.org)

Holden Karnofsky Jun 10, 2023, 6:21 AM
8 points
1 ∶ 2
in reply to: Sol3:2’s comment on: A Playbook for AI Risk Reduction (focused on misaligned AI)
I expected readers to assume that my wife owned significant equity in Anthropic; I’ve now edited the post to state this explicitly (and also added a mention of her OpenAI equity, which I should’ve included before and have included in the past). I don’t plan to disclose the exact amount and don’t think this is needed for readers to have sufficient context on my statements here.

Holden Karnofsky Jun 10, 2023, 6:16 AM
12 points
2 ∶ 0
in reply to: finnhambly’s comment on: A Playbook for AI Risk Reduction (focused on misaligned AI)
Sorry, I didn’t mean to dismiss the importance of the conflict of interest or say it isn’t affecting my views.

I’ve sometimes seen people reason along the lines of “Since Holden is married to Daniela, this must mean he agrees with Anthropic on specific issue X,” or “Since Holden is married to Daniela, this must mean that he endorses taking a job at Anthropic in specific case Y.” I think this kind of reasoning is unreliable and has been incorrect in more than one specific case. That’s what I intended to push back against.

A Playbook for AI Risk Reduction (focused on misaligned AI)

Holden KarnofskyJun 6, 2023, 6:05 PM

81 points

17 comments1 min readEA link

Holden Karnofsky Jun 4, 2023, 3:21 PM
3 points
0 ∶ 0
in reply to: Koen Holtman’s comment on: Seeking (Paid) Case Studies on Standards
Thanks! I’m looking for case studies that will be public; I’m agnostic about where they’re posted beyond that. We might consider requests to fund confidential case studies, but this project is meant to inform broader efforts, so confidential case studies would still need to be cleared for sharing with a reasonable set of people, and the funding bar would be higher.

Holden Karnofsky Jun 2, 2023, 10:51 PM
2 points
0 ∶ 0
in reply to: DanielFilan’s comment on: Taking a leave of absence from Open Philanthropy to work on AI safety
I think this was a goof due to there being a separate hardcover version, which has now been removed—try again?

Seeking (Paid) Case Studies on Standards

Holden KarnofskyMay 26, 2023, 5:58 PM

99 points

14 comments1 min readEA link

Holden Karnofsky Mar 23, 2023, 9:44 PM
2 points
0 ∶ 0
in reply to: Arden Koehler’s comment on: My takes on the FTX situation will (mostly) be cold, not hot
To give a rough idea, I basically mean anyone who is likely to harm those around them (using a common-sense idea of doing harm) and/or “pollute the commons” by having an outsized and non-consultative negative impact on community dynamics. It’s debatable what the best warning signs are and how reliable they are.

Holden Karnofsky Mar 22, 2023, 10:44 PM
83 points
7 ∶ 3
on: Time Article Discussion—“Effective Altruist Leaders Were Repeatedly Warned About Sam Bankman-Fried Years Before FTX Collapsed”
Re: “In the weeks leading up to that April 2018 confrontation with Bankman-Fried and in the months that followed, Mac Aulay and others warned MacAskill, Beckstead and Karnofsky about her co-founder’s alleged duplicity and unscrupulous business ethics” -

I don’t remember Tara reaching out about this, and I just searched my email for signs of this and didn’t see any. I’m not confident this didn’t happen, just noting that I can’t remember or easily find signs of it.

In terms of what I knew/learned 2018 more generally, I discuss that here.
What links here?
- Holden Karnofsky’s recent comments on FTX by Lizka (Mar 24, 2023, 11:44 AM; 149 points)

Holden Karnofsky Mar 22, 2023, 10:40 PM
14 points
4 ∶ 0
in reply to: CuriousEA’s comment on: Taking a leave of absence from Open Philanthropy to work on AI safety
For context, my wife is the President and co-founder of Anthropic, and formerly worked at OpenAI.

80% of her equity in Anthropic is (not legally bindingly) pledged for donation. None of her equity in OpenAI is. She may pledge more in the future if there is a tangible compelling reason to do so.

I plan to be highly transparent about my conflict of interest, e.g. I regularly open meetings by disclosing it if I’m not sure the other person already knows about it, and I’ve often mentioned it when discussing related topics on Cold Takes.

I also plan to discuss the implications of my conflict of interest for any formal role I might take. It’s possible that my role in helping with safety standards will be limited to advising with no formal powers (it’s even possible that I’ll decide I simply can’t work in this area due to the conflict of interest, and will pursue one of the other interventions I’ve thought about).

But right now I’m just exploring options and giving non-authoritative advice, and that seems appropriate. (I’ll also note that I expect a lot of advice and opinions on standards to come from people who are directly employed by AI companies; while this does present a conflict of interest, and a more direct one than mine, I think it doesn’t and can’t mean they are excluded from relevant conversations.)
What links here?
- Zach Stein-Perlman's comment on Anthropic leadership conversation by Zach Stein-Perlman (LessWrong; Dec 22, 2024, 3:30 PM; 9 points)

Holden Karnofsky Mar 22, 2023, 10:37 PM
32 points
6 ∶ 0
in reply to: Ozzie Gooen’s comment on: Some comments on recent FTX-related events
There was no one with official responsibility for the relationship between FTX and the EA community. I think the main reason the two were associated was via FTX’s/Sam having a high profile and talking a lot about EA—that’s not something anyone else was able to control. (Some folks did ask him to do less of this.)

It’s also worth noting that we generally try to be cautious about power dynamics as a funder, which means we are hesitant to be pushy about most matters. In particular, I think one of two major funders in this space attacking the other, nudging grantees to avoid association and funding from it, etc. would’ve been seen as strangely territorial behavior absent very strong evidence of misconduct.

That said: as mentioned in another comment, with the benefit of hindsight, I wish I’d reasoned more like this: “This person is becoming very associated with effective altruism, so whether or not that’s due to anything I’ve done, it’s important to figure out whether that’s a bad thing and whether proactive distancing is needed.”
What links here?
- Holden Karnofsky’s recent comments on FTX by Lizka (Mar 24, 2023, 11:44 AM; 149 points)

Holden Karnofsky Mar 22, 2023, 10:35 PM
28 points
4 ∶ 1
in reply to: Sharmake’s comment on: Some comments on recent FTX-related events
In 2018, I heard accusations that Sam had communicated in ways that left people confused or misled, though often with some ambiguity about whether Sam had been confused himself, had been inadvertently misleading while factually accurate, etc. I put some effort into understanding these concerns (but didn’t spend a ton of time on it; Open Phil didn’t have a relationship with Sam or Alameda).

I didn’t hear anything that sounded anywhere near as bad as what has since come out about his behavior at FTX. At the time I didn’t feel my concerns rose to the level where it would be appropriate or fair to publicly attack or condemn him. The whole situation did make me vaguely nervous, and I spoke with some people about it privately, but I never came to a conclusion that there was a clearly warranted (public) action.
What links here?
- Holden Karnofsky’s recent comments on FTX by Lizka (Mar 24, 2023, 11:44 AM; 149 points)
- Holden Karnofsky's comment on Time Article Discussion—“Effective Altruist Leaders Were Repeatedly Warned About Sam Bankman-Fried Years Before FTX Collapsed” by Nathan Young (Mar 22, 2023, 10:44 PM; 83 points)

Holden Karnofsky

Case stud­ies on so­cial-welfare-based stan­dards in var­i­ous industries

Join­ing the Carnegie En­dow­ment for In­ter­na­tional Peace

Good job op­por­tu­ni­ties for helping with the most im­por­tant century

We’re Not Ready: thoughts on “paus­ing” and re­spon­si­ble scal­ing policies

3 lev­els of threat obfuscation

A Play­book for AI Risk Re­duc­tion (fo­cused on mis­al­igned AI)

Seek­ing (Paid) Case Stud­ies on Standards