TW123

Karma: 3,724

An Overview of Catastrophic AI Risks

Center for AI Safety15 Aug 2023 21:52 UTC

37 points

1 comment13 min readEA link

(www.safe.ai)

TW123 3 Aug 2023 22:47 UTC
65 points
16 ∶ 0
on: University EA Groups Need Fixing
People have been having similar thoughts to yours for many years, including myself. Navigating through EA epistemic currents is treacherous. To be sure, so is navigating epistemic currents in lots of other environments, including the “default” environment for most people. But EA is sometimes presented as being “neutral” in certain ways, so it feels jarring to see that it is clearly not.
Nearly everyone I know who has been around EA long enough to do things like run a university group eventually confronts the fact that their beliefs have been shaped socially by the community in ways that are hard to understand, including by people paid to shape your beliefs. It’s challenging to know what to do in light of that. Some people reject EA. Others, like you, take breaks to figure things out more for themselves. And others press on, while trying to course correct some. Many try to create more emotional distance, regardless of what they do. There’s not really an obvious answer, and I don’t feel I’ve figured it fully out myself. All this is to just say: you’re not alone. If you or anyone else reading this wants to talk, I’m here.
Finally, I really like this related post, as well as this comment on it. When I ran the Yale EA in depth fellowship, I assigned it as a reading.
Sorry not to weigh in on the object-level parts about university groups and what you think they should do differently, but as I’ve graduated I’m no longer a community builder so I’m somewhat less interested in weighing in on that.

TW123 21 Jul 2023 22:44 UTC
10 points
3 ∶ 0
in reply to: MHR🔸’s comment on: Linkpost: 7 A.I. Companies Agree to Safeguards After Pressure From the White House
It’s also very much worth reading the linked pdf, which goes into more detail than the fact sheet.

TW123 22 Jun 2023 16:33 UTC
2 points
1 ∶ 1
on: Longtermists are perceived as power-seeking
Except, perhaps, dictators and other ne’er-do-wells.
I would guess that a significant number of power-seeking people in history and the present are power-seeking precisely because they think that those they are vying for power with are some form of “ne’er-do-wells.” So the original statement:
Importantly, when longtermists say “we should try and influence the long-term future”, I think they/we really mean everyone.
with the footnote doesn’t seem to mean very much. “Everyone, except those viewed as irresponsible,” historically, at least, has certainly not meant everyone, and to some people means very few people.

TW123 15 May 2023 11:38 UTC
3 points
0 ∶ 0
in reply to: impermanent_tao’s comment on: Introducing the ML Safety Scholars Program
There is for the ML safety component only. It’s very different from this program in time commitment (much lower), stipend (much lower), and prerequisites (much higher, requires prior ML knowledge) though. There are a lot of online courses that just teach ML, so you could take one of those on your own and then this.

https://forum.effectivealtruism.org/posts/uB8BgEvvu5YXerFbw/intro-to-ml-safety-virtual-program-12-june-14-august-1

TW123 4 May 2023 3:33 UTC
4 points
1 ∶ 0
in reply to: Chris Leong’s comment on: [$20K In Prizes] AI Safety Arguments Competition
Sure, here they are! Also linked at the top now.

TW123 1 May 2023 15:56 UTC
2 points
1 ∶ 0
in reply to: Youssef Okeil ’s comment on: Introducing the ML Safety Scholars Program
No, it is not being run again this year, sorry!

TW123 12 Apr 2023 12:45 UTC
25 points
3 ∶ 0
on: AIs accelerating AI research
I have collected existing examples of this broad class of things on ai-improving-ai.safe.ai.

[MLSN #9] Verifying large training runs, security risks from LLM access to APIs, why natural selection may favor AIs over humans

TW12311 Apr 2023 16:05 UTC

18 points

0 comments6 min readEA link

(newsletter.mlsafety.org)

TW123 30 Mar 2023 2:22 UTC
15 points
8 ∶ 2
on: Pausing AI Developments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky
(More of a meta point somewhat responding to some other comments.)
It currently seems unlikely there will be a unified AI risk public communication strategy. AI risk is an issue that affects everyone, and many people are going to weigh in on it. That includes both people who are regulars on this forum and people who have never heard of it.
I imagine many people will not be moved by Yudkowsky’s op ed, and others will be. People who think AI x-risk is an important issue but who still disagree with Yudkowsky will have their own public writing that may be partially contradictory. Of course people should continue to talk to each other about their views, in public and in private, but I don’t expect that to produce “message discipline” (nor should it).
The number of people concerned about AI x-risk is going to get large enough (and arguably already is) that credibility will become highly unevenly distributed among those concerned about AI risk. Some people may think that Yudkowsky lacks credibility, or that his op ed damages it, but that needn’t damage the credibility of everyone who is concerned about the risks. Back when there were only a few major news articles on the subject, that might have been more true, but it’s not anymore. Now everyone from Geoffrey Hinton to Gary Marcus (somehow) to Elon Musk to Yuval Noah Harari are talking about the risks. While it’s possible everyone could be lumped together as “the AI x-risk people,” at this point, I think that’s a diminishing possibility.

TW123 28 Mar 2023 12:17 UTC
6 points
3 ∶ 0
on: ThomasW’s Shortform
There is often a clash between “alignment” and “capabilities” with some saying AI labs are pretending to do alignment while doing capabilities and others say they are so closely tied it’s impossible to do good alignment research without producing capability gains.

I’m not sure this discussion will be resolved anytime soon. But I think it’s often misdirected.

I think often what people are wondering is roughly “is x a good person for doing this research?” Should it count as beneficial EA-flavored research, or is it just you being an employee at a corporate AI lab? The alignment and capabilities discussions often seem secondary to this.

Instead, think we should stick to a different notion: something is “pro-social” (not attached to the term) AI x-risk research if it’s research that (1) has a shot of reducing x-risk from AI (rather than increasing it or doing nothing) and (2) is not incentivized enough by factors external to the lab, to pro-social motivation, and to EA (for example: the market, the government, the public, social status in silicon valley, etc.)

Note (1) should include risks that the intervention changes timelines in some negative way, and (2) does not mean the intervention isn’t incentivized at all, just that it isn’t incentivized enough.

This is actually similar enough to the scale/tractability/neglectedness framework but it (1) incorporates downside risk and (2) doesn’t run into the problem of having EAs want to do things “nobody else is doing” (including other EAs). EAs should simply do things that are underincentivized and good.

So, instead of asking things like, “is OpenAI’s alignment research real alignment?” ask “how likely is it to reduce x-risk?” and “is it incentivized enough by external factors?” That should be how we assess whether to praise the people there or tell people they should go work there.

Thoughts?

Note: edited “external to EA” to “external to pro-social motivation and to EA”

ThomasW’s Quick takes

TW12328 Mar 2023 12:17 UTC

7 points

1 comment EA link

TW123 12 Mar 2023 23:28 UTC
11 points
8 ∶ 1
on: If EAs won’t go vegan what chance do animals have?
I expect that if plant based alternatives ever were to become as available, tasty, and cheap as animal products, a large proportion of people and likely nearly all EAs would become vegan. Cultural effects do matter, but in the end I expect them to be mostly downstream of technology in this particular case. Moral appeals have unfortunately had limited success on this issue.

TW123 4 Mar 2023 16:38 UTC
7 points
5 ∶ 0
in reply to: Sean_o_h’s comment on: Nick Bostrom should step down as Director of FHI
Thanks for sharing your perspective, it’s useful to hear!

TW123 4 Mar 2023 15:49 UTC
6 points
7 ∶ 3
in reply to: Søren Elverlin’s comment on: Misalignment Museum opens in San Francisco: ‘Sorry for killing most of humanity’
I don’t think the orthogonality thesis is correct in practice, and moral antirealism certainly isn’t an agreed upon position among moral philosophers, but I agree that point 17 seems far fetched.

TW123 4 Mar 2023 15:40 UTC
56 points
27 ∶ 8
on: Nick Bostrom should step down as Director of FHI
This post lines up with my outsider perspective on FHI, and it seems to be quite measured. I encourage anyone who thinks that Bostrom is really the best leader for FHI to defend that view here (anonymously, if necessary).

TW123 22 Feb 2023 2:44 UTC
24 points
9 ∶ 0
in reply to: Ben_West🔸’s comment on: EU Food Agency Recommends Banning Cages
We should also celebrate the politicians and civil servants at the European Commission and EU Food Agency for doing the right thing. Regardless of who may have talked to them, it was ultimately up to them, and so far they’ve made the right choices.

TW123 21 Feb 2023 0:55 UTC
19 points
9 ∶ 10
in reply to: Lizka’s comment on: EV UK board statement on Owen’s resignation
A suggestion I’m throwing out just for consideration: maybe create a specific section on the frontpage for statements from organizations. I don’t think there are that many organizations that want to make statements on the EA forum, but they usually seem pretty worth reading for people here. (Often: something bad happened, and here’s our official stance/explanation).

A downside could be that this means organizations can be more visible than individuals about community matters. That seems possibly bad (though also how it usually works in the broader world). But it seems worse for the forum moderators to arbitrarily decide if something is important enough to be displayed somehow.

[MLSN #8]: Mechanistic interpretability, using law to inform AI alignment, scaling laws for proxy gaming

TW12320 Feb 2023 16:06 UTC

25 points

0 comments4 min readEA link

(newsletter.mlsafety.org)

TW123 9 Feb 2023 21:45 UTC
8 points
2 ∶ 0
in reply to: Buck’s comment on: Could 80,000 Hours’ messaging be discouraging for people not working on x-risk / longtermism?
Agreed. My sense is that much of the discomfort comes from the tendancy that people have to want to have their career paths validated by a central authority. But that isn’t the point of 80k. The point of 80k is to direct people towards whatever they think is most impactful. Currently that appears to be mostly x-risk.
If you meet some of the people at places like 80k and so forth, I think it’s easier to realize that they are just people who have opinions and failings like anyone else. They put a lot of work into making career advising materials, and they might put out materials that say that what you are doing is “suboptimal.” If they are right and what you’re doing really is clearly suboptimal, then maybe you should feel bad (or not; depends on how much you want to feel bad about not maximizing your altruistic impact) . But maybe 80k is wrong! If so, you shouldn’t feel bad just because some people who happen to work at 80k made the wrong recommendation.

TW123

An Overview of Catas­trophic AI Risks

[MLSN #9] Ver­ify­ing large train­ing runs, se­cu­rity risks from LLM ac­cess to APIs, why nat­u­ral se­lec­tion may fa­vor AIs over humans

ThomasW’s Quick takes

[MLSN #8]: Mechanis­tic in­ter­pretabil­ity, us­ing law to in­form AI al­ign­ment, scal­ing laws for proxy gaming

An Overview of Catastrophic AI Risks

[MLSN #9] Verifying large training runs, security risks from LLM access to APIs, why natural selection may favor AIs over humans

[MLSN #8]: Mechanistic interpretability, using law to inform AI alignment, scaling laws for proxy gaming