(“AI can have bad consequences” as a motivation for AI safety--> Yes, but AI can have bad consequences in meaningfully different ways!)
Here’s some frame confusion that I see a lot, that I think leads to confused intuitions (especially when trying to reason about existential risk from advanced AI, as opposed to today’s systems):
1. There’s (weapons) -- tech like nuclear weapons or autonomous weapons that if used correctly involve people dying. (Tech like this exists)
2. There’s (misuse) -- tech was intentioned to be anywhere from beneficial <> neutral <> seems high on offense-defense balance, but it wasn’t designed for harm. Examples here include social media, identification systems, language models, surveillance systems. (Tech like this exists)
3. There’s (advanced AI pursuing instrumental incentives --> causing existential risk), which is not about misuse, it’s about the *system itself* being an optimizer and seeking power (humans are not the problem here, the AI itself is, once the AI is sufficiently advanced). (Tech like this does not exist)
You can say “AI is bad” for all of them, and they’re all problems, but they’re different problems and should be thought of separately. (1) is a problem (autonomous weapons is the AI version of it) but is pretty independent from (3). Technical AI safety discussion is mostly about the power-seeking agent issue (3). (2) is a problem all the time for all tech (though some tech lends itself more to this than others). They’re all going to need to get solved, but at least (1) and (2) are problems humanity has any experience with (and so we have at least some structures in place to deal with them, and people are aware these are problems).
Vael Gates
^ Yeah, endorsed! This is work in (3)-- if you’ve got the skills and interests, going to work with Josh and Lucius seems like an excellent opportunity, and they’ve got lots of interesting projects lined up.
I think my data has insights about 3, and not about 1 and 2! You can take a look at https://www.lesswrong.com/posts/LfHWhcfK92qh2nwku/transcripts-of-interviews-with-ai-researchers to see what 11 interviews look like; I think it’d have to be designed differently to get info on 1 or 2.
Sounds great; thanks Sawyer! “Reaching out to BERI” was definitely listed in my planning docs for this post; if there’s anything that seems obvious to communicate about, happy to take a call, otherwise I’ll reach out if anything seems overlapping.
Announcing the AI Safety Field Building Hub, a new effort to provide AISFB projects, mentorship, and funding
Thanks levin! I realized before I published that I hadn’t gotten nearly enough governance people to review this, and indeed was hoping I’d get help in the comment section.
I’d thus be excited to hear more. Do you have specific questions / subareas of governance that are appreciably benefited by having a background in “economics, political science, legal studies, anthropology, sociology, psychology, and history” rather than a more generic “generalist”-type background (which can include any of the previous, but doesn’t depend on any of them?)
I view the core of this post as trying to push back a bit on inclusive “social scientists are useful!” framings, and instead diving into more specific instances of what kind of jobs and roles are available today that demand specific skills, or alternatively pointing out where I think background isn’t actually key and excellent generalist skills are what are sought.
Social scientists interested in AI safety should consider doing direct technical AI safety research, (possibly meta-research), or governance, support roles, or community building instead
“Preventing Human Extinction” at Stanford (first year undergraduate course)
Syllabus (2022)Additional subject-specific reading lists (AI, bio, nuclear, climate) (2022)
@Pablo Could you also your longtermism list with the syllabus, and with the edit that the class is taught by Steve Luby and Paul Edwards jointly? Thanks and thanks for keeping this list :) .
Vael Gates: Risks from Advanced AI (June 2022)
Great idea, thank you Vaidehi! I’m pulling this from the Forum and will repost once I get that done. (Update: Was reposted)
I haven’t received much feedback on this video yet, so I’m very curious to know how it’s received! I’m interested in critiques and things that it does well, so I can refine future descriptions and know who to send this to.
I’ve been finding “A Bird’s Eye View of the ML Field [Pragmatic AI Safety #2]” to have a lot of content that would likely be interesting to the audience reading these transcripts. For example, the incentives section rhymes with the type of things interviewees would sometimes say. I think the post generally captures and analyzes a lot of the flavor / contextualizes what it was like to talk to researchers.
This isn’t particularly helpful since it’s not sorted, but some transcripts with ML researchers: https://www.lesswrong.com/posts/LfHWhcfK92qh2nwku/transcripts-of-interviews-with-ai-researchers
My argument structure within these interviews was basically to ask them these three questions in order, then respond from there. I chose the questions initially, but the details of the spiels were added to as I talked to researchers and started trying to respond to their comments before they made them.
1. “When do you think we’ll get AGI / capable / generalizable AI / have the cognitive capacities to have a CEO AI if we do?”Example dialogue: “All right, now I’m going to give a spiel. So, people talk about the promise of AI, which can mean many things, but one of them is getting very general capable systems, perhaps with the cognitive capabilities to replace all current human jobs so you could have a CEO AI or a scientist AI, etcetera. And I usually think about this in the frame of the 2012: we have the deep learning revolution, we’ve got AlexNet, GPUs. 10 years later, here we are, and we’ve got systems like GPT-3 which have kind of weirdly emergent capabilities. They can do some text generation and some language translation and some code and some math. And one could imagine that if we continue pouring in all the human investment that we’re pouring into this like money, competition between nations, human talent, so much talent and training all the young people up, and if we continue to have algorithmic improvements at the rate we’ve seen and continue to have hardware improvements, so maybe we get optical computing or quantum computing, then one could imagine that eventually this scales to more of quite general systems, or maybe we hit a limit and we have to do a paradigm shift in order to get to the highly capable AI stage. Regardless of how we get there, my question is, do you think this will ever happen, and if so when?”
2. “What do you think of the argument ‘highly intelligent systems will fail to optimize exactly what their designers intended them to, and this is dangerous’?”Example dialogue: “Alright, so these next questions are about these highly intelligent systems. So imagine we have a CEO AI, and I’m like, “Alright, CEO AI, I wish for you to maximize profit, and try not to exploit people, and don’t run out of money, and try to avoid side effects.” And this might be problematic, because currently we’re finding it technically challenging to translate human values preferences and intentions into mathematical formulations that can be optimized by systems, and this might continue to be a problem in the future. So what do you think of the argument “Highly intelligent systems will fail to optimize exactly what their designers intended them to and this is dangerous”?
3. “What do you think about the argument: ‘highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals, and this is dangerous’?”
Example dialogue: “Alright, next question is, so we have a CEO AI and it’s like optimizing for whatever I told it to, and it notices that at some point some of its plans are failing and it’s like, “Well, hmm, I noticed my plans are failing because I’m getting shut down. How about I make sure I don’t get shut down? So if my loss function is something that needs human approval and then the humans want a one-page memo, then I can just give them a memo that doesn’t have all the information, and that way I’m going to be better able to achieve my goal.” So not positing that the AI has a survival function in it, but as an instrumental incentive to being an agent that is optimizing for goals that are maybe not perfectly aligned, it would develop these instrumental incentives. So what do you think of the argument, “Highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals and this is dangerous”?”
Indeed! I’ve actually found that in most of my interviews people haven’t thought about the 50+ year future much or heard of AI alignment, given that my large sample is researchers who had papers at NeurIPS or ICML. (The five researchers who were individually selected here had thought about AI alignment uncommonly much, which didn’t particularly surprise me given how they were selected.)
A nice followup direction to take this would be to get a list of common arguments used by AI researchers to be less worried about AI safety (or about working on capabilities, which is separate), counterarguments, and possible counter-counter arguments. Do you plan to touch on this kind of thing in your further work with the 86 researchers?
Yes. With the note that the arguments brought forth are generally less carefully thought-through than the ones shown in the individually-selected-population, due to the larger population. But you can get a sense for some of the types of arguments in the six transcripts from NeurIPS / ICML researchers, though I wouldn’t say it’s fully representative.
Transcripts of interviews with AI researchers
I just did a fast-and-dirty version of this study with some of the students I’m TAing for, in a freshman class at Stanford called “Preventing Human Extinction”. No promises I got all the details right, in either the survey or the analysis.
—————————————————————————————————
QUICK SUMMARY OF DATA FROM https://forum.effectivealtruism.org/posts/7f3sq7ZHcRsaBBeMD/what-psychological-traits-predict-interest-in-effective
MTurkers (n=~250, having a hard time extracting it from 1-3? different samples):
- expansive altruism (M = 4.4, SD = 1.1)
- effectiveness-focus scale (M = 4.4, SD = 1.1)
- 49% of MTurkers had a mean score of 4+ on both scales
- 14% had a mean score of 5+ on both scales
- 3% had a mean score of 6+ on both scalesNYU students (n=96)
- expansive altruism (M = 4.1, SD = 1.1)
- effectiveness-focus (M = 4.3, SD = 1.1)
- 39% of NYU students had a mean score of 4+ on both scales
- 6% had a mean score of 5+ on both scales
- 2% had a mean score of 6+ on both scalesEAs (n=226):
- expansive altruism (M = 5.6, SD = 0.9)
- effectiveness-focus (M = 6.0, SD = 0.8)
- 95% of effective altruist participants had a mean score of 4+ on both scales
- 81% had a mean score of 5+ on both scales
- 33% had a mean score of 6+ on both scales——————————————————————————————————
VAEL RESULTS:
Vael personally:
- Expansive altruism: 4.2
- Effectiveness-focus: 6.3Vael sample (Stanford freshman taking a class called “Preventing Human Extinction” in 2022, n=27 included, removed one for lack of engagement)
- expansive altruism (M = 4.2, SD = 1.0)
- effectiveness-focus (M = 4.3, SD = 1.0)
- 48% of Vael sample participants had a mean score of 4+ on both scales,
− 4% had a mean score of 5+ on both scales,
− 0% had a mean score of 6+ on both scales——————————————————————————————————
Survey link is here: https://docs.google.com/forms/d/e/1FAIpQLSeY-cFioo7SLMDuHx1w4Rll6pwuRnenvjJOfi1z8WCNNwCBiA/viewform?usp=sf_link
Data is here: https://drive.google.com/file/d/1SFLH4bGC-j0nGuy315z_HH4LwdNAiusa/view?usp=sharing
And Excel apparently didn’t decide to save the formulas, gah. Formulas at the bottom are: =AVERAGE(K3:K29), =STDEV(K3:K29), =AVERAGE(R3:R29), =STDEV(R3:R29), =COUNTIF(V3:V29, TRUE)/COUNTA(V3:V29), =COUNTIF(W3:W29, TRUE)/COUNTA(W3:W29), =COUNTIF(X3:X29, TRUE)/COUNTA(X3:X29) and the other formulas are: =AND(K3>4,R3>4), =AND(K3>5,R3>5), =AND(K3>6,R3>6) dragged down through the rest of the columns
It’s super cool :). I think SERI’s funded by a bunch of places (including some university funding, and for sure OpenPhil), but it definitely feels incredible!
Apply for Stanford Existential Risks Initiative (SERI) Postdoc
Just wanted to mention that if you were planning on standardizing an accelerated fellowship retreat, it seems definitely worth reaching out to CFAR folks (as mentioned), since they spent a lot of time testing models, including for post-workshop engagement, afaik! Happy to provide names / introductions if desired.
Suggestion for a project from Jonathan Yan: