Vael Gates

Karma: 1,150

[Question] People working on x-risks: what emotionally motivates you?

Vael Gates5 Jul 2021 3:16 UTC

16 points

8 comments1 min readEA link

Vael Gates 20 Sep 2021 1:29 UTC
3 points
0 ∶ 0
on: A central directory for open research questions
Additions under “Less technical / AI strategy / AI governance”?

- https://forum.effectivealtruism.org/posts/WdMnmmqqiP5zCtSfv/cognitive-science-psychology-as-a-neglected-approach-to-ai—
https://forum.effectivealtruism.org/posts/9kNqYzEAYtvLg2BbR/baobao-zhang-how-social-science-research-can-inform-ai (though this one only has three research questions and isn’t focused on generating questions)

Seeking social science students / collaborators interested in AI existential risks

Vael Gates24 Sep 2021 21:56 UTC

58 points

7 comments3 min readEA link

Vael Gates 24 Sep 2021 21:58 UTC
1 point
0 ∶ 0
in reply to: MichaelA’s comment on: A central directory for open research questions
Got my post up :). https://forum.effectivealtruism.org/posts/dKgWZ8GMNkXfRwjqH/seeking-social-science-students-collaborators-interested-in

Also “Artificial Intelligence and Global Security Initiative Research Agenda—Centre for a New American Security, no date” was published in July 2017, according to the embedded pdf in that link!

Vael Gates 26 Sep 2021 8:56 UTC
1 point
0 ∶ 0
in reply to: Geoffrey Irving’s comment on: Seeking social science students / collaborators interested in AI existential risks
Thanks so much; I’d be excited to talk! Emailed.

Vael Gates 26 Sep 2021 9:07 UTC
1 point
0 ∶ 0
in reply to: Chris Leong’s comment on: Seeking social science students / collaborators interested in AI existential risks
The comment about counterfactuals makes me think about computational cognitive scientist Tobias Gerstenberg’s research (https://cicl.stanford.edu), where his research focuses a lot on counterfactual reasoning in the physical domain, but he also has work in the social domain.

I confess to only a surface-level understanding of MIRI’s research agenda, so I’m not quite able to connect my understanding of counterfactual reasoning in the social domain to a concrete research question within MIRI’s agenda. I’d be happy to hear more though if you had more detail!

Apply to be a Stanford HAI Junior Fellow (Assistant Professor- Research) by Nov. 15, 2021

Vael Gates31 Oct 2021 2:21 UTC

15 points

0 comments1 min readEA link

Vael Gates 8 Nov 2021 2:44 UTC
18 points
0 ∶ 0
on: Vael Gates’s Shortform
(How to independent study)

Stephen Casper (https://stephencasper.com/) was giving advice today in how to upskill in research, and suggested doing a “deep dive”.

Deep dive: read 40-50 papers in a specific research area you’re interested in going into (e.g. adversarial examples in deep NNs). Take notes on each paper. You’ll then have comparable knowledge to people working in the area, after which you do a synthesis project at the end where you write something up (could be lit review, could be more original than that).

He said he’d trade any class he’d ever taken for one of these deep dives, and they’re worth doing even if it takes like 4 months.

*cool idea

Vael Gates 8 Nov 2021 23:09 UTC
9 points
0 ∶ 0
in reply to: Miranda_Zhang’s comment on: Vael Gates’s Shortform
I think classes are great given they’re targeting something you want to learn, and you’re not uncommonly self-motivated. They add a lot of structure and force engagement (i.e. homework, problem sets) in a way that’s hard to find time / energy for by yourself. You also get a fair amount of guidance and scaffolding information, plus information presented in a pedagogical order! With a lot of variance due to the skill and time investment of the instructor, size of class and quality of the curriculum etc.

But if you DO happen to be very self-driven, know what you want to learn, and if in a research context if you’re the type of person who is capable of generating novel insights without much guidance, then heck yes classes are inefficient. Even if you’re not all of these things, it certainly seems worth trying to see if you can be, since self-learning is so accessible and one learns a lot by being focusedly confused. I like how neatly presented the above deep dives idea is: it feels like it gives me enough structure to have a handle on it and makes it feel unusually feasible to do.

But yeah, for the people who are best at deep dives, I imagine it’s hard for any class to match, even with how high-variance classes can be :).

Vael Gates 12 Nov 2021 23:55 UTC
3 points
0 ∶ 0
on: Seeking social science students / collaborators interested in AI existential risks
Update: I’ve been running a two-month “program” with eight of the students who reached out to me! We’ve come up with research questions from my original list, and the expectation is that individuals work 9h/week as volunteer RAs. I’ve been meeting with each person / group for 30min per week to discuss progress. We’re halfway through this experiment, with a variety of projects and progress states—hopefully you’ll see at least one EA Forum post up from those students!

--

I was quite surprised by the interest that this post generated; ~30 people reached out to me, and a large number were willing to do a volunteer research for no credit / pay. I ended up working with eight students, mostly based on their willingness to work with me on some of my short-listed projects. I was willing to have their projects drift significantly from my original list if the students were enthusiastic and the project felt decently aligned with risks from long-term AI, and that did occur. My goal here was to get some experience training students who had limited research experience, and I’ve been enjoying working with them.

I’m not sure about how likely it is I’ll continue working with students past this 2-month program, because it does take up a chunk of time (that’s made worse by trying to wrangle schedules), but I’m considering what to do for the future. If anyone’s interested in also mentoring students with an interest in longterm risks from AI, please let me know, since I think there’s interest! It’s a decently low time commitment (30m/student or group of students) once you’ve got everything sorted. However, I am doing it for the benefit of the students, rather than with the expectation of getting help on my work, so it’s more of a volunteer role.

Vael Gates 12 Nov 2021 23:58 UTC
2 points
0 ∶ 0
on: Vael Gates’s Shortform
Update on my post “Seeking social science students / collaborators interested in AI existential risks” from ~1.5 months ago:

I’ve been running a two-month “program” with eight of the students who reached out to me! We’ve come up with research questions from my original list, and the expectation is that individuals work 9h/week as volunteer research assistants. I’ve been meeting with each person / group for 30min per week to discuss progress. We’re halfway through this experiment, with a variety of projects and progress states—hopefully you’ll see at least one EA Forum post up from those students!

I was quite surprised by the interest that this post generated; ~30 people reached out to me, and a large number were willing to do a volunteer research for no credit / pay. I ended up working with eight students, mostly based on their willingness to work with me on some of my short-listed projects. I was willing to have their projects drift significantly from my original list if the students were enthusiastic and the project felt decently aligned with risks from long-term AI, and that did occur. My goal here was to get some experience training students who had limited research experience, and I’ve been enjoying working with them.

I’m not sure about how likely it is I’ll continue working with students past this 2-month program, because it does take up a chunk of time (that’s made worse by trying to wrangle schedules), but I’m considering what to do for the future. If anyone’s interested in also mentoring students with an interest in longterm risks from AI, please let me know, since I think there’s interest! It’s a decently low time commitment (30m/student or group of students) once you’ve got everything sorted. However, I am doing it for the benefit of the students, rather than with the expectation of getting help on my work, so it’s more of a volunteer role.

What would you ask on MTurk? (I could possibly run a study for you)

Vael Gates13 Nov 2021 0:57 UTC

13 points

3 comments1 min readEA link

Vael Gates 13 Nov 2021 1:24 UTC
3 points
0 ∶ 0
in reply to: DirectedEvolution’s comment on: What would you ask on MTurk? (I could possibly run a study for you)
Awesome, thanks! Title is updated.

Vael Gates 21 Nov 2021 20:15 UTC
4 points
0 ∶ 0
on: We need alternatives to Intro EA Fellowships
Just wanted to mention that if you were planning on standardizing an accelerated fellowship retreat, it seems definitely worth reaching out to CFAR folks (as mentioned), since they spent a lot of time testing models, including for post-workshop engagement, afaik! Happy to provide names / introductions if desired.

Apply for Stanford Existential Risks Initiative (SERI) Postdoc

Vael Gates14 Dec 2021 21:50 UTC

28 points

2 comments1 min readEA link

Vael Gates 17 Dec 2021 0:01 UTC
3 points
0 ∶ 0
in reply to: mic’s comment on: Apply for Stanford Existential Risks Initiative (SERI) Postdoc
It’s super cool :). I think SERI’s funded by a bunch of places (including some university funding, and for sure OpenPhil), but it definitely feels incredible!

Vael Gates 19 Apr 2022 22:37 UTC
18 points
0 ∶ 0
on: What psychological traits predict interest in effective altruism?
I just did a fast-and-dirty version of this study with some of the students I’m TAing for, in a freshman class at Stanford called “Preventing Human Extinction”. No promises I got all the details right, in either the survey or the analysis.
—————————————————————————————————
QUICK SUMMARY OF DATA FROM https://forum.effectivealtruism.org/posts/7f3sq7ZHcRsaBBeMD/what-psychological-traits-predict-interest-in-effective
MTurkers (n=~250, having a hard time extracting it from 1-3? different samples):
- expansive altruism (M = 4.4, SD = 1.1)
- effectiveness-focus scale (M = 4.4, SD = 1.1)
- 49% of MTurkers had a mean score of 4+ on both scales
- 14% had a mean score of 5+ on both scales
- 3% had a mean score of 6+ on both scales
NYU students (n=96)
- expansive altruism (M = 4.1, SD = 1.1)
- effectiveness-focus (M = 4.3, SD = 1.1)
- 39% of NYU students had a mean score of 4+ on both scales
- 6% had a mean score of 5+ on both scales
- 2% had a mean score of 6+ on both scales
EAs (n=226):
- expansive altruism (M = 5.6, SD = 0.9)
- effectiveness-focus (M = 6.0, SD = 0.8)
- 95% of effective altruist participants had a mean score of 4+ on both scales
- 81% had a mean score of 5+ on both scales
- 33% had a mean score of 6+ on both scales
——————————————————————————————————
VAEL RESULTS:
Vael personally:
- Expansive altruism: 4.2
- Effectiveness-focus: 6.3
Vael sample (Stanford freshman taking a class called “Preventing Human Extinction” in 2022, n=27 included, removed one for lack of engagement)
- expansive altruism (M = 4.2, SD = 1.0)
- effectiveness-focus (M = 4.3, SD = 1.0)
- 48% of Vael sample participants had a mean score of 4+ on both scales,
− 4% had a mean score of 5+ on both scales,
− 0% had a mean score of 6+ on both scales
——————————————————————————————————
Survey link is here: https://docs.google.com/forms/d/e/1FAIpQLSeY-cFioo7SLMDuHx1w4Rll6pwuRnenvjJOfi1z8WCNNwCBiA/viewform?usp=sf_link
Data is here: https://drive.google.com/file/d/1SFLH4bGC-j0nGuy315z_HH4LwdNAiusa/view?usp=sharing
And Excel apparently didn’t decide to save the formulas, gah. Formulas at the bottom are: =AVERAGE(K3:K29), =STDEV(K3:K29), =AVERAGE(R3:R29), =STDEV(R3:R29), =COUNTIF(V3:V29, TRUE)/COUNTA(V3:V29), =COUNTIF(W3:W29, TRUE)/COUNTA(W3:W29), =COUNTIF(X3:X29, TRUE)/COUNTA(X3:X29) and the other formulas are: =AND(K3>4,R3>4), =AND(K3>5,R3>5), =AND(K3>6,R3>6) dragged down through the rest of the columns

Transcripts of interviews with AI researchers

Vael Gates9 May 2022 6:03 UTC

140 points

14 comments2 min readEA link

Vael Gates 9 May 2022 22:04 UTC
6 points
0 ∶ 0
in reply to: hb574’s comment on: Transcripts of interviews with AI researchers
Indeed! I’ve actually found that in most of my interviews people haven’t thought about the 50+ year future much or heard of AI alignment, given that my large sample is researchers who had papers at NeurIPS or ICML. (The five researchers who were individually selected here had thought about AI alignment uncommonly much, which didn’t particularly surprise me given how they were selected.)
A nice followup direction to take this would be to get a list of common arguments used by AI researchers to be less worried about AI safety (or about working on capabilities, which is separate), counterarguments, and possible counter-counter arguments. Do you plan to touch on this kind of thing in your further work with the 86 researchers?
Yes. With the note that the arguments brought forth are generally less carefully thought-through than the ones shown in the individually-selected-population, due to the larger population. But you can get a sense for some of the types of arguments in the six transcripts from NeurIPS / ICML researchers, though I wouldn’t say it’s fully representative.

Vael Gates 12 May 2022 23:24 UTC
1 point
0 ∶ 0
on: [$20K In Prizes] AI Safety Arguments Competition
This isn’t particularly helpful since it’s not sorted, but some transcripts with ML researchers: https://www.lesswrong.com/posts/LfHWhcfK92qh2nwku/transcripts-of-interviews-with-ai-researchers

My argument structure within these interviews was basically to ask them these three questions in order, then respond from there. I chose the questions initially, but the details of the spiels were added to as I talked to researchers and started trying to respond to their comments before they made them.

1. “When do you think we’ll get AGI / capable / generalizable AI / have the cognitive capacities to have a CEO AI if we do?”
- Example dialogue: “All right, now I’m going to give a spiel. So, people talk about the promise of AI, which can mean many things, but one of them is getting very general capable systems, perhaps with the cognitive capabilities to replace all current human jobs so you could have a CEO AI or a scientist AI, etcetera. And I usually think about this in the frame of the 2012: we have the deep learning revolution, we’ve got AlexNet, GPUs. 10 years later, here we are, and we’ve got systems like GPT-3 which have kind of weirdly emergent capabilities. They can do some text generation and some language translation and some code and some math. And one could imagine that if we continue pouring in all the human investment that we’re pouring into this like money, competition between nations, human talent, so much talent and training all the young people up, and if we continue to have algorithmic improvements at the rate we’ve seen and continue to have hardware improvements, so maybe we get optical computing or quantum computing, then one could imagine that eventually this scales to more of quite general systems, or maybe we hit a limit and we have to do a paradigm shift in order to get to the highly capable AI stage. Regardless of how we get there, my question is, do you think this will ever happen, and if so when?”
2. “What do you think of the argument ‘highly intelligent systems will fail to optimize exactly what their designers intended them to, and this is dangerous’?”
- Example dialogue: “Alright, so these next questions are about these highly intelligent systems. So imagine we have a CEO AI, and I’m like, “Alright, CEO AI, I wish for you to maximize profit, and try not to exploit people, and don’t run out of money, and try to avoid side effects.” And this might be problematic, because currently we’re finding it technically challenging to translate human values preferences and intentions into mathematical formulations that can be optimized by systems, and this might continue to be a problem in the future. So what do you think of the argument “Highly intelligent systems will fail to optimize exactly what their designers intended them to and this is dangerous”?
3. “What do you think about the argument: ‘highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals, and this is dangerous’?”
- Example dialogue: “Alright, next question is, so we have a CEO AI and it’s like optimizing for whatever I told it to, and it notices that at some point some of its plans are failing and it’s like, “Well, hmm, I noticed my plans are failing because I’m getting shut down. How about I make sure I don’t get shut down? So if my loss function is something that needs human approval and then the humans want a one-page memo, then I can just give them a memo that doesn’t have all the information, and that way I’m going to be better able to achieve my goal.” So not positing that the AI has a survival function in it, but as an instrumental incentive to being an agent that is optimizing for goals that are maybe not perfectly aligned, it would develop these instrumental incentives. So what do you think of the argument, “Highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals and this is dangerous”?”

Vael Gates

[Question] Peo­ple work­ing on x-risks: what emo­tion­ally mo­ti­vates you?

Seek­ing so­cial sci­ence stu­dents /​ col­lab­o­ra­tors in­ter­ested in AI ex­is­ten­tial risks

Ap­ply to be a Stan­ford HAI Ju­nior Fel­low (As­sis­tant Pro­fes­sor- Re­search) by Nov. 15, 2021

What would you ask on MTurk? (I could pos­si­bly run a study for you)

Ap­ply for Stan­ford Ex­is­ten­tial Risks Ini­ti­a­tive (SERI) Postdoc

Tran­scripts of in­ter­views with AI researchers