Got my post up :). https://forum.effectivealtruism.org/posts/dKgWZ8GMNkXfRwjqH/seeking-social-science-students-collaborators-interested-in
Also “Artificial Intelligence and Global Security Initiative Research Agenda—Centre for a New American Security, no date” was published in July 2017, according to the embedded pdf in that link!
Vael Gates
[Question] People working on x-risks: what emotionally motivates you?
Seeking social science students / collaborators interested in AI existential risks
Thanks so much; I’d be excited to talk! Emailed.
The comment about counterfactuals makes me think about computational cognitive scientist Tobias Gerstenberg’s research (https://cicl.stanford.edu), where his research focuses a lot on counterfactual reasoning in the physical domain, but he also has work in the social domain.
I confess to only a surface-level understanding of MIRI’s research agenda, so I’m not quite able to connect my understanding of counterfactual reasoning in the social domain to a concrete research question within MIRI’s agenda. I’d be happy to hear more though if you had more detail!
Apply to be a Stanford HAI Junior Fellow (Assistant Professor- Research) by Nov. 15, 2021
(How to independent study)
Stephen Casper (https://stephencasper.com/) was giving advice today in how to upskill in research, and suggested doing a “deep dive”.
Deep dive: read 40-50 papers in a specific research area you’re interested in going into (e.g. adversarial examples in deep NNs). Take notes on each paper. You’ll then have comparable knowledge to people working in the area, after which you do a synthesis project at the end where you write something up (could be lit review, could be more original than that).
He said he’d trade any class he’d ever taken for one of these deep dives, and they’re worth doing even if it takes like 4 months.
*cool idea
I think classes are great given they’re targeting something you want to learn, and you’re not uncommonly self-motivated. They add a lot of structure and force engagement (i.e. homework, problem sets) in a way that’s hard to find time / energy for by yourself. You also get a fair amount of guidance and scaffolding information, plus information presented in a pedagogical order! With a lot of variance due to the skill and time investment of the instructor, size of class and quality of the curriculum etc.
But if you DO happen to be very self-driven, know what you want to learn, and if in a research context if you’re the type of person who is capable of generating novel insights without much guidance, then heck yes classes are inefficient. Even if you’re not all of these things, it certainly seems worth trying to see if you can be, since self-learning is so accessible and one learns a lot by being focusedly confused. I like how neatly presented the above deep dives idea is: it feels like it gives me enough structure to have a handle on it and makes it feel unusually feasible to do.
But yeah, for the people who are best at deep dives, I imagine it’s hard for any class to match, even with how high-variance classes can be :).
Update: I’ve been running a two-month “program” with eight of the students who reached out to me! We’ve come up with research questions from my original list, and the expectation is that individuals work 9h/week as volunteer RAs. I’ve been meeting with each person / group for 30min per week to discuss progress. We’re halfway through this experiment, with a variety of projects and progress states—hopefully you’ll see at least one EA Forum post up from those students!
--
I was quite surprised by the interest that this post generated; ~30 people reached out to me, and a large number were willing to do a volunteer research for no credit / pay. I ended up working with eight students, mostly based on their willingness to work with me on some of my short-listed projects. I was willing to have their projects drift significantly from my original list if the students were enthusiastic and the project felt decently aligned with risks from long-term AI, and that did occur. My goal here was to get some experience training students who had limited research experience, and I’ve been enjoying working with them.
I’m not sure about how likely it is I’ll continue working with students past this 2-month program, because it does take up a chunk of time (that’s made worse by trying to wrangle schedules), but I’m considering what to do for the future. If anyone’s interested in also mentoring students with an interest in longterm risks from AI, please let me know, since I think there’s interest! It’s a decently low time commitment (30m/student or group of students) once you’ve got everything sorted. However, I am doing it for the benefit of the students, rather than with the expectation of getting help on my work, so it’s more of a volunteer role.
Update on my post “Seeking social science students / collaborators interested in AI existential risks” from ~1.5 months ago:
I’ve been running a two-month “program” with eight of the students who reached out to me! We’ve come up with research questions from my original list, and the expectation is that individuals work 9h/week as volunteer research assistants. I’ve been meeting with each person / group for 30min per week to discuss progress. We’re halfway through this experiment, with a variety of projects and progress states—hopefully you’ll see at least one EA Forum post up from those students!
I was quite surprised by the interest that this post generated; ~30 people reached out to me, and a large number were willing to do a volunteer research for no credit / pay. I ended up working with eight students, mostly based on their willingness to work with me on some of my short-listed projects. I was willing to have their projects drift significantly from my original list if the students were enthusiastic and the project felt decently aligned with risks from long-term AI, and that did occur. My goal here was to get some experience training students who had limited research experience, and I’ve been enjoying working with them.
I’m not sure about how likely it is I’ll continue working with students past this 2-month program, because it does take up a chunk of time (that’s made worse by trying to wrangle schedules), but I’m considering what to do for the future. If anyone’s interested in also mentoring students with an interest in longterm risks from AI, please let me know, since I think there’s interest! It’s a decently low time commitment (30m/student or group of students) once you’ve got everything sorted. However, I am doing it for the benefit of the students, rather than with the expectation of getting help on my work, so it’s more of a volunteer role.
What would you ask on MTurk? (I could possibly run a study for you)
Awesome, thanks! Title is updated.
Just wanted to mention that if you were planning on standardizing an accelerated fellowship retreat, it seems definitely worth reaching out to CFAR folks (as mentioned), since they spent a lot of time testing models, including for post-workshop engagement, afaik! Happy to provide names / introductions if desired.
Apply for Stanford Existential Risks Initiative (SERI) Postdoc
It’s super cool :). I think SERI’s funded by a bunch of places (including some university funding, and for sure OpenPhil), but it definitely feels incredible!
I just did a fast-and-dirty version of this study with some of the students I’m TAing for, in a freshman class at Stanford called “Preventing Human Extinction”. No promises I got all the details right, in either the survey or the analysis.
—————————————————————————————————
QUICK SUMMARY OF DATA FROM https://forum.effectivealtruism.org/posts/7f3sq7ZHcRsaBBeMD/what-psychological-traits-predict-interest-in-effective
MTurkers (n=~250, having a hard time extracting it from 1-3? different samples):
- expansive altruism (M = 4.4, SD = 1.1)
- effectiveness-focus scale (M = 4.4, SD = 1.1)
- 49% of MTurkers had a mean score of 4+ on both scales
- 14% had a mean score of 5+ on both scales
- 3% had a mean score of 6+ on both scalesNYU students (n=96)
- expansive altruism (M = 4.1, SD = 1.1)
- effectiveness-focus (M = 4.3, SD = 1.1)
- 39% of NYU students had a mean score of 4+ on both scales
- 6% had a mean score of 5+ on both scales
- 2% had a mean score of 6+ on both scalesEAs (n=226):
- expansive altruism (M = 5.6, SD = 0.9)
- effectiveness-focus (M = 6.0, SD = 0.8)
- 95% of effective altruist participants had a mean score of 4+ on both scales
- 81% had a mean score of 5+ on both scales
- 33% had a mean score of 6+ on both scales——————————————————————————————————
VAEL RESULTS:
Vael personally:
- Expansive altruism: 4.2
- Effectiveness-focus: 6.3Vael sample (Stanford freshman taking a class called “Preventing Human Extinction” in 2022, n=27 included, removed one for lack of engagement)
- expansive altruism (M = 4.2, SD = 1.0)
- effectiveness-focus (M = 4.3, SD = 1.0)
- 48% of Vael sample participants had a mean score of 4+ on both scales,
− 4% had a mean score of 5+ on both scales,
− 0% had a mean score of 6+ on both scales——————————————————————————————————
Survey link is here: https://docs.google.com/forms/d/e/1FAIpQLSeY-cFioo7SLMDuHx1w4Rll6pwuRnenvjJOfi1z8WCNNwCBiA/viewform?usp=sf_link
Data is here: https://drive.google.com/file/d/1SFLH4bGC-j0nGuy315z_HH4LwdNAiusa/view?usp=sharing
And Excel apparently didn’t decide to save the formulas, gah. Formulas at the bottom are: =AVERAGE(K3:K29), =STDEV(K3:K29), =AVERAGE(R3:R29), =STDEV(R3:R29), =COUNTIF(V3:V29, TRUE)/COUNTA(V3:V29), =COUNTIF(W3:W29, TRUE)/COUNTA(W3:W29), =COUNTIF(X3:X29, TRUE)/COUNTA(X3:X29) and the other formulas are: =AND(K3>4,R3>4), =AND(K3>5,R3>5), =AND(K3>6,R3>6) dragged down through the rest of the columns
Transcripts of interviews with AI researchers
Indeed! I’ve actually found that in most of my interviews people haven’t thought about the 50+ year future much or heard of AI alignment, given that my large sample is researchers who had papers at NeurIPS or ICML. (The five researchers who were individually selected here had thought about AI alignment uncommonly much, which didn’t particularly surprise me given how they were selected.)
A nice followup direction to take this would be to get a list of common arguments used by AI researchers to be less worried about AI safety (or about working on capabilities, which is separate), counterarguments, and possible counter-counter arguments. Do you plan to touch on this kind of thing in your further work with the 86 researchers?
Yes. With the note that the arguments brought forth are generally less carefully thought-through than the ones shown in the individually-selected-population, due to the larger population. But you can get a sense for some of the types of arguments in the six transcripts from NeurIPS / ICML researchers, though I wouldn’t say it’s fully representative.
This isn’t particularly helpful since it’s not sorted, but some transcripts with ML researchers: https://www.lesswrong.com/posts/LfHWhcfK92qh2nwku/transcripts-of-interviews-with-ai-researchers
My argument structure within these interviews was basically to ask them these three questions in order, then respond from there. I chose the questions initially, but the details of the spiels were added to as I talked to researchers and started trying to respond to their comments before they made them.
1. “When do you think we’ll get AGI / capable / generalizable AI / have the cognitive capacities to have a CEO AI if we do?”Example dialogue: “All right, now I’m going to give a spiel. So, people talk about the promise of AI, which can mean many things, but one of them is getting very general capable systems, perhaps with the cognitive capabilities to replace all current human jobs so you could have a CEO AI or a scientist AI, etcetera. And I usually think about this in the frame of the 2012: we have the deep learning revolution, we’ve got AlexNet, GPUs. 10 years later, here we are, and we’ve got systems like GPT-3 which have kind of weirdly emergent capabilities. They can do some text generation and some language translation and some code and some math. And one could imagine that if we continue pouring in all the human investment that we’re pouring into this like money, competition between nations, human talent, so much talent and training all the young people up, and if we continue to have algorithmic improvements at the rate we’ve seen and continue to have hardware improvements, so maybe we get optical computing or quantum computing, then one could imagine that eventually this scales to more of quite general systems, or maybe we hit a limit and we have to do a paradigm shift in order to get to the highly capable AI stage. Regardless of how we get there, my question is, do you think this will ever happen, and if so when?”
2. “What do you think of the argument ‘highly intelligent systems will fail to optimize exactly what their designers intended them to, and this is dangerous’?”Example dialogue: “Alright, so these next questions are about these highly intelligent systems. So imagine we have a CEO AI, and I’m like, “Alright, CEO AI, I wish for you to maximize profit, and try not to exploit people, and don’t run out of money, and try to avoid side effects.” And this might be problematic, because currently we’re finding it technically challenging to translate human values preferences and intentions into mathematical formulations that can be optimized by systems, and this might continue to be a problem in the future. So what do you think of the argument “Highly intelligent systems will fail to optimize exactly what their designers intended them to and this is dangerous”?
3. “What do you think about the argument: ‘highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals, and this is dangerous’?”
Example dialogue: “Alright, next question is, so we have a CEO AI and it’s like optimizing for whatever I told it to, and it notices that at some point some of its plans are failing and it’s like, “Well, hmm, I noticed my plans are failing because I’m getting shut down. How about I make sure I don’t get shut down? So if my loss function is something that needs human approval and then the humans want a one-page memo, then I can just give them a memo that doesn’t have all the information, and that way I’m going to be better able to achieve my goal.” So not positing that the AI has a survival function in it, but as an instrumental incentive to being an agent that is optimizing for goals that are maybe not perfectly aligned, it would develop these instrumental incentives. So what do you think of the argument, “Highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals and this is dangerous”?”
Additions under “Less technical / AI strategy / AI governance”?
- https://forum.effectivealtruism.org/posts/WdMnmmqqiP5zCtSfv/cognitive-science-psychology-as-a-neglected-approach-to-ai—
https://forum.effectivealtruism.org/posts/9kNqYzEAYtvLg2BbR/baobao-zhang-how-social-science-research-can-inform-ai (though this one only has three research questions and isn’t focused on generating questions)