Assessing SERI/CHERI/CERI summer program impact by surveying fellows

L Rudolf LSep 26, 2022, 3:29 PM

102 points

Pivotal Research Fellowships and internships Impact assessment Stanford Existential Risks Initiative Community Building effective altruism Cause prioritization Existential Risk Alliance Career choice Existential risk Postmortems & retrospectives

The three largest existential risk initiatives (ERIs) are SERI (Stanford), CHERI (Switzerland), and CERI (Cambridge). All three organized a paid summer research fellowship/program where fellows/participants are matched with mentors and do x-risk relevant research for 8 (CHERI) or 10 (SERI/CERI) weeks.

ERI summer programs are among the most-publicized and resource-intensive projects aimed at helping people get started on x-risk careers, so information about the impact they have is valuable. The existence of three organizations running a similar program but with some variation in strategy and implementation also creates an opportunity to run a natural experiment on what works and what doesn’t for this type of program.

All ERIs do their own impact assessments (and are assessed by Open Philanthropy when they apply for funding), and ran their own feedback surveys for their fellows in addition to the Joint ERI Survey (JERIS). The purpose of JERIS is restricted to assessing fellows’ experience and own reflections, but it does so by asking fellows in all programs the same questions for comparability.

There are three categories of people this post is most useful for:

ERI organizers who want to understand and improve their programs.
Potential future ERI fellows who want to get an idea of what past ERI fellows thought of the program.
Grant-makers and entrepreneurs trying to get a sense of what types of projects are valuable and in what ways.

Highlights

If ERI fellows had not been accepted into ERI programs, the most likely things they would’ve done instead are: a non-EA/x-risk internship, tried to do research and/or upskilling on their own, done nothing career-related during the summer, or done a non-x-risk EA job.
The top next career options ERI fellows are considering are research roles (including many specifically planning to do a PhD, but many also not having decided on specific area of research), technical AI alignment, and EA/x-risk community-building.
Fellows’ estimates of the probability they pursue an x-risk career start out high (~0.8) and do not measurably increase during the program.
Fellows’ estimate of how comfortable they would be pursuing a research project remains effectively constant. Many start out very comfortable with research. A few decline.
Fellows think they’ve gotten roughly as much research skill gain as they would have from a more established research internship in academia.
Fellows found their mentors to be friendly and generally useful, though the latter may have a two-humped distribution.
There was wide variation in where project ideas came from (mentors, fellows, or other sources).
Networking, learning to do research, and becoming a stronger candidate for academic (but not industry) jobs top the list of what participants found most valuable about the programs.
Fellows generally really enjoyed the program.
Fully remote fellows felt significantly less part of the same community/team as the other fellows. Partly and fully in-person fellows had comparable (high) feelings of belonging to the same community.
Remote fellows are at a significant disadvantage in finding people who might help them with their career (on average, they leave their ERI program comfortable asking 4 people for a career-related favor, compared to ~10 for fully or partly in-person fellows).
Remote fellows plan to maintain contact with fewer other fellows than partly or fully in-person fellows (2 vs ~5).
Being partly in-person provides most of the benefits of being fully in-person.
Women are underrepresented, but seem to enjoy and feel part of the program, and feel comfortable asking as many others for career favors, to the same extent as men.
Many fellows want to work in teams, or have neutral/mixed feelings about team work. A lower number actively think working individually is ideal. No fellow who worked in a team thinks they would’ve been more productive or had more fun if they worked alone.

Program features

The information below for each ERI was provided by someone involved in running that ERI’s summer program. Some of it might be out of date, as it reflected plans at the start of the summer.

	SERI	CHERI	CERI
Start	2022-06-13	2022-07-04	2022-07-04
End	2022-08-19	2022-08-28	2022-09-09
Duration	10 weeks	8 weeks	10 weeks
In-person component?	11 in-person, 17 remote All in-person: 2022-07-25 to 2022-07-31	In-person at least: 1 week (2022-07-04 to 2022-07-08), 1 weekend (2022-08-26 to 2022-08-28), optional 2 coworkathons, a small group is organizing an in-person stay together	Fellows live in the same place (Emmanuel College, Cambridge) and work in-person at the same office.
Applications	~325	~80	~650
Advertising methods	EA Forum, EAG London, EAGx Boston, individual outreach	EA slacks Personal Messages FB, SERI conference Feb.	EA Forum, LinkedIn/FB advertising (~£500), EA Slack groups, etc.
Participants	33 (12 in-person, 21 remote)	21	24
Application process outline	Written short-answer application, no project proposal Assessment by 1 program coordinator and cause area manager Interview with cause area manager	First round: a) one-page project or b) five potential questions Second round: a) interview b) survey about reasoning quality, incl. small essay	Initial application including long (~2h) essay component, blindly assessed by two people each (cause area lead + another person), followed by interview round
Mentor matching / general on-boarding outline	High variance between participants, done by cause area managers, sometimes based on project preferences of fellows	Mentor matching done by team, based on preferences of the students & our network	Mentor matching happened before the start of the programme. It was led by our cause area leads, and was a very personalised process for each fellow, depending on what project they wanted to pursue and what type of mentorship would have been most useful for them in the long run.
Participant salary/stipend + other perks	$7,500 Also travel and accommodation for in-person fellows.	CHF 6000, ~$6000 Also travel/accommodation for in-person event	£5600, ~$7060 Also travel, accommodation, and food

Survey method

Representatives from each ERI brainstormed hypotheses to test and questions to ask, and then had a meeting where we finalized the set of survey questions. There were three distinct surveys, to be completed by fellows at the beginning, middle, and end of the summer program respectively.

Anonymization and resulting issues

We thought fellows would be concerned about being de-anonymized and therefore hesitant to share candid feedback, especially as in many cases the combination of organization, cause area, and gender was sufficient to pin down a fellow, especially if the cause area was a less-popular one or the gender was female. The solution we ended up with was having a question asking fellows to generate a unique anonymous identifier for themselves by combining personal information in a hard-to-reverse way (or by generating a random number for themselves and keeping it for later, which a few of them actually took us up on) (also this generation method turned out to be insufficiently high entropy, as there were two collisions, resulting in 4 sets of answers being discarded). Each question asked for this unique identifier, but only the last survey asked for organization, cause area, and gender.

Therefore, the answers of a fellow (in any survey) could only be linked to an organization if that same fellow had also completed the last survey (and also correctly re-generated their unique identifier, or, in a few cases, correctly stored and retrieved the one they made previously).

Unfortunately, most fellows did not complete all three surveys. 50 people completed the last survey, but of them only 29 could be linked with a unique identifier in the middle survey, and 25 with an answer in the first survey.

We were not optimistic about fellows filling in feedback surveys; each program made time for them, and in some cases there were multiple reminders. It seems we were not quite pessimistic enough, however.

Results

First, it is helpful to keep in mind the breakdown of (survey-answering) fellows by organization and whether they were in-person or not:

Counterfactual options and career plans at start

The first survey included a question “What would you have done if you had not been accepted into the programme?”. I went through the answers and identified the categories of thing listed. In the graph below, if a respondent gave only one answer (e.g. something that fell under the category of “other EA job”), that was counted as +1 to that category. If they listed $n$ , an additional score of $+ 1 / n$ was added to all the categories of thing they mentioned.

Some notes on categories:

“other internship” excludes things that fall under other categories, like other ERI, other EA job, and other x-risk job
“similar but independently” means the fellow planned to conduct x-risk relevant research on their own but without an organization to do it under and with any mentorship (rare) self-organized. Several people giving answers under this category mentioned FTX regrantor grants.
“studying” encompasses x-risk -relevant up-skilling as well as studying for e.g. university courses and specifically implies no mention of intent to do original research.
“nothing” means “nothing related to careers or x-risk”. It includes holidays, travel, rest, and hobbies.
“other EA job” excludes the more specific cases of “other ERI” and “other x-risk job”.
“existing research” means the fellow was planning to continue a research project they were already working on.
“tech job” and “internship” exclude x-risk -relevant or within-EA versions.
“other research” excludes EA or x-risk relevant research.

Another “question” on the first survey said “Briefly outline the career plans/options that you are considering.” Again, the categorization was based on my classification of fellow answers into (potentially multiple) categories rather than asking fellows to select from options, and fellows who gave many answers had their “vote” split evenly as described above.

The top next career options ERI fellows are considering are research roles (including many specifically planning to do a PhD, but many also not having decided on specific area of research), technical AI alignment, and EA/x-risk community-building.

SERI seem less likely to note general interest in some unspecified research career, and more likely to note more specific things instead.

Some notes on categories:

“broad research” and “broad policy” were used when fellows mentioned research or policy careers, and in some cases indicated interest in x-risk -relevant things but without mentioning anything more specific than “x-risk”.
“technical alignment” means technical AI alignment.
“PhD” was included whenever the fellow specifically indicated doing a PhD as a next step.
“community-building”, “grant-making”, and “operations” refer to x-risk / EA versions of those things.

This question was not repeated at the end to see change. It probably should’ve been. However, the next section hints change might be surprisingly low.

Self-estimated probability of pursuing an x-risk career

The exact phrasing of the question was:

What do you think is the probability (as a number between 0 and 1) that you will spend at least 5 years of your life working in a job that is closely related to x-risk mitigation?

The “5 years of your life” part was to anchor people to thinking about what they might concretely spend their working time on rather than the more abstract and less-defined concept of having a “career”.

The graphs below show the distribution of answers in the beginning and end survey:

Linking people using the anonymous unique identifier, we can also see the trends for individual participants (averages for each program shown as dots):

The average ERI summer fellow is already fairly set on an x-risk career (assuming their probability estimates are at least somewhat well-calibrated). Somewhat strikingly, the probability of pursuing an x-risk career does not increase throughout the program. Perhaps this is because of ceiling effects; probabilities can’t go much higher than the starting 0.8 after all.

The CHERI fellows seem to be both less committed to x-risk careers overall (just about one standard deviation below CERI/SERI fellows), and to see larger changes over the course of the program. Note how four CHERI fellows saw significant drops in their assigned probability, while the greatest increase also came from a CHERI fellow.

One potentially relevant feature of the CHERI fellowship was a significantly smaller number of applicants (~80, compared to 325 for SERI and ~650 for CERI). Note that, apart from Facebook advertising, CHERI did use fairly targeted advertising methods, like EA Slacks, personal messages, and the 2022 SERI conference. Therefore it is possible that either a large pool is needed to get x-risk committed applicants even when using targeted advertising channels, CHERI is generally interfaced to less-committed parts of EA space, or SERI and CERI weighted (proxies for) commitment to x-risk careers more heavily.

The relatively low changes in x-risk career commitment suggest that ERI programs (with the possible exception of CHERI) do not generally cause fellows to update very much on their commitment to an x-risk career.

Cause areas

Cause area ranking

At the beginning and end of the survey, we asked fellows to “Please rank what you think are the greatest existential risks to human civilisation (top few are enough).” I went through the free-form lists, randomizing in case of ties and sometimes merging very similar categories (in hindsight, this should not have been a free-form text question).

The top-ranked cause area at the beginning and end was:

AI clearly dominates, and climate falls over the course of the program.

We can get a sense of overall prioritization by scoring the $n$ th ranked cause area as $1 / n$ points. At the beginning and end of the survey respectively, this suggests the following ranking:

The same pattern is present as in the top-only graph, though less extreme.

The dominance of the “big four” (AI, bio, nuclear, climate) might be influenced by the way in which these were the main cause areas identified by the programs.

Participants by cause area

The number of participants working on each cause area, split by organization, was as follows:

Note that since the survey was not taken by everyone, this is missing some people.

Perceived skills and skill changes

We asked fellows to rate on a 1-10 scale “How comfortable would you feel undertaking a significant research project?” at both the beginning and the end of the program. The definition of “significant research project” given was

“Significant research project” means something of at least one year duration, where you have access to a mentor but a need to generate significant ideas/research on your own. Examples include research-oriented Master’s programs, PhDs, think-tank roles, and long research internships.

Comfort with pursuing research projects remains effectively constant. Interestingly, there seem to be two contrasting patterns: many fellows increasing slightly, and a few decreasing significantly. It seems that a fair number of participants start and stay confident, most improve a bit, and some perhaps have a rude awakening to the realities of research. Major declines are causes for concern, and may suggest ERI programs should be mindful about the possibility of fellows being demotivated or burned out by struggling.

At the end, to see if fellows think they missed out on research up-skilling by not doing a more traditional research internship, we also asked “How much do you think the programme helped you develop your research skills, on a scale of 1-10 where 5 is the level of research skill gain you’d expect from a comparable-length research internship in academia?”.

This gives a nicely shaped distribution that confirms fellows in all programs think they’ve gotten roughly as much value as they would from a more established research internship in academia. Of course, it is unclear if fellow perceptions are accurate—it would be interesting to see this broken down by whether or not the fellows have prior research experience.

We also asked the comfort question but for entrepreneurial projects. ERIs are research programs, so asking about another thing as well serves as a control. It also provides some information about how strongly ERI participants think their skills lean to research specifically, and how comfortable they’d be with running projects. The definition given was

“Significant entrepreneurial project” includes things like starting an organisation in the Effective Altruism space or another startup/non-profit or running a major event/camp (e.g. conference / retreat / educational program)

As expected, fellows admitted for interest and skill at research are more comfortable with research than entrepreneurship. Interestingly, the increase in perceived comfort with entrepreneurial projects is larger for every org than that for research. Perhaps the (mostly young) fellows generally just get slightly more comfortable with every type of thing as they gain experience.

However, this is additional evidence that ERI programs are not increasing fellows’ self-perceived comfort with research any more than they increase fellows’ comfort with anything. It would be interesting to see if mentors of fellows think they have improved overall; it may be that changes in self-perception and actual skill don’t correlate very much.

We also asked fellows in the final survey the following question:

Do you think you are better at critical thinking than at the start (e.g. more likely to notice fallacies/biases in yourself, better calibrated at estimating probabilities, more able to think quantitatively, more able to judge whether a claim has substance)? Feel free to comment

I categorized the responses into three categories:

Some cheeky fellows commented something like “no, but mostly because they were already at a high starting point”.

Mentors and project ideas

“How useful do you think your mentor was?”, where 1 = “not very useful” (phrasing this euphemistically was a mistake and might have influenced the results) and 10 = “extremely useful”, and “How friendly/pleasant did you find your interactions with your mentor?”, where 1 = “very unfriendly and unpleasant” and 10 = “very friendly and pleasant”:

Mentor usefulness shows an interesting two-humped pattern. This may correspond to an actual two-humped distribution, or to whether people read the misleadingly-euphemistic label for what “1” means or not. However, in general mentors do seem to be useful. Practically everyone also found their mentor interactions pleasant and friendly, though outliers exist.

SERI and CERI are over-represented among fellows giving a ¹⁰⁄₁₀ for their mentor on both categories. The average for CHERI also ends up being slightly weaker in both categories, though the variance is also high.

Project idea source

In the second survey, fellows were asked:

What percentage of the project idea would you attribute to yourself, your mentor, and others? Answer as e.g. the ratio 30:60:10 if you think you contributed 30%, your mentor 60%, and others 10%

The following graph shows fellows’ self-perceived contribution to the idea on the x-axis, mentor contribution on the y-axis, and the contribution of others as distance from the point to the grey line:

Essentially every SERI fellow attributes less than 50% of the idea to themselves, whereas most CERI fellows attribute more than 50% to themselves. There also exists a clear minority that got their project idea mostly from a source other than themselves or their mentor.

Greatest value adds

Fellows were asked “What do you think were the most valuable parts of the programme?” and given a series of options to select from (with an “add other” button available). Answers were split (as above) so that a fellow ticking $n$ boxes was counted as having given $+ 1 / n$ score to each of them. The scores for each answer were as follows:

Networking, learning to do research, and becoming a stronger candidate for academic (but not industry) jobs top the list of what participants found most valuable.

Enjoyment of program

Recall that CERI was almost entirely fully in-person, CHERI almost entirely partly in-person, and SERI had an almost even split between all levels.

The big results from the above are:

Fellows in general enjoyed the programs a lot; 8-9 out of 10 is a high level.
CERI was just about one standard deviation and 1 point on the scale higher than the other programs.
Fully in-person was similarly better compared to the other options, especially fully remote. This may have been the driving factor behind the above point.

The above graph shows mostly the same information, but allows us to look at individual trajectories (note that many of the lines from survey 2 to survey 3 are overlapping; you can roughly guess number from opacity level):

This mostly reveals that there may be a slight trend of enjoyment being higher at the end of the program, and confirming the (small) differences between the ERIs existed also in the middle of the program.

Connections and sense of community

Fellows were asked “How much did you feel part of the same team/community/program with the other fellows in your cohort?” where 1 = “Not at all; felt a complete outsider” and 10 = “Extremely; felt like I had found my people”. Below are the results, split both by org and whether or not the fellow worked in-person or not.

Fully remote fellows felt like significantly less part of the same community/team as the other fellows. Partly and fully in-person fellows had comparable (high) averages and distributions.

One of the theories CHERI wanted to test with their program this summer was whether being partly in-person gets most of the advantage of being fully in-person. This seems like moderate evidence in favor of this being true for fellow sense of belonging.

Two other questions ran:

How many of the participants in your programme would you feel comfortable asking for a career-related favor? (e.g. an introduction to one of their contacts, their advice on applying to an organisation, proofreading an EA Forum post)

With how many fellows from the programme do you expect to maintain contact with after the end of the fellowship?

The disadvantage of fully remote fellows extended to finding contacts who they might ask for help with their careers, and maintaining contact with fewer fellows. Partly and fully in-person fellows had similar experiences on both fronts.

Gender

The graphs below give the gender breakdown by organization and cause area:

Women are underrepresented in ERI programs (they make up 14-30% depending on ERI).

This seems especially acute in technical AI safety. CERI seems to have done the best in recruiting women and SERI the worst. However the numbers are low and not everyone filled in the survey, so the last two should be interpreted cautiously.

Thankfully, there do not seem to be gender-based differences in enjoyment, feeling of belonging, or number of fellows they could ask for a career-related favor. The latter two are graphed below:

Teams

A few fellows worked in teams:

Quite a few would like to: (the colors are by answer to whether or not the fellow’s project was a team project)

The exact questions asked in the above graph were:

How do you feel about the effect of working individually vs in a team on your [research output / enjoyment of the program]? (compare with what you think the counterfactual scenario where the opposite was the case held)

Many fellows think working in a team would increase their research output and enjoyment of the program. Many had mixed or no strong preferences on the matter. No one who worked on a team thought they would have been more productive or had more fun had they worked individually.

It seems very worthwhile to have more fellows working in teams.

Organizational problems

One way ERIs could fail is if fellows had to spend hours sorting out organizational messes because their ERI was in some way incompetent at operations. To see if this was the case, we asked fellows:

How many hours do you estimate you lost to organisational problems (i.e. instances where your work was affected because of issues coming from the organisation, rather than your mentor / being stuck yourself on ideas / etc.)?

The results are plotted here (note the log scale):

We see that there are a few extreme outliers. One CERI fellow claimed they had lost 200 hours to such organisational problems (interestingly, though they though their research skill gain was ¹⁄₁₀, they rated their overall enjoyment of the program as ⁷⁄₁₀). The three CHERI fellows who gave answers in the 30-100 hour range all also seemed to otherwise enjoy the program. It is possible that the question was not clear enough and some read it as “how many hours could you have saved it if the organisation and circumstances were prefect”, or missed the specification of “organisational” as “issues coming from the organisation” (this was admittedly badly phrased).

Salary

Every fellow except two were happy with the salary, and many commented that it was more than sufficient, or more than what they expected, or “outstanding”.

Fellows were paid the equivalent of 6-7k$. CHERI greatly increased their initial salary plan to match SERI and CERI. CERI paid fellows much more than last year.

Planning fallacy?

The first survey asked fellows to “What is the probability, as a number between 0 and 1, that you would assign to you finishing your project during the programme?” where we defined “Finishing your project” as “[...] completing the success criteria specified at the start if you have any, and otherwise completing what feels to you like a complete piece of work that you are happy with.”. The last survey asked fellows whether they had finished, and I classified each free-form text answer as either “yes”, “basically” (mostly finished but with some non-trivial dangling threads), or “no”.

Conflict of interest note

I’ve been involved with CERI since September 2021, including helping design the CERI SRF application process. However, I was not involved in day-to-day running of the SRF program.

Acknowledgements

The idea for JERIS was born during a 1-on-1 meeting with CHERI founder Naomi Nederlof during EAG London 2022.

Thanks to Tobias Häberli, Sage Andrus Bergerson, and Herbie Bradley for help with choosing questions and creating the surveys. Thanks to Tobias Häberli, Sage Andrus Bergerson, and Hannah Erlebach for organizing time in their respective programs for fellows to fill in the surveys, and suffering through my multiple requests to further prod fellows to fill in the survey.

Thanks to the 68 distinct ERI fellows who filled in at least one survey, and especially to the 20 feedback form warriors who completed all three surveys. We know you’ve had to fill in a lot of forms, and we appreciate you doing your part to appease Azagorg the Ravenous, Patron Deity of Feedback Forms.

*LONG LIVE FEEDBACK FORMS! LONG LIVE AZAGORG!*

What links here?