AI governance/grantmaking. Formerly at the Center for AI Safety and Yale EA organizer.
ThomasW
There are definitely a lot of selection effects prior to us making our selection. I think what we are trying to say is that our selections based on interview scores were not very helpful. Perhaps, they would be helpful if our system worked very differently (for instance, if just interviewed anyone who put down their email). But it seems like with the selection effects we had (have to make an effort to fill out the application, do a small amount of background reading, schedule and show up to an interview) we arrived at a place where our interview scoring system didn’t do a good job further narrowing down the applicants.
We definitely do not mean to say that other groups definitively shouldn’t be selective, or even shouldn’t be selective using our criteria. We just don’t have the evidence to suggest that our criteria were particularly helpful in our case, so we can’t really recommend it for others.
Broadly, I agree with your points. You’re right that we don’t care about the relationship in the subpopulation, but rather about the relationship in the broader population. However, there are a couple of things I think are important to note here:
As mentioned in my response on range restrictions, in some cases we did not reject many people at all. In those cases, our subpopulation was almost the entire population. This is not the case for the NBA or GRE examples.
Lastly, possibly more importantly: we only know of maybe 3 cases of people being rejected from the fellowship but becoming involved in the group in any way at all. All of these were people who were rejected and later reapplied and completed the fellowship. We suspect this is both due to the fact that the fellowship causes people to become engaged, and also because people who are rejected may be less likely to want to get involved. As a result, it wouldn’t really make sense to try to measure engagement in this group.
In general, we believe that in order to use a selection method based on subjective interview rankings—which are very time-consuming and open us up to the possibility of implicit bias—we need to have some degree of evidence that our selection method actually works. After two years, we have found none using the best available data.
That being said—this fall, we ended up admitting everyone who we interviewed. Once we know more about how engaged these fellows end up being, we can follow up with an analysis that is truly of the entire population.
- 20 Jul 2021 4:22 UTC; 2 points) 's comment on There will now be EA Virtual Programs every month! by (
I agree, I do not think I would say that “we have evidence that there is not a strong relation”. But I do feel comfortable saying that we do not have evidence that there is any relation at all.
The 95% confidence intervals are extremely wide, given our small sample sizes:
Spring 2019: −0.75 to 0.5 (95th) and −0.55 to 0.16 (75th)
Fall 2019: −0.37 to 0.69 and −0.19 to 0.43
Spring 2020: −0.67 to 0.66 and −0.37 to 0.37
Summer 2020: −0.60 to 0.51 and −0.38 to 0.26
The upper ends are very high, and there is certainly a possibility that our interview scoring process is actually good. But, of the observed effects, two are negative, and two are positive. The highest positive observed correlation is only 0.10.
To somebody who has never been to San Francisco in the summer, it seems reasonable to expect it to rain. It’s cloudy, it’s dark, and it’s humid. You might even bring an umbrella! But, after four days, you’ve noticed that it hasn’t rained on any of them, despite continuing to be gloomy. You also notice that almost nobody else is carrying an umbrella; many of those who are are only doing so because you told them you were! In this situation, it seems unlikely that you would need to see historical weather charts to conclude that the cloudy weather probably doesn’t imply what you thought it did.
This is analogous to our situation. We thought our interview scores would be helpful. But it’s been several years, and we haven’t seen any evidence that they have been. It’s costly to use this process, and we would like to see some benefit if we are going to use it. We have not seen that benefit in any of our four cohorts. So, it makes sense to leave the umbrella at home, for now.
Hi Mauricio! More details are in the post linked at the top: https://forum.effectivealtruism.org/posts/N6cXCLDPKzoGiuDET/yale-ea-virtual-fellowship-retrospective-summer-2020#Selectiveness
It doesn’t seem (unlike some other places) that Redwood is directly trying to create AGI, so value will have to come from the techniques being used by other labs. Assuming Redwood finds some promising techniques, how does Redwood plan to influence the biggest research labs that are working towards AGI? Do you hope for your techniques to be useful enough to AGI research that labs adopt them anyway? Do you want to heavily evangelize your techniques in publications/the press/etc.? Or do you expect the work of persuading the biggest players to be better done by somebody else?
This is great! Do you have a breakdown of the total number of FTEs for each focus university (rather than just those that were approved recently)? I think this would be useful for people to understand how much the groups are staffed.
I definitely don’t mean to say that classes shouldn’t have secondary sources; they should and these sources are important (I am less excited about tertiary sources). I think a key to primary sources is something like the ability to read current sources as primary sources. If you develop the skills to be able to understand primary sources in the context of history, it helps enable you to be able to evaluate primary sources of today. I see history as a good way to learn how to evaluate the world at present, and the world at present has more primary than secondary sources about it.
Classes are often not the most efficient way to learn things. History is certainly no different, and I think the idea of history modules sounds very interesting. That being said, I wrote this post mainly for undergrads who have to operate within the boundaries of classes to some extent.
I appreciate this point a lot! I think the counterfactual value of taking history classes is pretty hard to generalize across university students because everyone has different tradeoffs. Some students might have more value from taking other kinds of classes, even other kinds of “padding” classes. Good candidates might be CS, economics, philosophy, math, and maybe a writing class. My general sense is that the value of those classes are more well known in EA (e.g. I see many people majoring in the first four) and probably don’t need an explanation. I think history might need more of an explanation, which is why I offered one here. In general, I do agree that people should be thinking about this counterfactually, but I think the outcome would be very dependent on the individual student.
I’ve always thought the name “fellowship” was misleading. But that seems like an argument to change the name, not really to pay people.
You do lay out some plausible arguments for how paying fellows could be good. As Michael mentions in another thread, Penn EA paid their fellows last term. I think the most useful evidence for or against this idea would be a writeup from them about how well it worked and what kind of people it attracted.
Also, this is definitely the kind of thing that you should preclear with funders prior to trying; it is not included in CEA’s list of common expenses.
- 3 Jan 2022 17:21 UTC; 3 points) 's comment on Why Student Groups Should Run Summer Fellowships by (
They are so commonly called fellowships that calling them something else in this post would complicate the message.
I haven’t changed my mind, and probably somebody should write a post about why we shouldn’t call them that so we can discuss this further. This unfortunately is not currently high on my priority list, but I may do it at some point.
Glad you enjoyed the post!
This idea makes sense: disruptions to one’s everyday routine are novel and so, perhaps, more likely to make one open-minded (e.g., to ~weird~ ideas like ‘maybe we should work on making sure we don’t die because of something that doesn’t exist yet’)
This definitely seems true to me, and is representative of my experience.
Anecdote about house culture as examples of non-EA GITV:
These examples are great, and I think illustrate the point really well! Common sets of unusual/quirky activities/traditions are often great for group bonding.
I think this comment is really good and that it better articulates one idea I was trying to get at with the original post. Thank you!
LOL, Simon, I love this.
In the section about why people don’t (and shouldn’t) get in the van, I forgot to think about car sickness!
Hope you’re well too, we’ll have an alumni event later this semester, you should come!
Ethics Education
Values and Reflective Processes
Over the next century, leaders will likely have to make increasingly high-stakes ethical decisions. In democratic societies, large numbers of people may play a role in making those decisions. And yet, ethics is seldom thoroughly taught in most educational curricula. While it may be covered briefly in secondary school and is covered in detail at university for those who attend and choose to study it, many accomplished people do not have even a superficial understanding of the most important ethical theories and arguments for and against them. We think that better knowledge of ethics might enable people to behave more ethically, and better understand the limitations of commonsense morality: for instance, that it typically neglects people in the far future. We’d like to see projects that aim to increase high fidelity knowledge of ethics in high-leverage ways, such as the creation of high-quality standardized curricula or promotion of existing ethics courses to large audiences online.
Some of this has been said in threads above, but I don’t think that upvotes are a very good way of knowing what the forum thinks. People are definitely not reading this whole thread and the first posts they see will likely get all of their attention.
On top of that, I do not expect forum karma to be a good indicator of much even in the best case. People tend to upvote what they can understand and what is interesting and useful to them. I suspect what the average EA forum user finds useful and interesting is probably only loosely related with what a large EA grantmaker should fund. For instance, in general good writing is a very good way to get upvotes, but that doesn’t correlate much with the strength of the ideas presented.
I like this post, thanks Thomas!
I want to make a comment for maybe newer people especially with some of the uses of the word “EA” here. I’ll take an example to illustrate: “People who are not totally dedicated to EA will...”
I actually think this means (or if it doesn’t, it should mean), “people who are not totally dedicated to impartially maximizing impact as defined under a plausible moral theory [not the point of this to debate which are plausible] will...” or something like that. In other words, “people who are not totally dedicated to the basic principles of EA”.
It doesn’t (or shouldn’t) mean “people who are not totally dedicated to the EA community” or something else that might imply only working at an EA-branded org, only having EA friends, or only working on a cause area that some proportion of EAs think is worthwhile. The EA community is probably a good way to find multipliers and a useful signal for what is valuable, but it is not the final goal at all and doesn’t have all the answers.
I could imagine some case in which it makes sense to do something “less EA” (in the sense that fewer people in EA think it’s valuable) because it’s actually “more EA” (in the sense that it’s actually more valuable for maximizing impact). The point of this example isn’t to establish how likely this is, just to point out that the final goal is maximizing impact, not EA the community, and that “more EA/less EA” is a bit ambiguous.
This might be totally obvious to most readers of this comment, but I wanted to write it anyway just in case there are people who don’t find it obvious (or it isn’t at all obvious, or not what Thomas meant).
One thing you should consider is that most of the impact is likely to be at the tails. For instance, the distribution of impact for people is probably power-law distributed (this is true in ML in terms of first author citations; I suspect it could be true for safety specifically). From your description, it seems like you might be more likely to end up in the tail of ability for quantum computing, if one of the best quantum computing startups is trying to hire you. You don’t say that some of the top AI safety orgs are trying to hire you.
Then you have to consider how useful quantum algorithms are to existential risk. Just because people don’t talk about that subject doesn’t mean it’s useless. How many quantum computing PhDs have you seen on the EA forum or met at an EA conference? You are the only one I’ve met. As somebody with unique knowledge, it’s probably worth a pretty significant chunk of time thinking about how it could possibly fit in, getting feedback on your ideas, sharing thoughts with the community, etc.
Then you have to think about how likely quantum computing is likely to make you really rich (probably through equity, not salary) in a period of time where it will matter (e.g. being rich in 5 years is very different from being rich in 50 years).
I think if it’s completely useless for existential risk and is extremely unlikely to make you rich, probably worth pivoting. But consider those questions first, before you give up the chance to be one of the (presumably) very few professional quantum computing researchers in the world.
Also, have you considered 80k advising?
If the community has so much money, and we believe this is such an important problem, why can’t we just hire/fund world experts in AI/ML to work on it?
Food for thought: LeCun and Hinton both hold academic positions in addition to their industry positions at Meta and Google, respectively. Yoshua Bengio is still in academia entirely. Do you think that tech companies haven’t tried to buy every minute of their attention? Why are the three pioneers of deep learning not all in the highest-paying industry job? Clearly, they care about something more than this.
Yes, this is definitely a concern, for some cohorts more than others. Here are the number of people we interviewed each cohort:
Fall 2018: 37
Spring 2019: 22
Fall 2019: 30
Spring 2020: 22
Summer 2020: 40
So for Fall 2018 and Summer 2020, I think the case can be made that the range restriction effects might be high (given we have admitted ~15 fellows). For the Spring fellowships, we admitted the majority of applications and thus there should be more differentiation in the predictor variable.