Center for AI Safety and Yale EA organizer
There is often a clash between “alignment” and “capabilities” with some saying AI labs are pretending to do alignment while doing capabilities and others say they are so closely tied it’s impossible to do good alignment research without producing capability gains.
I’m not sure this discussion will be resolved anytime soon. But I think it’s often misdirected.
I think often what people are wondering is roughly “is x a good person for doing this research?” Should it count as beneficial EA-flavored research, or is it just you being an employee at a corporate AI lab? The alignment and capabilities discussions often seem secondary to this.
Instead, think we should stick to a different notion: something is “pro-social” (not attached to the term) AI x-risk research if it’s research that (1) has a shot of reducing x-risk from AI (rather than increasing it or doing nothing) and (2) is not incentivized enough by factors external to the lab, to pro-social motivation, and to EA (for example: the market, the government, the public, social status in silicon valley, etc.)
Note (1) should include risks that the intervention changes timelines in some negative way, and (2) does not mean the intervention isn’t incentivized at all, just that it isn’t incentivized enough.
This is actually similar enough to the scale/tractability/neglectedness framework but it (1) incorporates downside risk and (2) doesn’t run into the problem of having EAs want to do things “nobody else is doing” (including other EAs). EAs should simply do things that are underincentivized and good.
So, instead of asking things like, “is OpenAI’s alignment research real alignment?” ask “how likely is it to reduce x-risk?” and “is it incentivized enough by external factors?” That should be how we assess whether to praise the people there or tell people they should go work there.
Note: edited “external to EA” to “external to pro-social motivation and to EA”
I expect that if plant based alternatives ever were to become as available, tasty, and cheap as animal products, a large proportion of people and likely nearly all EAs would become vegan. Cultural effects do matter, but in the end I expect them to be mostly downstream of technology in this particular case. Moral appeals have unfortunately had limited success on this issue.
Thanks for sharing your perspective, it’s useful to hear!
I don’t think the orthogonality thesis is correct in practice, and moral antirealism certainly isn’t an agreed upon position among moral philosophers, but I agree that point 17 seems far fetched.
This post lines up with my outsider perspective on FHI, and it seems to be quite measured. I encourage anyone who thinks that Bostrom is really the best leader for FHI to defend that view here (anonymously, if necessary).
We should also celebrate the politicians and civil servants at the European Commission and EU Food Agency for doing the right thing. Regardless of who may have talked to them, it was ultimately up to them, and so far they’ve made the right choices.
A suggestion I’m throwing out just for consideration: maybe create a specific section on the frontpage for statements from organizations. I don’t think there are that many organizations that want to make statements on the EA forum, but they usually seem pretty worth reading for people here. (Often: something bad happened, and here’s our official stance/explanation).
A downside could be that this means organizations can be more visible than individuals about community matters. That seems possibly bad (though also how it usually works in the broader world). But it seems worse for the forum moderators to arbitrarily decide if something is important enough to be displayed somehow.
[MLSN #8]: Mechanistic interpretability, using law to inform AI alignment, scaling laws for proxy gaming
Agreed. My sense is that much of the discomfort comes from the tendancy that people have to want to have their career paths validated by a central authority. But that isn’t the point of 80k. The point of 80k is to direct people towards whatever they think is most impactful. Currently that appears to be mostly x-risk.
If you meet some of the people at places like 80k and so forth, I think it’s easier to realize that they are just people who have opinions and failings like anyone else. They put a lot of work into making career advising materials, and they might put out materials that say that what you are doing is “suboptimal.” If they are right and what you’re doing really is clearly suboptimal, then maybe you should feel bad (or not; depends on how much you want to feel bad about not maximizing your altruistic impact) . But maybe 80k is wrong! If so, you shouldn’t feel bad just because some people who happen to work at 80k made the wrong recommendation.
Yes, I think it’s impossible not to have norms about personal relationships (or really, anything socially important). I should perhaps have provided an example of this. Here is one:
If you move to a new place with a lot of EAs, you will likely at some point be asked if you want to live in a large group house with other EAs. These group houses are a norm, and a lot of people live in them. This is not a norm outside of EA (though it is in maybe some other communities), so it’s certainly a positive norm that has been created.
Even if EAs tended to live overwhelmingly in smaller houses, or lived with people who weren’t EAs, then that would just be another norm. So I really don’t think there is a way to escape norms.
I appreciate this post. But one mistake this post makes, which I think is an extremely common mistake, is assuming that there can exist a community without (soft) norms.
Every community has norms. It is impossible to get out of having norms. And so I don’t think we should be averse to trying to consciously choose them.
For example, in American society it is a norm to eat meat. Sometimes this is in fact because people actively are trying to get you to eat meat. But mostly, nobody is telling other people what to eat -- people are allowed to exercise their free choice (though animals aren’t). But this norm, while freeing for some, is constricting for others. If I go to a restaurant in many places, there won’t be much good vegetarian food. In some places, there is a norm to have vegetarian food. But there is no place where there is no norm: in some places, there is a norm to have it, and in others, there is a norm not to have it. The norms can be stronger or weaker but there is no place without norms.
Currently, there is the non-coercive but “soft” norm in EA that young people interested in AI safety research will go to Berkeley. The post you link is an example of that. People are being actively encouraged to go to Berkeley. They are being paid specifically to go to Berkeley in some cases. For the reasons you give, this could potentially be really good, but the comments on that post also give reasons why it might not be!
You gave the following reason why norms are often not so good:
they prevent people from doing harmless things that they want to do.
This is true. But one could just as easily say of other norms:
they encourage people to do slightly harmful things they wouldn’t otherwise want to do.
they fail to discourage people from doing harmful things that they want to do.
The “default norm” is what the community happened to settle on. But it is a norm as much as any other. And it isn’t necessarily the best one.
I certainly didn’t mean to imply that if you don’t have one of those bullet points, you are going to be “blacklisted” or negatively affected as a result of speaking your mind. They just seemed like contributing factors for me, based on my experience. And yeah, I agree different people evaluate differently.
Thanks for sharing your perspective.
There is this, but I agree it would be good if there was one that were substantially more detailed in describing the process.
(You are probably getting downvotes because you brought up polyamory without being very specific about describing exactly how you think it relates to why Open Phil should have a public COI policy. People are sensitive about the topic, because it personally relates to them and is sometimes conflated with things it shouldn’t be conflated with. Regardless, it doesn’t seem relevant to your actual point, which is just that there should be a public document.)
I have not (yet) known myself to ever be negatively affected for speaking my mind in EA. However, I know others who have. Some possible reasons for the difference:
My fundamental ethical beliefs are pretty similar to the most senior people.
On the EA Forum, I make almost extreme effort to make tight claims and avoid overclaiming (though I don’t always succeed). If I have vibes-based criticisms (I have plenty) I tend to keep them to people I trust.
I “know my audience:” I am good at determining how to say things such that they won’t be received poorly. This doesn’t mean “rhetoric,” it means being aware of the most common ways my audience might misinterpret my words or the intent behind them, and making a conscious effort to clearly avoid those misinterpretations.
Related to the above, I tend to “listen before I speak” in new environments. I avoid making sweeping claims before I know my audience and understand their perspective inside and out.
I’m a techy white man working in AI safety and I’m not a leftist, so I’m less likely to be typed by people as an “outsider.” I suspect this is mostly subconscious, except for the leftist part, where I think there are some community members who will consciously think you are harmful to the epistemic environment if they think you’re a leftist and don’t know much else about you. Sometimes this is in a fair way, and sometimes it’s not.
I’m very junior, but in comparison to even more junior people I have more “f*** you social capital” and “f*** you concrete achievements you cannot ignore”.
To be very clear: I am not saying “this can never be changed.” I am saying that it would require changing the EA social scene—that is, to somehow decentralize it. I am not sure how to do that well (rather than doing it poorly, or doing it in name only). But I increasingly believe it is likely to be necessary.
The problem goes beyond guardrails. Any attempts to reduce these conflicts of interest would have to contend with the extremely insular social scene in Berkeley. Since grantmakers frequently do not interact with many people outside of EA, and everyone in EA might end up applying for a grant from Open Phil, guardrails would significantly disrupt the social lives of grantmakers.
Let’s not forget that you can not just improperly favor romantic partners, but also just friends. The idea of stopping Open Phil from making grants to organizations where employees are close friends with (other) grantmakers is almost laughable because of how insular the social scene is—but that’s not at all normal for a grantmaking organization.
Even if Open Phil grantmakers separated themselves from the rest of the community, anyone who ever wanted to potentially become a grantmaker would have to do so as well because the community is so small. What if you become a grantmaker and your friend or romantic partner ends up applying for a grant?
In addition, many grants are socially evaluated at least partially, in my experience. Grantmakers have sometimes asked me what I think of people applying for grants, for example. This process will obviously favor friends of friends.
As such, the only way to fully remove conflicts of interest is likely to entail significant disruptions to the entire EA social scene (the one that involves everyone living/partying/working with the same very small group of people). I think that would be warranted, but that’s another post and I recognize I haven’t justified it fully here.
These dynamics are one reason (certainly not the only one) why I turned down an offer to be a part time grantmaker, choose not to live in Berkeley, and generally avoid dating within EA. Even if I cannot unilaterally remove these problems, I can avoid being part of them.
- 8 Feb 2023 15:58 UTC; -6 points)'s comment on Why People Use Burner Accounts: A Commentary on Blacklists, the EA “Inner Circle”, and Funding by (
I’ve never felt comfortable in EA broadly construed, not since I encountered it about three years ago. And yet I continue to be involved to a certain extent. Why? Because I think that doing so is useful for doing good, and many of the issues that EA focuses on are sadly still far too neglected elsewhere. Many of the people who come closest to sharing my values are in EA, so even if I didn’t want to be “in EA,” it would be pretty difficult to remove myself entirely.
I also love my university EA group, which is (intentionally, in part by my design, in part by the design of others) different from many other groups I’ve encountered.
I work in AI safety, and so the benefit of staying plugged into EA for me is probably higher than it would be for somebody who wants to work in global health and development. But I could still be making a (potentially massive) miscalculation.
If you think that EA is not serving your aims of doing good (the whole point of EA), then remember to look out the window. And even if you run an “EA” group, you don’t need to feel tied to the brand. Do what you think will actually be good for the world. Best of luck.
“Living expenses while doing some of my early research” is one of the main purposes of the LTFF; to me Atlas feels like a roundabout way of getting that. LTFF asks you to have a specific high-impact project or educational opportunity for you to pursue, but as far as I know that wasn’t true of Atlas.
I think The Century Fellowship would make a better comparison to the Thiel Fellowship than Atlas would. It seems aimed at similar types to the Thiel Fellowship (college age people who are prepared to start projects and need to be financially independent to do so), while Atlas targets a slightly younger demographic and gives scholarships.
Atlas is posed as a talent search and development program, so I think any evaluation of Atlas should focus on how well it is searching for and developing talent that would not otherwise exist. I personally don’t know anything about how that has been turning out, or what the graduates have done/are doing with the money, so I don’t feel very qualified to evaluate it myself.
(More of a meta point somewhat responding to some other comments.)
It currently seems unlikely there will be a unified AI risk public communication strategy. AI risk is an issue that affects everyone, and many people are going to weigh in on it. That includes both people who are regulars on this forum and people who have never heard of it.
I imagine many people will not be moved by Yudkowsky’s op ed, and others will be. People who think AI x-risk is an important issue but who still disagree with Yudkowsky will have their own public writing that may be partially contradictory. Of course people should continue to talk to each other about their views, in public and in private, but I don’t expect that to produce “message discipline” (nor should it).
The number of people concerned about AI x-risk is going to get large enough (and arguably already is) that credibility will become highly unevenly distributed among those concerned about AI risk. Some people may think that Yudkowsky lacks credibility, or that his op ed damages it, but that needn’t damage the credibility of everyone who is concerned about the risks. Back when there were only a few major news articles on the subject, that might have been more true, but it’s not anymore. Now everyone from Geoffrey Hinton to Gary Marcus (somehow) to Elon Musk to Yuval Noah Harari are talking about the risks. While it’s possible everyone could be lumped together as “the AI x-risk people,” at this point, I think that’s a diminishing possibility.