(Posting in a personal capacity unless stated otherwise.) I help allocate Open Phil’s resources to improve the governance of AI with a focus on avoiding catastrophic outcomes. Formerly co-founder of the Cambridge Boston Alignment Initiative, which supports AI alignment/safety research and outreach programs at Harvard, MIT, and beyond, co-president of Harvard EA, Director of Governance Programs at the Harvard AI Safety Team and MIT AI Alignment, and occasional AI governance researcher. I’m also a proud GWWC pledger and vegan.
tlevin
EU policymakers reach an agreement on the AI Act
Common-sense cases where “hypothetical future people” matter
(Even) More Early-Career EAs Should Try AI Safety Technical Research
University Groups Should Do More Retreats
With the caveat that this is obviously flawed data because the sample is “people who came to an all-expenses-paid retreat,” I think it’s useful to provide some actual data Harvard EA collected at our spring retreat. I was slightly concerned that the spending would rub people the wrong way, so I included as one of our anonymous feedback questions, “How much did the spending of money at this retreat make you feel uncomfortable [on a scale of 1 to 10]?” All 18 survey answerers provided an answer. Mean: 3.1. Median: 3. Mode: 1. High: 9.
I think it’s also worth noting that in response to the first question, “What did you think of the retreat overall?”, nobody mentioned money, including the person who answered 9 (who said “Excellent arrangements, well thought out, meticulous planning”). On the question “Imagine you’re on the team planning the next retreat, and it’s the first meeting. Fill in the blank: “One thing I think we could improve from the last retreat is ____”,” nobody volunteered spending less money; several suggestions involved adding things that would cost more money, including the person who answered 9, who suggested adding daily rapid tests. The question “Did participating in this retreat make you feel more or less like you want to be part of the EA community?” received mean 8.3, median 9, including a 9 from the person who felt most uncomfortable about the spending.
I concluded from this survey that, again, with the caveats for selection bias, the spending was not alienating people at the retreat, and especially not alienating enough to significantly affect their engagement with EA.
I just want to say I really like this style of non-judgmental anthropology and think it gives an accurate-in-my-experience range of what people are thinking and feeling in the Bay, for better and for worse.
Also: one thing that I sort of expected to come up and didn’t see, except indirectly in a few vignettes, is just how much of one’s life in the Bay Area rationalist/EA scene is comprised of work, of AI, and/or of EA. Part of this is just that I’ve only ever lived in the Bay for up to ~6 weeks at a time and was brought there by work, and if I lived there permanently I’d probably try to carve out some non-EA/AI time, but I think it’s a fairly common experience for people who move to the Bay to do AI safety-related things to find that it absorbs everything else unless you make a conscious effort not to. At basically all the social events I attended, >25% of the attendees worked in the same office I did and >25% of the people at any given time are talking about AI or EA. This has not been my experience even while doing related full-time work in Boston, Oxford, and DC.
Again, part of this is that I’ve been in Berkeley for shorter stints that were more work-focused. But yeah, I think it’s not just my experience that the scene is very intense in this way, and this amplifies everything in this post in terms of how much it affects your day-to-day experience.
Fwiw, seems like the positive performance is more censored in expectation than the negative performance: while a case that CH handled poorly could either be widely discussed or never heard about again, I’m struggling to think of how we’d all hear about a case that they handled well, since part of handling it well likely involves the thing not escalating into a big deal and respecting people’s requests for anonymity and privacy.
It does seem like a big drawback that the accused don’t know the details of the accusations, but it also seems like there are obvious tradeoffs here, and it would make sense for this to be very different from the criminal justice system given the difference in punishments (loss of professional and financial opportunities and social status vs. actual prison time).
Agreed that a survey seems really good.
I agree that we’re now in a third wave, but I think this post is missing an essential aspect of the new wave, which is that EA’s reputation has taken a massive hit. EA doesn’t just have less money because of SBF; it has less trust and prestige, less optimism about becoming a mass movement (or even a mass-elite movement), and fewer potential allies because of SBF, Bostrom’s email/apology, and the Time article.
For that reason, I’d put the date of the third wave around the 10th of November 2022, when it became clear that FTX was not only experiencing a “liquidity crisis” but had misled customers, investors, and the EA community and likely committed massive fraud, and when the Future Fund team resigned. The other features of the Third Wave (the additional scandals and the rise in public interest in AI safety due to ChatGPT, GPT-4, the FLI letter, the CAIS statement, and so on) took a few months to emerge, but that week seems like the turning point.
We’re working on making Boston a much better hub—stay tuned!
In addition to the biosecurity hub, advantages for Boston not listed in the Boston section include immediate proximity to two of the top 2/5/5 global universities (the only place on earth where two are within a mile of each other), an advantage both for outreach/community-building and for the “culture fit” aspects discussed in this post.
It’s also nearly ideally positioned between other EA hubs and mini-hubs:
Non-horrific distance in both time zone and flight to London (5 hours apart/6.5 hour flight) and San Francisco (3 hours apart/7 hour flight). Decent flight connectivity to Central Europe as well (though NYC is better for this).
Easy train ride to NYC (on which I am typing this comment!) and quick flights to NYC/DC.
Same time zone and 3.5 hour flight to Bahamas.
As your fellow Cantabrigian I have some sympathies for this argument. But I’m confused about some parts of it and disagree with others:
“EA hub should be on the east coast” is one kind of claim. “People starting new EA projects, orgs, and events should do so on the east coast” is a different one. They’d be giving up the very valuable benefits of living near the densest concentrations of other orgs, especially funders. You’re right that the reasons for Oxford and the Bay being the two hubs are largely historical rather than practical, but that’s the nature of Schelling points; it might have been better to have started in the East Coast (or somewhere temperate, cheap, cosmopolitan, and globally centrally located like Barcelona), but how are we going to all coordinate to move there? The options that come to mind (Open Phil, FTX, CEA, and/or others move there, or coordinate to do so together?) seem very costly — on the order of weeks or months of the entire organization’s time.
By the commonly held view that AI is by far the most important cause area, it’s fine that the Bay is an EA hub despite the tech industry being its only non-Schelling-point reason to be a hub.
For better or worse, Berkeley is also a hub for community-building now; tons of student organizers spent this summer there. Again, they go there for the recursive common-knowledge reason that other people will also be going there, so there’d have to be some (costly?) coordinated shift probably driven by a major org.
Seems slightly like cheating to count all those universities (or indeed all those cities) as part of the same hub. Oxford and London are way closer than any of Boston, DC, and NYC are to each other. It seems like a place can be a hub if it would be physically easy for any two people living in it to meet every week. Boston, NYC, and DC are not close enough to qualify. Pointing out the cause area networks that each of these cities have, and cumulatively counting them against the Bay “merely” having the AI industry, makes it seem more likely than it is that the entire East Coast could achieve the kind of Schelling status that Berkeley has. (Indeed, notably the Bay Area EA community is overwhelmingly located specifically in Berkeley, supporting the idea that physical proximity is very important.)
Generally I really like the East Coast lifestyle (insofar as it differs from the Bay’s) and am figuring out how to articulate it. Maybe it’s that people are a little more ironic. Maybe it’s that having to Uber basically everywhere in the Bay is dystopian. That being said, lots of EAs like the outdoors, and the East Coast is much worse than the Bay Area for hiking etc.
One thing that I like about Boston relative to the Bay is the relatively horizontal social/professional structure: it feels like, in the Bay, there’s a pretty clear status pyramid and a pretty clear line of who’s in the elite circle (access to the top workspaces), while it’s looser and chiller in Boston. But it seems like this results from the Bay being a major hub and Boston being less of a hub. E.g., once a certain office space opens in Cambridge, I expect some of these dynamics to reappear, and if Boston became as booming as Berkeley, I think a pyramid would likely start to become more apparent as well. (Sad.)
Suggestion for how people go about developing this expertise from ~scratch, in a way that should be pretty adaptable to e.g. the context of an undergraduate or grad-level course, or independent research (a much better/stronger version of things I’ve done in the past, which involved lots of talking and take-developing but not a lot of detail and publication, which I think are both really important):
Figure out who, both within the EA world and not, would know at least a fair amount about this topic—maybe they just would be able to explain why it’s useful in more context than you have, maybe they know what papers you should read or acronyms you should familiarize yourself with—and talk to them, roughly in increasing order of scariness/value of their time, such that you’ve at least had a few conversations by the time you’re talking to the scariest/highest-time-value people. Maybe this is like a list of 5-10 people?
During these conversations, take note of what’s confusing you, ideas that you have, connections you or your interlocutors draw between topics, takes you find yourself repeating, etc.; you’re on the hunt for a first project.
Use the “learning by writing” method and just try to write “what you think should happen” in this area, as in, a specific person (maybe a government agency, maybe a funder in EA) should take a specific action, with as much detail as you can, noting a bunch of ways it could go wrong and how you propose to overcome these obstacles.
Treat this proposal as a hypothesis that you then test (meaning, you have some sense of what could convince you it’s wrong), and you seek out tests for it, e.g. talking to more experts about it (or asking them to read your draft and give feedback), finding academic or non-academic literature that bears on the important cruxes, etc., and revise your proposal (including scrapping it) as implied by the evidence.
Try to publish something from this exercise—maybe it’s the proposal, maybe it’s “hey, it turns out lots of proposals in this domain hinge on this empirical question,” maybe it’s “here’s why I now think [topic] is a dead end.” This gathers more feedback and importantly circulates the information that you’ve thought about it a nonzero amount.
Curious what other approaches people recommend!
Hmm, interesting. My first draft said “under 1,000” and I got lots of feedback that this was way too high. Taking a look at your count, I think many of these numbers are way too high. For example:
FHI AIS is listed at 34, when the entire FHI staff by my count is 59 and includes lots of philosophers and biosecurity people and the actual AI safety research group is 4, and counting GovAI (where I work this summer [though my opinions are of course my own] and is definitely not AI safety technical research).
MIRI is listed at 40, when their “research staff” page has 9 people.
CSET is listed at 5.8. Who at CSET does alignment technical research? CSET is a national security think-tank that focuses on AI risks, but is not explicitly longtermist, let alone a hub for technical alignment research!
CHAI is listed at 41, but their entire staff is 24, including visiting fellows and assistants.
Should I be persuaded by the Google Scholar label “AI Safety”? What percentage of their time do the listed researchers spend on alignment research, on average?
Agree with Xavier’s comment that people should consider reversing the advice, but generally confused/worried that this post is getting downvoted (13 karma on 18 votes as of this writing). In general, I want the forum to be a place where bold, truth-seeking claims about how to do more good get attention. My model of people downvoting this is that they are worried that this will make people work harder despite this being suboptimal. I think that people can make these evaluations well for themselves, and that it’s good to present people with information and arguments that might change their mind. Just as “donate lots of your money to global charities” and “stop buying animal products” are unwelcome to hear and might be bad for you if you take them to extremes in your context but are probably good for the world, “consider working more hours” could be bad in some cases but also might help people learn faster and become excellent at impactful work, and we should be at least be comfortable debating whether we’re at the right point on the curve.
Setting aside the questions of the impacts of working at these companies, it seems to me like this post prioritizes the warmth and collegiality of the EA community over the effects that our actions could have on the entire rest of the planet in a way that makes me feel pretty nervous. If we’re trying in good faith to do the most good, and someone takes a job we think is harmful, it seems like the question should be “how can I express my beliefs in a way that is likely to be heard, to find truth, and not to alienate the person?” rather than “is it polite to express these beliefs at all?” It seems like at least the first two reasons listed would also imply that we shouldn’t criticize people in really obviously harmful jobs like cigarette advertising.
It also seems quite dangerous to avoid passing judgment on individuals within the EA community based on our impressions of their work, which, unless I’m missing something, is what this post implies we should do. Saying we should “be kind and cooperative toward everyone who is trying in good faith to reduce AI risk” kind of misses the point, because a lot of the evidence for them “trying in good faith” comes from our observations of their actions. And, if it seems to me that someone’s actions make the world worse, the obvious next step is “see what happens if they’re presented with an argument that their actions are making the world worse.” If they have responses that make sense to me, they’re more likely to be acting in good faith. If they don’t, this is a significant red flag that they’re not trustworthy, regardless of their inner motivations: either factors besides the social impact of their actions are dominating in a way that makes it hard to trust them, or their judgment is bad in a way that makes it hard to trust them. I don’t get this information just by asking them open-ended questions; I get it by telling them what I think, in a polite and safe-feeling way.
I think the norms proposed in this post result in people not passing judgment on the individuals working at FTX, which in turn leads to trusting these individuals and trusting the institution that they run. (Indeed, I’m confused at the post’s separation between criticizing the decisions/strategies made by institutions and those made by the individuals who make the decisions and choose to further the strategies.) If people had suspicions that FTX was committing fraud or otherwise acting unethically, confronting individuals at FTX with these suspicions—and forming judgments of the individuals and of FTX—could have been incredibly valuable.
Weaving these points together: if you think leading AGI labs are acting recklessly, telling this to individuals who work at these labs (in a socially competent way) and critically evaluating their responses seems like a very important thing to do. Preserving a norm of non-criticism also denies these people the information that (1) you think their actions are net-negative and (2) you and others might be forming judgments of them in light of this. If they are acting in good faith, it seems extremely important that they have this information—worth the risk of an awkward conversation or hurt feelings, both of which are mitigable with social skills.
Notes on nukes, IR, and AI from “Arsenals of Folly” (and other books)
This comment co-written with Jake McKinnon:
The post seems obviously true when the lifeguards are the general experts and authorities, who just tend not to see or care about the drowning children at all. It’s more ambiguous when the lifeguards are highly-regarded EAs.
It’s super important to try to get EAs to be more agentic and skeptical that more established people “have things under control.” In my model, the median EA is probably too deferential and should be nudged in the direction of “go save the children even though the lifeguards are ignoring them.” People need to be building their own models (even if they start by copying someone else’s model, which is better than copying their outputs!) so they can identify the cases where the lifeguards are messing up.
However, sometimes the lifeguards aren’t saving the children because the water is full of alligators or something. Like, lots of the initial ideas that very early EAs have about how to save the child are in fact ignorant about the nature of the problem (a common one is a version of “let’s just build the aligned AI first”). If people overcorrect to “the lifeguards aren’t doing anything,” then when the lifeguards tell them why their idea is dangerous, they’ll ignore them.
The synthesis here is something like: it’s very important that you understand why the lifeguards aren’t saving the children. Sometimes it’s because they’re missing key information, not personally well-suited to the task, exhausted from saving other children, or making a prioritization/judgment error in a way that you have some reason to think your judgment is better. But sometimes it’s the alligators! Most ideas for solving problems are bad, so your prior should be that if you have an idea, and it’s not being tried, probably the idea is bad; if you have inside-view reasons to think that it’s good, you should talk to the lifeguards to see if they’ve already considered this or think you will do harm.
Finally, it’s worth noting that even when the lifeguards are competent and correctly prioritizing, sometimes the job is just too hard for them to succeed with their current capabilities. Lots of top EAs are already working on AI alignment in not-obviously-misguided ways, but it turns out that it’s a very very very hard problem, and we need more great lifeguards! (This is not saying that you need to go to “lifeguard school,” i.e. getting the standard credentials and experiences before you start actually helping, but probably the way to start helping the lifeguards involves learning what the lifeguards think by reading them or talking to them so you can better understand how to help.)
The posts linked in support of “prominent longtermists have declared the view that longtermism basically boils down to x-risk” do not actually advocate this view. In fact, they argue that longtermism is unnecessary in order to justify worrying about x-risk, which is evidence for the proposition you’re arguing against, i.e. you cannot conclude someone is a longtermist because they’re worried about x-risk.
Possibly important or useful points:
In my version, the introducer of Doom Circles said something like: “You might do this with some people you know well and some people you don’t know well. The people you don’t know well will offer first or second impressions, and this can be useful too. But you are not obligated to accept any of the Dooms. In my experience, the people you know will have a “hit rate” of around 60% and the people you don’t will be around 30%.” It seems important that you should expect some of the Doom to just not feel particularly accurate, or to result from the Doom-Sayer’s own emotional state or weird takes or something. You’re there to gather information about what the particular people in the room would say under these exact conditions, not to discover the absolute truth about what everyone thinks of you.
In my case, the people I didn’t know that well said some things that I attributed to them just not knowing my field and disregarded.. But they also had surprising and overlapping negative aspects of their first impressions that were really helpful, that hadn’t been on my radar, and that I then put effort into fixing. I think I benefited a lot from this.
Sometimes, it will be really obvious to me that someone has a much better path to impact than the path they seem to be on, or they’re equivocating between a few but not choosing the one that’s their clear (to me) comparative advantage, but there is no social circumstance where it would be polite or normal to say this kind of thing to them. This is the kind of problem that Doom Circles are kind of meant to solve. But while you could in theory frame “you’re not doing the clearly best thing for you” as a Doom, but somehow Doom Circles don’t tend to produce this kind of thing. I would be interested in supplementing (or sometimes replacing) Doom Circles with “Victory Circles,” where everyone goes in a circle and says “This is the path by which I think, if in 30 years [X] has accomplished all their goals, they will have done it.
In the settings where Doom Circles have been available to me, they were very much an opt-in process. To join, one would have to leave the default activity to go do it, and then organizers made efforts (I think successfully) to enable people to frictionlessly leave after hearing the description if they decided not to do it. I don’t think this is a foolproof way to remove all social pressure and agree that it sounds (partly because of the name) somewhat culty, but I think it’s not nearly as bad on these metrics as some of the other commenters say.
- 9 Jul 2022 21:51 UTC; 8 points) 's comment on Doom Circles by (
Thanks for writing this up!
I hope to write a post about this at some point, but since you raise some of these arguments, I think the most important cruxes for a pause are:
It seems like in many people’s models, the reason the “snap back” is problematic is that the productivity of safety research is much higher when capabilities are close to the danger zone, both because the AIs that we’re using to do safety research are better and because the AIs that we’re doing the safety research on are more similar to the ones in the danger zone. If the “snap back” reduces the amount of calendar time during which we think AI safety research will be most productive in exchange for giving us more time overall, this could easily be net negative. On the other hand, a pause might just “snap back” to somewhere on the capabilities graph that’s still outside the danger zone, and lower than it would’ve been without the pause for the reasons you describe.
A huge empirical uncertainty I have is: how elastic is the long-term supply curve of compute? If, on one extreme end, the production of computing hardware for the next 20 years is set in stone, then at the end of the pause there would be a huge jump in how much compute a developer could use to train a model, which seems pretty likely to produce a destabilizing/costly jump. At the other end, if compute supply were very responsive to expected AI progress and a pause would mean a big cut to e.g. Nvidia’s R&D budget and TSMC shelved plans for a leading-node fab or two as a result, the jump would be much less worrying in expectation. I’ve heard that the industry plans pretty far in advance because of how much time and money it takes to build a fab (and how much coordination is required between the different parts of the supply chain), but it seems like at this point a lot of the future expected revenue to be won from designing the next generations of GPUs comes from their usefulness for training huge AI systems, so it seems like there should at least be some marginal reduction in long-term capacity if there were a big regulatory response.
I’ve seen the time-money tradeoff reach some pretty extreme, scope-insensitive conclusions. People correctly recognize that it’s not worth 30 minutes of time at a multi-organizer meeting to try to shave $10 off a food order, but they extrapolate this to it not being worth a few hours of solo organizer time to save thousands of dollars. I think people should probably adopt some kind of heuristic about how many EA dollars their EA time is worth and stick to it, even when it produces the unpleasant/unflattering conclusion that you should spend time to save money.
Also want to highlight “For example, we should avoid the framing of ‘people with money want to pay for you to do X’ and replace this with an explanation of why X matters a lot and why we don’t want anyone to be deterred from doing X if the costs are prohibitive” as what I think is the most clearly correct and actionable suggestion here.