AI safety governance/strategy research & field-building.
Formerly a PhD student in clinical psychology @ UPenn, college student at Harvard, and summer research fellow at the Happier Lives Institute.
AI safety governance/strategy research & field-building.
Formerly a PhD student in clinical psychology @ UPenn, college student at Harvard, and summer research fellow at the Happier Lives Institute.
I would be interested in seeing your takes about why building runway might be more cost-effective than donating.
Separately, if you decide not to go with 10% because you want to think about what is actually best for you, I suggest you give yourself a deadline. Like, suppose you currently think that donating 10% would be better than status quo. I suggest doing something like “if I have not figured out a better solution by Jan 1 2024, I will just do the community-endorsed default of 10%.”
I think this protects against some sort of indefinite procrastination. (Obviously less relevant if you never indefinitely procrastinate on things like this, but my sense is that most people do at least sometimes).
I think it’s good for proponents of RSPs to be open about the sorts of topics I’ve written about above, so they don’t get confused with e.g. proposing RSPs as a superior alternative to regulation. This post attempts to do that on my part. And to be explicit: I think regulation will be necessary to contain AI risks (RSPs alone are not enough), and should almost certainly end up stricter than what companies impose on themselves.
Strong agree. I wish ARC and Anthropic had been more clear about this, and I would be less critical of their RSP posts if they were upfront and clear about this stance. I think your post is strong and clear (you state multiple times, unambiguously, that you think regulation is necessary and that you wish the world had more political will to regulate). I appreciate this, and I’m glad you wrote this post.
I think it’d be unfortunate to try to manage the above risk by resisting attempts to build consensus around conditional pauses, if one does in fact think conditional pauses are better than the status quo. Actively fighting improvements on the status quo because they might be confused for sufficient progress feels icky to me in a way that’s hard to articulate.
A few thoughts:
One reason I’m critical of the Anthropic RSP is that it does not make it clear under what conditions it would actually pause, or for how long, or under what safeguards it would determine it’s OK to keep going. It is nice that they said they would run some evals at least once every 4X in effective compute and that they don’t want to train catastrophe-capable models until their infosec makes it more expensive for actors to steal their models. It is nice that they said that once they get systems that are capable of producing biological weapons, they will at least write something up about what to do with AGI before they decide to just go ahead and scale to AGI. But I mostly look at the RSP and say “wow, these are some of the most bare minimum commitments I could’ve expected, and they don’t even really tell me what a pause would look like and how they would end it.”
Meanwhile, we have OpenAI (that plans to release an RSP at some point), DeepMind (rumor has it they’re working on one but also that it might be very hard to get Google to endorse one), and Meta (oof). So I guess I’m sort of left thinking something like “If Anthropic’s RSP is the best RSP we’re going to get, then yikes, this RSP plan is not doing so well.” Of course, this is just a first version, but the substance of the RSP and the way it was communicated about doesn’t inspire much hope in me that future versions will be better.
I think the RSP frame is wrong, and I don’t want regulators to use it as a building block. My understanding is that labs are refusing to adopt an evals regime in which the burden of proof is on labs to show that scaling is safe. Given this lack of buy-in, the RSP folks concluded that the only thing left to do was to say “OK, fine, but at least please check to see if the system will imminently kill you. And if we find proof that the system is pretty clearly dangerous or about to be dangerous, then will you at least consider stopping” It seems plausible to me that governments would be willing to start with something stricter and more sensible than this “just keep going until we can prove that the model has highly dangerous capabilities” regime.
I think some improvements on the status quo can be net negative because they either (a) cement in an incorrect frame or (b) take a limited window of political will/attention and steer it toward something weaker than what would’ve happened if people had pushed for something stronger. For example, I think the UK government is currently looking around for substantive stuff to show their constituents (and themselves) that they are doing something serious about AI. If companies give them a milktoast solution that allows them to say “look, we did the responsible thing!”, it seems quite plausible to me that we actually end up in a worse world than if the AIS community had rallied behind something stronger.
If everyone communicating about RSPs was clear that they don’t want it to be seen as sufficient, that would be great. In practice, that’s not what I see happening. Anthropic’s RSP largely seems devoted to signaling that Anthropic is great, safe, credible, and trustworthy. Paul’s recent post is nuanced, but I don’t think the “RSPs are not sufficient” frame was sufficiently emphasized (perhaps partly because he thinks RSPs could lead to a 10x reduction in risk, which seems crazy to me, and if he goes around saying that to policymakers, I expect them to hear something like “this is a good plan that would sufficiently reduce risks”). ARC’s post tries to sell RSPs as a pragmatic middle ground and IMO pretty clearly does not emphasize (or even mention?) some sort of “these are not sufficient” message. Finally, the name itself sounds like it came out of a propaganda department– “hey, governments, look, we can scale responsibly”.
At minimum, I hope that RSPs get renamed, and that those communicating about RSPs are more careful to avoid giving off the impression that RSPs are sufficient.
More ambitiously, I hope that folks working on RSPs seriously consider whether or not this is the best thing to be working on or advocating for. My impression is that this plan made more sense when it was less clear that the Overton Window was going to blow open, Bengio/Hinton would enter the fray, journalists and the public would be fairly sympathetic, Rishi Sunak would host an xrisk summit, Blumenthal would run hearings about xrisk, etc. I think everyone working on RSPs should spend at least a few hours taking seriously the possibility that the AIS community could be advocating for stronger policy proposals and getting out of the “we can’t do anything until we literally have proof that the model is imminently dangerous” frame. To be clear, I think some people who do this reflection will conclude that they ought to keep making marginal progress on RSPs. I would be surprised if the current allocation of community talent/resources was correct, though, and I think on the margin more people should be doing things like CAIP & Conjecture, and fewer people should be doing things like RSPs. (Note that CAIP & Conjecture both impt flaws/limitations– and I think this partly has to do with the fact that so much top community talent has been funneled into RSPs/labs relative to advocacy/outreach/outside game).
Excited to see this team expand! A few [optional] questions:
What do you think were some of your best and worst grants in the last 6 months?
What are your views on the value of “prosaic alignment” relative to “non-prosaic alignment?” To what extent do you think the most valuable technical research will look fairly similar to “standard ML research”, “pure theory research”, or other kinds of research?
What kinds of technical research proposals do you think are most difficult to evaluate, and why?
What are your favorite examples of technical alignment research from the past 6-12 months?
What, if anything, do you think you’ve learned in the last year? What advice would you have for a Young Ajeya who was about to start in your role?
When should someone who cares a lot about GCRs decide not to work at OP?
I agree that there are several advantages of working at Open Phil, but I also think there are some good answers to “why wouldn’t someone want to work at OP?”
Culture, worldview, and relationship with labs
Many people have an (IMO fairly accurate) impression that OpenPhil is conservative, biased toward inaction, generally prefers maintaining the status quo, and is generally in favor of maintaining positive relationships with labs.
As I’ve gotten more involved in AI policy, I’ve updated more strongly toward this position. While simple statements always involve a bit of gloss/imprecision, I think characterizations like “OpenPhil has taken a bet on the scaling labs”, “OpenPhil is concerned about disrupting relationships with labs”, and even “OpenPhil sometimes uses its influence to put pressure on orgs to not do things that would disrupt the status quo” are fairly accurate.
The most extreme version of this critique is that perhaps OpenPhil has been net negative through its explicit funding for labs and implicit contributions to a culture that funnels money and talent toward labs and other organizations that entrench a lab-friendly status quo.
This might change as OpenPhil hires new people and plans to spend more money, but by default, I expect that OpenPhil will continue to play the “be nice with labs//don’t disrupt the status quo” role in the space. (In contrast to organizations like MIRI, Conjecture, FLI, the Center for AI Policy, perhaps CAIS).
Lots of people want to work there; replaceability
Given OP’s high status, lots of folks want to work there. Some people think the difference between the “best applicant” and the “2nd best applicant” is often pretty large, but this certainly doesn’t seem true in all cases.
I think if someone EG had an opportunity to work at OP vs. start their own organization or do something that requires more agency/entrepreneurship, there might be a strong case for them to do the latter, since it’s much less likely to happen by default.
What does the world need?
I think this is somewhat related to the first point, but I’ll flesh it out in a different way.
Some people think that we need more “rowing”– like, OP’s impact is clearly good, and if we just add some more capacity to the grantmakers and make more grants that look pretty similar to previous grants, we’re pushing the world into a considerably better direction.
Some people think that the default trajectory is not going so well, and this is (partially or largely) caused or maintained by the OP ecosystem Under this worldview, one might think that adding some additional capacity to OP is not actually all that helpful in expectation.
Instead, people with this worldview believe that projects that aim to (for example) advocate for strong regulations, engage with the media, make the public more aware about AI risk, and do other forms of direct work more focused on folks outside of the core EA community might be more impactful.
Of course, part of this depends on how open OP will be to people “steering” from within. My expectation is that it would be pretty hard to steer OP from within (my impression is that lots of smart people have tried, and folks like Ajeya and Luke have clearly been thinking about things for a long time, and the culture has already been shaped by many core EAs, and there’s a lot of inertia, so a random new junior person is pretty unlikely to substantially shift their worldview, though I of course could be wrong).
Adding this comment over from the LessWrong version. Note Evan and others have responded to it here.
Thanks for writing this, Evan! I think it’s the clearest writeup of RSPs & their theory of change so far. However, I remain pretty disappointed in the RSP approach and the comms/advocacy around it.
I plan to write up more opinions about RSPs, but one I’ll express for now is that I’m pretty worried that the RSP dialogue is suffering from motte-and-bailey dynamics. One of my core fears is that policymakers will walk away with a misleadingly positive impression of RSPs. I’ll detail this below:
What would a good RSP look like?
Clear commitments along the lines of “we promise to run these 5 specific tests to evaluate these 10 specific dangerous capabilities.”
Clear commitments regarding what happens if the evals go off (e.g., “if a model scores above a 20 on the Hubinger Deception Screener, we will stop scaling until it has scored below a 10 on the relatively conservative Smith Deception Test.”)
Clear commitments regarding the safeguards that will be used once evals go off (e.g., “if a model scores above a 20 on the Cotra Situational Awareness Screener, we will use XYZ methods and we believe they will be successful for ABC reasons.”)
Clear evidence that these evals will exist, will likely work, and will be conservative enough to prevent catastrophe
Some way of handling race dynamics (such that Bad Guy can’t just be like “haha, cute that you guys are doing RSPs. We’re either not going to engage with your silly RSPs at all, or we’re gonna publish our own RSP but it’s gonna be super watered down and vague”).
What do RSPs actually look like right now?
Fairly vague commitments, more along the lines of “we will improve our information security and we promise to have good safety techniques. But we don’t really know what those look like.
Unclear commitments regarding what happens if evals go off (let alone what evals will even be developed and what they’ll look like). Very much a “trust us; we promise we will be safe. For misuse, we’ll figure out some way of making sure there are no jailbreaks, even though we haven’t been able to do that before.”
Also, for accident risks/AI takeover risks… well, we’re going to call those “ASL-4 systems”. Our current plan for ASL-4 is “we don’t really know what to do… please trust us to figure it out later. Maybe we’ll figure it out in time, maybe not. But in the meantime, please let us keep scaling.”
Extremely high uncertainty about what safeguards will be sufficient. The plan essentially seems to be “as we get closer to highly dangerous systems, we will hopefully figure something out.”
No strong evidence that these evals will exist in time or work well. The science of evaluations is extremely young, the current evals are more like “let’s play around and see what things can do” rather than “we have solid tests and some consensus around how to interpret them.”
No way of handling race dynamics absent government intervention. In fact, companies are allowed to break their voluntary commitments if they’re afraid that they’re going to lose the race to a less safety-conscious competitor. (This is explicitly endorsed in ARC’s post and Anthropic includes such a clause.)
Important note: I think several of these limitations are inherent to current gameboard. Like, I’m not saying “I think it’s a bad move for Anthropic to admit that they’ll have to break their RSP if some Bad Actor is about to cause a catastrophe.” That seems like the right call. I’m also not saying that dangerous capability evals are bad—I think it’s a good bet for some people to be developing them.
Why I’m disappointed with current comms around RSPs
Instead, my central disappointment comes from how RSPs are being communicated. It seems to me like the main three RSP posts (ARC’s, Anthropic’s, and yours) are (perhaps unintentionally?) painting and overly-optimistic portrayal of RSPs. I don’t expect policymakers that engage with the public comms to walk away with an appreciation for the limitations of RSPs, their current level of vagueness + “we’ll figure things out later”ness, etc.
On top of that, the posts seem to have this “don’t listen to the people who are pushing for stronger asks like moratoriums—instead please let us keep scaling and trust industry to find the pragmatic middle ground” vibe. To me, this seems not only counterproductive but also unnecessarily adversarial. I would be more sympathetic to the RSP approach if it was like “well yes, we totally think it’d great to have a moratorium or a global compute cap or a kill switch or a federal agency monitoring risks or a licensing regime”, and we also think this RSP thing might be kinda nice in the meantime. Instead, ARC explicitly tries to paint the moratorium folks as “extreme”.
(There’s also an underlying thing here where I’m like “the odds of achieving a moratorium, or a licensing regime, or hardware monitoring, or an agency that monitors risks and has emergency powers— the odds of meaningful policy getting implemented are not independent of our actions. The more that groups like Anthropic and ARC claim “oh that’s not realistic”, the less realistic those proposals are. I think people are also wildly underestimating the degree to which Overton Windows can change and the amount of uncertainty there currently is among policymakers, but this is a post for another day, perhaps.)
I’ll conclude by noting that some people have went as far as to say that RSPs are intentionally trying to dilute the policy conversation. I’m not yet convinced this is the case, and I really hope it’s not. But I’d really like to see more coming out of ARC, Anthropic, and other RSP-supporters to earn the trust of people who are (IMO reasonably) suspicious when scaling labs come out and say “hey, you know what the policy response should be? Let us keep scaling, and trust us to figure it out over time, but we’ll brand it as this nice catchy thing called Responsible Scaling.”
Thanks! A few quick responses/questions:
I think presumably the pause would just be for that company’s scaling—presumably other organizations that were still in compliance would still be fine.
I think this makes sense for certain types of dangerous capabilities (e.g., a company develops a system that has strong cyberoffensive capabilities. That company has to stop but other companies can keep going).
But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipulation, power-seeking, deception) or scientific capabilities (e.g., the ability to develop better AI systems)?
Supposing that 3-10 other companies are within a few months of these systems, do you think at this point we need a coordinated pause, or would it be fine to just force company 1 to pause?
That’s definitely my position, yeah—and I think it’s also ARC’s and Anthropic’s position.
Do you know if ARC or Anthropic have publicly endorsed this position anywhere? (And if not, I’d be curious for your take on why, although that’s more speculative so feel free to pass).
@evhub can you say more about what you envision a governmentally-enforced RSP world would look like? Is it similar to licensing? What happens when a dangerous capability eval goes off— does the government have the ability to implement a national pause?
Aside: IMO it’s pretty clear that the voluntary-commitment RSP regime is insufficient, since some companies simply won’t develop RSPs, and even if lots of folks adopted RSPs, the competitive pressures in favor of racing seem like they’d make it hard for anyone to pause for >a few months. I was surprised/disappointed that neither ARC nor Anthropic mentioned this. ARC says some stuff about how maybe in the future one day we might have some stuff from RSPs that could maybe inform government standards, but (in my opinion) their discussion of government involvement was quite weak, perhaps even to the point of being misleading (by making it seem like the voluntary commitments will be sufficient.)
I think some of the negative reaction to responsible scaling, at least among some people I know, is that it seems like an attempt for companies to say “trust us— we can scale responsibly, so we don’t need actual government regulation.” If the narrative is “hey, we agree that the government should force everyone to scale responsibly, and this means that the government would have the ability to tell people that they have to stop scaling if the government decides it’s too risky”, then I’d still probably prefer stopping right now, but I’d be much more sympathetic to the RSP position.
@tlevin I would be interested in you writing up this post, though I’d be even more interested in hearing your thoughts on the regulatory proposal Thomas is proposing.
Note that both of your points seem to be arguing against a pause, whereas my impression is that Thomas’s post focuses more on implementing a national regulatory body.
(I read Thomas’s post as basically saying like “eh, I know there’s an AI pause debate going on, but actually this pause stuff is not as important as getting good policies. Specifically, we should have a federal agency that does licensing for frontier AI systems, hardware monitoring for advanced chips, and tracking of risks. If there’s an AI-related emergency or evidence of imminent danger, then the agency can activate emergency powers to swiftly respond.”
I think the “snap-back” point and the “long-term supply curve of compute” point seem most relevant to a “should we pause?” debate, but they seem less relevant to Thomas’s regulatory body proposal. Let me know if you think I’m missing something, though!)
One thing I appreciate about both of these tests is that they seem to (at least partially) tap into something like “can you think for yourself & reason about problems in a critical way?” I think this is one of the most important skills to train, particularly in policy, where it’s very easy to get carried away with narratives that seem popular or trendy or high-status.
I think the current zeitgeist has gotten a lot of folks interested in AI policy. My sense is that there’s a lot of potential for good here, but there are also some pretty easy ways for things to go wrong.
Examples of some questions that I hear folks often ask/say:
What do the experts think about X?
How do I get a job at X org?
“I think the work of X is great”--> “What about their work do you like?” --> “Oh, idk, just like in general they seem to be doing great things and lots of others seem to support X.”
What would ARC evals think about this plan?
Examples of some questions that I often encourage people to ask/say:
What do you think about X?
What do you think X is getting wrong?
If the community is wrong about X, what do you think it’s getting wrong? Do you think we could be doing better than X?
What do I think about this plan?
So far, my experience engaging with AI governance/policy folks is that these questions are not being asked very often. It feels more like a field where people are respected for “looking legitimate” as opposed to “having takes”. Obviously, there are exceptions, and there are a few people whose work I admire & appreciate.
But I think a lot of junior people (and some senior people) are pretty comfortable with taking positions like “I’m just going to defer to people who other people think are smart/legitimate, without really asking myself or others to explain why they think those people are smart/legitimate”, and this is very concerning.
As a caveat, it is of course important to have people who can play support roles and move things forward, and there’s a failure mode of spending too much time in “inside view” mode. My thesis here is simply that, on the current margin, I think the world would be better off if more people shifted toward “my job is to understand what is right and evaluate plans/people for myself” and fewer people adopted the “my job is to find a credible EA leader and row in the direction that they’re currently rowing.”
And as a final point, I think this is especially important in a context where there is a major resource/power/status imbalance between various perspectives. In the absence of critical thinking & strong epistemics, we should not be surprised if the people with the most money & influence end up shaping the narrative. (This model necessarily mean that they’re wrong, but it does tell us something like “you might expect to see a lot of EAs rally around narratives that are sympathetic toward major AGI labs, even if these narratives are wrong. And it would take a particularly strong epistemic environment to converge to the truth when one “side” has billions of dollars and is offering a bunch of the jobs and is generally considered cooler/higher-status.”
I agree with Zach here (and I’m also a fan of Holly). I think it’s great to spotlight people whose applications you’re excited about, and even reasonable for the tone to be mostly positive. But I think it’s fair for people to scrutinize the exact claims you make and the evidence supporting those claims, especially if the target audience consists of potential donors.
My impression is that the crux is less about “should Holly be funded” and more about “were the claims presented precise” and more broadly some feeling of “how careful should future posts be when advertising possible candidates.”
Congratulations on launching!
On the governance side, one question I’d be excited to see Apollo (and ARC evals & any other similar groups) think/write about is: what happens after a dangerous capability eval goes off?
Of course, the actual answer will be shaped by the particular climate/culture/zeitgeist/policy window/lab factors that are impossible to fully predict in advance.
But my impression is that this question is relatively neglected, and I wouldn’t be surprised if sharp newcomers were able to meaningfully improve the community’s thinking on this.
Excited to see this! I’d be most excited about case studies of standards in fields where people didn’t already have clear ideas about how to verify safety.
In some areas, it’s pretty clear what you’re supposed to do to verify safety. Everyone (more-or-less) agrees on what counts as safe.
One of the biggest challenges with AI safety standards will be the fact that no one really knows how to verify that a (sufficiently-powerful) system is safe. And a lot of experts disagree on the type of evidence that would be sufficient.
Are there examples of standards in other industries where people were quite confused about what “safety” would require? Are there examples of standards that are specific enough to be useful but flexible enough to deal with unexpected failure modes or threats? Are there examples where the standards-setters acknowledged that they wouldn’t be able to make a simple checklist, so they requested that companies provide proactive evidence of safety?
Glad to see this write-up & excited for more posts.
I think these are three areas that MATS has handled well. I’d be especially excited to hear more about areas where MATS thinks it’s struggling, MATS is uncertain, or where MATS feels like it has a lot of room to grow. Potential candidates include:
How is MATS going about talent selection and advertising for the next cohort, especially given the recent wave of interest in AI/AI safety?
How does MATS intend to foster (or recruit) the kinds of qualities that strong researchers often possess?
How does MATS define “good” alignment research?
Other things I’m be curious about:
Which work from previous MATS scholars is the MATS team most excited about? What are MATS’s biggest wins? Which individuals or research outputs is MATS most proud of?
Most peoples’ timelines have shortened a lot since MATS was established. Does this substantially reduce the value of MATS (relative to worlds with longer timelines)?
Does MATS plan to try to attract senior researchers who are becoming interested in AI Safety (e.g., professors, people with 10+ years of experience in industry)? Or will MATS continue to recruit primarily from the (largely younger and less experienced) EA/LW communities?
Clarification: I think we’re bottlenecked by both, and I’d love to see the proposals become more concrete.
Nonetheless, I think proposals like “Get a federal agency to regulate frontier AI labs like the FDA/FAA” or even “push for an international treaty that regulates AI in a way that the IAEA regulates atomic energy” are “concrete enough” to start building political will behind them. Other (more specific) examples include export controls, compute monitoring, licensing for frontier AI models, and some others on Luke’s list.
I don’t think any of these are concrete enough for me to say “here’s exactly how the regulatory process should be operationalized”, and I’m glad we’re trying to get more people to concretize these.
At the same time, I expect that a lot of the concretization happens after you’ve developed political will. If the USG really wanted to figure out how to implement compute monitoring, I’m confident they’d be able to figure it out.
More broadly, my guess is that we might disagree on how concrete a proposal needs to be before you can actually muster political will behind it, though. Here’s a rough attempt at sketching out three possible “levels of concreteness”. (First attempt; feel free to point out flaws).
Level 1, No concreteness: You have a goal but no particular ideas for how to get there. (e.g., “we need to make sure we don’t build unaligned AGI”)
Level 2, Low concreteness: You have a goal with some vagueish ideas for how to get there (e.g., “we need to make sure we don’t build unaligned AGI, and this should involve evals/compute monitoring, or maybe a domestic ban on AGI projects and a single international project).
Level 3, Medium concreteness: You have a goal with high-level ideas for how to get there. (e.g., “We would like to see licensing requirements for models trained above a certain threshold. Still ironing out whether or not that threshold should be X FLOP, Y FLOP, or $Z, but we’ve got some initial research and some models for how this would work.)
Level 4, High concreteness: You have concrete proposals that can be debated. (e.g., We should require licenses for anything above X FLOP, and we have some drafts of the forms that labs would need to fill out.)
I get the sense that some people feel like we need to be at “medium concreteness” or “high concreteness” before we can start having conversations about implementation. I don’t think this is true.
Many laws, executive orders, and regulatory procedures have vague language (often at Level 2 or in-between Level 2 and Level 3). My (loosely-held, mostly based on talking to experts and reading things) sense quite common for regulators to be like “we’re going to establish regulations for X, and we’re not yet exactly sure what they look like. Part of this regulatory agency’s job is going to be to figure out exactly how to operationalize XYZ.”
I also think that recent events have been strong evidence in favor of my position: we got a huge amount of political will “for free” from AI capabilities advances, and the best we could do with it was to push a deeply flawed “let’s all just pause for 6 months” proposal.
I don’t think this is clear evidence in favor of the “we are more bottlenecked by concrete proposals” position. My current sense is that we were bottlenecked both by “not having concrete proposals” and by “not having relationships with relevant stakeholders.”
I also expect that the process of concretizing these proposals will likely involve a lot of back-and-forth with people (outside the EA/LW/AIS community) who have lots of experience crafting policy proposals. Part of the benefit of “building political will” is “finding people who have more experience turning ideas into concrete proposals.”
I don’t actually think the implementation of governance ideas is mainly bottlenecked by public support; I think it’s bottlenecked by good concrete proposals. And to the extent that it is bottlenecked by public support, that will change by default as more powerful AI systems are released.
I appreciate Richard stating this explicitly. I think this is (and has been) a pretty big crux in the AI governance space right now.
Some folks (like Richard) believe that we’re mainly bottlenecked by good concrete proposals. Other folks believe that we have concrete proposals, but we need to raise awareness and political support in order to implement them.
I’d like to see more work going into both of these areas. On the margin, though, I’m currently more excited about efforts to raise awareness [well], acquire political support, and channel that support into achieving useful policies.
I think this is largely due to (a) my perception that this work is largely neglected, (b) the fact that a few AI governance professionals I trust have also stated that they see this as the higher priority thing at the moment, and (c) worldview beliefs around what kind of regulation is warranted (e.g., being more sympathetic to proposals that require a lot of political will).
Lots of awesome stuff requires AGI or superintelligence. People think LLMs (or stuff LLMs invent) will lead to AGI or superintelligence.
So wouldn’t slowing down LLM progress slow down the awesome stuff?
I think more powerful (aligned) LLMs would lead to more awesome stuff, so caution on LLMs does delay other awesome stuff.
I agree with the point that “there’s value that can be gained from figuring out how to apply systems at current capabilities levels” (AI summer harvest), but I wouldn’t go as far as “you can almost have the best of both worlds.” It seems more like “we can probably do a lot of good with existing AI, so even though there are costs of caution, those costs are worth paying, and at least we can make some progress applying AI to pressing world problems while we figure out alignment/governance.” (My version isn’t catchy though, oops).
I appreciate that this post acknowledges that there are costs to caution. I think it could’ve gone a bit further in emphasizing how these costs, while large in an absolute sense, are small relative to the risks.
The formal way to do this would be a cost-benefit analysis on longtermist grounds (perhaps with various discount rates for future lives). But I think there’s also a way to do this in less formal/wonky language, without requiring any longtermist assumptions.
If you have a technology where half of experts believe there’s a ~10% chance of extinction, the benefits need to be enormous for them to outweigh the costs of caution. I like Tristan Harris’s airplane analogy:
Imagine: would you board an airplane if 50% of airplane engineers who built it said there was a 10% chance that everybody on board dies?
Here’s another frame (that I’ve been finding useful with folks who don’t follow the technical AI risk scene much): History is full of examples of people saying that they are going to solve everyone’s problems. There are many failed messiah stories. In the case of AGI, it’s true that aligned and responsibly developed AI could do a lot of good. But when you have people saying “the risks are overblown—we’re smart and responsible enough to solve everything”, I think it’s pretty reasonable to be skeptical (on priors alone).
Finally, one thing that sometimes gets missed in this discussion is that most advocates of pause still want to get to AGI eventually. Slowing down for a few years or decades is costly, and advocates of slowdown should recognize this. But the costs are substantially lower than the risks. I think both of these messages get missed in discussions about slowdown.
The impression I get is that lots of people are like “yeah, I’d like to see more work on this & this could be very important” but there aren’t that many people who want to work on this & have ideas.
Is there evidence that funding isn’t available for this work? My loose impression is that mainstream funders would be interested in this. I suppose it’s an area where it’s especially hard to evaluate the promisingness of a proposal, though.
Reasons people might not be interested in doing this work: — Tractability — Poor feedback loops — Not many others in the community to get feedback from — Has to deal with thorny and hard-to-concretize theoretical questions
Reasons people might want to work on this: — Importance and neglectedness — Seems plausible that one could become one of the most knowledgeable EAs on this topic in not much time — Interdisciplinary; might involve interacting a lot with the non-EA world, academia, etc — Intellectually stimulating
See also: https://80000hours.org/podcast/episodes/robert-long-artificial-sentience/
Personally, I still think there is a lot of uncertainty around how governments will act. There are at least some promising signs (e.g., UK AI Safety Summit) that governments could intervene to end or substantially limit the race toward AGI. Relatedly, I think there’s a lot to be done in terms of communicating AI risks to the public & policymakers, drafting concrete policy proposals, and forming coalitions to get meaningful regulation through.
Some folks also have hope that internal governance (lab governance) could still be useful. I am not as optimistic here, but I don’t want to rule it out entirely.
There’s also some chance that we end up getting more concrete demonstrations of risks. I do not think we should wait for these, and I think there’s a sizable chance we do not get them in time, but I think “have good plans ready to go in case we get a sudden uptick in political will & global understanding of AI risks” is still important.