Implications of the Whitehouse meeting with AI CEOs for AI superintelligence risk—a first-step towards evals?

Introducion

On Wednesday 4th May, Sam Altman (Open AI) and Dario Amodei (Anthropic) - amongst others—met with US Vice President Kamala Harris (with a drop-in from President Joe Biden), to discuss the dangers of AI.

Announcement | Fact sheet | EA Forum linkpost

I spent about 2 hours trying to understand what happened, who was involved, and what its possible implications for superintelligence risk.

I decided to make this post for two reasons:

  1. I am practising writing and developing my opinions on AI strategy (so feedback is very welcome, and you should treat my epistemic status as ‘new to this’!)

  2. I think demystifying the facts of the announcement and offering some tentative conclusions will positively contribute to the community’s understanding of AI-related political developments.

My main conclusions

Three announcements were made, but the announcement on public model evaluations involving major AI labs seemed most relevant and actionable to me[1].

My two actionable conclusions are:

  1. I think folks with technical alignment expertise[2] should consider attending DEF CON 31[3] if it’s convenient, to help shape the conclusions from the event.

  2. My main speculative concern is that this evaluation event could positively associate advanced AI and the open source community. As far as those that feel the downside of model proliferation outweighs the benefits of open sourcing, spreading this message in a more focused way now may be valuable.

Summary of the model evaluations announcement

This is mostly factual, and I’ve flagged where I’m offering my interpretation. Primary source: AI village announcement.

There’s going to be an evaluation platform made available during a conference called DEF CON 31. DEF CON 31 is the 31st iteration of DEF CON, “the world’s largest security conference”, taking place in Los Angeles on 10th August 2023. The platform is being organised by a subcommunity at that conference called the AI village.

The evaluation platform will be provided by Scale AI. The platform will provide “timed access to LLMs” via laptops available at the conference, and attendees will red-team various models by injecting prompts. I expect that the humans will then rate the output of the model as good or bad, much like on the ChatGPT platform. There’s a points-based system to encourage participation, and the winner will win a “high-end Nvidia GPU”.

The intent of this whole event appears to be to collect adversarial data that the AI organisations in question can use and ‘learn from’ (and presumably do more RLHF on). The orgs that signed up include: Anthropic, Google, Hugging Face, Microsoft, NVIDIA, OpenAI, and Stability AI.

It seems that there won’t be any direct implications for the AI organisations. They will, by default, be allowed to carry on as normal no matter what is learned at the event.

I’ll provide more details on what has happened after the takeaways section.

Takeaways from the Whitehouse announcement on model evaluations

I prioritised communicating my takeaways in this section. If you want more factual context to understand exactly what happened and who’s involved- see the section below this one.

For the avoidance of doubt, the Whitehouse announcement on the model evaluation event doesn’t come with any regulatory teeth.

I don’t mean that as a criticism necessarily; I’m not sure anyone has a concrete proposal for what the evaluation criteria should even be, or how they should be enforced, etc, so it’d be too soon to see an announcement like that.

That does mean I’m left with the slightly odd conclusion that all that’s happened is the Whitehouse has endorsed a community red-teaming event at a conference.

Nonetheless I’m cautiously optimistic about this announcement for a few reasons.

First off, it strikes me as a test to gain more information about how to proceed with regulation. Either by accident or design, I think this is just the beginning; it’s now well within the overton window to “evaluate all AI orgs’ models before deployment”. This seems like a potential precedent on which to attach further requirements.

It also seems encouraging to me that the US Government was able to coordinate AI companies to agree to commit to a public evaluation before deployment. It’s great to see the governments playing a key role in cutting through competition and gaining consensus. There may have been a stronger proposal on the table that wasn’t agreed to, but at least this was agreed to by everyone in the room.

The AI governance community could acknowledge these two facts and treat this implementation of evaluations as a lever. Now might be a good time to find out more about this lever and figure out how to pull it hard. (Reminder—it seems likely to me that folks more involved in governance than me already are doing this, or have good reason to pursue other avenues that I’ve not considered.)

Unfortunately, it’s not mentioned anywhere that the red-teaming will explicitly be monitoring for misalignment, or ‘model ill-intent’. However I think there will be a significant amount of overlap between the ‘harmful outputs’ red-teamers will look for by default to the sorts of things the alignment community would look for. If I’m right that there will be overlap, I’m excited about the ability for the event to raise some amount of credible awareness of misalignment (from it being at a reasonably official venue, in the ‘hacker community’).

Given the above, I tentatively think folks with technical alignment expertise[2] should consider attending DEF CON[3]. I don’t know how possible that is because I’ve never been to DEF CON—but I expect that means other members of the alignment community won’t be there either, by default. Again to be clear, I think useful things will be evaluated by competent people at DEF CON, but the results of that workshop appear to be being treated as being used as input for “what happens next” so I think it’s important to be in the room.

Some people raised concerns about Scale AI not being interested in alignment, yet taking a central role in this announcement. It seems true that Scale AI doesn’t tout alignment strongly, but I think their involvement in this is a red herring and is basically unimportant who provides the platform. It sounds like someone told the press-release people that the platform is made by Scale AI, so Scale AI got mentioned.

I think the platform will just be something like “takes in text, saves human ratings on responses”. For example, see their RLHF platform. It’s just a way to monetise getting human data into AI companies’ hands—they’re following profit incentives, not engineering the future. Regulation will be what changes the incentive landscape, not middle-orgs like Scale AI.

That said, I don’t know the details of how the Scale AI platform is built, and there might be differing implementation details if it had been built with alignment in mind. Here I’d refer back to “alignment people should get involved with this”, to have the opportunity to raise their voice if anything is wrong. Even more speculatively: there’s probably also opportunity for another evaluation org to apply competitive pressure and prove themselves as the better eval company.

My main speculative concern is that the evaluation event could produce more memes that positively associate advanced AI and open sourcing, which I believe could lead to more model proliferation. My reasoning is loose, but goes like: given this is quite a light-touch evaluation, the DEF CON community may pat themselves on the back for finding lots of harmful outputs, and the AI companies involved are likely to be positive about the outcome because, by default, they’ll get useful data without any enforcement or further scrutiny. Could that lead people to conclude that open sourcing is the way to go, as ‘the community knows best’? There’s a bunch of other stories you could tell, so I am not sure about this outcome.

More factual details

The most relevant of the 3 announcements[1] from the fact sheet was:

The Administration is announcing an independent commitment from leading AI developers, including Anthropic, Google, Hugging Face, Microsoft, NVIDIA, OpenAI, and Stability AI, to participate in a public evaluation of AI systems, consistent with responsible disclosure principles—on an evaluation platform developed by Scale AI—at the AI Village at DEF CON 31.

Let’s unpack everything in that statement one by one.

What’s DEF CON 31 and the AI village?

Primary source: Wikipedia and the AI village blog.

DEF CON is a conference

DEF CON is a conference founded by members of the internet security community. It’s not typically an AI /​ ML conference (I don’t think I’d heard about it as a UK based ML engineer in 2021) and it’s not an academic conference. More like a coming together of hackers to convene, do cool projects, and discuss industry best practices. Typically in security, though the scope has grown as the conference became more popular.

I was a bit shocked at the presentation of the website at first, given its presidential endorsement. However, I think it needs to be understood in the context of it being founded by the internet security community: they historically have a culture of anonymity and using aliases.

It seems in fact to be the most respected conference in the computer security world, with some high-ranking attendees:

Federal law enforcement agents from the FBI, DoD, United States Postal Inspection Service, DHS (via CISA) and other agencies regularly attend DEF CON.[3][4]

It also has precedent interacting with the US government:

In April 2017, a DEF CON Black Badge was featured in an exhibit[15] in the Smithsonian Institution’s National Museum of American History entitled “Innovations in Defense: Artificial Intelligence and the Challenge of Cybersecurity”. The badge belongs to ForAllSecure’s Mayhem Cyber Reasoning System,[16] the winner of the DARPA 2016 Cyber Grand Challenge at DEF CON 24 and the first non-human entity ever to earn a Black Badge.

The AI Village is a subcommunity

The AI village is a subcommunity at DEF CON. It says it’s “a community of hackers and data scientists working to educate the world on the use and abuse of artificial intelligence in security and privacy”.

It has a few writings on its blog about adversarial robustness, which look to be sharing reasonable industry best-practice to me at a glance. I found its article on ML security a little bit limited in scope, though. It mostly based its conclusions on a framing of system accuracy, neglecting to incorporate general and generative AI (and thus alignment). I found this surprising given it will be hosting the red teaming event for generative, general-ish AI systems.

Finally, it has surprisingly few people involved in it as far as I can tell from the website and no weighty credentials thrown around, given the weighty endorsement it’s just received from the Whitehouse.

What’s Scale AI?

Scale AI is a Machine Learning operations platform founded in 2016 in San Francisco, with 600 employees. They have various products that help ML companies scale and deploy their models.

Its mission is “to accelerate the development of AI applications”.

Notably they have a platform for helping ML companies do Reinforcement Learning from Human Feedback (RLHF), claiming it helps with alignment. There’s no particular nuance about whether it fully aligns models. As I said above, I get the sense it’s just following profit incentives and supporting other orgs, rather than building anything itself.

Which other actors are involved?

For those interested, I noticed a couple of other actors who aren’t mentioned in the Whitehouse briefings were co-authors on the AI village announcement post. I haven’t got a lot to say about them, but I’m including this information for completeness.

Rumman Chowdhury (Humane Intelligence)

We focus on safety, ethics, and subject specific expertise (e.g. medical). We are suited for any company creating consumer-facing AI products, but in particular generative AI products.

Austin Carson, Political Science at University of Chicago.

  1. ^

    I largely ignored 2 of the 3 announcments the Whitehouse made:

    1. Further policies on AI development are in the works. Whilst those policies might end up being important, no details were given in the announcement, so an outsider like me can’t really comment on that yet.

    2. More funding was announced for American AI research and development (R&D). I don’t expect these to be important for alignment reserarch, on the margin, but some funds may be leveragable depending on the exact details.

  2. ^

    E.g. PhD students who’ve worked on alignment, and others who have published.

  3. ^

    I note it’s a costly recommendation for alignment researchers to visit a conference in LA.

    Some more input on whether DEF CON is likely to be an important event from someone who understands the US policy world better than me would be useful before you book a flight.

    To be transparent, my reasoning for recommending this is: the Whitehouse endorsed this event and are well networked with the workshop’s stakeholders, so probably the reported outcome of the workshop will be important for the shape of policies that come next.