Seeking (Paid) Case Studies on Standards

Holden KarnofskyMay 26, 2023, 5:58 PM

99 points

AI evaluations and standards AI safety Opportunities to take action Research agendas, questions, and project lists

Update: the case studies collected⁰ via this project that authors have agreed to make publicly available are here (including some that already have public links). We are not currently taking applications for new projects. I will likely post some reflections from reading the case studies at a later date.

We are not taking more applications for now.

I’m looking for concise, informative case studies on social-welfare-based standards¹ for companies and products (including standards imposed by regulation).

I think case studies could help a lot with making AI safety standards work.

This post outlines:

Some quick background on the state of (and hopes for) AI safety standards. More
Why I think case studies can be useful. More
Some specific standards that seem especially relevant (more), and guidance on the case studies I’m seeking (more).
How to apply for funding to do case studies. I’m running the request for proposals, with funding from Open Philanthropy. More

Some quick background on AI safety standards

The basic idea of AI safety standards would be:

Someone publishes an AI safety standard: a set of guidelines that, if followed by AI companies, could reduce risks of harm from AI. “Someone” could be a government agency (like NIST), an independent nonprofit, an industry association, etc. The standard might have conditions along the lines of: “Test to see whether your AI system can do dangerous thing __, using method __. If so, don’t deploy it until safety measures __ and __ (for example, alignment techniques or security measures) have been taken.” (Alternatively, it might only have conditions around assessment and disclosure of risks.)
If the standard is compelling—e.g., providing both protectiveness (following the standards would significantly reduce important risks) and practicality (an AI company could follow the standards without needing to compromise its business more than clearly necessary to reduce the risks) - then AI companies might choose to announce their intention to abide by it, for a number of reasons. These reasons could include:
- They think the guidelines are worth following on the merits and intend to do so, and they want this to be clear to everyone (customers, investors, employees, etc.)
- They want to get good PR, avoid bad PR, and/or strengthen their defenses against potential lawsuits.
- Important employees, customers, investors, etc. prefer that they follow the standard.
- They want to create momentum for the standard in the hopes that their competitors will also follow it, causing them to face fewer tough tradeoffs in terms of “Deploying system X might be risky to society, but if we don’t do it, our competitors might.”
- They view regulation as inevitable, and hope it will be based on protective+practical standards.
Once AI companies have announced their intention to abide by standards, choosing not to do so could become more costly (e.g., bad PR).
Over time, standards could be revised to remain protective and practical (or become more so), while also becoming more rigorously enforced—laying the groundwork for national and even international regulation (a bit more here).

If something like this happened, it could (a) make dangerous AI deployments more costly and less likely; (b) reduce “race dynamics” in which companies have to choose between releasing dangerous models and fearing that their competitors will do so; (c) increase incentives for alignment research and other danger-reducing measures (since these things, if done well, might allow companies to release powerful systems while staying in compliance with standards).

One of the things that appeals to me about this general model is that there is plenty of precedent for similar models in other industries. It’s common for companies to voluntarily follow social-welfare-oriented standards established by third parties, aiming—through compliance with standards—to increase confidence in the social responsibility of their work. Sometimes these standards are quite detailed and take a lot of work and/or expense both to create and follow. And there’s also precedent for initially voluntary standards to end up codified in regulation.

Some relevant examples include farm animal welfare standards (governing how animals are treated on farms), environmental standards (governing companies’ environmental impacts), security standards (governing e.g. how customer data is protected), safety standards (for airplanes, wetlabs and more), and financial standards aimed at e.g. preventing a bank collapse. More below.

Some ways case studies can be useful

There’s a lot of interest in AI safety standards right now, and I’m encountering a lot of differences of opinion on questions like:

Who should be in charge of drafting and revising standards? Industry associations? Independent nonprofits?
What sorts of people should and shouldn’t be looped in heavily for input?
Generally, how complex, onerous and/or expensive can a standard be and still command wide adoption?
Should standards require outside evaluations to check particular claims made by companies (for example, “Our model doesn’t have dangerous capability X”)?
- If so, what sorts of organizations can do these evaluations, and what measures can be taken to make it more likely that these organizations (a) are truly neutral and arm’s-length; (b) are able to understand what’s going on at the companies well enough to do accurate evaluations?
What are major factors in whether standards become widely adopted or not?
Is there much hope (much precedent?) for voluntary industry-adopted standards having an impact on later regulatory frameworks?
For those hoping to create standards, what are things they should be making sure to do or not do? What aren’t they thinking of by default, in terms of potential challenges and potential solutions?

I think that studying cases of existing widely-adopted standards can shed a lot of light on how these questions have been answered in other cases (both successful and unsuccessful).

They can thus inform the strategies taken by people looking to write or help shape safety standards that are both highly protective and widely adopted.

So far I’ve done one mini-case-study: a case study on farm animal welfare standards based on a conversation with Lewis Bollard. I’ve picked up a number of things from this that may be useful to people working on standards, such as:

There are many competing standards, though the first draft of each standard may be disproportionately important.
- There are a number of different animal welfare standards. Some are essentially about codifying what’s already common in industry, and have near-universal adoption; others are maintained by independent animal-welfare-oriented nonprofits, have a higher bar, and have lower participation.
- Each standard is arguably somewhat “anchored” in terms of how high a bar it’s setting and how much participation it’s seeking. (That is, when the standard is being revised, the people doing the revisions are likely aiming to maintain roughly the level of participation they already have.) With this in mind, the first version of a standard—and decisions about how high to set the bar—could be crucially important. (But it’s not the case that once one standard exists, it’s too late to create a competing one.)
Activism and advocacy can be important for standards adoption. Lewis believes pressure from activists—including “outside game” activists that hold protests and don’t participate in standards creation—has been important in a large wave of companies agreeing to higher-welfare standards over the last decade.
There are a number of pressures toward compliance once a company has signed on—even if there are no formal audits or external checks. For example, if a company says it’s complying with a standard, but isn’t, this might be revealed by a whistleblower or undercover investigation, and could constitute consumer fraud and/or securities fraud.
Lewis suggested holding “listening meetings” with companies and other potentially affected parties before standards development gets too far along, so that (a) they feel included in the process and (b) their concerns can be considered from the beginning. I’ve passed this suggestion on to people working on standards.

I think case studies can also help us a lot with the general problem that we don’t know what we don’t know.

As a general matter, I think it’s very hard to design something like standards from first principles alone. I expect there will be lots of difficult-to-anticipate challenges.
If we study standards from other industries, we get the opportunity to learn about challenges we might not have thought of—and solutions that might have taken decades to iron out.
We’ll need to apply judgment to using this information, since no other case is a perfect analogy for AI safety, but I think having the information would be a big plus.

My impression is that standards often take a very long time to take shape and gain wide adoption. If we want to “speedrun” this process due to the possibility of transformative AI being developed soon, learning as quickly and thoroughly as possible how things have worked elsewhere seems important.

Narrowing down standards to learn about

There are an enormous number of standards out there (ISO alone maintains almost 25,000). I’m especially interested in cases that share some key properties with potential AI safety standards. In particular:

I’m interested in intense standards for high-stakes applications. Some standards are relatively lightweight (e.g., international food standards); higher-stakes standards tend to be more intense, and I think the latter will be most appropriate for potentially transformative AI systems.

Example high-stakes standards: biosafety standards (see BMBL as well as the Federal Select Agency Standards); nuclear safety standards (e.g., IAEA’s); safety standards for chemical producers; airline safety standards; and standards and regulations that the FDA imposes on drugs.

I’m interested in standards that involve complex, sometimes creative risk assessment and/or intense, even adversarial auditing. Some standards seem straightforward to observe and verify (farm animal welfare standards are an example); I don’t think we can count on this being the case for AI, where it can take a lot of knowledge and creativity to answer questions like “What dangerous activities is this AI system really capable of?”

I think financial regulation and financial standards (e.g., FINRA) are a promising place to look for this sort of thing, since financial risks are often hard to understand and assess. (~~I’m told that in some cases, regulators are embedded within a financial company, going to work every day in the company’s office;~~ [I was prodded about this and couldn’t confirm it, and think it’s probably not right] also see this interesting Twitter thread arguing that the bank supervision model is promising for AI.)

Some other promising categories:

Cybersecurity standards such as NERC CIP, SOC2 and ISO 27034.
Some environmental standards have some of this quality. For example, down sourcing standards such as Downpass sometimes emphasize the thoroughness and intensity of audits; SA8000 instructs auditors to “conduct off-site interviews with trade union organisations, NGOs and dismissed workers to assess worker treatment.”
The Fair Labor Association’s Workplace Code of Conduct has been cited to me as an example of a standard with intense monitoring (more here).

I’m interested in standards that are more complex than just “checklists.” Most standards are something like: “You meet the standard if and only if the following things are all true of your company/product.” But I think AI safety standards might have to involve more complex conditions, like: “If an AI strongly demonstrates dangerous property X, then mitigation measures ___ are required; if it only weakly demonstrates dangerous property X, then lesser mitigation measures ____ are required.”

Here again financial regulations might be useful, for example the Large Financial Institution Rating System.

Institutional Review Boards might be useful as well, and have some other parallels as well (e.g., they are required before performing research).

I’m interested in standards that are motivated by non-monetized social welfare.

Many standards are about quality assurance (e.g., helping customers know what they’re getting) or interoperability (e.g., making sure that different products are compatible with each other). There’s a straightforward profit incentive to work on such standards.
I don’t think those kinds of motives will be enough to drive AI safety standards that protect against global catastrophic risks. I’m particularly interested in social-welfare-based standards that companies adopt (sometimes under pressure) in order to show social responsibility.
Examples include farm animal welfare standards (see my case study); the SA8000 social certification program; Fair Trade; the Eco-Management and Audit Scheme; and many other environmental standards.

All else equal, standards for things that are more similar to AI are better. (E.g., software is probably better to examine than food, although other factors here could outweigh this.)

I’m interested in failure stories, not just success stories. A good example might be bond credit ratings: third-party certifiers of creditworthiness came to play an important role in the economy, but they failed to correctly assess creditworthiness (when accounting for e.g. systemic risk), leading some institutions that were supposed to be conservative to take on too much risk (more).

I’m especially interested in private/voluntary standards, and even more especially in cases where private/voluntary standards helped shape later regulation, though I’m not exclusively interested in these (some of the examples above are regulation-backed standards).

What I’m looking for in case studies

I’m looking for case studies that:

Explore a standard, or other case of regulation or self-regulation, that is interestingly analogous to AI safety standards. Most of the standards I linked to in the previous section are standards I’d probably be interested in case studies of (though there are probably lots of interesting ones I didn’t list!)
Start with a very clear description of exactly how the standard works today (or worked in its heyday), with links to detailed documents laying out what the standard is and how it’s enforced.
Answer questions such as these (though it’s not necessary to cover all of these comprehensively):
- What’s the history of the standard? How did it get started?
- How is the standard implemented today? Who writes it and revises it, and what does that process look like?
- How did we get from the beginnings to where we are today?
- If a standard aims to reduce risks, to what extent did the standard get out ahead of/prevent risks, as opposed to being developed after relevant problems had already happened?
- How involved are/were activists/advocates/people who are explicitly focused on public benefit rather than profits in setting standards? How involved are companies? How involved are people with reputations for neutrality?
- Are there audits required to meet a standard?
  - If so, who does the audits, and how do they avoid being gamed?
  - How much access do they get to the companies they’re auditing?
  - How good are the audits? How do we know?
  - What other measures are taken to avoid standards being “gamed” and ensure that whatever risks they’re meant to protect against are in fact protected against?
- What sorts of companies (and how many/what percentage of relevant companies) comply with what standards, and what are the major reasons they do so?
- How costly and difficult is it to comply with the standards?
- What happens if a company stops complying?
- Does the standard currently seem to achieve its intended purpose? To the extent it seeks to reduce risks, is there a case that it’s done so?
- Was there any influence of early voluntary standards on later government regulation?
Are very strong on reasoning transparency, providing citations and key quotes for all key claims.
Are very clear and easy to navigate. It should be possible to pick up the key takeaways from a case study in 1-3 pages, easily find more detail on any particular key takeaway, and easily find answers to the key questions above. I expect good case studies to be 10-50 pages in general; for longer case studies it’s especially important to meet this criterion.

Other projects I might be interested in

I’m also interested in writeups that look for patterns across a large number of standards. Example topics include:

What would be some interesting standards to do case studies on (that weren’t already listed in this post), and why would they be interesting?
How widespread are safety standards generally?
Are there any generalizations we can make about the answers to the questions above for a wide range of standards?
What are the most interesting points from a particular extensive book or other writeup on standards? (Examples in footnote.²)

In general, feel free to use the form below to pitch me on any analysis you think could be useful, although I expect to be most likely to support analysis that is heavily about learning from past/existing cases (rather than about making abstract arguments).

Who can do case studies, and how can they find the relevant information?

I don’t think you need to be a subject-matter expert to do a good case study. You just need to be able to find the relevant information about how a standard works, how the process for maintaining it works, etc. This could be by:

Googling around, reading books and papers, etc. (I am unsure of whether all the relevant questions can be answered in this way.)
Interviewing knowledgeable people. My case study on farm animal welfare is based on a conversation with Lewis Bollard; I personally knew very little about the topic before we spoke about it.
I’ve found large language models quite helpful for this work as well. This may be because there’s often a lot of information about them somewhere online, but it’s not always easy to find. I’d suggest trying them out as research tools, though I’d prefer that any citations are to more reliable sources. (Sometimes, when directly prompted to do so, language models can produce good citations for their claims; sometimes they can’t, and sometimes their claims appear incorrect.³)

How to participate

Please use this form to (a) let me know about your interest in doing a case study or other writeup; (b) apply for funding to support the work. My basic default is to offer funding for up to 50 hours per case study, with room for negotiation in special circumstances. The rate of pay will be at least $75/hour for all approved cases, and could be higher (the form submission asks for information on this).

In any cases where I favor providing funding, I’ll make the recommendation to Open Philanthropy to do so.

If I get multiple proposals to study the same thing, I will probably do something to avoid redundancy (e.g., email the parties in question so they’re aware of overlapping efforts). This is a reason to use the form even if you’re not seeking funding.

I may also occasionally update this post to note whether some topics seem likely to already be well-covered.

Got ideas for more case studies?

Please share them in the comments! I’ve found that a lot of people happen to know of standards that are interestingly analogous to AI safety standards. Some guidance on how to look for such analogies is above.

For this post I talked to a number of people to get ideas on what good case studies might be, and on how some particular standards work. I’m grateful to Daniela Amodei, Sam Bell, Alexander Berger, Lewis Bollard, Alexis Carlier, Rocco Casagrande, Ben Garfinkel, Jonathan Gleklen, Mindy James, Richard Korzekwa, Jade Leung and Piers Millett for help and/or suggesting good example standards to learn about. These folks shouldn’t be seen as responsible for the content of the post.

Notes

Most of these case studies were directly paid for via this project, but in some cases the work was pro bono, or someone adapted or sent a copy of work that had been done for another project, etc. ↩
For a nice definition of standards, see ISO’s definition. ↩
One language model claimed that standards such as BSL-4 originated with the Asilomar conference on recombinant DNA, but I haven’t been able to find any source supporting this, and one biorisk expert I talked to was pretty sure it was false. ↩

What links here?

Holden KarnofskyMay 26, 2023, 5:58 PM

99 points

14 comments1 min readEA link

AI evaluations and standards AI safety Opportunities to take action Research agendas, questions, and project lists

Crossposted from LessWrong (69 points, 9 comments)

Akash May 26, 2023, 7:39 PM
7 points
0 ∶ 0

Excited to see this! I’d be most excited about case studies of standards in fields where people didn’t already have clear ideas about how to verify safety.
In some areas, it’s pretty clear what you’re supposed to do to verify safety. Everyone (more-or-less) agrees on what counts as safe.
One of the biggest challenges with AI safety standards will be the fact that no one really knows how to verify that a (sufficiently-powerful) system is safe. And a lot of experts disagree on the type of evidence that would be sufficient.
Are there examples of standards in other industries where people were quite confused about what “safety” would require? Are there examples of standards that are specific enough to be useful but flexible enough to deal with unexpected failure modes or threats? Are there examples where the standards-setters acknowledged that they wouldn’t be able to make a simple checklist, so they requested that companies provide proactive evidence of safety?
- Koen Holtman May 28, 2023, 7:53 PM
  7 points
  1 ∶ 0
  Parent
  
  
  One of the biggest challenges with AI safety standards will be the fact that no one really knows how to verify that a (sufficiently-powerful) system is safe. And a lot of experts disagree on the type of evidence that would be sufficient.
  
  While overcoming expert disagreement is a challenge, it is not one that is as big as you think. TL;DR: Deciding not to agree is always an option.
  
  To expand on this: the fallback option in a safety standards creation process, for standards that aim to define a certain level of safe-enough, is as follows. If the experts involved cannot agree on any evidence based method for verifying that a system X is safe enough according to the level of safety required by the standard, then the standard being created will simply, and usually implicitly, declare that there is no route by which system X can comply with the safety standard. If you are required by law, say by EU law, to comply with the safety standard before shipping a system into the EU market, then your only legal option will be to never ship that system X into the EU market.
  
  For AI systems you interact with over the Internet, this ‘never ship’ translates to ‘never allow it to interact over the Internet with EU residents’.
  
  I am currently in the JTC21 committee which is running the above standards creation process to write the AI safety standards in support of the EU AI Act, the Act that will regulate certain parts of the AI industry, in case they want to ship legally into the EU market. ((Legal detail: if you cannot comply with the standards, the Act will give you several other options that may still allow you to ship legally, but I won’t get into explaining all those here. These other options will not give you a loophole to evade all expert scrutiny.))
  
  Back to the mechanics of a standards committee: if a certain AI technology, when applied in a system X, is well know to make that system radioactively unpredictable, it will not usually take long for the technical experts in a standards committee to come to an agreement that there is no way that they can define any method in the standard for verifying that X will be safe according to the standard. The radioactively unsafe cases are the easiest cases to handle.
  
  That being said, in all but the most trivial of safety engineering fields, there is a complicated epistemics involved in deciding when something is safe enough to ship, it is complicated whether you use standards or not. I have written about this topic, in the context of AGI, in section 14 of this paper.
- Ben Stewart May 27, 2023, 5:37 PM
  3 points
  0 ∶ 0
  Parent
  
  Maybe there’s something in early cybersecurity? I.e. we’re not really sure precisely how people could be harmed through these systems (like the nascent internet), but there’s plenty of potential in the future?
- Ariel May 29, 2023, 8:34 AM
  2 points
  0 ∶ 0
  Parent
  
  
  Are there examples of standards in other industries where people were quite confused about what “safety” would require?
  
  Yes, medical robotics is one I was involved in. Though there, the answer is often just wait for the first product to hit the market (there is nothing quite there yet, doing full autonomous surgery), and then copy their approach. As is, the medical standards don’t cover much ML, and so the companies have to come up with the reasoning themselves for convincing the FDA in the audit. Which in practice means many companies just don’t risk it, and do something robotic, but surgeon controled, or use classical algorithms instead of deep learning.
Ben Stewart May 26, 2023, 7:45 PM
6 points
0 ∶ 0

One interesting case may be the Health Insurance Portability and Accountability Act, the law governing the collection, storage, and use of healthcare information in the U.S. Though it’s an actual regulation, not a standard, it should be a case of a complex, multi-stakeholder landscape involving a variety of risks, some of which arise from adversaries, and governing sensitive electronic information. It’s quality seems mixed, and appeared to be inadequate for subsequent developments in ‘big data’. Also, it looks like there’s been a decent amount written about it—there are 94 review articles with HIPAA in the title (results mentioning HIPAA look inflated due to articles mentioning HIPAA compliance in its methods).
mxschons Jun 15, 2023, 2:28 PM
5 points
2 ∶ 0

@Holden: We submitted two weeks ago and have not heard back yet?
johnjnay May 27, 2023, 2:21 PM
5 points
1 ∶ 0

Related paper: https://law.stanford.edu/publications/large-language-models-as-fiduciaries-a-case-study-toward-robustly-communicating-with-artificial-intelligence-through-legal-standards/
And related post: https://forum.effectivealtruism.org/posts/cWeioTmbs73iZjs25/large-language-models-as-fiduciaries-to-humans
Naevia Jun 15, 2023, 5:18 PM
4 points
0 ∶ 0

re: “One language model claimed that standards such as BSL-4 originated with the Asilomar conference on recombinant DNA”, I found some evidence for this.

The summary statement of the Asilomar conference recommends 4 levels of containment depending on the risk of the experiment. They sound similar to today’s BSL 1-4.
A history of the NIH guidelines for working with recombinant DNA makes it clear that they were heavily influenced by the Asilomar conference.
I haven’t found sources for:
- the fact that the BMBL’s biosafety levels were influenced by the NIH levels P1-4 (but it seems pretty obvious since they’re quite similar and both developed at least partly by the NIH)
- whether the idea of 4 biosafety levels existed before Asilomar (but I haven’t found any evidence that it did, and looking through the history of ABSA conferences the first references to biosafety levels I could find were post-Asilomar. This source claims without citation that the idea of 4 levels originated in the mid-1970s, which was when Asilomar happened)
Vishakha Agrawal Aug 1, 2023, 10:59 AM
3 points
0 ∶ 0

Hey Holden! Are you looking for more case studies at this time?
Jamie_Harris Jun 6, 2023, 7:49 PM
3 points
0 ∶ 0

I’m interested in standards that are motivated by non-monetized social welfare… [e.g.] Fair Trade
I wrote a case study of the Fair Trade movement. The focus was on the movement rather than the standards themselves, but I think it might be helpful for at least some of what you refer to in “What I’m looking for in case studies”. You can easily skim through the bolded headings in the “Strategic implications” section and see if any of the points highlighted seem relevant.
If someone else ends up doing a more standards-focused case study, it could be helpful for context.
I’m interested in intense standards for high-stakes applications… [e.g.] nuclear safety standards (e.g., IAEA’s)
Relatedly, my former colleague wrote a case study on the social and regulatory context of nuclear power. The report is quite short, but there’s not a single clear section I’d recommend to check for this purpose.
Koen Holtman Jun 4, 2023, 10:56 AM
3 points
0 ∶ 0

Hi Holden! I may be able to get some people my network interested in submitting a funding request to you for writing a case study.

There are two important questions they would have, that I could not find answers for in your post or form:
1. Are you inviting case studies you will be able to post on the web when you get them, or that the authors are allowed to publish also themselves as a blog post or academic journal article?
2. Are you inviting case studies for which the authors can request that they are kept confidential?
- Holden Karnofsky Jun 4, 2023, 3:21 PM
  3 points
  0 ∶ 0
  Parent
  
  Thanks! I’m looking for case studies that will be public; I’m agnostic about where they’re posted beyond that. We might consider requests to fund confidential case studies, but this project is meant to inform broader efforts, so confidential case studies would still need to be cleared for sharing with a reasonable set of people, and the funding bar would be higher.
Michael_Wiebe May 27, 2023, 7:06 AM
3 points
0 ∶ 0

Potential topic: state governments enforcing housing plans on municipalities.
tamgent Jul 29, 2023, 7:30 PM
2 points
0 ∶ 0

Come across this? https://aistandardshub.org/ai-standards-search/