Executive Summary
Plenty of attention rests on artificial intelligence developers’ non-technical contributions to ensuring safe development of advanced AI: Their corporate structure, their internal guidelines (‘RSPs’), and their work on policy. We argue that strong profitability incentives increasingly force these efforts into ineffectiveness. As a result, less hope should be placed on AI corporations’ internal governance, and more scrutiny should be afforded to their policy contributions.
TL;DR
Only Profit-Maximizers Stay At The Frontier
Investors and compute providers have extensive leverage over labs and need to justify enormous spending
As a result, leading AI corporations are forced to maximize profits
This leads them to advocate against external regulatory constraints or shape them in their favor
Constraints from Corporate Structure Are Dangerously Ineffective
Ostensibly binding corporate structures are easily evaded or abandoned
Political and public will cannot be enforced or ensured via corporate structure
Public pressure can lead to ineffective and economically harmful non-profit signaling
Hope In RSPs Is Misguided
RSPs on their own can and will easily be discarded once they become inconvenient
Public or political pressure is unlikely to enforce RSPs against business interests
RSP codification is likely to yield worse results than independent legislative initiative
Therefore, much less attention should be afforded to RSPs.
For-Profit Policy Work Is Called Corporate Lobbying
For-profit work on policy and governance is usually called corporate lobbying. In many other industries, corporate lobbying is an opposing corrective force to advocacy
Corporate lobbying output should be understood as constrained by business interests
Talent allocation and policy attention should be more skeptical of corporate lobbying.
Introduction
Advocates for safety-focused AI policy often portray today’s leading AI corporations as caught between two worlds: Product-focused, profit-oriented commercial enterprise on the one hand, and public-minded providers of measured advice on transformative AI and its regulation on the other hand. AI corporations frequently present themselves in the latter way, when they invoke the risks and harms and transformative potential of their technology in hushed tones; while at the same time, they herald the profits and economic transformations ushered in by their incoming top-shelf products. When these notions clash and profit maximization prevails, surprise and indignation frequently follow: The failed ouster of OpenAI CEO Sam Altman revealed that profit-driven Microsoft was a much more powerful voice than OpenAI’s non-profit board, and the deprioritisation of its superalignment initiative, reportedly in favor of commercial products, reinforced that impression. Anthropic’s decision to arguably push the capability frontier with its latest class of models revealed that its reported private commitments to the contrary did not constrain them, and DeepMind’s full integration into the Google corporate structure has curtailed hope in its responsible independence.
Those concerned about safe AI might deal with that tension in two ways: Put pressure on and engage with the AI corporations to make sure that their better angels have a greater chance at prevailing; or take a more cynical view and treat large AI developers as simply just another private-sector profit maximizer—not ‘labs’, but corporations. This piece argues one should do the latter. We examine the nature and force of profit incentives and argue they are likely to lead to a misallocation of political and public attention to company structure; a misallocation of policy attention to AI corporations’ internal policies; and a misallocation of political attention and safety-motivated talent to lobbying work for AI corporations.
Only Profit-Maximizers Stay At The Frontier
Investors and compute providers have extensive leverage over labs and need to justify enormous spending
As a result, leading AI corporations are forced to maximize profits
This leads them to advocate against external regulatory constraints or shape them in their favor
Economic realities of frontier AI development make profit orientation a foregone conclusion. Even if an AI corporation starts out not primarily motivated by profit, it might still have no choice but to chase maximal profitability: As the track record of Anthropic, the seemingly most safety-minded lab, clearly demonstrates, even a maximally safetyist lab has to remain at the frontier of AI development: To understand the technical realities at the actual model frontier, to attract the relevant talent and motivate the necessary compute investments. The investments that are required to scale up the required computing power to remain at this frontier are enormous and are only provided by profit-driven major technology corporations or, to a much lesser extent, large-scale investors. And a profit-driven tech corporation seems exceedingly unlikely to hinge astronomical capex on an AI corporation that does not give off the unmistakable impression of pursuing maximal profits. If a publicly traded company like Microsoft had serious reason to believe that a model it spent billions of dollars on could just not be released as a result of altruistically motivated safety considerations that go beyond liability risks, their compute expenditure would at least be strategically imprudent and at worst violate its fiduciary duty. This is especially true when compared to counterfactually funding model development at another, less safety-minded, corporation or an in-house lab. And these external pressures might well be mounting in light of rising doubts around the short-term profitability of frontier AI development.
So, even a safety-motivated lab has no choice but to act as if it were an uncompromising profit-maximizing entity—lest their compute providers will take their business elsewhere, and the lab could no longer compete. The effects are clearly visible: When OpenAI board members briefly tried to change course, Microsoft almost took away most of their compute and talent in a matter of hours; in a compute crunch, OpenAI’s superalignment team reportedly got the short end of the stick; and OpenAI might well be gearing up for an IPO doing away with what’s left of its safety-focused structure. The recent, presumable antitrust-motivated, surrender of Apple’s and Microsoft’s respective board seats at OpenAI does cast some doubt on the institutional side of this mechanism, but it is important to note that Microsoft did not have a board seat before they managed to strongarm OpenAI into rehiring Altman—if anything, that episode showed how little the board matters. It would be exceedingly surprising if Microsoft’s main competitors did not have, or at least wrestle for, a comparably iron grip on their respective AI corporations. If they succeed, then these respective labs become full-on profit maximizers; and if they don’t, profit-maximizing labs supported by investors and compute providers might take their place. This determines the role of AI corporations in policy and governance debate. With AI undergoing rapid technical innovation, financial investment, and media awareness, it seems to be destined to be a key technology of the 21st century. This strategic importance will inevitably lead to public attention, political pressure, and regulatory intervention in favor of making AI safe, controlled, and beneficial.
Profit maximization is not necessarily at odds with that goal—often, safe and reliable AI is the best possible product to release. Much safety-relevant technical progress, such as advances in making AI more predictable, more responsive to instructions, or more malleable to feedback, has also made for meaningful product improvements. This incentive has its limitations—in race situations that offer great economic benefit from being first to reach meaningful capability thresholds, larger and larger risks of unsafe or unreliable releases might become economically tenable.
But more importantly, business interest is often at grave odds with external constraints. It is near-universally accepted that geostrategically and infrastructurally critical industries ought to be externally constrained through regulation and oversight—even where market pressures favor safe products. This oversight is usually thought to ensure strategic sovereignty, reduce critical vulnerabilities, and prevent large-scale failures, etc. Profit maximizers will not willingly accept such constraints, even where they publicly endorse them in principle—because they believe that they might know best, because compliance is expensive, and because unconstrained behavior might sometimes be very profitable. Behind closed doors, the largest AI corporations have been some of the fiercest opponents of early legislative action that might constrain them, whether that is the EU’s AI Act, California’s SB-1047, or early initiatives in D.C.and elsewhere. Where constraints seem unavoidable, business suggests shaping them to be minimally restrictive or potentially even conducive: Companies might advertise for legal mechanisms that best fit their own technical abilities or that prevent smaller competitors from catching up. This understanding of AI corporations’ profitability motives and their resulting role in policy dynamics should lead us to reassess their activity in three areas: Corporate structure, Responsible Scaling Policies, and corporate lobbying.
Constraints from Corporate Structure Are Dangerously Ineffective
Ostensibly binding corporate structures are easily evaded or abandoned
Political and public will cannot be enforced or ensured via corporate structure
Public pressure can lead to ineffective and economically harmful non-profit signaling
AI corporations, in alleging their binding commitment to responsible development, often point to internal governance structures. But because of the primacy of profitability even in safety-minded labs discussed above, arcane corporate structures might do very little to mitigate it. This is true for three reasons: Firstly, profit maximization can easily be argued to be a necessary condition for a valuable contribution to safety, as it secures the talent and compute required to begin with. So, even under an ostensibly prohibitive pro-safety structure, a single lab might justify pushing for profits and capabilities—look at Anthropic and its recent decision to release arguably the most advanced suite of LLMs to date. Secondly, corporate structures are often not very robust to change driven by executive leadership and major investors—again, it’s easy to point at OpenAI as a recent example of a quick de facto change in governance setup that could not even be halted by one of its founders. Thirdly, in many cases, existing companies might be easily exchangeable shells for what really matters under the hood: Compute that could potentially be reassigned by the compute providers, and research teams that can be poached and will follow the compute. That kind of move might be legally contentious, but between the massive asymmetries in legal resources and potential contractual provisions around model progress in compute agreements, it seems like a highly believable threat. Present corporate structures that suggest a stronger focus on safety or the common good might hence be best understood more cynically: They are only there because they have not interfered with profitability just yet, but can readily be dismissed or dismissed one way or the other. Any other understanding is a set-up for miscalibrated expectations and disappointment, such as in the case of the internal changes at OpenAI. Drawing the right lessons from that is important: Alleging that Sam Altman turned out to be a uniquely deceptive or machiavellian figure, or that OpenAI has undergone some surprising hostile takeover, misses the point and sets up future misconceptions. The lesson should be that there was a failure to understand the real distribution of power in corporate governance.
Expecting corporate governance to enforce public will to make AI safe might not only be mistaken, it could also soften pressure on political institutions to craft meaningful policies that address the societal challenges posed by AI. Insofar as the ineffectiveness of safety-minded corporate structures is not immediately obvious, the public and policy-makers could be inclined to believe that AI corporations were already sufficiently aligned with the public interest. This misunderstanding should be avoided: Public attention should instead pivot towards policymakers and the adequacy of their regulatory frameworks rather than dwelling on corporate board reshufflings. This shift might even benefit AI corporations—by reducing the resources they need to spend on costly charades like complex corporate structures designed to provide an impression of safety focus. Treating them like any other private company might in turn relieve them of the burden of costly non-profit signaling.
Hope In RSPs Is Misguided
RSPs on their own can and will easily be discarded once they become inconvenient
Public or political pressure is unlikely to enforce RSPs against business interests
RSP codification is likely to yield worse results than independent legislative initiative
Therefore, much less attention should be afforded to RSPs.
Secondly, a rosy view of AI corporations’ incentives leads to overrating the relevance of corporate governance guidelines set by labs. Responsible scaling policies (RSPs) are documents that outline safety precautions taken around advanced AI—e.g. which models should undergo which evaluations, necessary conditions for model deployment, or development red lines - see those from OpenAI, Google DeepMind, and Anthropic. These RSPs are often met with interest, attention, sometimes praise, and sometimes disappointment from safety advocates, and often feature in policy proposals and political discussion.
Unfortunately, there is very little reason to believe that such RSPs deserve this attention. Firstly, of course, RSPs by themselves lack an external enforcement mechanism. No one can compel the internal governance of an AI corporation to comply with their RSPs, nor to keep their RSPs once they feel they become inconvenient. RSPs are simply a public write-up of internal corporate governance valid exactly as long as company leadership decides it is. An optimistic view of RSPs might be that they are a good way to hold AI corporations accountable—that public and political attention would be able to somehow sanction labs once they did diverge from their RSPs. Not only is this a fairly convoluted mechanism of efficacy, it also seems empirically shaky: Meta is a leading AI corporation with industry-topping amounts of compute and talent and does not publish RSPs. This seems to have garnered neither impactful public and political scrutiny nor hurt the Meta AI business.
This enforceability is sometimes thought to be answerable through RSP codification. RSPs might be codified, i.e. implemented in the form of binding law. This describes, in effect, a legislative process: For RSPs to be binding or externally enforceable in a meaningful sense, someone would have to be empowered to carry out an external, neutral evaluation of compliance and to enforce the measures often stipulated in the RSPs—for instance, whether a model ought to be shut down, planned deployment ought to be canceled, or training should be stopped. At present, it is difficult to conceive how this evaluation and enforcement should happen if not through executive action empowered by legislative mandate. So RSP codification is in effect simply a safe AI law—with one notable difference: That we do not start from a blank piece of paper, but with an outline of what AI corporations might like for the regulation to entail. The advantages of that approach might firstly come from AI corporations’ relevant expertise—which we discuss later -, or from their increased buy-in. But it seems unclear why exactly buy-in is required: There is substantial political and public appetite for sensible AI regulation, and by and large, whether private companies want to be regulated is usually not a factor in our democratic decision to regulate them. The downside of choosing an RSP-based legislative process should be obvious—it limits, or at least frames, the option space to the concepts and mechanisms provided by the AI corporations themselves. But this might be a harmful limitation: As we have argued above, these companies are incentivized to mainly provide mechanisms they might be able to evade, that might fit their idiosyncratic technical advantages, that might strengthen their market position, etc. RSP codification hence seems like a worse way to safe AI legislation than standard regulatory and legislative processes.
Additionally, earnest public discussion and advocacy for the codification of RSPs may give off the superficial impression to policymakers that corporate governance is adequately addressing safety concerns. All of these are reasons why even strictly profit-maximizing AI corporations might publish RSPs—it quells regulatory pressures and shifts and frames the policy debate in their favor. Hence, affording outsized attention to RSPs and conceiving of RSP codification as a promising legislative approach is unlikely to lead to particularly safe regulation. It might, however, incur a false sense of confidence, reduce political will to regulate, or shift the regulatory process in favor of industry. We should care much less about RSPs.
For-Profit Policy Work Is Called Corporate Lobbying
For-profit work on policy and governance is usually called corporate lobbying. In many other industries, corporate lobbying is an opposing corrective force to advocacy
Corporate lobbying output should be understood as constrained by business interests
Talent allocation and policy attention should be more skeptical of corporate lobbying.
Thirdly, AI corporations’ lobbying efforts currently have a strangely mixed standing in safety policy debates. Leading AI corporations employ large teams dedicated to policy(similar, by all accounts, to government affairs teams at other corporations) and governance (with less direct equivalent). On one hand, extensive lobbying efforts by AI corporations are often viewed critically, especially as they attempt to weaken regulatory constraints. On the other hand, these teams sometimes appear as allies of safety policy advocacy. Labs’ lobbying and governance teams frequently recruit from safety advocates, and entertain close relationships with the safety coalition. In fact, working for an AI corporation is often considered an advisable, desirable career step for non-profit safety advocates. This integration is highly unusual. In other industries, transfers from non-profit work to corporate lobbying happen, but they are usually not considered in service of the non-profits’ goal, and might at best be called green- (or safety-, or clean-,...) washing and at worst betrayal to the cause.
The existence of governance teams, concerned with developing ostensibly impartial ideas on how to govern their technology, is also not very common in other industries. This phenomenon may be best understood in light of the historical context. Given its status as a recently emerging technology with a long prehistory of more theoretical speculation around its capabilities, research labs became generalist hubs on the technology, as no one—beyond the labs and academia—had the technical knowledge of how to regulate it safely, or the political interest to even think about it much. But at least since the ChatGPT release followed by the surge of AI interest and investment, political institutions and civil society organizations are starting to catch up. Dedicated bodies, such as the EU AI Office and the US & UK AI Safety Institutes, have been established and equipped with leading talent, and there is a thriving new ecosystem of AI think tanks and advocacy groups.
It would be a mistake to burden the discussion around frontier AI with tribalistic rhetoric as is often present in regulatory debates of other industries. But its prevalence in virtually all of these other sectors points at which respective roles and relationships have been historically defined for corporate lobbyists, non-profit and academic advisors, and policy-makers. Forfeiting this adversarial dynamic points to a perhaps naive understanding of what profit-maximizing AI corporations will allow their policy and governance teams to do. It is simply implausible that labs would, in due time and following their increased commercialization, pay a substantial department to create policy-related outputs that do not directly further their policy goals. Again, the cynical perspective might be most informative: Profit-maximizing companies pay policy and governance teams to shape policy and governance in a way that maximizes their profits while avoiding researching and publishing any policy proposal that could potentially hinder their AI products and thereby its future financial success. This can be enforced on the executive level by directly vetoing any potent safety policy or—more indirectly—by cutting financial, organizational, and compute resources for safety research. On a personal level, employees might be inclined to practice anticipatory obedience and avoid going head-to-head with their employers, in turn protecting their salary, shares, reputation, and future impact in steering AI progress from the inside. Besides, working for a big AI corporation building a transformative technology like no other is of course exciting—even a cautious individual could at least be somewhat captivated by the thought of shaping rapid, utopian technological progress and therefore choose to remain on board. No one in these teams needs to be ill-spirited or self-servant for this mechanism to take hold. In fact, there might be a lot of value in creating policy that ensures progress and profit in an industry that promises as much economic and societal value as AI. We just claim that this value does not lie in causing first-order progress on making regulation more safe.
This leads to a dynamic wherein leading governance talent with high standing in non-profit and government AI policy institutions makes suggestions that serve business interests, but face much less scrutiny than corporate lobbying attempts would in other industries. This shapes the policy debate toward the interests of the few AI corporations that currently entertain ostensibly safety-focused governance and policy teams. Ultimately, this also might result in a misallocation of governance talent. Right now, working for the governance or policy team of a major lab remains a dream job for many ambitious, safety-minded individuals, further fueled by the social and cultural proximity between AI corporation employees and safety researchers and activists. This human capital could potentially be more effectively utilized in less constrained roles, like at governmental institutions or in research and advocacy roles.
This does not apply to safety-minded individuals conducting technical safety work at AI corporations. Much research progress on the technical level is incredibly beneficial to safety, and where it is, incentives of labs and safety advocates align—labs also want to build safe products. The misalignment only exists where policy, i.e. an external force that compels the labs, is concerned. So technical work remains valuable on any cynical understanding of lab incentives—and is also presumably sufficient to cash in on some of the benefits of having safety-minded employees at AI corporations, such as whistleblowing options and input on overall corporate culture.
Conclusion
The days of AI developers as twilight institutions between start-ups and research labs are at best numbered and at worst over. The economics of frontier AI development will render them profit-maximizing corporate entities. We believe this means that we should treat them as such: No matter their corporate structure, we should expect them to choose profits over long-term safety; no matter their governance guidelines, ensuring safety should be the realm of policy; and no matter the intentions of their governance teams, we should understand their policy work as corporate lobbying. If we do not, we might face some rough awakenings.
This is particularly relevant given the recent letter from Anthropic on SB-1047.
I would like to see a steelman of the letter since it appears to me to significantly undermine Anthropic’s entire raison d’etre (which I understood to be: “have a seat at the table by being one of the big players—use this power to advocate for safer AI policies”). And I haven’t yet heard anyone in the AI Safety community defending it.
I believe that Anthropic’s policy advocacy is (1) bad and (2) worse in private than in public.
But Dario and Jack Clark do publicly oppose strong regulation. See https://ailabwatch.org/resources/company-advocacy/#dario-on-in-good-company-podcast and https://ailabwatch.org/resources/company-advocacy/#jack-clark. So this letter isn’t surprising or a new betrayal — the issue is the preexisting antiregulation position, insofar as it’s unreasonable.
Can you say a bit more about:
?
A few DC and EU people tell me that in private, Anthropic (and others) are more unequivocally antiregulation than their public statements would suggest.
I’ve tried to get this on the record—person X says that Anthropic said Y at meeting Z, or just Y and Z—but my sources have declined.
I’ve heard similar things, as well as Anthropic throwing their weight as a “safety” company to try to unduly influence other safety-concerned actors.
Thank you for writing this! It’s probably the most clear and rigorous way I’ve seen these arguments presented, and I think a lot of the specific claims here are true and important to notice.
That being said, I want to offer some counterarguments, both for their own sake and to prompt discussion in case I’m missing something. I should probably add the disclaimer that I’m currently working at an organization advocating for stronger self-governance among AI companies, so I may have some pre-existing biases toward defending this strategy. But it also makes this question very relevant to me and I hope to learn something here.
Addressing particular sections:
This section is interesting and reminds me of some metaphors I’ve heard comparing the mechanism of free markets to Darwinism… i.e. you have to profit-maximize, and if you don’t, someone else will and they’ll take your place. It’s survival of the fittest, like it or not. Take this naïve metaphor seriously enough and you would expect most market ecosystems to be “red in tooth and claw,” with bare-minimum wages, rampant corner-cutting, nothing remotely resembling CSR/ESG, etc.
One problem is: I’m not sure how true this is to begin with. Plenty of large companies act in non-profit-maximizing ways simply out of human error, or passivity, or because the market isn’t perfectly competitive (maybe they and their nearest rivals are benefitting from entrenchment and economies of scale that mean they no longer have to), or perhaps most importantly, because they are all responding to non-financial incentives (such as the personal values of the people at the company) that their competitors are equally subject to.
But more convincingly, I think social good / avoiding dangerous accidents really are just more aligned with profit incentives than the metaphor would naively suggest. I know your piece acknowledges this, but you also write it off as having limitations, especially under race conditions aiming toward a particular capabilities threshold.
But that doesn’t totally follow to me — under such conditions, while you might be more open to high-variance, high-risk strategies to reach that threshold, you might also be more averse to those strategies since the costs (direct or reputational or otherwise) imposed by accidents before that threshold is reached become so much more salient. In the case of AI, the costs of a major misuse incident from an AI product (threatening investment/employee retention/regulatory scrutiny/etc.) might outweigh the benefits of moving quickly or without regard to safety — even when racing to a critical threshold. A lot of this probably depends on how far off you think such a capability threshold is, and where relative to the frontier you currently are. This is all to say that race dynamics might make high-variance high-risk strategies more attractive, but they also might make them less attractive, and the devil is probably in the details. I haven’t heard a good argument for how the AI case shakes out (and I’ve been thinking about it for a while).
Also, correct me if I’m wrong, but one thing the worldview you write about here would suggest is that we shouldn’t trust companies to fulfill their commitments to carbon neutrality, or that if they do, they will soon no longer be on the forefront of their industry — doing so is expensive, nobody is requiring it of them (at least not on the timeline they are committing to), the commitment is easy to abandon, and even if they do it, someone who chooses not to will outcompete them and take their place at the forefront of the market. But I just don’t really expect that to happen. I think in 2030 there’s a good chance Apple’s supply chain will be carbon-neutral, and that they’ll still be in the lead for consumer electronics (either because the reputational benefits of the choice, and the downstream effects it has on revenue and employee retention and whatnot, made it the profit-maximizing thing to do, and/or because they were sufficiently large/entrenched that they can just make choices like that due to non-financial personal/corporate values without damaging their competitive position, even when doing so isn’t maximally efficient.)
Early in the piece, you write:
But we can already prove this isn’t true given that OpenAI has a profit cap, their deal with Microsoft had a built-in expiration, and Anthropic is a B-corp. Even if you don’t trust that some of these measures will be ahered to (e.g. I believe the details on OpenAI’s profit cap quietly changed over time), they certainly do not give off the unmistakable impression of maximal profit seeking. But I think these facts exist because either (1) many of the people at these companies are thinking about social impact in addition to profit (2) social responsibility is an important intermediate step to being profitable, or (3) the companies are so entrenched that there simply are no alternative, extra profit-maximizing firms who can compete, i.e. they have the headroom to make concessions like this, much as Apple can make climate commitments. I’m not sure what the balance between these three explanations are, but #1 and #3 challenge the strong view that only seemingly hard-nosed profit-maximizers are going to win here, and #2 challenges the view that profit-maximizing is mutually exclusive with long-term safety efforts.
All this considered, my take here is instead something like “We should expect frontier AI companies to generally act in profit-maximizing ways, but we shouldn’t expect them to always be perfectly profit-maximizing across all dimensions, nor should we expect that profit-maximizing is always opposed to safety.”
I don’t have a major counterargument here, aside from the fact that well-documented and legally recognized corporate structures often can be pretty effective thanks in part to the fact that judges/regulators get input on when and how they can be changed, and while I’m no expert, my understanding is that there are ways to optimize for this.
But your idea that companies are exchangeable shells for what really matters under the hood — compute, data, algorithms, employees — seems very true and very underrated to me. I think of this as something like “realpolitik” for AI safety. What really matters, above ideology and figureheads and voluntary commitments, is where the actual power lies (which is also where the actual bottlenecks for developing AI are) and where that power wants to go.
The claim that “RSPs on their own can and will easily be discarded once they become inconvenient” seems far too strong to me — and again, if it were true, we should expect to see this with all costly voluntary safety/CSR measures that are made in other industries (which often isn’t the case).
A few things that may make non-binding voluntary commitments like RSPs hard to discard:
It’s really hard to abandon them without looking hypocritical and untrustworthy (to the public, to regulators, to employees, to corporate partners, etc.)
In large bureaucracies, lock-in effects make it easy to create new teams/procedures/practices/cultures and much harder to change them.
Abandoning these commitments can open companies up to liability for deceptive advertising, misleading shareholders, or even fraud if the safety practices were used to promote e.g. an AI product, convince investors that they are a good company to support, convince regulators or the public that they can be trusted, etc. I’m not an expert on this by any means, nor do I have specific examples to point to right now, so take this one with a grain of salt.
There’s also the fact that RSPs aren’t strictly an invention of the AI labs. Plenty of independent experts have been involved in developing and advocating for either RSPs or risk evaluation procedures that look like them.
Here, I think a more defensible claim would be “The fact that RSPs may be easily discarded when inconvenient should be a point in favor of binding solutions like legislation, or at least indicate that they should be considered one of many potentially fallible safeguards for a defense-in-depth strategy”
Minor factual point: probably worth noting that Meta, as well as most leading AI labs, have now committed to publish an RSP. Time will tell what their policy ends up looking like.
It’s true that the presence of, and quality of, RSPs at individual companies doesn’t seem to have translated to any public/political scrutiny yet. I’m optimistic this can change (it’s what I’m working on), or perhaps even will change by default once models reach a new level of capabilities that make catastrophic risks from AI an ever-more-salient issue among the public.
This is a question: my understanding is that the RSP model was specifically inspired by regulatory pathways from other industries, where voluntary measures like this got codified into what is now seen (in retrospect) as sensible policy. Is this true? I can’t remember where I heard it, and can’t find mention of it now, but if so, it seems like those past cases might be informative in terms of how successful we can expect the RSP codification strategy to be today.
That actually brings me to one last meta point that I want to make, which is that I am tempted to think that we are just in a weird situation where there are psychological facts about the people at leading profit-driven AI labs that make the heuristic of profit maximization a poor predictor of their behavior, and a lot of this comes down to genuine, non-financial concern about long-term safety.
Earlier I mentioned how even in a competitive market, you might see multiple corporations collectively acting in non-profit-maximizing ways due to non-financial incentives collectively acting upon the decision-makers at each those companies. Companies are full of humans who make choices for non-financial reasons, like wanting to feel like a good person, wanting to have a peaceful home life where their loved ones accept and admire them, and genuinely wanting to fix problems in the world. I think the current psychological profile of AI lab leaders (and, indeed, the AI lab employees that hold the “real power” under the hood) is surprisingly biased toward genuine concern about the risks of AI. Many of them correctly recognized, way before anyone else, how important this technology would be.
Sorry for the long comment. l do think AI labs need fierce scrutiny and binding constraints, and their incentives are largely not pointing in the right place and might bias them toward putting profit over safety — again, this is my main focus right now — but I’m also not ready to totally write off their ability to adopt genuinely valuable and productive voluntary measures to reduce AI risk.