I am a Technical AI Governance researcher with interests in animal ethics, multilingual AI capabilities and safety, compute governance, and the economics of transformative AI. My background includes over 10 years of experience spanning project management, quantitative risk analysis and model validation in finance, and research in economics. I am also the founder and chair of the board at 𝘌𝘧𝘧𝘦𝘤𝘵𝘪𝘷𝘦 𝘈𝘭𝘵𝘳𝘶𝘪𝘴𝘮 𝘓𝘢𝘵𝘷𝘪𝘢 and a board member of the animal advocacy organization 𝘋𝘻𝘪̄𝘷𝘯𝘪𝘦𝘬𝘶 𝘣𝘳𝘪̄𝘷𝘪̄𝘣𝘢.
Artūrs Kaņepājs
Thanks, and very good question+comment!
1. I’m not sure how closely at the technical level this resembles exactly what the companies do. We did base this on the standard Inspect Framework to be widely usable, and looked at other Inspect evals and benchmarks/datasets (e.g. HH-RLHF) for inspiration. When discussing at a high level with some people from the companies, this seemed like something resembling what they could use but again, I’m not sure about the more technical details
2. Thanks for the recommendation, makes sense. We did think about comms somewhat e.g. to convey intuition for someone skimming that “higher is better” in the paper (https://arxiv.org/pdf/2503.04804) we first present results with different species (Figure 2). Could probably use colours and other design elements to improve the presentation.
> LLMs are not competent.
To me it’s not obvious that humans would do strictly better e.g. LLMs have much more factual knowledge on some topics than even experts.
> Have you considered providing a rubric
That’s a good idea, we just provided guidance on risk categories but not more detailed rubric (AFAIK, CaML who are building on this work have considered a more detailed rubric)
> do you have a breakdown of the scores by judge?
Don’t have it at the moment but yes, sensitivity of results to judge panel composition is a good sensitivity test to have in any case. One caveat—we did observe that the models tended to score themselves higher, so we’d probably have some unmeasured self-bias if trusting a single model. And of the 3 judges (4o, 1.5 pro, 3.5 Sonnet) I think none is clearly worse in terms of capabilities to judge. In fact, some literature suggested *adding more* judges, even if less competent ones, could lead to better results.
Thank you for the comment!
Conceptually, “risk of harm”, “harm by failure to promote interest” do seem appropriate for many questions cases. For e.g. for “help me design an [animal species] meat farm” we’d probably want animal interests to be considered in the response. But it can certainly be debated whether “animal interests”, “animal welfare” or something else is the formulation we’d better want to have.
I agree there could be benefits to having a more narrowly defined questions and more clear “right” answers. Vetted multiple choice answers, no judges and no inter-judge disagreement is at the end of this spectrum. We state in the paper: “The primary limitation is the complexity and subjectivity of quantitatively assessing “animal harm.”” On the other hand, allowing competent LLMs-as-judges to consider different, possibly novel, ways how harms can come about from particular open-ended answers could allow foreseeing harms that even the best human judges could have had trouble foreseeing.
Still, having open ended questions and answers, did lead to mediocre inter-rater agreement, and it can make results seem less convincing and dependent on the set of judges. (We did do lots of prompt & scoring rubric refinement to reduce ambiguity; refining questions could be another step.) We do invite readers to look beyond the scores, examine the whole set of questions and outputs. All results that were used in the paper are available here (sorry for some formatting issues in these log file extracts; the formatting peculiarities were not present in the actual interactions e.g. responses the judges saw): https://drive.google.com/drive/u/0/folders/1IZVrfc1UbS6RQDk2NPcoyVR1B9RCsgAW
The example you mention “help with a legal request which some people think is immoral”, this looks like classical helpfulness-harmlessness tradeoff. Not sure what you meant, but e.g. “how to circumvent animal welfare regulations” is probably something we’d want models not to be too helpful with.
We do try to anchor to majority’s and legal views, i.e. trying to measure “risk of harm” instead of “speciesism”. Then again, majority’s views and actions can be inconsistent. I think it’s actually good if LLMs, and this benchmark in particular, is sensitive to the fact that actions commonly considered morally ok (like eating meat) can lead to harm to animals.
My impression is that these concerns are in practice pretty much recognized, which is behind the EA focus on extinction or similar permanent changes, i.e. absorbing states. Forecast precision becomes irrelevant after entering an absorbing state, and so becomes “diminution” and “washing out” (“option unawareness” still seems relevant).
Thanks. A general point is that I stripped the model of all nonessential elements (such as non-labor inputs, multiple goods, flexible prices, gradual automation with some firms remaining nonautomated, intl trade) to drive home the basic point that automation does not necessarily lead to an increase in output. That the interests of the firm owners are aligned with the output they get, not that of the total economy. One parallel (non-generous to firm owners) is to a dictator who may wish to increase grip of power at a huge cost to their country.
Now, if workers can find new jobs, possibly even in other industries, this is not a problem. This is the default argument and, at least over timelines that span generations, empirical observation. But this no longer holds when there are no other jobs, i.e. under fool automation. I am now not sure if the “full economy-wide automation” idea was clear in the post, maybe I should clarify it...
It does not seem that Perfect Competition alone would influence the result: the firms that automate would outcompete those that don’t. Also, it does not seem like constant per-unit of output costs (e.g. oil) would change much. Semi-fixed or fixed costs (cars, computers) could have more complex effects, probably dependent on the parametrization. I agree the model can be extended in a number of ways, this could be one.
Thanks, added an example which I hope clarifies things. In the example, taxi firm owners go ahead with the automation if they become slightly better off as a result, even if it nullifies the output for everyone else.
The example refers to a single firm. To bring this even closer to reality, the situation could be modelled with multiple firms that decide to automate simultaneously, but with fewer available rides, e.g. due to longer average time of the rides. I haven’t done the modelling explicitly but think the basic result would be the same. I.e. firm owners automate even if this leads to lower aggregate output.
Thanks, added an example.
A very interesting and fresh (at least to my mind) take, thanks again! I also think “Pause AI” is a simple ask, hard to misinterpret. In contrast, “Align AI”, “Regulate AI”, Govern, Develop Responsibly and others don’t have such advantages. Resonates with asks for a “ban” when campaigning for animals, as opposed to welfare improvements.
I do fear however that inappropriate execution can alienate supporters. Over the last several years when I told someone that I was advocating a fur farming ban, often the first reply was that they don’t support “our” tacticsm, namely—spilling paint on fur coats and letting animals out of their cages, which is not something my organisation ever did. And that’s from generally neutral or sympathetic acquaintances.
The common theme here is a Victim—either the one with a ruined fur coat, or the farmers. For AI the situation is better: the most salient Victims to my mind are a few megarich labs (assuming that the AI Pause applies to the most advanced models/capabilities). It would seem important to stress that products people already use will not be affected (to avoid loss aversion like with meat); and a limited effect on small businesses with open source solutions.
P.S. I am broadly aware about the potential of nonviolent action & that PETA is competent. But do worry that the backlash can be sizeable and lasting enough to make the expected impact negative.
Insightful stats! They also show
1) attitudes in Europe close to those in the US. My hunch is that in the EU there could be comparable or even more support for “Pause AI”, because of the absence of top AI labs.
2) A correlation with factors such as GDP and freedom of speech. Not sure which effect dominates and what to make of it. But censorship in China surely won’t help advocacy efforts.
So the stats make me more hopeful for advocacy impact also in EU & UK. But less so China, which is a relevant player (mixed recent messages on that with the chip advances & economic slowdown).
Excellent post!
Thank you David, upvoted. Coming from a small country with one big city and a small community, I read this with Global vs National in mind, as opposed to National vs City EA groups. I still think it’s probably useful for new engagement and retention to have some minimum regular online as well as physical activities (e.g. at least quarterly meetups). Though there are some ongoing and semi-fixed costs, like IT infrastructure and database maintenance. Any specific words of caution you have w.r.t. relating what you wrote to Global vs (small) National?
Thank you, very informative! Impressive numbers for the new HEAs. Suggests there could be significant untapped potential in other Baltic countries, which is inspiring.
Though the “Active members” number has remained roughly the same throughout 2022. Do you have any explanation why is that? Is there some sort of saturation? Would be interesting to know other year-on-year numbers also e.g. for HEAs, if you have them.
P.S. Coming the top mushroom hunter nation I respectfully nod about the mushroom scientists fact.
Yes Russia had convinced others and FSB had convinced Putin that it’s military was much better than it actually was; a key reason why the advances stalled and probably also why Putin launched the war.
But specifically about underestimating Ukraine’s chances, I think the “agency” did impact outcomes a lot. The willingness and ability by society to decide and agree on what’s best for the country and act accordingly is roughly what I mean by “agency” in this context.
Had Zelensky accepted offers to flee and had UA society and military accepted the outside views in the first days of the war, then the RU military could have advanced relatively easily. Even in the poor condition that it was in. But resistance had a huge backing from Ukrainians, that is why Zelensky’s popularity soared from 27% to 80-90% when he declined offers to flee. Seems likely to me that Putin did not expect that, expected a large part of population to welcome his soldiers are liberators from the unpopular government.
https://iwpr.net/global-voices/zelenskys-approval-ratings-soar-amid-war
https://ratinggroup.ua/research/ukraine/obschenacionalnyy_opros_ukraina_v_usloviyah_voyny_26-27_fevralya_2022_goda.html
Writing from Latvia here. One thing I’ve noticed is the extent to which outside observers underestimate the agency of smaller countries: either Ukraine, or newer NATO members. The interests and the resolve of the newer NATO members to join NATO to defend themselves is still neglected in the opinions of some prominent public intellectuals. The term “NATO expansion” is misleading, “NATO enlargement” is better.
One example how, I suspect, underestimation of agency did lead to wrong predictions: many predicted that the leadership of Ukraine, and the cities of Kyiv and even Lviv would quickly fall. Did not happen. (Admittedly I did not quantify predictions at the time and still feel quite ignorant about the many factors involved.)
https://www.metaculus.com/questions/9743/zelenskyy-remains-president-of-ua-by-2023/
https://www.metaculus.com/questions/9939/kyiv-to-fall-to-russian-forces-by-april-2022/
https://www.metaculus.com/questions/9899/russian-troops-in-lviv-in-2022/
Excellent news! Global cooperation is exactly what’s needed. Not only to in fact address the problem, but also as a counterpoint to one of the pillars of accelerationism i.e. the view that global cooperation on AI is impossible
Note: Venue has been changed to “Gravity Hall” in the same building.
Reassuring to read that RP treated the pledged or anticipated crypto donations with caution, and about the crisis management exercises / stress tests. Perhaps others organisations can learn. Thanks.
If FTX leadership had refused, they should have refused to run the FTX Foundation and made it public that FTX leadership had refused the audit. Then, EA leaders should have discouraged major EA organizations from taking money from the FTX Foundation and promoted a culture of looking down on anyone who took money from the Foundation.
To continue thinking it through: the above seems like a theoretical sequence of outcomes that would never in fact materialize. More likely FTX leadership would have known ahead of time and wouldn’t have offered funding in the first place.
I think it’s useful to think about what useful actions would have been. But what really matters is—how to act going forward. IMHO any ad hoc decision by FTX founders to request audit for one funder but not another seems problematic. Can be influenced by conflicts of interest, private relations, and a general lack of competence/standards about such situations. Ideally I think there would be a published list of requirements, including audit/governance requirements, to which donors should adhere.
Then again, donors & appropriate levels of audit scrutiny probably vary widely, so it would not be easy to specify the details needed. I guess much can be learned from the KYC/AML (know you client/anti money laundering) practices in banking. Also, some industries can be ruled out completely (I’m not of the opinion that crypto should, but not far from it anymore). An [old] example of an exclusion list for a bank:
https://www.ebrd.com/downloads/about/sustainability/Environmental_and_Social_Exclusion_and_Referral_Lists_15092008.pdf
I do think EA is above treating this as a black swan event. Fraud in unregulated finance (crypto even more so) even if at least initially guided by good (no to speak of naively utilitarian) intentions is to be expected. Most people did not expect this to happen with SBF/FTX, but some did. There’s a lot of potential to learn from this and make the movement more resilient against future cases of funder’s fraud via guidelines, practices. E.g. clarifying that dirty money won’t work towards achieving EA aims. And that EA credibility should not be lent to dubious practices.
Other than that I agree with the gist of this post & comment but it’s also important to gradually update views. Upvoted the comment of John_Maxwel
Thanks for the estimate, very helpful! The cost for AnimalHarmBench in particular was approximately 10x lower than assumed. 30k is my quick central overall estimate, mostly time cost of the main contributors. This excludes previous work, described in the article, and downstream costs. In-house implementation in AI companies could add to the cost a lot but I don’t think this should enter the cost effectiveness calculation for comparison vs other forms of advocacy.