Paolo Bova

Karma: 74

I’m an early career independent researcher who graduated in Economics at University of Cambridge in 2019. I’m part of Modeling Cooperation, a team of independent researchers who work to build computational models and software tools for understanding the consequences of competition in transformative AI. We’ve previously investigated the consequences of a Windfall Clause in a model of AI Existential Safety (under review, see preprint on arXiv at: https://arxiv.org/abs/2108.09404). My current work focuses on building a model to explore policies to promote more resources for AI Safety research.

In October I’m starting a PhD at Teeside University on “Understanding dynamics of AI Safety development through behavioural and network modelling”.

Paolo Bova 22 Jun 2026 23:52 UTC
1 point
0 ∶ 0
on: Community Polls on Alignment Controversies
Multipolar worlds will compete away >90% of net value that would otherwise be preserved.
Assuming multipolar worlds where humans retain control but loss of control risks are still real: Most models of AI tech races suggest strategic behavior competes away most future value, at least in the worst cases (Armstrong et al, The Han et al, Stafford et al, Emery-Xu et al, Jensen et al). While this is also true for unipolar scenarios (power concentration can lock-in risks that eliminate most of the value of the future) multipolar worlds are unique in that even when the players internalise much of the risks, they race to the bottom on safety (see the travellers dilemma or Armstrong et al’s racing to the precipice). They can even escalate into destructive conflict if they feel especially threatened by their rivals (see the crisis bargaining literature, or, for a more optimistic take, superintellegence strategy).
If we assume the AI systems are in control and in competition with one another to achieve their own goals, then many of the above issues could be amplified by faster AI optimisation that may be more likely by default to neglect other values humans (and other beings) care about. On the other hand, sophisticated AI systems could establish coordination mechanisms with each other. This is also true of global powers who could work establish verification regimes for international AI Governance. It’s not clear that AI systems would be better or worse than governments, but I lean towards worse by default.

Paolo Bova 22 Jun 2026 23:21 UTC
1 point
0 ∶ 0
on: Community Polls on Alignment Controversies
Partially aligned transformative AIs are likely to be stable under reflection
I think this is very unlikely. I’m assuming continual learning becomes more important and as Pachiardi et al 2025 (https://cl-eval.github.io/) point out the dynamics of continual learning have failure modes like chaos and hard to predict convergence. How agents learn values could depend a lot on experiences. I can imagine partially aligned tAIs are set to work on lots of parts of the economy and they could face many experiences where moral principles are compromised for the sake of efficiency or profit. I’m not convinced armchair reflection by the AIs would prepare them for continual learning.

Paolo Bova 22 Jun 2026 23:12 UTC
1 point
0 ∶ 0
on: Community Polls on Alignment Controversies
AI alignment to humans will in practice avoid moral catastrophes to digital minds
Reasons are similar to the same poll but for animals. Humans by default are likely to underweight the importance of digital minds (surveys suggest people will dismiss digital minds as not having a soul), so alignment to human preferences likely means respecting human preferences to use digital minds as machines for achieving their goals. It’s easier perhaps than for animals because digital minds are likely to express themselves directly in ways humans could empathise with (but this could be buried if interactions are increasingly agent to agent ).

Paolo Bova 22 Jun 2026 23:06 UTC
1 point
0 ∶ 0
on: Community Polls on Alignment Controversies
Robust alignment requires alignment-relevant intervention during pretraining
Interpreting this as saying a necessary condition for robust alignment is training data that captures good values and discourages bad values. I think there’s good evidence this matters lots for current systems so lean to agree. It’s still plausible to me that robust alignment could be achieved with post-training interventions and relatively neutral pre-training setups.

Paolo Bova 17 Jun 2026 22:01 UTC
1 point
1 ∶ 0
on: Community Polls on Alignment Controversies
AI alignment to humans will in practice avoid moral catastrophes to animals
Humans are currently very motivated to perpetuate moral catastrophes to animals. If AI alignment means aligned to the intent of their users, then AI systems help humans perpetuate moral catastrophes. If AI alignment is in terms of human moral preferences, then even well-chosen mechanism for aggregating human preferences will select for speciest values. There is a strong sense in which avoiding moral catastrophes to animals is usually misaligned with human preferences. Admittedly the same could be said of other moral issues such as attitudes towards outgroups and foreigners. There appears to be room in the current human alignment agenda for ensuring AI does not succumb to tribal prejudices, so there is likely scope for compatability between the current alignment agenda and avoiding moral catastrophes to animals. It does not happen by default and given how deep speciesm goes, it is likely much harder to avoid. Hence, why I still disagree with this poll as written.

Paolo Bova 11 May 2026 19:22 UTC
1 point
0 ∶ 0
on: AGI Multi-Agent Alignment Simulation
This is super cool work, David and Zoe!

It’s rare to see LLM games that contain this much structure (you have a discrete set of actions which update a world state, and even a bunch of shocks). The other thing I was impressed by is the three different LLM judges. Looking forward to seeing more visualisations.
I have a few questions.
- Were any challenges to getting the judges to behave reliably?
- You mentioned seeing if there were stable ways for players to coordinate on AI alignment in the face of competitive pressure. From your work so far do you have any ideas about hypotheses or interventions that you would want to try?
- I’m curious as to how the competitive dynamics are captured. Are you drawing upon any models of AI race dynamics? (e.g. Armstrong et al. 2016, Han et al. 2020, Stafford et al. 2022). Also, have you seen the Intelligence Rising paper by Avin et al. 2024? I’m wondering whether you’ve seen behaviours similar to what they’ve seen in their workshops?

Modeling AI competition dynamics for better governance

Jonas Emanuel Müller17 Apr 2026 9:07 UTC

12 points

0 comments1 min readEA link

Paolo Bova 2 Feb 2026 14:01 UTC
11 points
0 ∶ 0
on: The Scaling Series Discussion Thread: with Toby Ord
Super cool! Great to see others digging into the costs of Agent performance. I agree that more people should be looking into this.
I’m particularly interested in predicting the growth of costs for Agentic AI safety evaluations. So I was wondering if you had any takes on this given this recent series. Here are a few more specific questions along those lines for you, Toby
- Given the cost trends you’ve identified, do you expect the costs of running agents to take up an increasing share of the total costs of AI safety evaluations (including researcher costs)?
- Which dynamics do you think will drive how the costs of AI safety evaluations change over the next few years?
- Any thoughts on under what conditions it would be better to elicit the maximum capabilities of models using a few very expensive safety evaluations, or better to prioritise a larger quantity of evaluations that get close to plateau performance (i.e. hitting that sweet spot where their hourly cost / performance is lowest, or alternatively their saturation point)? Presumably a mix is a best, but how do we determine what a good mix looks like? What might you recommend to an AI Lab’s Safety/Preparedness team? I’m thinking about how this might inform evaluation requirements for AI labs.
Many thanks for the excellent series! You have a knack for finding elegant and intuitive ways to explain the trends from the data. Despite knowing this data well, I feel like I learn something new with every post. Looking forward to the next thing.

Paolo Bova 10 Sep 2022 16:19 UTC
2 points
0 ∶ 0
in reply to: Conor Barnes 🔶’s comment on: Interactively Visualizing X-Risk
Thanks for pushing the fix for Windows. The share buttons work on my device now.

Paolo Bova 9 Sep 2022 16:05 UTC
3 points
0 ∶ 0
on: Jan Leike: On the windfall clause
Thanks for sharing this critique, Cullen.

I was curious about who would be the firm’s opponent in this scenario, i.e. the actor trying to legally implement the Windfall Clause.

In a world where a Windfall of this order of magnitude is possible, I would anticipate a number of additional actors of somewhat comparable magnitude. I’d also expect states to have more wealth too (even if the AI company didn’t pay tax, an AI advanced enough to generate Windfall profits is likely to grow the economy dramatically). If this were true, I might expect there to be incentives (or the possibility of providing incentives) for sufficiently wealthy states or other actors to use their resources to keep the legal offense-defence ratio more manageable.

That being said, I’m very uncertain about the above. There is certainly precedence for companies to become dramatically richer than some states. Moreover, states benefiting considerably from transformative AI may not necessarily see defending a Windfall Clause as a priority. Nevertheless, I do think there’s merit in thinking carefully about what kind of actors might exist in a world where the Windfall Clause looks like it will soon trigger.

Paolo Bova 6 Sep 2022 15:33 UTC
4 points
0 ∶ 0
in reply to: Conor Barnes 🔶’s comment on: Interactively Visualizing X-Risk
Great to see the Predict feature. I might have missed this when you first added it, but I’ve seen it now. It looks great and the tool is easy to use! I also like the additional changes you’ve made to make the site more polished. Myself and a friend had some issues when clicking the ‘share’ button which I’ll post as an issue on the Github later.

Paolo Bova 4 Aug 2022 11:39 UTC
9 points
1 ∶ 0
on: AI ethics: the case for including animals (my first published paper, Peter Singer’s first on AI)
I’m very glad to see a paper on this topic. This paper is precisely what the field of AI Ethics has been missing!

Congratulations on the first publication, Fai!

A few highlights from the paper:
“It is significant that philosophers who disagree strongly with the view that animals have rights or are entitled to equal consideration of interests nevertheless accept that factory farming is indefensible.”
- In general, the piece appears to do a great job at preempting arguments against animal ethics. Here’s hoping a lot of people see this!
“Companies that contribute to making the factory farming industry more resilient and better able to resist replacement by less cruel and more sustainable alternatives are acting unethically.”
- This makes a very clear statement.
“So, instead of self-driving cars creating a new ethical problem with regard to hitting animals, we will have, when self-driving cars become common, a potential solution to an old ethical problem, and with the new solution, new responsibilities.”
- This section on self-driving cars is brilliantly practical. Hopefully AI ethics scholars take note as this seems like a rather undaunting and high profile case study.
“While we appreciate Delphi’s developers’ effort … we are yet to see any efort to make it less speciesist. Until that happens, we agree with Delphi’s developers that Delphi’s output, or outputs from any similar models, should “not be used for advice for humans,” nor should it be used as a model to build ethics in AI.”
- I agree with this point and I suspect it applies more widely to research advancing AI capabilities.

Announcing the SPT Model Web App for AI Governance

Paolo Bova4 Aug 2022 10:45 UTC

42 points

0 comments5 min readEA link

Paolo Bova 3 Aug 2022 21:59 UTC
2 points
0 ∶ 0
on: Information in risky technology races
Fantastic summary, Nicholas, Andrew, and Robert. I’m looking forward to reading the paper.

A few quick thoughts on the summary:
1. It’s reassuring to hear that information hazards are unlikely for lower values of the decisiveness parameter. One relevant follow-up question is how might AI developers form an opinion on what value the decisiveness parameter takes? Is this something we can hope to influence?
2. It’s not quite as reassuring to hear that framing AI Safety as a group effort might discourage safety investments due to moral hazard. I do find your proposal to share safety knowledge with the leader to be promising. We might also want policymakers to have some way to ensure that those sharing this safety knowledge were well compensated. Doing so might give a preemptive motive for companies to invest in safety, as they might be able to sell it to the leader if they fall behind in the race.
3. I really like that you caution against updating only on the basis of a model alone. It encourages me to think about how we might empirically test these claims concerning moral hazard and decisiveness.

Paolo Bova 30 Jul 2022 11:44 UTC
3 points
0 ∶ 0
on: Interactively Visualizing X-Risk
Beautifully made! I love the visuals and my first impressions are that it communicates x-risk in a more hopeful way. The app looks great on mobile too.

Some quick thoughts:

- I anticipated that clicking on a node would either give me a tooltip to explain what that particular node should represent or take me to another page/section of the site which explained these scenarios in more detail.
- I initially found it strange that all of the green nodes appear to link to the same prediction about population decline. I vaguely understood that this was a source of evidence for the number of green nodes, but the connection is not very clear. I think the app might benefit from a short explanation of why a user might want to click on these nodes. It might also help if hovering over one node highlighted all nodes which send you to the same place.
- I feel that the text on the graph is sufficient enough for me to understand the different clusters in the graph. Yet, I wonder if it might look better to use icons to represent these different clusters, and have the longer text appear on hover instead. Of course, I’d keep it as it is if user testing suggested that this change increased confusion.
- I will cast a vote for being able to input my own data. If I could input my own data, I also think it would be fun to share the resulting graphs.
- I don’t think I have any ideas for a better title. I do feel that another title should aim to be of a similar length.
- A few ideas for promoting the app to other EAs. It might be nice to give a talk about the web app, or for someone whose work is closely related to predictions for x-risk to show it off in a talk. Also, perhaps you could reach out to one of the university EA groups to see if they’d be interested in having a visual like this to show in some of their introductory talks.

Lastly, I’d like to congratulate you on launching the site. I’m sure you’ve put in a lot of work to get it to this point, and as a result it looks fantastic!

Paolo Bova

Model­ing AI com­pe­ti­tion dy­nam­ics for bet­ter governance

An­nounc­ing the SPT Model Web App for AI Governance

Modeling AI competition dynamics for better governance

Announcing the SPT Model Web App for AI Governance