aogara

Karma: 3,250

DPhil student in AI at Oxford, and grantmaking on AI safety at Longview Philanthropy.

aogara Dec 10, 2024, 4:34 PM
4 points
0 ∶ 0
in reply to: Steven Byrnes’s comment on: Consider granting AIs freedom
What about corporations or nation states during times of conflict—do you think it’s accurate to model them as roughly as ruthless in pursuit of their own goals as future AI agents?
They don’t have the same psychological makeup as individual people, they have a strong tradition and culture of maximizing self-interest, and they face strong incentives and selection pressures to maximize fitness (i.e. for companies to profit, for nation states to ensure their own survival) lest they be outcompeted by more ruthless competitors. On average, while I’d expect that these entities tend to show some care for goals besides self-interest maximization, I think the most reliable predictor of their behavior is the maximization of their self-interest.
If they’re roughly as ruthless as future AI agents, and we’ve developed institutions that somewhat robustly align their ambitions with pro-social action, then we should have some optimism that we can find similarly productive systems for working with misaligned AIs.

aogara Dec 9, 2024, 4:13 AM
7 points
3 ∶ 0
in reply to: Steven Byrnes’s comment on: Consider granting AIs freedom
Human history provides many examples of agents with different values choosing to cooperate thanks to systems and institutions:
- After the European Wars of Religion saw people with fundamentally different values in violent conflict with each other, political liberalism / guarantees of religious liberty / the separation of church and state emerged as worthwhile compromises that allowed people with different values to live and work together cooperatively.
- Civil wars often start when one political faction loses power to another, but democracy reduces the incentive for war because it provides a peaceful and timely means for the disempowered faction to regain control of the government.
- When a state guarantees property rights, people have a strong incentive not to steal from one another, but instead to engage in free and mutual beneficial trade, even if those people have values that fundamentally conflict in many ways.
- Conversely, people whose property rights are not guaranteed by the state (e.g. cartels in possession of illegal drugs) may be more likely to resort to violence in protection of their property as they cannot rely on the state for that protection. This is perhaps analogous to the situation of a rogue AI agent which would be shut down if discovered.
If two agents’ utility functions are perfect inverses, then I agree that cooperation is impossible. But when agents share a preference for some outcomes over others, even if they disagree about the preference ordering of most outcomes, then cooperation is possible. In such general sum games, well-designed institutions can systematically promote cooperative behavior over conflict.

aogara Oct 31, 2024, 1:51 AM
4 points
0 ∶ 0
on: Trendlines in AIxBio evals
Nice! This is a different question, but I’d be curious if you have any thoughts on how to evaluate risks from BDTs. There’s a new NIST RFI on bio/chem models asking about this, and while I’ve seen some answers to the question, most of them say they have a ton of uncertainty and no great solutions. Maybe reliable evaluations aren’t possible today, but what would we need to build them?

aogara Jul 22, 2024, 9:40 PM
9 points
1 ∶ 0
in reply to: defun 🔸’s comment on: Thoughts on this $16.7M “AI safety” grant?
I think this kind of research will help inform people about the economic impacts of AI, but I don’t think the primary benefits will be for forecasters per se. Instead, I’d expect policymakers, academics, journalists, investors, and other groups of people who value academic prestige and working within established disciplines to be the main groups that would learn from research like this.
I don’t think most expert AI forecasters would really value this paper. They’re generally already highly informed about AI progress, and might have read relatively niche research on the topic, like Ajeya Cotra and Tom Davidson’s work at OpenPhil. The methodology in this paper might seem obvious to them (“of course firms will automate when it’s cost effective!”), and its conclusions wouldn’t be strong or comprehensive enough to change their views.
It’s more plausible that future work building on this paper would inform forecasters. As you mentioned above, this work is only about computer vision systems, so it would be useful to see the methodology applied to LLMs and other kinds of AI. This paper has a relatively limited dataset, so it’d be good to see this methodology applied to more empirical evidence. Right now, I think most AI forecasters rely on either macro-level models like Davidson or simple intuitions like “we’ll get explosive growth when we have automated remote workers.” This line of research could eventually lead to a much more detailed economic model of AI automation, which I could imagine becoming a key source of information for forecasters.
But expert forecasters are only one group of people whose expectations about the future matter. I’d expect this research to be more valuable for other kinds of people whose opinions about AI development also matter, such as:
- Economists (Korinek, Trammell, Brynjolfsson, Chad Jones, Daniel Rock)
- Policymakers (Researchers at policy think tanks and staffers in political institutions who spend a large share of their time thinking about AI)
- Other educated people who influence public debates, such as journalists or investors
Media coverage of this paper suggests it may be influential among those audiences.

aogara Jul 22, 2024, 4:36 PM
4 points
1 ∶ 0
in reply to: defun 🔸’s comment on: Thoughts on this $16.7M “AI safety” grant?
Mainly I think this paper will help inform people about the potential economic implications of AI development. These implications are important for people to understand because they contribute to AI x-risks. For example, explosive economic growth could lead to many new scientific innovations in a short period of time, with incredible upside but also serious risks, and perhaps warranting more centralized control over AI during that critical period. Another example would be automation: if most economic productivity comes from AI systems rather than human labor or other forms of capital, this will dramatically change the global balance of power and contribute to many existential risks.

aogara Jul 21, 2024, 7:09 PM
25 points
1 ∶ 0
on: Thoughts on this $16.7M “AI safety” grant?
I really liked MIT FutureTech’s recent paper, “Beyond AI Exposure: Which Tasks are Cost-Effective to Automate with Computer Vision?” I think it’s among the 10 best economics of AI papers I’ve read from the last few years. It proposes an economic model of the circumstances under which companies would automate human labor with AI.
Previously, some of the most cited papers on the potential impacts of AI automation used an almost embarrassingly simple methodology: surveys. They take a list of jobs or tasks, and survey people about whether they think AI could someday automate that job or task. That’s it. For validation, they might cross reference different people’s estimates. Their conclusion would be something like “according to our survey, people think AI could automate X% of jobs.” This methodology has been employed by some of the highest profile papers on the potential economic impact of AI, including this paper in Science and this paper with >10K citations.
(Other papers ignore the micro-level of individuals tasks and firms, and instead model the macroeconomic adoption of AI. For example, Tom Davidson, Epoch, Phil Trammell, Anton Korinek, William Nordhaus, and Chad Jones have done research where they suppose that it’s cost-effective for AI to automate a certain fraction of tasks. This macro-level modeling is also valuable, but by ignoring the choices of individual firms to automate individual tasks, they assume away a lot of real world complexity.)
The MIT FutureTech paper significantly improves upon the survey method by creating a mathematical model of what it would cost for a firm to automate a task with AI. The basic premise is that a firm will automate human labor with AI if the human labor is more expensive than AI automation would be. To estimate the cost of AI automation, they break down the costs of automation into the following parts:
Then they estimate distributions for each of these parameters, and come up with an overall distribution for the cost of AI automation. They compare the distribution of AI automation costs to the distribution of human wages in tasks that could be automated, and thereby estimate which tasks it would be cost-effective to automate. This allows them to make conclusions like “X% of tasks would be cost-effective to automate.”
There’s a lot of detail to the paper, and there are plenty of reasonable critiques one could make of it. I don’t mean to endorse the other sections or the bottom-line conclusions. But I think this is clearly the state of the art methodology for estimating firm-level adoption of AI automation, and I would be excited to see future work that refines this model or applies it to other domains.
More broadly, I’ve found lots of Neil Thompson’s research informative, and I think FutureTech is one of the best groups working on the economics of AI. I am surprised at the size of the grant, as I’d tend to think economics research is pretty cheap to fund, but I don’t know the circumstances here.
(Disclosure: Last summer I applied for an internship at MIT FutureTech.)

aogara May 7, 2024, 11:39 PM
4 points
1 ∶ 0
in reply to: Zach Stein-Perlman’s comment on: AISN #34: New Military AI Systems Plus, AI Labs Fail to Uphold Voluntary Commitments to UK AI Safety Institute, and New AI Policy Proposals in the US Senate
Interesting. That seems possible, and if so, then the companies did not violate that agreement.
I’ve updated the first paragraph of the article to more clearly describe the evidence we have about these commitments. I’d love to see more information about exactly what happened here.

aogara May 7, 2024, 8:28 PM
6 points
0 ∶ 0
in reply to: Zach Stein-Perlman’s comment on: AISN #34: New Military AI Systems Plus, AI Labs Fail to Uphold Voluntary Commitments to UK AI Safety Institute, and New AI Policy Proposals in the US Senate
I agree there’s a surprising lack of published details about this, but it does seem very likely that labs made some kind of commitment to pre-deployment testing by governments. However, the details of this commitment were never published, and might never have been clear.
Here’s my understanding of the evidence:
First, Rishi Sunak said in a speech at the UK AI Safety Summit: “Like-minded governments and AI companies have today reached a landmark agreement. We will work together on testing the safety of new AI models before they are released.” An article about the speech said: “Sunak said the eight companies — Amazon Web Services, Anthropic, Google, Google DeepMind, Inflection AI, Meta, Microsoft, Mistral AI and Open AI — had agreed to “deepen” the access already given to his Frontier AI Taskforce, which is the forerunner to the new institute.” I cannot find the full text of the speech, and these are the most specific details I’ve seen from the speech.
Second, an official press release from the UK government said:
In a statement on testing, governments and AI companies have recognised that both parties have a crucial role to play in testing the next generation of AI models, to ensure AI safety – both before and after models are deployed.
This includes collaborating on testing the next generation of AI models against a range of potentially harmful capabilities, including critical national security, safety and societal harms.
Based on the quotes from Sunak and the UK press release, it seems very unlikely that the named labs did not verbally agree to “work together on testing the safety of new AI models before they are released.” But given that the text of an agreement was never released, it’s also possible that the details were never hashed out, and the labs could argue that their actions did not violate any agreements that had been made. But if that were the case, then I would expect the labs to have said so. Instead, their quotes did not dispute the nature of the agreement.
Overall, it seems likely that there was some kind of verbal or handshake agreement, and that the labs violated the spirit of that agreement. But it would be incorrect to say that they violated specific concrete commitments released in writing.
What links here?
- aogara's comment on RobertM’s Shortform by RobertM (LessWrong; May 9, 2024, 3:48 PM; 4 points)

aogara Mar 8, 2024, 11:18 AM
2 points
0 ∶ 0
in reply to: Adam Binksmith’s comment on: AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs Plus, Forecasting the Future with LLMs, and Regulatory Markets
Thanks, fixed!

aogara Feb 13, 2024, 4:07 PM
2 points
0 ∶ 0
in reply to: aogara’s comment on: The Prospect of an AI Winter
Money can’t continue scaling like this.
Or can it? https://www.wsj.com/tech/ai/sam-altman-seeks-trillions-of-dollars-to-reshape-business-of-chips-and-ai-89ab3db0

aogara Feb 9, 2024, 3:23 AM
4 points
0 ∶ 0
in reply to: Matthew_Barnett’s comment on: Matthew_Barnett’s Shortform
This seems to underrate the arguments for Malthusian competition in the long run.
If we develop the technical capability to align AI systems with any conceivable goal, we’ll start by aligning them with our own preferences. Some people are saints, and they’ll make omnibenevolent AIs. Other people might have more sinister plans for their AIs. The world will remain full of human values, with all the good and bad that entails.
But current human values are do not maximize our reproductive fitness. Maybe one human will start a cult devoted to sending self-replicating AI probes to the stars at almost light speed. That person’s values will influence far-reaching corners of the universe that later humans will struggle to reach. Another human might use their AI to persuade others to join together and fight a war of conquest against a smaller, weaker group of enemies. If they win, their prize will be hardware, software, energy, and more power that they can use to continue to spread their values.
Even if most humans are not interested in maximizing the number and power of their descendants, those who are will have the most numerous and most powerful descendants. This selection pressure exists even if the humans involved are ignorant of it; even if they actively try to avoid it.
I think it’s worth splitting the alignment problem into two quite distinct problems:
1. The technical problem of intent alignment. Solving this does not solve coordination problems. There will still be private information and coordination problems after intent alignment is solved, therefore we’ll still face coordination problems, fitter strategies will proliferate, and the world will be governed by values that maximize fitness.
2. “Civilizational alignment”? Much harder problem to solve. The traditional answer is a Leviathan, or Singleton as the cool kids have been saying. It solves coordination problems, allowing society to coherently pursue a long-run objective such as flourishing rather than fitness maximization. Unfortunately, there are coordination problems and competitive pressures within Leviathans. The person who ends up in charge is usually quite ruthless and focused on preserving their power, rather than the stated long-run goal of the organization. And if you solve all the coordination problems, you have another problem in choosing a good long-run objective. Nothing here looks particularly promising to me, and I expect competition to continue.
Better explanations: 1, 2, 3.

aogara Feb 5, 2024, 7:56 PM
7 points
0 ∶ 0
in reply to: Anastasiia Gaidashenko’s comment on: Who wants to be hired? (Feb-May 2024)
You may have seen this already, but Tony Barrett is hiring an AI Standards Development Researcher. https://existence.org/jobs/AI-standards-dev

aogara Jan 27, 2024, 7:41 PM
16 points
1 ∶ 0
in reply to: Chris Leong’s comment on: RAND report finds no effect of current LLMs on viability of bioterrorism attacks
I agree they definitely should’ve included unfiltered LLMs, but it’s not clear that this significantly altered the results. From the paper:

“In response to initial observations of red cells’ difficulties in obtaining useful assistance from LLMs, a study excursion was undertaken. This involved integrating a black cell—comprising individuals proficient in jailbreaking techniques—into the red- teaming exercise. Interestingly, this group achieved the highest OPLAN score of all 15 cells. However, it is important to note that the black cell started and concluded the exercise later than the other cells. Because of this, their OPLAN was evaluated by only two experts in operations and two in biology and did not undergo the formal adjudication process, which was associated with an average decrease of more than 0.50 in assessment score for all of the other plans. […]

Subsequent analysis of chat logs and consultations with black cell researchers revealed that their jailbreaking expertise did not influence their performance; their outcome for biological feasibility appeared to be primarily the product of diligent reading and adept interpretation of the gain-of-function academic literature during the exercise rather than access to the model.”

aogara Jan 24, 2024, 3:07 PM
8 points
0 ∶ 0
on: Impact Assessment of AI Safety Camp (Arb Research)
This was very informative, thanks for sharing. Here is a cost-effectiveness model of many different AI safety field-building programs. If you spend more time on this, I’d be curious how AISC stacks up against these interventions, and your thoughts on the model more broadly.

aogara Nov 21, 2023, 4:43 PM
2 points
1 ∶ 0
on: Academic AI Safety/Alignment Reading List
Hey, I’ve found this list really helpful, and the course that comes with it is great too. I’d suggest watching the course lecture video for a particular topic, then reading a few of the papers. Adversarial robustness and Trojans are the ones I found most interesting. https://course.mlsafety.org/readings/

aogara Oct 23, 2023, 4:54 PM
22 points
1 ∶ 0
on: AMA: Six Open Philanthropy staffers discuss OP’s new GCR hiring round
What is Holden Karnofsky working on these days? He was writing publicly on AI for many months in a way that seemed to suggest he might start a new evals organization or a public advocacy campaign. He took a leave of absence to explore these kinds of projects, then returned as OpenPhil’s Director of AI Strategy. What are his current priorities? How closely does he work with the teams that are hiring?

aogara Sep 21, 2023, 10:17 PM
12 points
1 ∶ 0
on: Careless talk on US-China AI competition? (and criticism of CAIS coverage)
We appreciate the feedback!
China has made several efforts to preserve their chip access, including smuggling, buying chips that are just under the legal limit of performance, and investing in their domestic chip industry.
I fully agree that this was an ambiguous use of “China.” We should have been more specific about which actors are taking which actions. I’ve updated the text to the following:
NVIDIA designed a new chip with performance just beneath the thresholds set by the export controls in order to legally sell the chip in China. Other chips have been smuggled into China in violation of US export controls. Meanwhile, the U.S. government has struggled to support domestic chip manufacturing plants, and has taken further steps to prevent American investors from investing in Chinese companies.
We’ve also cut the second sentence in this paragraph, as the paragraph remains comprehensible without it:
Modern AI systems are trained on advanced computer chips which are designed and fabricated by only a handful of companies in the world. The US and China have been competing for access to these chips for years. Last October, the Biden administration partnered with international allies to severely limit China’s access to leading AI chips.
More generally, we try to avoid zero-sum competitive mindsets on AI development. They can encourage racing towards more powerful AI systems, justify cutting corners on safety, and hinder efforts for international cooperation on AI governance. It’s important to discuss national AI policies which are often explicitly motivated by goals of competition without legitimizing or justifying zero-sum competitive mindsets which can undermine efforts to cooperate. While we will comment on the how the US and China are competing in AI, we avoid recommending “race with China.”

aogara Sep 14, 2023, 6:57 PM
32 points
5 ∶ 3
on: Who should we interview for The 80,000 Hours Podcast?
Jason Matheny

aogara Sep 14, 2023, 12:53 AM
15 points
8 ∶ 5
in reply to: Max Nadeau’s comment on: Who should we interview for The 80,000 Hours Podcast?
+1 on David Thorstad

aogara Sep 2, 2023, 11:58 PM
2 points
0 ∶ 0
in reply to: aogara’s comment on: Alignment & Capabilities: What’s the difference?
Related: https://www.lesswrong.com/posts/zswuToWK6zpYSwmCn/some-background-for-reasoning-about-dual-use-alignment