I’m Aaron, I’ve done Uni group organizing at the Claremont Colleges for a bit. Current cause prioritization is AI Alignment.
Aaron_Scher
Language models have been growing more capable even faster. But with them there is something very special about the human range of abilities, because that is the level of all the text they are trained on.
This sounds like a hypothesis that makes predictions we can go check. Did you have any particular evidence in mind? This and this come to mind, but there is plenty of other relevant stuff, and many experiments that could be quickly done for specific domains/settings.
Note that you say “something very special” whereas my comment is actually about a stronger claim like “AI performance is likely to plateau around human level because that’s where the data is”. I don’t dispute that there’s something special here, but I think the empirical evidence about plateauing — that I’m aware of — does not strongly support that hypothesis.
We estimate that
Point of clarification, it seems like FutureSearch is largely powered by calls to AI models. When you say “we”, what do you mean? Has a human checked the entire reasoning process that led to the results you present here?
My understanding of your main claim: If AGI is not a magic problem-solving oracle and is instead limited by needing to be unhobbled and integrated with complex infrastructure, it will be relatively safe for model weights to be available to foreign adversaries. Or at least key national security decision makers will believe that’s the case.
Please correct me if I’m wrong. My thoughts on the above:
Where is this relative safety coming from? Is it from expecting that adversaries aren’t going to be able to figure out how to do unhobbling or steal the necessary secrets to do unhobbling? Is it from expecting the unhobbling and building infrastrucure around AIs to be a really hard endeavor?
The way I’m viewing this picture, AI that can integrate all across the economy, even if that takes substantial effort, is a major threat to global stability and US dominance.
I guess you can think about the AI-for-productive-purposes supply chain as having two components: Develop the powerful AI model (Initial development), and unhobble it / integrate it in workflows / etc. (Unhobbling/Integration). And you’re arguing that the second of these will be an acceptable place to focus restrictions. My intuition says we will want restrictions on both, but more on the part that is most expensive or excludable (e.g., AI chips being concentrated is a point for initial development). It’s not clear to me what the cost of both supply chain steps is: Currently, it looks like pre-training costs are higher than fine-tuning costs (point for initial development); but actually integrating AIs across the economy seems very expensive to do, the economy is really big (point for unhobbling/integration) (this depends a lot on the systems at the time and how easy they are to work with).
Are you all interested in making content or doing altruism-focused work about AI or AI Safety?
I’ll toss out that a lot of folks in the Effective Altruism-adjacent sphere are involved in efforts to make future AI systems safe and beneficial for humanity. If you all are interested in producing content or making a difference around artificial intelligence or AI Safety, there are plenty of people who would be happy to help you e.g., better understand the key ideas, how to convey them, understand funding gaps in the ecosystem, etc. I, for one, would be happy to help with this — I think mitigating extinction risks from advanced AI systems is one of the best opportunities to improve the world, although it’s quite different from standard philanthropy. PS I was subscribed to Jimmy back at ~10k :)
Poor people are generally bad at managing their own affairs and need external guidance
That seems like a particularly cynical way of describing this argument. Another description might be: Individuals are on average fine at identifying ways to improve their lives, and if you think life improvements are heavy tailed, this implies that individual will perform much less well than experts who aim to find the positive tail interventions.
Here’s a similar situation: A high school student is given 2 hours with no distractions and told they should study for a test. How do you think their study method of choice would compare to if a professional tutor designs a studying curriculum for them to follow? My guess is that the tutor designed curriculum is somewhere between 20% and 200% better, depending on the student. Now that’s still really far from 719x, but I think it’s fine for building the intuition. I wouldn’t necessarily say the student is “bad at managing their own affairs”, in fact they might be solidly average for students, but I would say they’re not an expert at studying, and like other domains, studying benefits from expertise.
Thanks for writing this. I agree that this makes me nervous. Various thoughts:
I think I’ve slowly come to believe something like, ‘sufficiently smart people can convince themselves that arbitrary morally bad things are actually good’. See e.g., as the gymnastic meme, but also there’s something deeper of like ‘many of the evil people throughout history have believed that what they’re doing is good actually’. I think the response to this should be deep humility and moral risk aversion. Having a big brain argument that sounds good to you about why what you’re doing is good is actually extremely weak evidence about the goodness of the thing. I think it would probably be better if EAs took took this more seriously and didn’t do things like starting an AGI company or starting an AGI hedge fund. An AGI hedge fund seems even worse than Anthropic (where I think the argument for cutting edge research is medium brained and at least somewhat true empirically). The reasons Chana lists for why hedge fund could be a good idea all seem fairly weak — they would be stronger if Leopold was saying these were part of the plan.
The unilateralist nature and relationship to race dynamics also worries me. Maybe there would have been AGI hedge funds anyway, and maybe there would have been lengthy blog posts that tell the USG and China that they should be in a massive race on AI — but those things sure weren’t being done before Leopold did it.
I don’t think I have strong reasons to actively trust Leopold. I don’t know him and I think my baseline trust isn’t super high nowadays. By “trust” I mean some combination of being of good character, having correct judgment, and good epistemic practices to make up for poor judgment. Choosing to lose OpenAI equity is a positive sign, but I’m not sure how big. So this caches out in not making much of an update on the value of an AGI hedge fund — something that seems initially medium bad.
I think it’s sus to write up a blog post telling people AGI is coming soon while starting an investment firm that will benefit from people thinking AGI is coming soon. This is clearly a case of conflicting interests. It’s not necessarily a bad thing — there are good arguments around putting your money where your mouth is and taking actions based on big if true ideas, but it is a warning flag.
I could imagine a normal person reading Situational Awareness, including the part about Superalignment, and then hearing that the author is starting an AGI hedge fund, and their response being “WTF?! You believe all this about the intelligence explosion and how there are critical safety problems we’re not on track to solve, and you’re starting a hedge fund?” This response makes a lot of sense to me (and I do think I’ve heard it somewhere, though I’m not sure where). I think ‘starting an AGI hedge fund’ is really low on the list of things somebody who cares a lot about superintelligence safety should be doing. So either I’m misunderstanding something, or this is an update that Leopold isn’t as serious about ASI safety as I thought.
I have yet to see any replies from Leopold to people commentating or responding to Situational Awareness. This seems like bad form for truth seeking and getting buy-in from EAs, but it may be the norm for general intellectual content.
The paper that introduces the test is probably what you’re looking for. Based on a skim, it seems to me that it spends a lot of words laying out the conceptual background that would make this test valuable. Obviously it’s heavily selected for making the overall argument that the test is good.
Elaborating on point 1 and the “misinformation is only a small part of why the system is broken” idea:
The current system could be broken in many ways but at some equilibrium of sorts. Upsetting this equilibrium could have substantial effects because, for instance, people’s built immune response to current misinformation is not as well trained as their built immune response to traditionally biased media.
Additionally, intervening on misinformation could be far more tractable than other methods of improving things. I don’t have a solid grasp of what the problem is and what makes is worse, but a number of potential causes do seem much harder to intervene on than misinformation: general ignorance, poor education, political apathy. It can be the case that misinformation makes the situation merely 5% worse but is substantially easier to fix than these other issues.
I appreciate you writing this, it seems like a good and important post. I’m not sure how compelling I find it, however. Some scattered thoughts:
In point 1, it seems like the takeaway is “democracy is broken because most voters don’t care about factual accuracy, don’t follow the news, and elections are not a good system for deciding things; because so little about elections depends on voters getting reliable information, misinformation can’t make things much worse”. You don’t actually say this, but this appears to me to be the central thrust of your argument — to the extent modern political systems are broken, it is not in ways that are easily exacerbated by misinformation.
Point 3 seems to be mainly relevant to mainstream media, but I think the worries about misinformation typically focus on non-mainstream media. In particular, when people say they “saw X on Facebook”, they’re not basing their information diet on trustworthiness and reputation. You write, “As noted above (#1), the overwhelming majority of citizens get their political information from establishment sources (if they bother to get such information at all).” I’m not sure what exactly you’re referencing here, but it looks to me like people are getting news from social media about ⅔ as much as from news websites/apps (see “News consumption across digital platforms”). This is still a lot of social media news, which should not be discounted.
I don’t think I find point 4 compelling. I expect the establishment to have access to slightly, but not massively better AI. But more importantly, I don’t see how this helps? If it’s easy to make pro-vaccine propaganda but hard to make anti-vax propaganda, I don’t see how this is a good situation? It’s not clear that propaganda counteracts other propaganda in an efficient way such that those with better AI propaganda will win out (e.g., insularity and people mostly seeing content that aligns with their beliefs might imply little effect of counter-propaganda existing). You write “Anything that anti-establishment propagandists can do with AI, the establishment can do better”, but propaganda is probably not a zero-sum, symmetric weapon.
Overall, it feels to me like these are decent arguments about why AI-based disinformation is likely to be less of a big deal than I might have previously thought, but they don’t feel super strong. They feel largely handwavy in the sense of “here is an argument which points in that direction”, but it’s really hard to know how hard they push that direction. There is ample opportunity for quantitative and detailed analysis (which I generally would find more convincing), but that isn’t made here, and is instead obfuscated in links to other work. It’s possible that the argument I would actually find super convincing here is just way to long to be worth writing.
Again, thanks for writing this, I think it’s a service to the commons.
Due to current outsourcing being of data labeling, I think one of the issues you express in the post is very unlikely:
My general worry is that in future, the global south shall become the training ground for more harmful AI projects that would be prohibited within the Global North. Is this something that I and other people should be concerned about?
Maybe there’s an argument about how:
current practices are evidence that AI companies are trying to avoid following the laws (note I mostly don’t believe this),
and this is why they’re outsourcing parts of development,
so then we should be worried they’ll do the same to get around other (safety-oriented) laws.
This is possible, but my best guess is that low wages are the primary reason for current outsourcing.
Additionally, as noted by Larks, outsourcing data-centers is going to be much more difficult, or at least take a long time, compared to outsourcing data-labeling, so we should be less worried that companies could effectively get around laws by doing this.
This line of argument suggests that slow takeoff is inherently harder to steer. Because pretty much any version of slow takeoff means that the world will change a ton before we get strongly superhuman AI.
I’m not sure I agree that the argument suggests that. I’m also not sure slow takeoff is harder to steer than other forms of takeoff — they all seem hard to steer. I think I messed up the phrasing because I wasn’t thinking about it the right way. Here’s another shot:
Widespread AI deployment is pretty wild. If timelines are short, we might get attempts at AI takeover before we have widespread AI deployment. I think attempts like this are less likely to work than attempts in a world with widespread AI deployment. This is thinking about takeoff in the sense of deployment impact on the world (e.g., economic growth), rather than in terms of cognitive abilities.
On a related note, slow takeoff worlds are harder to steer in the sense that the proportion of influence on AI from x-risk oriented people probably goes down because the rest of the world gets involved, also the neglectedness of AI safety research probably drops; this is why some folks have considered conditioning their work on e.g., high p(doom).
Thanks for your comments! I probably won’t reply to the others as I don’t think I have much to add, they seem reasonable, though I don’t fully agree.
I think these don’t bite nearly as hard for conditional pauses, since they occur in the future when progress will be slower
Your footnote is about compute scaling, so presumably you think that’s a major factor for AI progress, and why future progress will be slower. The main consideration pointing the other direction (imo) is automated researchers speeding things up a lot. I guess you think we don’t get huge speedups here until after the conditional pause triggers are hit (in terms of when various capabilities emerge)? If we do have the capabilities for automated researchers, and a pause locks these up, that’s still pretty massive (capability) overhang territory.
While I’m very uncertain, on balance I think it provides more serial time to do alignment research. As model capabilities improve and we get more legible evidence of AI risk, the will to pause should increase, and so the expected length of a pause should also increase [footnote explaining that the mechanism here is that the dangers of GPT-5 galvanize more support than GPT-4]
I appreciate flagging the uncertainty; this argument doesn’t seem right to me.
One factor affecting the length of a pause would be the (opportunity cost from pause) / (risk of catastrophe from unpause) ratio of marginal pause days, or what is the ratio of the costs to the benefits. I expect both the costs and the benefits of AI pause days to go up in the future — because risks of misalignment/misuse are greater, and because AIs will be deployed in a way that adds a bunch of value to society (whether the marginal improvements are huge remains unclear, e.g., GPT-6 might add tons of value, but it’s unclear how much more GPT-6.5 adds on top of that, seems hard to tell). I don’t know how the ratio will change, which is probably what actually matters. But I wouldn’t be surprised if that numerator (opportunity cost) shot up a ton.
I think it’s reasonable to expect that marginal improvements to AI systems in the future (e.g., scaling up 5x) could map on to automating an additional 1-7% of a nation’s economy. Delaying this by a month would be a huge loss (or a benefit, depending on how the transition is going).
What relevant decision makers think the costs and benefits are is what actually matters, not the true values. So even if right now I can look ahead and see that an immediate pause pushes back future tremendous economic growth, this feature may not become apparent to others until later.
To try and say what I’m getting at a different way: you’re suggesting that we get a longer pause if we pause later than if we pause now. I think that “races” around AI are going to ~monotonically get worse and that the perceived cost of pausing will shoot up a bunch. If we’re early on an exponential of AI creating value in the world, it just seems way easier to pause for longer than it will be later on. If this doesn’t make sense I can try to explain more.
Sorry, I agree my previous comment was a bit intense. I think I wouldn’t get triggered if you instead asked “I wonder if a crux is that we disagree on the likelihood of existential catastrophe from AGI. I think it’s very likely (>50%), what do you think?”
P(doom) is not why I disagree with you. It feels a little like if I’m arguing with an environmentalist about recycling and they go “wow do you even care about the environment?” Sure, that could be a crux, but in this case it isn’t and the question is asked in a way that is trying to force me to agree with them. I think asking about AGI beliefs is much less bad, but it feels similar.
I think it’s pretty unclear if extra time now positively impacts existential risk. I wrote about a little bit of this here, and many others have discussed similar things. I expect this is the source of our disagreement, but I’m not sure.
I don’t think you read my comment:
I don’t think extra time pre-transformative-AI is particularly valuable except its impact on existential risk
I also think it’s bad how you (and a bunch of other people on the internet) ask this p(doom) question in a way that (in my read of things) is trying to force somebody into a corner of agreeing with you. It doesn’t feel like good faith so much as bullying people into agreeing with you. But that’s just my read of things without much thought. At a gut level I expect we die, my from-the-arguments / inside view is something like 60%, and my “all things considered” view is more like 40% doom.
Yep, seems reasonable, I don’t really have any clue here. One consideration is that this AI is probably way better than all the human scientists and can design particularly high-value experiments, also biological simulations will likely be much better in the future. Maybe the bio-security community gets a bunch of useful stuff done by then which makes the AI’s job even harder.
there will be governance mechanisms put in place after a failure
Yep, seems reasonably likely, and we sure don’t know how to do this now.
I’m not sure where I’m assuming we can’t pause dangerous AI “development long enough to build aligned AI that would be more capable of ensuring safety”? This is a large part of what I mean with the underlying end-game plan in this post (which I didn’t state super explicitly, sorry), e.g. the centralization point
centralization is good because it gives this project more time for safety work and securing the world
I’m curious why you don’t include intellectually aggressive culture in the summary? It seems like this was a notable part of a few of the case studies. Did the others just not mention this, or is there information indicating they didn’t have this culture? I’m curious how widespread this feature is. e.g.,
The intellectual atmosphere seems to have been fairly aggressive. For instance, it was common (and accepted) that some researchers would shout “bullshit” and lecture the speaker on why they were wrong.
we need capabilities to increase so that we can stay up to date with alignment research
I think one of the better write-ups about this perspective is Anthropic’s Core Views on AI Safety.
From its main text, under the heading The Role of Frontier Models in Empirical Safety, a couple relevant arguments are:
Many safety concerns arise with powerful systems, so we need to have powerful systems to experiment with
Many safety methods require large/powerful models
Need to understand how both problems and our fixes change with model scale (if model gets bigger, does it look like safety technique is still working)
To get evidence of powerful models being dangerous (which is important for many reasons), you need the powerful models.
FWIW, I find this somewhat convincing. I think the collaborating on papers part seems like it could be downstream of the expectations of # of paper produced being higher. My sense is that grad students are expected to write more papers now than they used to. One way to accomplish this is to collaborate more.
I expect if you compared data on the total number of researchers in the AI field and the number of papers, you would see the second rising a little faster than the first (I think I’ve seen this trend, but don’t have the numbers in front of me). If these were rising at the same rate, I think it would basically indicate no change in the difficulty of ideas, because research hours would be scaling with # papers. Again, I expect the trend is actually papers rising faster than people, which would make it seem like ideas are getting easier to find.
I think other explanations, like the norms and culture around research output expectation, collaboration, how many references you have to have, are more to blame.
Overall I don’t find the methodology presented here, of just looking at number of authors and number of references, to be particularly useful for figuring out if ideas are getting harder to find. It’s definitely some evidence, but I think there’s quite a few plausible explanations.