I’m Aaron, I’ve done Uni group organizing at the Claremont Colleges for a bit. Current cause prioritization is AI Alignment.
Aaron_Scher
I appreciate you writing this, it seems like a good and important post. I’m not sure how compelling I find it, however. Some scattered thoughts:
In point 1, it seems like the takeaway is “democracy is broken because most voters don’t care about factual accuracy, don’t follow the news, and elections are not a good system for deciding things; because so little about elections depends on voters getting reliable information, misinformation can’t make things much worse”. You don’t actually say this, but this appears to me to be the central thrust of your argument — to the extent modern political systems are broken, it is not in ways that are easily exacerbated by misinformation.
Point 3 seems to be mainly relevant to mainstream media, but I think the worries about misinformation typically focus on non-mainstream media. In particular, when people say they “saw X on Facebook”, they’re not basing their information diet on trustworthiness and reputation. You write, “As noted above (#1), the overwhelming majority of citizens get their political information from establishment sources (if they bother to get such information at all).” I’m not sure what exactly you’re referencing here, but it looks to me like people are getting news from social media about ⅔ as much as from news websites/apps (see “News consumption across digital platforms”). This is still a lot of social media news, which should not be discounted.
I don’t think I find point 4 compelling. I expect the establishment to have access to slightly, but not massively better AI. But more importantly, I don’t see how this helps? If it’s easy to make pro-vaccine propaganda but hard to make anti-vax propaganda, I don’t see how this is a good situation? It’s not clear that propaganda counteracts other propaganda in an efficient way such that those with better AI propaganda will win out (e.g., insularity and people mostly seeing content that aligns with their beliefs might imply little effect of counter-propaganda existing). You write “Anything that anti-establishment propagandists can do with AI, the establishment can do better”, but propaganda is probably not a zero-sum, symmetric weapon.
Overall, it feels to me like these are decent arguments about why AI-based disinformation is likely to be less of a big deal than I might have previously thought, but they don’t feel super strong. They feel largely handwavy in the sense of “here is an argument which points in that direction”, but it’s really hard to know how hard they push that direction. There is ample opportunity for quantitative and detailed analysis (which I generally would find more convincing), but that isn’t made here, and is instead obfuscated in links to other work. It’s possible that the argument I would actually find super convincing here is just way to long to be worth writing.
Again, thanks for writing this, I think it’s a service to the commons.
Due to current outsourcing being of data labeling, I think one of the issues you express in the post is very unlikely:
My general worry is that in future, the global south shall become the training ground for more harmful AI projects that would be prohibited within the Global North. Is this something that I and other people should be concerned about?
Maybe there’s an argument about how:
current practices are evidence that AI companies are trying to avoid following the laws (note I mostly don’t believe this),
and this is why they’re outsourcing parts of development,
so then we should be worried they’ll do the same to get around other (safety-oriented) laws.
This is possible, but my best guess is that low wages are the primary reason for current outsourcing.
Additionally, as noted by Larks, outsourcing data-centers is going to be much more difficult, or at least take a long time, compared to outsourcing data-labeling, so we should be less worried that companies could effectively get around laws by doing this.
This line of argument suggests that slow takeoff is inherently harder to steer. Because pretty much any version of slow takeoff means that the world will change a ton before we get strongly superhuman AI.
I’m not sure I agree that the argument suggests that. I’m also not sure slow takeoff is harder to steer than other forms of takeoff — they all seem hard to steer. I think I messed up the phrasing because I wasn’t thinking about it the right way. Here’s another shot:
Widespread AI deployment is pretty wild. If timelines are short, we might get attempts at AI takeover before we have widespread AI deployment. I think attempts like this are less likely to work than attempts in a world with widespread AI deployment. This is thinking about takeoff in the sense of deployment impact on the world (e.g., economic growth), rather than in terms of cognitive abilities.
On a related note, slow takeoff worlds are harder to steer in the sense that the proportion of influence on AI from x-risk oriented people probably goes down because the rest of the world gets involved, also the neglectedness of AI safety research probably drops; this is why some folks have considered conditioning their work on e.g., high p(doom).
Thanks for your comments! I probably won’t reply to the others as I don’t think I have much to add, they seem reasonable, though I don’t fully agree.
I think these don’t bite nearly as hard for conditional pauses, since they occur in the future when progress will be slower
Your footnote is about compute scaling, so presumably you think that’s a major factor for AI progress, and why future progress will be slower. The main consideration pointing the other direction (imo) is automated researchers speeding things up a lot. I guess you think we don’t get huge speedups here until after the conditional pause triggers are hit (in terms of when various capabilities emerge)? If we do have the capabilities for automated researchers, and a pause locks these up, that’s still pretty massive (capability) overhang territory.
While I’m very uncertain, on balance I think it provides more serial time to do alignment research. As model capabilities improve and we get more legible evidence of AI risk, the will to pause should increase, and so the expected length of a pause should also increase [footnote explaining that the mechanism here is that the dangers of GPT-5 galvanize more support than GPT-4]
I appreciate flagging the uncertainty; this argument doesn’t seem right to me.
One factor affecting the length of a pause would be the (opportunity cost from pause) / (risk of catastrophe from unpause) ratio of marginal pause days, or what is the ratio of the costs to the benefits. I expect both the costs and the benefits of AI pause days to go up in the future — because risks of misalignment/misuse are greater, and because AIs will be deployed in a way that adds a bunch of value to society (whether the marginal improvements are huge remains unclear, e.g., GPT-6 might add tons of value, but it’s unclear how much more GPT-6.5 adds on top of that, seems hard to tell). I don’t know how the ratio will change, which is probably what actually matters. But I wouldn’t be surprised if that numerator (opportunity cost) shot up a ton.
I think it’s reasonable to expect that marginal improvements to AI systems in the future (e.g., scaling up 5x) could map on to automating an additional 1-7% of a nation’s economy. Delaying this by a month would be a huge loss (or a benefit, depending on how the transition is going).
What relevant decision makers think the costs and benefits are is what actually matters, not the true values. So even if right now I can look ahead and see that an immediate pause pushes back future tremendous economic growth, this feature may not become apparent to others until later.
To try and say what I’m getting at a different way: you’re suggesting that we get a longer pause if we pause later than if we pause now. I think that “races” around AI are going to ~monotonically get worse and that the perceived cost of pausing will shoot up a bunch. If we’re early on an exponential of AI creating value in the world, it just seems way easier to pause for longer than it will be later on. If this doesn’t make sense I can try to explain more.
Sorry, I agree my previous comment was a bit intense. I think I wouldn’t get triggered if you instead asked “I wonder if a crux is that we disagree on the likelihood of existential catastrophe from AGI. I think it’s very likely (>50%), what do you think?”
P(doom) is not why I disagree with you. It feels a little like if I’m arguing with an environmentalist about recycling and they go “wow do you even care about the environment?” Sure, that could be a crux, but in this case it isn’t and the question is asked in a way that is trying to force me to agree with them. I think asking about AGI beliefs is much less bad, but it feels similar.
I think it’s pretty unclear if extra time now positively impacts existential risk. I wrote about a little bit of this here, and many others have discussed similar things. I expect this is the source of our disagreement, but I’m not sure.
I don’t think you read my comment:
I don’t think extra time pre-transformative-AI is particularly valuable except its impact on existential risk
I also think it’s bad how you (and a bunch of other people on the internet) ask this p(doom) question in a way that (in my read of things) is trying to force somebody into a corner of agreeing with you. It doesn’t feel like good faith so much as bullying people into agreeing with you. But that’s just my read of things without much thought. At a gut level I expect we die, my from-the-arguments / inside view is something like 60%, and my “all things considered” view is more like 40% doom.
Yep, seems reasonable, I don’t really have any clue here. One consideration is that this AI is probably way better than all the human scientists and can design particularly high-value experiments, also biological simulations will likely be much better in the future. Maybe the bio-security community gets a bunch of useful stuff done by then which makes the AI’s job even harder.
there will be governance mechanisms put in place after a failure
Yep, seems reasonably likely, and we sure don’t know how to do this now.
I’m not sure where I’m assuming we can’t pause dangerous AI “development long enough to build aligned AI that would be more capable of ensuring safety”? This is a large part of what I mean with the underlying end-game plan in this post (which I didn’t state super explicitly, sorry), e.g. the centralization point
centralization is good because it gives this project more time for safety work and securing the world
I’m curious why you don’t include intellectually aggressive culture in the summary? It seems like this was a notable part of a few of the case studies. Did the others just not mention this, or is there information indicating they didn’t have this culture? I’m curious how widespread this feature is. e.g.,
The intellectual atmosphere seems to have been fairly aggressive. For instance, it was common (and accepted) that some researchers would shout “bullshit” and lecture the speaker on why they were wrong.
we need capabilities to increase so that we can stay up to date with alignment research
I think one of the better write-ups about this perspective is Anthropic’s Core Views on AI Safety.
From its main text, under the heading The Role of Frontier Models in Empirical Safety, a couple relevant arguments are:
Many safety concerns arise with powerful systems, so we need to have powerful systems to experiment with
Many safety methods require large/powerful models
Need to understand how both problems and our fixes change with model scale (if model gets bigger, does it look like safety technique is still working)
To get evidence of powerful models being dangerous (which is important for many reasons), you need the powerful models.
Not responding to your main question:
Second in a theoretical situation where capabilities research globally stopped overnight, isn’t this just free-extra-time for the human race where we aren’t moving towards doom? That feels pretty valuable and high EV in and of itself.
I’m interpreting this as saying that buying humanity more time, in and of itself, is good.
I don’t think extra time pre-transformative-AI is particularly valuable except its impact on existential risk. Two reasons for why I think this:
Astronomical waste argument. Time post-transformative-AI is way more valuable than time now, assuming some (but strong version not necessary) aggregating/total utilitarianism. If I was trading clock-time seconds now for seconds a thousand years from now, assuming no difference in existential risk, I would probably be willing to trade every historical second of humans living good lives for like a minute a thousand years from now, because it seems like we could have a ton of (morally relevant) people in the future, and the moral value derived from their experience could be significantly greater than current humans.
The moral value of the current world seems plausibly negative due to large amounts of suffering. Factory farming, wild animal suffering, humans experiencing suffering, and more, seem like they make the total sign unclear. Under moral views that weigh suffering more highly than happiness, there’s an even stronger case for the current world being net-negative. This is one of those arguments that I think is pretty weird and almost never affects my actions, but it is relevant to the question of whether extra time for the human race is positive EV.
Third argument about how AI sooner could help reduce other existential risks. e.g., normal example of AI speeding up vaccine research, or weirder example of AI enabling space colonization, and being on many planets makes x-risk lower. I don’t personally put very much weight on this argument, but it’s worth mentioning.
I’m glad you wrote this post. Mostly before reading this post, I wrote a draft for what I want my personal conflict of interest policy to be, especially with regard to personal and professional relationships. Changing community norms can be hard, but changing my norms might be as easy as leaving a persuasive comment! I’m open to feedback and suggestions here for anybody interested.
I think Ryan is probably overall right that it would be better to fund people for longer at a time. One counter-consideration that hasn’t been mentioned yet: longer contracts implicitly and explicitly push people to keep doing something — that may be sub-optimal — because they drive up switching costs.
If you have to apply for funding once a year no matter what you’re working on, the “switching costs” of doing the same thing you’ve been doing are similar to the cost of switching (of course they aren’t in general, but with regard to funding they might be). I think it’s unlikely but not crazy that the effects of status quo bias might be severe enough that funders artificially imposing switching costs on “continuing/non-switching” results in net better results. I expect that in the world where grants usually last 1 year, people switch what they’re doing more than the world where grants are 3 years, and it’s plausible these changes are good for impact.
Some factors that seem cruxy here:
how much can be gained through realistic switching (how bad is current allocation of people, how much better are the things people would do if switching costs were relatively zero as they sorta are, how much worse of things would people be doing if continuing-costs were low).
it seems very likely that this consideration should affect grants to junior and early-career people that are bouncing around more but it probably doesn’t apply much to more senior folks who have been working on a project for awhile (on the other hand, maybe you want the junior people investing more in long term career plans, thus switching less, it depends on the person).
Could relatively-zero switching costs actually hurt things because people switch too much due to over-correcting (e.g., holden karnofsky gets excited about AI evaluations so a bunch of people work on it and then governance gets big so people switch to that, and then etc.)
How much does it help to have people with a lot of experience on particular things (narrow experts vs. generalists)
If grants were more flexible (or grant-makers communicated better and the social norms were clearer; in fact people sometimes do return grants or change what they’re working on mid-grant) maybe you could fund people for 3 years while giving them affordance to switch so you still capture the value from people switching to more impactful things.
Personally, I’ve found that being funded by a grant at all makes me less likely to switch what I’m doing. I expect the amount I “want” (not upon reflection/hindsight) to switch what I’m doing is too much, so for me this effect might be net positive, but there are probably also some times where this gets in the way of me making impactful switches. If grant-makers were more accessible to talk about this, i.e., not significantly time constrained, they could probably cause a better allocation of resources. Overall, I’m not sure how compelling this counter-consideration is, but it seems worth mulling over.
What does FRO stand for?
How is the super-alignment team going to interface with the rest of the AI alignment community, and specifically what kind of work from others would be helpful to them (e.g., evaluations they would want to exist in 2 years, specific problems in interpretability that seem important to solve early, curricula for AIs to learn about the alignment problem while avoiding content we may not want them reading)?
To provide more context on my thinking that leads to this question: I’m pretty worried that OpenAI is making themselves a single point of failure in existential security . Their plan seems to be a less-disingenuous version of “we are going to build superintelligence in the next 10 years, and we’re optimistic that our alignment team will solve catastrophic safety problems, but if they can’t then humanity is screwed anyway, because as mentioned, we’re going to build the god machine. We might try to pause if we can’t solve alignment, but we don’t expect that to help much.” Insofar as a unilateralist is taking existentially risky actions like this and they can’t be stopped, other folks might want to support their work to increase the chance of the super-alignment team’s success. Insofar as I want to support their work, I currently don’t know what they need.
Another framing behind this question is just “many people in the AI alignment community are also interested in solving this problem, how can they indirectly collaborate with you (some people will want to directly collaborate, but this has corporate-closed-ness limitation).
I am not aware of modeling here, but I have thought about this a bit. Besides what you mention, some other ways I think this story may not pan out (very speculative):
At the critical time, the cost of compute for automated researchers may be really high such that it’s actually not cost effective to buy labor this way. This would mainly be because many people want to use the best hardware for AI training or productive work, and this demand just overwhelms suppliers and prices skyrocket. This is like the labs and governments paying a lot more except that they’re buying things which are not altruistically-motivated research. Because autonomous labor is really expensive, it isn’t a much better deal than 2023 human labor.
A similar problem is that there may not be a market for buying autonomous labor because somebody is restricting this. Perhaps a government implements compute controls including on inference to slow AI progress (because they think that rapid progress would lead to catastrophe from misalignment). Perhaps the lab that develops the first of these capable-of-autonomous-research models restricts who can use it. To spell this out more, say GPT-6 is capable of massively accelerating research, then OpenAI may only make it available to alignment researchers for 3 months. Alternatively, they may only make it available to cancer researchers. In the first case, it’s probably relatively cheap to get autonomous alignment research (I’m assuming OpenAI is subsidizing this, though this may not be a good assumption). In the second case you can’t get useful alignment research with your money because you’re not allowed to.
It might be that the intellectual labor we can get out of AI systems at the critical time is bottlenecked by human labor (i.e., humans are needed to: review the output of AI debates, give instructions to autonomous software engineers, or construct high quality datasets). In this situation, you can’t buy very much autonomous labor with your money because autonomous labor isn’t the limiting factor on progress. This is pretty much the state of things in 2023; AI systems help speed up human researchers, but the compute cost of them doing so is still far below the human costs, and you probably didn’t need to save significant money 5 years ago to make this happen.
My current thinking is that there’s a >20% chance that EA-oriented funders should be saving significant money to spend on compute for autonomous researchers, and it is an important thing for them to gain clarity on. I want to point out that there is probably a partial-automation phase (like point 3 above) before a full-automation phase. The partial-automation phase has less opportunity to usefully spend money on compute (plausibly still in the tens of millions of dollars), but our actions are more likely to matter. After that comes the full-automation phase where money can be scalably spent to e.g., differentially speed up alignment vs. AI capabilities research by hundreds of millions of dollars, but there’s a decent chance our actions don’t matter then.
As you mention, perhaps our actions don’t matter then because humans don’t control the future. I would emphasize that if we have fully autonomous, no humans in the loop, research happening without already having good alignment of those systems, it’s highly likely that we get disempowered. That is, it might not make sense to aim to do alignment research at that point because either the crucial alignment work was already done, or we lose. Conditional on having aligned systems at this point, having saved money to spend on altruistically motivated cognitive work probably isn’t very important because economic growth gets going really fast and there’s plenty of money to be spent on non-alignment altruistic causes. On the other hand, something something at that point it’s the last train on it’s way to the dragon and it sure would be sad to not have money saved to buy those bed-nets.
A few weeks ago I did a quick calculation for the amount of digital suffering I expect in the short term, which probably gets at your question about these sizes, for the short term. tldr of my thinking on the topic:
There is currently a global compute stock of ~1.4e21 FLOP/s (each second, we can do about that many floating point operations).
It seems reasonable to expect this to grow ~40x in the next 10 years based on naively extrapolating current trends in spending and compute efficiency per dollar. That brings us to 1.6e23 FLOP/s in 2033.
Human brains do about 1e15 FLOP/s (each second, a human brain does about 1e15 floating point operations worth of computation)
We might naively assume that future AIs will have similar consciousness-compute efficiency to humans. We’ll also assume that 63% of the 2033 compute stock is being used to run such AIs (makes the numbers easier).
Then the number of human-consciousness-second-equivalent AIs that can be run each second in 2033 is 1e23 / 1e15 = 1e8, or 100 million.
For reference, there are probably around 31 billion land animals being factory farmed each second. I make a few adjustments based on brain size and guesses about the experience of suffering AIs and get that digital suffering in 2033 seems to be similar in scale to factory farming.
Overall my analysis is extremely uncertain, and I’m unsurprised if it’s off by 3 orders of magnitude in either direction. Also note that I am only looking at the short term.
You can read the slightly more thorough, but still extremely rough and likely wrong BOTEC here
Thanks for your response. I’ll just respond to a couple things.
Re Constitutional AI: I agree normatively that it seems bad to hand over judging AI debates to AIs[1]. I also think this will happen. To quote from the original AI Safety via Debate paper,
Human time is expensive: We may lack enough human time to judge every debate, which we can address by training ML models to predict human reward as in Christiano et al. [2017]. Most debates can be judged by the reward predictor rather than by the humans themselves. Critically, the reward predictors do not need to be as smart as the agents by our assumption that judging debates is easier than debating, so they can be trained with less data. We can measure how closely a reward predictor matches a human by showing the same debate to both.
Re
We’d also really contest the ‘perform very similarly to human raters’ is enough—it’d be surprising if we already have a free lunch, no information lost, way to simulate humans well enough to make better AI.
I also find this surprising, or at least I did the first 3 times I came across medium-quality evidence pointing this direction. I don’t find it as surprising any more because I’ve updated my understanding of the world to “welp, I guess 2023 AIs actually are that good on some tasks.” Rather than making arguments to try and convince you, I’ll just link some of the evidence that I have found compelling, maybe you will too, maybe not: Model Written Evals, MACHIAVELLI benchmark, Alpaca (maybe the most significant for my thinking), this database, Constitutional AI.
I’m far from certain that this trend, of LLMs being useful for making better LLMs and for replacing human feedback, continues rather than hitting a wall in the next 2 years, but it does seem more likely than not to me, based on my read of the evidence. Some important decisions in my life rely on how soon this AI stuff is happening (for instance if we have 20+ years I should probably aim to do policy work), so I’m pretty interested in having correct views. Currently, LLMs improving the next generation of AIs via more and better training data is one of the key factors in how I’m thinking about this. If you don’t find these particular evidences compelling and are able to explain why, that would be useful to me!
- ^
I’m actually unsure here. I expect there are some times where it’s fine to have no humans in the loop and other times where it’s critical. It generally gives me the ick to take humans out of the loop, but I expect there are some times where I would think it’s correct.
- ^
Elaborating on point 1 and the “misinformation is only a small part of why the system is broken” idea:
The current system could be broken in many ways but at some equilibrium of sorts. Upsetting this equilibrium could have substantial effects because, for instance, people’s built immune response to current misinformation is not as well trained as their built immune response to traditionally biased media.
Additionally, intervening on misinformation could be far more tractable than other methods of improving things. I don’t have a solid grasp of what the problem is and what makes is worse, but a number of potential causes do seem much harder to intervene on than misinformation: general ignorance, poor education, political apathy. It can be the case that misinformation makes the situation merely 5% worse but is substantially easier to fix than these other issues.