(COI note: I work at OpenAI. These are my personal views, though.)
My quick take on the “AI pause debate”, framed in terms of two scenarios for how the AI safety community might evolve over the coming years:
AI safety becomes the single community that’s the most knowledgeable about cutting-edge ML systems. The smartest up-and-coming ML researchers find themselves constantly coming to AI safety spaces, because that’s the place to go if you want to nerd out about the models. It feels like the early days of hacker culture. There’s a constant flow of ideas and brainstorming in those spaces; the core alignment ideas are standard background knowledge for everyone there. There are hackathons where people build fun demos, and people figuring out ways of using AI to augment their research. Constant interactions with the models allows people to gain really good hands-on intuitions about how they work, which they leverage into doing great research that helps us actually understand them better. When the public ends up demanding regulation, there’s a large pool of competent people who are broadly reasonable about the risks, and can slot into the relevant institutions and make them work well.
AI safety becomes much more similar to the environmentalist movement. It has broader reach, but alienates a lot of the most competent people in the relevant fields. ML researchers who find themselves in AI safety spaces are told they’re “worse than Hitler” (which happened to a friend of mine). People get deontological about AI progress; some hesitate to pay for ChatGPT because it feels like they’re contributing to the problem (another true story); others overemphasize the risks of existing models in order to whip up popular support. People are sucked into psychological doom spirals similar to how many environmentalists think about climate change: if you’re not depressed then you obviously don’t take it seriously enough. Just like environmentalists often block some of the most valuable work on fixing climate change (e.g. nuclear energy, geoengineering, land use reform), safety advocates block some of the most valuable work on alignment (e.g. scalable oversight, interpretability, adversarial training) due to acceleration or misuse concerns. Of course, nobody will say they want to dramatically slow down alignment research, but there will be such high barriers to researchers getting and studying the relevant models that it has similar effects. The regulations that end up being implemented are messy and full of holes, because the movement is more focused on making a big statement than figuring out the details.
Obviously I’ve exaggerated and caricatured these scenarios, but I think there’s an important point here. One really good thing about the AI safety movement, until recently, is that the focus on the problem of technical alignment has nudged it away from the second scenario (although it wasn’t particularly close to the first scenario either, because the “nerding out” was typically more about decision theory or agent foundations than ML itself). That’s changed a bit lately, in part because a bunch of people seem to think that making technical progress on alignment is hopeless. I think this is just not an epistemically reasonable position to take: history is full of cases where people dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems. Either way, I do think public advocacy for strong governance measures can be valuable, but I also think that “pause AI” advocacy runs the risk of pushing us towards scenario 2. Even if you think that’s a cost worth paying, I’d urge you to think about ways to get the benefits of the advocacy while reducing that cost and keeping the door open for scenario 1.
I think it would be helpful for you to mention and highlight your conflict-of-interest here.
I remember becoming much more positive about ads after starting work at Google. After I left, I slowly became more cynical about them again, and now I’m back down to ~2018 levels.
EDIT: I don’t think this comment should get more than say 10-20 karma. I think it was a quick suggestion/correction that Richard ended up following, not too insightful or useful.
EDIT: I don’t think this comment should get more than say 10-20 karma. I think it was a quick suggestion/correction that Richard ended up following, not too insightful or useful.
Cool that you pointed this out! I have the impression comments like yours just above often get lots of karma on EA Forum, particularly when coming from people who already have lots of karma. I wonder whether that is good.
Yeah I think it’s suboptimal. It makes sense that the comment had a lot of agree-votes. It’d also make more sense to upvote if Richard didn’t add in his COI after my comment, because then making the comment go up in visibility had a practical value of a) making sure almost everybody who reads Richard’s comment notices the COI and b) making it more likely for Richard to change his mind.
But given that Richard updated very quickly (in <1 hour), I think additional upvotes after his edit were superfluous.
I agree there’s a bias where the points more popular people make are evaluated more generously, but in this case I think the karma is well deserved. The COI point is important, and Linch highlights its importance with a relevant yet brief personal story. And while the comment was quick for Linch to make, some people in the EA community would hesitate to point out a conflict of interest in public for fear of being seen as a troublemaker, so the counterfactual impact is higher than it might seem. I strongly upvoted the comment.
I appreciate you drawing attention to the downside risks of public advocacy, and I broadly agree that they exist, but I also think the (admittedly) exaggerated framings here are doing a lot of work (basically just intuition pumping, for better or worse). The argument would be just as strong in the opposite direction if we swap the valence and optimism/pessimism of the passages: what if, in scenario one, the AI safety community continues making incremental progress on specific topics in interpretability and scalable oversight but achieves too little too slowly and fails to avert the risk of unforeseen emergent capabilities in large models driven by race dynamics, or even worse, accelerates those dynamics by drawing more talent to capabilities work? Whereas in scenario two, what if the AI safety movement becomes similar to the environmental movement by using public advocacy to build coalitions among diverse interest groups, becoming a major focus of national legislation and international cooperation, moving hundreds of billions of $ into clean tech research, etc.
Don’t get me wrong — there’s a place for intuition pumps like this, and I use them often. But I also think that both technical and advocacy approaches could be productive or counterproductive, and so it’s best for us to cautiously approach both and evaluate the risks and merits of specific proposals on their own. In terms of the things you mention driving bad outcomes for advocacy, I’m not sure if I agree — feeling uncertain about paying for ChatGPT seems like a natural response for someone worried about OpenAI’s use of capital, and I haven’t seen evidence that Holly (in the post you link) is exaggerating any risks to whip up support. We could disagree about these things, but my main point is that actually getting into the details of those disagreements is probably more useful in service of avoiding the second scenario than just describing it in pessimistic terms.
Yepp, I agree that I am doing an intuition pump to convey my point. I think this is a reasonable approach to take because I actually think there’s much more disagreement on vibes and culture than there is on substance (I too would like AI development to go more slowly). E.g. AI safety researchers paying for ChatGPT obviously brings in a negligible amount of money for OpenAI, and so when people think about that stuff the actual cognitive process is more like “what will my purchase signal and how will it influence norms?” But that’s precisely the sort of thing that has an effect on AI safety culture independent of whether people agree or disagree on specific policies—can you imagine hacker culture developing amongst people who were boycotting computers? Hence why my takeaway at the end of the post is not “stop advocating for pauses” but rather “please consider how to have positive effects on community culture and epistemics, which might not happen by default”.
I would be keen to hear more fleshed-out versions of the passages with the valences swapped! I like the one you’ve done; although I’d note that you’re focusing on the outcomes achieved by those groups, whereas I’m focusing also on the psychologies of the people in those groups. I think the psychological part is important because, as they say, culture eats strategy for breakfast. I do think that climate activists have done a good job at getting funding into renewables; but I think alignment research is much harder to accelerate (e.g. because the metrics are much less clear, funding is less of a bottleneck, and the target is moving much faster) and so trading off a culture focused on understanding the situation clearly for more success at activism may not be the right call here even if it was there.
Huh, it really doesn’t read that way to me. Both are pretty clear causal paths to “the policy and general coordination we get are better/worse as a result.”
I don’t think this is a coincidence—in general I think it’s much easier for people to do great research and actually figure stuff out when they’re viscerally interested in the problems they’re tackling, and excited about the process of doing that work.
Like, all else equal, work being fun and invigorating is obviously a good thing? I’m open to people arguing that the benefits of creating a depressing environment are greater (even if just in the form of vignettes like I did above), e.g. because it spurs people to do better policy work. But falling into unsustainable depressing environments which cause harmful side effects seems like a common trap, so I’m pretty cautious about it.
in general I think it’s much easier for people to do great research and actually figure stuff out when they’re viscerally interested in the problems they’re tackling, and excited about the process of doing that work.
Totally. But OP kinda made it sound like the fact that you found 2 depressing was evidence it was the wrong direction. I think advocacy could be fun and full of its own fascinating logistical and intellectual questions as well as lots of satisfying hands-on work.
“hesitate to pay for ChatGPT because it feels like they’re contributing to the problem”
Yep that’s me right now and I would hardly call myself a Luddite (maybe I am tho?)
Can you explain why you frame this as an obviously bad thing to do? Refusing to help fund the most cutting edge AI company, which has been credited by multiple people with spurring on the AI race and attracting billions of dollars to AI capabilities seems not-unreasonable at the very least, even if that approach does happen to be wrong.
Sure there are decent arguments against not paying for chat GPT, like the LLM not being dangerous in and of itself, and the small amount of money we pay not making a significant difference, but it doesn’t seem to be prima-facie-obviously-net-bad-luddite behavior, which is what you seem to paint it as in the post.
Obviously if individual people want to use or not use a given product, that’s their business. I’m calling it out not as a criticism of individuals, but in the context of setting the broader AI safety culture, for two broad reasons:
In a few years’ time, the ability to use AIs will be one of the strongest drivers of productivity, and not using them will be… actually, less Luddite, and more Amish. It’s fine for some people to be Amish, but for AI safety people (whose work particularly depends on understanding AI well) not using cutting-edge AI is like trying to be part of the original hacker culture while not using computers.
I think that the idea of actually trying to do good effectively is a pretty radical one, and scope-sensitivity is a key component of that. Without it, people very easily slide into focusing on virtue signalling or ingroup/outgroup signalling (e.g. climate activists refusing to take flights/use plastic bags/etc), which then has knock-on effects in who is attracted to the movement, etc. On twitter I recently criticized a UK campaign to ban a specific dog breed for not being very scope-sensitive; you can think of this as similar to that.
I’m a bit concerned that both of your arguments here are a bit strawmannish, but again I might be missing something
Indeed ,my comment was regarding the 99.999 percent of people ( including myself) who are not AI researchers. I completely agree that researchers should be working on the latest models and paying for chat GPT 4, but that wasn’t my point.
I think it’s borderline offensive to call people “amish” who boycott potentially dangerous tech which can increase productivity. First it could be offensive to the Amish, as you seem to be using it as a perogative, and second boycotting any 1 technology for harm minimisation reasons while using all other technology can’t get compared to the Amish way of life. I’m not saying boycott all AI, that would be impossible anyway. Just perhaps not contributing financially to the company making the most cutting edge models.
This is a big discussion, but I think discarding not paying for chat GPT under the banner of poor scope sensitivity and virtue signaling is weak at best and straw Manning at worst. The environmentalists I know who don’t fly, don’t use it to virtue signal at all, they are doing it to help the world a little and show integrity with their lifestyles. This may or may not be helpful to their cause, but the little evidence we have also seems to show that more radical actions like this do not alienate regular people but instead pull people towards the argument your are trying to make, in this case that an AI frontier arms race might be harmful.
I actually changed my mind on this on seeing the forum posts here a few months ago, I used to think that radical life decisions and activism was likely to be net harmful too. what research we have on the topic shows that more radical actions attract more people to mainstream climate/animal activist ideals, so I think your comment “has knock-on effects in who is attracted to the movement, etc.” It’s more likely to be wrong than right.
Indeed ,my comment was regarding the 99.999 percent of people ( including myself) who are not AI researchers. I completely agree that researchers should be working on the latest models and paying for chat GPT 4, but that wasn’t my point.
I’d extend this not just to include AI researchers, but people who are involved in AI safety more generally. But on the question of the wider population, we agree.
The environmentalists I know who don’t fly, don’t use it to virtue signal at all, they are doing it to help the world a little and show integrity with their lifestyles, which is admirable whether you agree it’s helpful or not.
“show integrity with their lifestyles” is a nicer way of saying “virtue signalling”, it just happens to be signalling a virtue that you agree with. I do think it’s an admirable display of non-selfishness (and far better than vice signalling, for example), but so too are plenty of other types of costly signalling like asceticism. A common failure mode for groups people trying to do good is “pick a virtue that’s somewhat correlated with good things and signal the hell out of it until it stops being correlated”. I’d like this not to happen in AI safety (more than it already has: I think this has already happened with pessimism-signalling, and conversely happens with optimism-signalling in accelerationist circles).
“show integrity with their lifestyles” is a nicer way of saying “virtue signalling”,
I would describe it more as a spectrum. On the more pure “virtue signaling” end, you might choose one relatively unimportant thing like signing a petition, then blast it all over the internet while not doing other more important actions that’s the cause.
Whereas on the other end of the spectrum, “showing integrity with lifestyle” to me means something like making a range of lifestyle choices which might make only s small difference to your cause, while making you feel like you are doing what you can on a personal level. You might not talk about these very much at all.
Obviously there are a lot of blurry lines in between.
Maybe my friends are different from yours, but climate activists I know often don’t fly, don’t drive and don’t eat meat. And they don’t talk about it much or “signal” this either. But when they are asked about it, they explain why. This means when they get challenged in the public sphere, both neutral people and their detractors lack personal ammunition to car dispersion on their arguments, so their position becomes more convincing.
I don’t call that virtue signaling, but I suppose it’s partly semantics.
One exchange that makes me feel particularly worried about Scenario 2 is this one here, which focuses on the concern that there’s:
No rigorous basis for that the use of mechanistic interpretability would “open up possibilities” to long-term safety. And plenty of possibilities for corporate marketers – to chime in on mechint’s hypothetical big breakthroughs. In practice, we may help AI labs again – accidentally – to safety-wash their AI products.
I would like to point to this as a central example of the type of thing I’m worried about in scenario 2: the sort of doom spiral where people end up actively opposed to the most productive lines of research we have, because they’re conceiving of the problem as being arbitrarily hard. This feels very reminiscent of the environmentalists who oppose carbon capture or nuclear energy because it might make people feel better without solving the “real problem”.
It looks like, on net, people disagree with my take in the original post. So I’d like to ask the people who disagree: do you have reasons to think that the sort of position I’ve quoted here won’t become much more common as AI safety becomes much more activism-focused? Or do you think it would be good if it did?
It looks like, on net, people disagree with my take in the original post.
I just disagreed with the OP because it’s a false dichotomy; we could just agree with the true things that activists believe, and not the false ones, and not go based on vibes. We desire to believe that mech-interp is mere safety-washing iff it is, and so on.
The problem here is doing insufficient safety R&D at AI labs that enables the AI labs to market themselves as seriously caring about safety and thus that their ML products are good for release.
You need to consider that, especially since you work at an AI lab.
Slightly conflicted agree vote: your model here offloads so much to judgment calls that fall on people who are vulnerable to perverse incentives (like, alignment/capabilities as a binary distinction is a bad frame, but it seems like anyone who’d be unusually well suited to thinking clearly about it’s alternatives make more money and have less stressful lives if their beliefs fall some ways vs others).
Other than that, I’m aware that no one’s really happy about the way they tradeoff “you could copenhagen ethics your way out of literally any action in the limit” against “saying that the counterfactual a-hole would do it worse if I didn’t is not a good argument”. It seems like a law of opposite advice situation, maybe? As in some people in the blase / unilateral / powerhungry camp could stand to be nudged one way and some people in the scrupulous camp could stand to be nudged another.
It also matters that the “oppose carbon capture or nuclear energy because it might make people feel better without solving the ‘real problem’.” environmentalists have very low standards even when you condition on them being environmentalists. That doesn’t mean they can’t be memetically adaptive and then influential, but it might be tactically important (i.e. you have a messaging problem instead of a more virtuous actually-trying-to-think-clearly problem)
history is full of cases where people dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems.
There are 2 concurrent research programs, and if one program (capability) completes before the other one (alignment), we all die, but the capability program is an easier technical problem than the alignment program. Do you disagree with that framing? If not, then how does “research might proceed faster than we expect” give you hope rather than dread?
Also, I’m guessing you would oppose a worldwide ban starting today on all “experimental” AI research (i.e., all use of computing resources to run AIs) till the scholars of the world settle on how to keep an AI aligned through the transition to superintelligence. That’s my guess, but please confirm. In your answer, please imagine that the ban is feasible and in fact can be effective (“leak-proof”?) enough to give the AI theorists all they time they need to settle on a plan even if that takes many decades. In other words, please indulge me this hypothetical question because I suspect it is a crux.
“Settled” here means that a majority of non-senile scholars / researchers who’ve worked full-time on the program for at least 15 years of their lives agree that it is safe for experiments to start as long as they adhere to a particular plan (which this majority agree on). Kinda like the way scholars have settled on the conclusion that anthropogenic emissions of carbon dioxide are causing the Earth to warm up.
I realize that there are safe experiments that would help the scholars of the world a lot in their theorizing about alignment, but I also expect that the scholars / theorist could probably settle on a safe effective plan without the benefit of experiments beyond the experiments that have been run up to now even though it might take them longer than it would with the benefit. Do you agree?
There are 2 concurrent research programs, and if one program (capability) completes before the other one (alignment), we all die, but the capability program is an easier technical problem than the alignment program. Do you disagree with that framing?
Yepp, I disagree on a bunch of counts.
a) I dislike the phrase “we all die”, nobody has justifiable confidence high enough to make that claim, even if ASI is misaligned enough to seize power there’s a pretty wide range of options for the future of humans, including some really good ones (just like there’s a pretty wide range of options for the future of gorillas, if humans remain in charge).
b) Same for “the capability program is an easier technical problem than the alignment program”. You don’t know that; nobody knows that; Lord Kelvin/Einstein/Ehrlich/etc would all have said “X is an easier technical problem than flight/nuclear energy/feeding the world/etc” for a wide range of X, a few years before each of those actually happened.
c) The distinction between capabilities and alignment is a useful concept when choosing research on an individual level; but it’s far from robust enough to be a good organizing principle on a societal level. There is a lot of disagreement about what qualifies as which, and to which extent, even within the safety community; I think there are a whole bunch of predictable failure modes of the political position that “here is the bad thing that must be prevented at all costs, and here is the good thing we’re crucially trying to promote, and also everyone disagrees on where the line between them is and they’re done by many of the same people”. This feels like a recipe for unproductive or counterproductive advocacy, corrupt institutions, etc. If alignment researchers had to demonstrate that their work had no capabilities externalities, they’d never get anything done (just as, if renewables researchers had to demonstrate that their research didn’t involve emitting any carbon, they’d never get anything done). I will write about possible alternative framings in an upcoming post.
I’m guessing you would oppose a worldwide ban starting today on all “experimental” AI research (i.e., all use of computing resources to run AIs) till the scholars of the world settle on how to keep an AI aligned through the transition to superintelligence.
As written, I would oppose this. I doubt the world as a whole could solve alignment with zero AI experiments; feels like asking medieval theologians to figure out the correct theory of physics without ever doing experiments.
b) Same for “the capability program is an easier technical problem than the alignment program”. You don’t know that; nobody knows that; Lord Kelvin/Einstein/Ehrlich/etc would all have said “X is an easier technical problem than flight/nuclear energy/feeding the world/etc” for a wide range of X, a few years before each of those actually happened.
Even if we should be undecided here, there’s an asymmetry where, if you get alignment too early, that’s okay, but getting capabilities before alignment is bad. Unless we know that alignment is going to be easier, pushing forward on capabilities without an outsized alignment benefit seems needlessly risky.
On the object level, if we think the scaling hypothesis is roughly correct (or “close enough”) or if we consider it telling that evolution probably didn’t have the sophistication to install much specialized brain circuitry between humans and other great apes, then it seems like getting capabilities past some universality and self-improvement/self-rearrangement (“learning how to become better at learning/learning how to become better at thinking”) threshold cannot be that difficult? Especially considering that we arguably already have “weak AGI.” (But maybe you have an inside view that says we still have huge capability obstacles to overcome?)
At the same time, alignment research seems to be in a fairly underdeveloped state (at least my impression as a curious outsider), so I’d say “alignment is harder than capabilities” seems almost certainly true. Factoring in lots of caveats about how they aren’t always cleanly separable, and so on, doesn’t seem to change that.
Unless we know that alignment is going to be easier, pushing forward on capabilities without an outsized alignment benefit seems needlessly risky.
I am not disputing this :) I am just disputing the factual claim that we know which is easier.
I’d say “alignment is harder than capabilities” seems almost certainly true
Are you making the claim that we’re almost certainly not in a world where alignment is easy? (E.g. only requires something like Debate/IA and maybe some rudimentary interpretability techniques.) I don’t see how you could know that.
Are you making the claim that we’re almost certainly not in a world where alignment is easy? (E.g. only requires something like Debate/IA and maybe some rudimentary interpretability techniques.) I don’t see how you could know that.
I’m not sure if I’m claiming quite that, but maybe I am. It depends on operationalizations.
Most importantly, I want to flag that even the people who are optimistic about “alignment might turn out to be easy” probably lose their optimism if we assume that timelines are sufficiently short. Like, would you/they still be optimistic if we for sure had <2years? It seems to me that more people are confident that AI timelines are very short than people are confident that we’ll solve alignment really soon. In fact, no one seems confident that we’ll solve alignment really soon. So, the situation already feels asymmetric.
On assessing alignment difficulty, I sympathize most with Eliezer’s claims that it’s important to get things right on the first try and that engineering progress among humans almost never happened to be smoother than initially expected (and so is a reason for pessimism in combination with the “we need to get it right on the first try” argument). I’m less sure how much I buy Eliezer’s confidence that “niceness/helpfulness” isn’t easy to train/isn’t a basin of attraction. He has some story about how prosocial instincts evolved in humans for super-contingent reasons so that it’s highly unlikely to re-play in ML training. And there I’m more like “Hm, hard to know.” So, I’m not pessimistic for inherent technical reasons. It’s more that I’m pessimistic because I think we’ll fumble the ball even if we’re in the lucky world where the technical stuff is surprisingly easy.
That said, I still think “alignment difficulty?” isn’t the sort of question where the ignorance prior is 50-50. It feels like there are more possibilities for it to be hard than easy.
the capability program is an easier technical problem than the alignment program.
You don’t know that; nobody knows that
Do you concede that frontier AI research is intrinsically dangerous?
That it is among the handful of the most dangerous research programs ever pursued by our civilization?
If not, I hope you can see why those who do consider it intrinsically dangerous are not particularly mollified or reassured by “well, who knows? maybe it will turn out OK in the end!”
The distinction between capabilities and alignment is a useful concept when choosing research on an individual level; but it’s far from robust enough to be a good organizing principle on a societal level.
When I wrote “the alignment program” above, I meant something specific, which I believe you will agree is robust enough to organize society (if only we could get society to go along with it): namely, I meant thinking hard together about alignment without doing anything dangerous like training up models with billions of parameters till we have at least a rough design that most of the professional researchers agree is more likely to help us than to kill us even if it turns out to have super-human capabilities—even if our settling on that design takes us many decades. E.g., what MIRI has been doing the last 20 years.
I dislike the phrase “we all die”, nobody has justifiable confidence high enough to make that claim, even if ASI is misaligned enough to seize power there’s a pretty wide range of options for the future of humans
It makes me sad that you do not see that “we all die” is the default outcome that naturally happens unless a lot of correct optimization pressure is applied by the researchers to the design of the first sufficiently-capable AI before the AI is given computing resources. It would have been nice to have someone with your capacity for clear thinking working on the problem. Are you sure you’re not overly attached (e.g., for intrapersonal motivational reasons) to an optimistic vision in which AI research “feels like the early days of hacker culture” and “there are hackathons where people build fun demos”?
Interesting and insightful framing! I think the main concern I have is that your scenario 1 doesn’t engage much with the idea of capability info hazards and the point that some of the people who nerd out about technical research lack moral seriousness or big-picture awareness to not always push ahead.
Yepp, that seems right. I do think this is a risk, but I also think it’s often overplayed in EA spaces. E.g. I’ve recently heard a bunch of people talking about the capability infohazards that might arise from interpretability research. To me, it seems pretty unlikely that this concern should prevent people from doing or sharing interpretability research.
What’s the disagreement here? One part of it is just that some people are much more pessimistic about alignment research than I am. But it’s not actually clear that this by itself should make a difference, because even if they’re pessimistic they should “play to their outs”, and “interpretability becomes much better” seems like one of the main ways that pessimists could be wrong.
The main case I see for being so concerned about capability infohazards as to stop interpretability research is if you’re pessimistic about alignment but optimistic about governance. But I think that governance will still rely on e.g. a deep understanding of the systems involved. I’m pretty skeptical about strategies which only work if everything is shut down (and Scenario 2 is one attempt to gesture at why).
AI safety becomes the single community that’s the most knowledgeable about cutting-edge ML systems. The smartest up-and-coming ML researchers find themselves constantly coming to AI safety spaces, because that’s the place to go if you want to nerd out about the models. It feels like the early days of hacker culture.
I’d like to constructively push back on this: The research and open-source communities outside AI Safety that I’m embedded in are arguably just as, if not more hands-on, since their attitude towards deployment is usually more … unrestricted. For context, I mess around with generative agents and learning agents.
I broadly agree that the AI Safety community is very smart people working on very challenging and impactful problems. I’m just skeptical that what you’ve described is particularly unique to AI Safety, and think that descriptiom would apply to many ML-related spaces. Then again, I could be extremely inexperienced and unaware of the knowledge gap between top AI Safety researchers and everyone else.
Re: Environmentalism
much more similar to the environmentalist movement. It has broader reach, but alienates a lot of the most competent people in the relevant fields. ML researchers who find themselves in AI safety spaces are told they’re “worse than Hitler”
I was a climate activist organising FridaysForFuture (FFF) protests, and I don’t recall this was ever the prevailing perception/attitude. Mainstream activist movements and scientists put up a united front, and they still mutually support each other today. Even if it was superficial, FFF always emphasised “listen to the science”.
From a survey of FFF activists:
Our data show that activists overwhelmingly derive their goals from scientific knowledge and reject the idea that science could be used imprecisely just as an instrument to attain their goals.[1]
I’m also fairly certain the environmentalist was a counterfactual net positive, with Will Macaskill himself commenting on the role of climate advocacy in funding solar energy research and accelerating climate commitments in What We Owe The Future. However, I will admit that the anti-nuclear stance was exactly as dumb as you’ve implied, and it embarrasses me how many activists expressed it.
Re: Enemy of my Enemy
Personally, I draw a meaningful distinction between being anti-AI capabilities and pro-AI Safety. Both are strongly and openly concerned about rapid AI progress, but the two groups have very different motivations, proposed solutions and degree of epistemic rigour. Being anti-AI does not mean pro AI Safety, the former is a much larger umbrella movement of people expressing strong opinions on a disruptive, often misunderstood field.
I’d like to constructively push back on this: The research and open-source communities outside AI Safety that I’m embedded in are arguably just as, if not more hands-on, since their attitude towards deployment is usually more … unrestricted.
I think we agree: I’m describing a possible future for AI safety, not making the claim that it’s anything like this now.
I was a climate activist organising FridaysForFuture (FFF) protests, and I don’t recall this was ever the prevailing perception/attitude.
Not sure what you mean by this but in some AI safety spaces ML capabilities researchers are seen as opponents. I think the relevant analogy here would be, e.g. an oil executive who’s interested in learning more about how to reduce the emissions their company produces, who I expect would get a pretty cold reception.
Re “alienation”, I’m also thinking of stuff like the climate activists who are blocking highways, blocking offices, etc.
I’m also fairly certain the environmentalist was a counterfactual net positive, with Will Macaskill himself commenting on the role of climate advocacy in funding solar energy research and accelerating climate commitments in What We Owe The Future. However, I will admit that the anti-nuclear stance was exactly as dumb as you’ve implied, and it embarrasses me how many activists expressed it.
Makes sense! Yeah, I agree that a lot has been done to accelerate research into renewables; I just feel less confident than you about how this balances out compared with nuclear.
Personally, I draw a meaningful distinction between being anti-AI capabilities and pro-AI Safety.
I like this distinction, feels like a useful one. Thanks for the comment!
I think that the 2-scenario model described here is very important, and should be a foundation for thinking about the future of AI safety.
However, I think that both scenarios will also be compromised to hell. The attack surface for the AI safety community will be massive in both scenarios, ludicrously massive in scenario #2, but nonetheless still nightmarishly large in scenario #1.
Assessment of both scenarios revolves around how inevitable you think slow takeoff is- I think that some aspects of slow takeoff, such as intelligence agencies, already started around 10 years ago and at this point just involve a lot of finger crossing and hoping for the best.
Just like environmentalists often block some of the most valuable work on fixing climate change (e.g. nuclear energy, geoengineering, land use reform)
Something else you may note here. The reason environmentalists are wrong is they focus on the local issue and ignore the larger picture.
Nuclear energy: they focus on the local risk of a meltdown or waste disposal, and ignore the carbon emitting power plants that must be there somewhere else for each nuclear plant they successfully block. Carbon emissions are global, even the worst nuclear disaster is local.
Geoengineering: they simply won’t engage on actually discussing the cost benefit ratios. Their reasoning shuts down or they argue “we can’t know the consequences” as an argument to do nothing. This ignores the bigger picture that temperatures are rising and will continue to rise in all scenarios.
Land use reform : they focus on the local habitat loss to convert a house to apartments, or an empty lot to apartments, and ignore that laws of conservation of number of humans. Each human who can’t live in the apartment will live somewhere, and probably at lower density with more total environmental damage.
Demanding AI Pauses: This locally stops model training, if approved, in the USA and EU. The places they can see if they bring out the signs in San Francisco. It means that top AI lab employees will be laid off, bringing any “secret sauce” with them to work for foreign labs who are not restricted. It also frees up wafer production for foreign labs to order compute on the same wafers. If Nvidia is blocked from manufacturing H100s, it frees up a share in the market for a foreign chip vendor.
It has minimal, possibly zero effect on the development of AGI if you think wafer production is the rate limiting factor.
AI Pause generally means a global, indefinite pause on frontier development. I’m not talking about a unilateral pause and I don’t think any country would consider that feasible.
That’s a reasonable position but if a global pause on nuclear weapons could not be agreed on what’s different about AI?
If AI works to even a fraction of its potential, it’s a more useful tool than a nuclear weapon, which is mostly an expensive threat you can’t actually use most of the time, right?
Why would a multilateral agreement on this ever happen?
Assuming you agree AI is more tempting than nukes, what would lead to an agreement being possible?
(COI note: I work at OpenAI. These are my personal views, though.)
My quick take on the “AI pause debate”, framed in terms of two scenarios for how the AI safety community might evolve over the coming years:
AI safety becomes the single community that’s the most knowledgeable about cutting-edge ML systems. The smartest up-and-coming ML researchers find themselves constantly coming to AI safety spaces, because that’s the place to go if you want to nerd out about the models. It feels like the early days of hacker culture. There’s a constant flow of ideas and brainstorming in those spaces; the core alignment ideas are standard background knowledge for everyone there. There are hackathons where people build fun demos, and people figuring out ways of using AI to augment their research. Constant interactions with the models allows people to gain really good hands-on intuitions about how they work, which they leverage into doing great research that helps us actually understand them better. When the public ends up demanding regulation, there’s a large pool of competent people who are broadly reasonable about the risks, and can slot into the relevant institutions and make them work well.
AI safety becomes much more similar to the environmentalist movement. It has broader reach, but alienates a lot of the most competent people in the relevant fields. ML researchers who find themselves in AI safety spaces are told they’re “worse than Hitler” (which happened to a friend of mine). People get deontological about AI progress; some hesitate to pay for ChatGPT because it feels like they’re contributing to the problem (another true story); others overemphasize the risks of existing models in order to whip up popular support. People are sucked into psychological doom spirals similar to how many environmentalists think about climate change: if you’re not depressed then you obviously don’t take it seriously enough. Just like environmentalists often block some of the most valuable work on fixing climate change (e.g. nuclear energy, geoengineering, land use reform), safety advocates block some of the most valuable work on alignment (e.g. scalable oversight, interpretability, adversarial training) due to acceleration or misuse concerns. Of course, nobody will say they want to dramatically slow down alignment research, but there will be such high barriers to researchers getting and studying the relevant models that it has similar effects. The regulations that end up being implemented are messy and full of holes, because the movement is more focused on making a big statement than figuring out the details.
Obviously I’ve exaggerated and caricatured these scenarios, but I think there’s an important point here. One really good thing about the AI safety movement, until recently, is that the focus on the problem of technical alignment has nudged it away from the second scenario (although it wasn’t particularly close to the first scenario either, because the “nerding out” was typically more about decision theory or agent foundations than ML itself). That’s changed a bit lately, in part because a bunch of people seem to think that making technical progress on alignment is hopeless. I think this is just not an epistemically reasonable position to take: history is full of cases where people dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems. Either way, I do think public advocacy for strong governance measures can be valuable, but I also think that “pause AI” advocacy runs the risk of pushing us towards scenario 2. Even if you think that’s a cost worth paying, I’d urge you to think about ways to get the benefits of the advocacy while reducing that cost and keeping the door open for scenario 1.
I think it would be helpful for you to mention and highlight your conflict-of-interest here.
I remember becoming much more positive about ads after starting work at Google. After I left, I slowly became more cynical about them again, and now I’m back down to ~2018 levels.
EDIT: I don’t think this comment should get more than say 10-20 karma. I think it was a quick suggestion/correction that Richard ended up following, not too insightful or useful.
good call, will edit in
Hi Linch,
Cool that you pointed this out! I have the impression comments like yours just above often get lots of karma on EA Forum, particularly when coming from people who already have lots of karma. I wonder whether that is good.
Yeah I think it’s suboptimal. It makes sense that the comment had a lot of agree-votes. It’d also make more sense to upvote if Richard didn’t add in his COI after my comment, because then making the comment go up in visibility had a practical value of a) making sure almost everybody who reads Richard’s comment notices the COI and b) making it more likely for Richard to change his mind.
But given that Richard updated very quickly (in <1 hour), I think additional upvotes after his edit were superfluous.
I agree there’s a bias where the points more popular people make are evaluated more generously, but in this case I think the karma is well deserved. The COI point is important, and Linch highlights its importance with a relevant yet brief personal story. And while the comment was quick for Linch to make, some people in the EA community would hesitate to point out a conflict of interest in public for fear of being seen as a troublemaker, so the counterfactual impact is higher than it might seem. I strongly upvoted the comment.
I appreciate you drawing attention to the downside risks of public advocacy, and I broadly agree that they exist, but I also think the (admittedly) exaggerated framings here are doing a lot of work (basically just intuition pumping, for better or worse). The argument would be just as strong in the opposite direction if we swap the valence and optimism/pessimism of the passages: what if, in scenario one, the AI safety community continues making incremental progress on specific topics in interpretability and scalable oversight but achieves too little too slowly and fails to avert the risk of unforeseen emergent capabilities in large models driven by race dynamics, or even worse, accelerates those dynamics by drawing more talent to capabilities work? Whereas in scenario two, what if the AI safety movement becomes similar to the environmental movement by using public advocacy to build coalitions among diverse interest groups, becoming a major focus of national legislation and international cooperation, moving hundreds of billions of $ into clean tech research, etc.
Don’t get me wrong — there’s a place for intuition pumps like this, and I use them often. But I also think that both technical and advocacy approaches could be productive or counterproductive, and so it’s best for us to cautiously approach both and evaluate the risks and merits of specific proposals on their own. In terms of the things you mention driving bad outcomes for advocacy, I’m not sure if I agree — feeling uncertain about paying for ChatGPT seems like a natural response for someone worried about OpenAI’s use of capital, and I haven’t seen evidence that Holly (in the post you link) is exaggerating any risks to whip up support. We could disagree about these things, but my main point is that actually getting into the details of those disagreements is probably more useful in service of avoiding the second scenario than just describing it in pessimistic terms.
Yepp, I agree that I am doing an intuition pump to convey my point. I think this is a reasonable approach to take because I actually think there’s much more disagreement on vibes and culture than there is on substance (I too would like AI development to go more slowly). E.g. AI safety researchers paying for ChatGPT obviously brings in a negligible amount of money for OpenAI, and so when people think about that stuff the actual cognitive process is more like “what will my purchase signal and how will it influence norms?” But that’s precisely the sort of thing that has an effect on AI safety culture independent of whether people agree or disagree on specific policies—can you imagine hacker culture developing amongst people who were boycotting computers? Hence why my takeaway at the end of the post is not “stop advocating for pauses” but rather “please consider how to have positive effects on community culture and epistemics, which might not happen by default”.
I would be keen to hear more fleshed-out versions of the passages with the valences swapped! I like the one you’ve done; although I’d note that you’re focusing on the outcomes achieved by those groups, whereas I’m focusing also on the psychologies of the people in those groups. I think the psychological part is important because, as they say, culture eats strategy for breakfast. I do think that climate activists have done a good job at getting funding into renewables; but I think alignment research is much harder to accelerate (e.g. because the metrics are much less clear, funding is less of a bottleneck, and the target is moving much faster) and so trading off a culture focused on understanding the situation clearly for more success at activism may not be the right call here even if it was there.
This kind of reads as saying that 1 would be good because it’s fun (it’s also kind of your job, right?) and 2 would be bad because it’s depressing.
Huh, it really doesn’t read that way to me. Both are pretty clear causal paths to “the policy and general coordination we get are better/worse as a result.”
That too, but there was a clear indication that 1 would be fun and invigorating and 2 would be depressing.
I don’t think this is a coincidence—in general I think it’s much easier for people to do great research and actually figure stuff out when they’re viscerally interested in the problems they’re tackling, and excited about the process of doing that work.
Like, all else equal, work being fun and invigorating is obviously a good thing? I’m open to people arguing that the benefits of creating a depressing environment are greater (even if just in the form of vignettes like I did above), e.g. because it spurs people to do better policy work. But falling into unsustainable depressing environments which cause harmful side effects seems like a common trap, so I’m pretty cautious about it.
Totally. But OP kinda made it sound like the fact that you found 2 depressing was evidence it was the wrong direction. I think advocacy could be fun and full of its own fascinating logistical and intellectual questions as well as lots of satisfying hands-on work.
“hesitate to pay for ChatGPT because it feels like they’re contributing to the problem”
Yep that’s me right now and I would hardly call myself a Luddite (maybe I am tho?)
Can you explain why you frame this as an obviously bad thing to do? Refusing to help fund the most cutting edge AI company, which has been credited by multiple people with spurring on the AI race and attracting billions of dollars to AI capabilities seems not-unreasonable at the very least, even if that approach does happen to be wrong.
Sure there are decent arguments against not paying for chat GPT, like the LLM not being dangerous in and of itself, and the small amount of money we pay not making a significant difference, but it doesn’t seem to be prima-facie-obviously-net-bad-luddite behavior, which is what you seem to paint it as in the post.
Obviously if individual people want to use or not use a given product, that’s their business. I’m calling it out not as a criticism of individuals, but in the context of setting the broader AI safety culture, for two broad reasons:
In a few years’ time, the ability to use AIs will be one of the strongest drivers of productivity, and not using them will be… actually, less Luddite, and more Amish. It’s fine for some people to be Amish, but for AI safety people (whose work particularly depends on understanding AI well) not using cutting-edge AI is like trying to be part of the original hacker culture while not using computers.
I think that the idea of actually trying to do good effectively is a pretty radical one, and scope-sensitivity is a key component of that. Without it, people very easily slide into focusing on virtue signalling or ingroup/outgroup signalling (e.g. climate activists refusing to take flights/use plastic bags/etc), which then has knock-on effects in who is attracted to the movement, etc. On twitter I recently criticized a UK campaign to ban a specific dog breed for not being very scope-sensitive; you can think of this as similar to that.
I’m a bit concerned that both of your arguments here are a bit strawmannish, but again I might be missing something
Indeed ,my comment was regarding the 99.999 percent of people ( including myself) who are not AI researchers. I completely agree that researchers should be working on the latest models and paying for chat GPT 4, but that wasn’t my point.
I think it’s borderline offensive to call people “amish” who boycott potentially dangerous tech which can increase productivity. First it could be offensive to the Amish, as you seem to be using it as a perogative, and second boycotting any 1 technology for harm minimisation reasons while using all other technology can’t get compared to the Amish way of life. I’m not saying boycott all AI, that would be impossible anyway. Just perhaps not contributing financially to the company making the most cutting edge models.
This is a big discussion, but I think discarding not paying for chat GPT under the banner of poor scope sensitivity and virtue signaling is weak at best and straw Manning at worst. The environmentalists I know who don’t fly, don’t use it to virtue signal at all, they are doing it to help the world a little and show integrity with their lifestyles. This may or may not be helpful to their cause, but the little evidence we have also seems to show that more radical actions like this do not alienate regular people but instead pull people towards the argument your are trying to make, in this case that an AI frontier arms race might be harmful.
I actually changed my mind on this on seeing the forum posts here a few months ago, I used to think that radical life decisions and activism was likely to be net harmful too. what research we have on the topic shows that more radical actions attract more people to mainstream climate/animal activist ideals, so I think your comment “has knock-on effects in who is attracted to the movement, etc.” It’s more likely to be wrong than right.
I’d extend this not just to include AI researchers, but people who are involved in AI safety more generally. But on the question of the wider population, we agree.
“show integrity with their lifestyles” is a nicer way of saying “virtue signalling”, it just happens to be signalling a virtue that you agree with. I do think it’s an admirable display of non-selfishness (and far better than vice signalling, for example), but so too are plenty of other types of costly signalling like asceticism. A common failure mode for groups people trying to do good is “pick a virtue that’s somewhat correlated with good things and signal the hell out of it until it stops being correlated”. I’d like this not to happen in AI safety (more than it already has: I think this has already happened with pessimism-signalling, and conversely happens with optimism-signalling in accelerationist circles).
“show integrity with their lifestyles” is a nicer way of saying “virtue signalling”,
I would describe it more as a spectrum. On the more pure “virtue signaling” end, you might choose one relatively unimportant thing like signing a petition, then blast it all over the internet while not doing other more important actions that’s the cause.
Whereas on the other end of the spectrum, “showing integrity with lifestyle” to me means something like making a range of lifestyle choices which might make only s small difference to your cause, while making you feel like you are doing what you can on a personal level. You might not talk about these very much at all.
Obviously there are a lot of blurry lines in between.
Maybe my friends are different from yours, but climate activists I know often don’t fly, don’t drive and don’t eat meat. And they don’t talk about it much or “signal” this either. But when they are asked about it, they explain why. This means when they get challenged in the public sphere, both neutral people and their detractors lack personal ammunition to car dispersion on their arguments, so their position becomes more convincing.
I don’t call that virtue signaling, but I suppose it’s partly semantics.
One exchange that makes me feel particularly worried about Scenario 2 is this one here, which focuses on the concern that there’s:
I would like to point to this as a central example of the type of thing I’m worried about in scenario 2: the sort of doom spiral where people end up actively opposed to the most productive lines of research we have, because they’re conceiving of the problem as being arbitrarily hard. This feels very reminiscent of the environmentalists who oppose carbon capture or nuclear energy because it might make people feel better without solving the “real problem”.
It looks like, on net, people disagree with my take in the original post. So I’d like to ask the people who disagree: do you have reasons to think that the sort of position I’ve quoted here won’t become much more common as AI safety becomes much more activism-focused? Or do you think it would be good if it did?
I just disagreed with the OP because it’s a false dichotomy; we could just agree with the true things that activists believe, and not the false ones, and not go based on vibes. We desire to believe that mech-interp is mere safety-washing iff it is, and so on.
The problem here is doing insufficient safety R&D at AI labs that enables the AI labs to market themselves as seriously caring about safety and thus that their ML products are good for release.
You need to consider that, especially since you work at an AI lab.
Slightly conflicted agree vote: your model here offloads so much to judgment calls that fall on people who are vulnerable to perverse incentives (like, alignment/capabilities as a binary distinction is a bad frame, but it seems like anyone who’d be unusually well suited to thinking clearly about it’s alternatives make more money and have less stressful lives if their beliefs fall some ways vs others).
Other than that, I’m aware that no one’s really happy about the way they tradeoff “you could copenhagen ethics your way out of literally any action in the limit” against “saying that the counterfactual a-hole would do it worse if I didn’t is not a good argument”. It seems like a law of opposite advice situation, maybe? As in some people in the blase / unilateral / powerhungry camp could stand to be nudged one way and some people in the scrupulous camp could stand to be nudged another.
It also matters that the “oppose carbon capture or nuclear energy because it might make people feel better without solving the ‘real problem’.” environmentalists have very low standards even when you condition on them being environmentalists. That doesn’t mean they can’t be memetically adaptive and then influential, but it might be tactically important (i.e. you have a messaging problem instead of a more virtuous actually-trying-to-think-clearly problem)
There are 2 concurrent research programs, and if one program (capability) completes before the other one (alignment), we all die, but the capability program is an easier technical problem than the alignment program. Do you disagree with that framing? If not, then how does “research might proceed faster than we expect” give you hope rather than dread?
Also, I’m guessing you would oppose a worldwide ban starting today on all “experimental” AI research (i.e., all use of computing resources to run AIs) till the scholars of the world settle on how to keep an AI aligned through the transition to superintelligence. That’s my guess, but please confirm. In your answer, please imagine that the ban is feasible and in fact can be effective (“leak-proof”?) enough to give the AI theorists all they time they need to settle on a plan even if that takes many decades. In other words, please indulge me this hypothetical question because I suspect it is a crux.
“Settled” here means that a majority of non-senile scholars / researchers who’ve worked full-time on the program for at least 15 years of their lives agree that it is safe for experiments to start as long as they adhere to a particular plan (which this majority agree on). Kinda like the way scholars have settled on the conclusion that anthropogenic emissions of carbon dioxide are causing the Earth to warm up.
I realize that there are safe experiments that would help the scholars of the world a lot in their theorizing about alignment, but I also expect that the scholars / theorist could probably settle on a safe effective plan without the benefit of experiments beyond the experiments that have been run up to now even though it might take them longer than it would with the benefit. Do you agree?
Yepp, I disagree on a bunch of counts.
a) I dislike the phrase “we all die”, nobody has justifiable confidence high enough to make that claim, even if ASI is misaligned enough to seize power there’s a pretty wide range of options for the future of humans, including some really good ones (just like there’s a pretty wide range of options for the future of gorillas, if humans remain in charge).
b) Same for “the capability program is an easier technical problem than the alignment program”. You don’t know that; nobody knows that; Lord Kelvin/Einstein/Ehrlich/etc would all have said “X is an easier technical problem than flight/nuclear energy/feeding the world/etc” for a wide range of X, a few years before each of those actually happened.
c) The distinction between capabilities and alignment is a useful concept when choosing research on an individual level; but it’s far from robust enough to be a good organizing principle on a societal level. There is a lot of disagreement about what qualifies as which, and to which extent, even within the safety community; I think there are a whole bunch of predictable failure modes of the political position that “here is the bad thing that must be prevented at all costs, and here is the good thing we’re crucially trying to promote, and also everyone disagrees on where the line between them is and they’re done by many of the same people”. This feels like a recipe for unproductive or counterproductive advocacy, corrupt institutions, etc. If alignment researchers had to demonstrate that their work had no capabilities externalities, they’d never get anything done (just as, if renewables researchers had to demonstrate that their research didn’t involve emitting any carbon, they’d never get anything done). I will write about possible alternative framings in an upcoming post.
As written, I would oppose this. I doubt the world as a whole could solve alignment with zero AI experiments; feels like asking medieval theologians to figure out the correct theory of physics without ever doing experiments.
Even if we should be undecided here, there’s an asymmetry where, if you get alignment too early, that’s okay, but getting capabilities before alignment is bad. Unless we know that alignment is going to be easier, pushing forward on capabilities without an outsized alignment benefit seems needlessly risky.
On the object level, if we think the scaling hypothesis is roughly correct (or “close enough”) or if we consider it telling that evolution probably didn’t have the sophistication to install much specialized brain circuitry between humans and other great apes, then it seems like getting capabilities past some universality and self-improvement/self-rearrangement (“learning how to become better at learning/learning how to become better at thinking”) threshold cannot be that difficult? Especially considering that we arguably already have “weak AGI.” (But maybe you have an inside view that says we still have huge capability obstacles to overcome?)
At the same time, alignment research seems to be in a fairly underdeveloped state (at least my impression as a curious outsider), so I’d say “alignment is harder than capabilities” seems almost certainly true. Factoring in lots of caveats about how they aren’t always cleanly separable, and so on, doesn’t seem to change that.
I am not disputing this :) I am just disputing the factual claim that we know which is easier.
Are you making the claim that we’re almost certainly not in a world where alignment is easy? (E.g. only requires something like Debate/IA and maybe some rudimentary interpretability techniques.) I don’t see how you could know that.
I’m not sure if I’m claiming quite that, but maybe I am. It depends on operationalizations.
Most importantly, I want to flag that even the people who are optimistic about “alignment might turn out to be easy” probably lose their optimism if we assume that timelines are sufficiently short. Like, would you/they still be optimistic if we for sure had <2years? It seems to me that more people are confident that AI timelines are very short than people are confident that we’ll solve alignment really soon. In fact, no one seems confident that we’ll solve alignment really soon. So, the situation already feels asymmetric.
On assessing alignment difficulty, I sympathize most with Eliezer’s claims that it’s important to get things right on the first try and that engineering progress among humans almost never happened to be smoother than initially expected (and so is a reason for pessimism in combination with the “we need to get it right on the first try” argument). I’m less sure how much I buy Eliezer’s confidence that “niceness/helpfulness” isn’t easy to train/isn’t a basin of attraction. He has some story about how prosocial instincts evolved in humans for super-contingent reasons so that it’s highly unlikely to re-play in ML training. And there I’m more like “Hm, hard to know.” So, I’m not pessimistic for inherent technical reasons. It’s more that I’m pessimistic because I think we’ll fumble the ball even if we’re in the lucky world where the technical stuff is surprisingly easy.
That said, I still think “alignment difficulty?” isn’t the sort of question where the ignorance prior is 50-50. It feels like there are more possibilities for it to be hard than easy.
Do you concede that frontier AI research is intrinsically dangerous?
That it is among the handful of the most dangerous research programs ever pursued by our civilization?
If not, I hope you can see why those who do consider it intrinsically dangerous are not particularly mollified or reassured by “well, who knows? maybe it will turn out OK in the end!”
When I wrote “the alignment program” above, I meant something specific, which I believe you will agree is robust enough to organize society (if only we could get society to go along with it): namely, I meant thinking hard together about alignment without doing anything dangerous like training up models with billions of parameters till we have at least a rough design that most of the professional researchers agree is more likely to help us than to kill us even if it turns out to have super-human capabilities—even if our settling on that design takes us many decades. E.g., what MIRI has been doing the last 20 years.
It makes me sad that you do not see that “we all die” is the default outcome that naturally happens unless a lot of correct optimization pressure is applied by the researchers to the design of the first sufficiently-capable AI before the AI is given computing resources. It would have been nice to have someone with your capacity for clear thinking working on the problem. Are you sure you’re not overly attached (e.g., for intrapersonal motivational reasons) to an optimistic vision in which AI research “feels like the early days of hacker culture” and “there are hackathons where people build fun demos”?
Interesting and insightful framing! I think the main concern I have is that your scenario 1 doesn’t engage much with the idea of capability info hazards and the point that some of the people who nerd out about technical research lack moral seriousness or big-picture awareness to not always push ahead.
Yepp, that seems right. I do think this is a risk, but I also think it’s often overplayed in EA spaces. E.g. I’ve recently heard a bunch of people talking about the capability infohazards that might arise from interpretability research. To me, it seems pretty unlikely that this concern should prevent people from doing or sharing interpretability research.
What’s the disagreement here? One part of it is just that some people are much more pessimistic about alignment research than I am. But it’s not actually clear that this by itself should make a difference, because even if they’re pessimistic they should “play to their outs”, and “interpretability becomes much better” seems like one of the main ways that pessimists could be wrong.
The main case I see for being so concerned about capability infohazards as to stop interpretability research is if you’re pessimistic about alignment but optimistic about governance. But I think that governance will still rely on e.g. a deep understanding of the systems involved. I’m pretty skeptical about strategies which only work if everything is shut down (and Scenario 2 is one attempt to gesture at why).
Re: Hacker culture
I’d like to constructively push back on this: The research and open-source communities outside AI Safety that I’m embedded in are arguably just as, if not more hands-on, since their attitude towards deployment is usually more … unrestricted. For context, I mess around with generative agents and learning agents.
I broadly agree that the AI Safety community is very smart people working on very challenging and impactful problems. I’m just skeptical that what you’ve described is particularly unique to AI Safety, and think that descriptiom would apply to many ML-related spaces. Then again, I could be extremely inexperienced and unaware of the knowledge gap between top AI Safety researchers and everyone else.
Re: Environmentalism
I was a climate activist organising FridaysForFuture (FFF) protests, and I don’t recall this was ever the prevailing perception/attitude. Mainstream activist movements and scientists put up a united front, and they still mutually support each other today. Even if it was superficial, FFF always emphasised “listen to the science”.
From a survey of FFF activists:
I’m also fairly certain the environmentalist was a counterfactual net positive, with Will Macaskill himself commenting on the role of climate advocacy in funding solar energy research and accelerating climate commitments in What We Owe The Future. However, I will admit that the anti-nuclear stance was exactly as dumb as you’ve implied, and it embarrasses me how many activists expressed it.
Re: Enemy of my Enemy
Personally, I draw a meaningful distinction between being anti-AI capabilities and pro-AI Safety. Both are strongly and openly concerned about rapid AI progress, but the two groups have very different motivations, proposed solutions and degree of epistemic rigour. Being anti-AI does not mean pro AI Safety, the former is a much larger umbrella movement of people expressing strong opinions on a disruptive, often misunderstood field.
Frontiers | “Listen to the science!”—The role of scientific knowledge for the Fridays for Future movement (frontiersin.org)
I think we agree: I’m describing a possible future for AI safety, not making the claim that it’s anything like this now.
Not sure what you mean by this but in some AI safety spaces ML capabilities researchers are seen as opponents. I think the relevant analogy here would be, e.g. an oil executive who’s interested in learning more about how to reduce the emissions their company produces, who I expect would get a pretty cold reception.
Re “alienation”, I’m also thinking of stuff like the climate activists who are blocking highways, blocking offices, etc.
Makes sense! Yeah, I agree that a lot has been done to accelerate research into renewables; I just feel less confident than you about how this balances out compared with nuclear.
I like this distinction, feels like a useful one. Thanks for the comment!
I think that the 2-scenario model described here is very important, and should be a foundation for thinking about the future of AI safety.
However, I think that both scenarios will also be compromised to hell. The attack surface for the AI safety community will be massive in both scenarios, ludicrously massive in scenario #2, but nonetheless still nightmarishly large in scenario #1.
Assessment of both scenarios revolves around how inevitable you think slow takeoff is- I think that some aspects of slow takeoff, such as intelligence agencies, already started around 10 years ago and at this point just involve a lot of finger crossing and hoping for the best.
Something else you may note here. The reason environmentalists are wrong is they focus on the local issue and ignore the larger picture.
Nuclear energy: they focus on the local risk of a meltdown or waste disposal, and ignore the carbon emitting power plants that must be there somewhere else for each nuclear plant they successfully block. Carbon emissions are global, even the worst nuclear disaster is local.
Geoengineering: they simply won’t engage on actually discussing the cost benefit ratios. Their reasoning shuts down or they argue “we can’t know the consequences” as an argument to do nothing. This ignores the bigger picture that temperatures are rising and will continue to rise in all scenarios.
Land use reform : they focus on the local habitat loss to convert a house to apartments, or an empty lot to apartments, and ignore that laws of conservation of number of humans. Each human who can’t live in the apartment will live somewhere, and probably at lower density with more total environmental damage.
Demanding AI Pauses: This locally stops model training, if approved, in the USA and EU. The places they can see if they bring out the signs in San Francisco. It means that top AI lab employees will be laid off, bringing any “secret sauce” with them to work for foreign labs who are not restricted. It also frees up wafer production for foreign labs to order compute on the same wafers. If Nvidia is blocked from manufacturing H100s, it frees up a share in the market for a foreign chip vendor.
It has minimal, possibly zero effect on the development of AGI if you think wafer production is the rate limiting factor.
AI Pause generally means a global, indefinite pause on frontier development. I’m not talking about a unilateral pause and I don’t think any country would consider that feasible.
That’s a reasonable position but if a global pause on nuclear weapons could not be agreed on what’s different about AI?
If AI works to even a fraction of its potential, it’s a more useful tool than a nuclear weapon, which is mostly an expensive threat you can’t actually use most of the time, right?
Why would a multilateral agreement on this ever happen?
Assuming you agree AI is more tempting than nukes, what would lead to an agreement being possible?