What is the purpose of publicly deploying Claude? It seems like this will only have the effect of increasing arms race dynamics. If the reason is just to fund further safety research, then I think this is worth saying explicitly.
Joseph Miller
The International PauseAI Protest: Activism under uncertainty
Why I’m doing PauseAI
EA promoted earning to give When the movement largely moved away from it, not enough work was done to make that distance
Why would we want to do that? Earning to give is a good way to help the world. Maybe not the best, but still good.
Dear Anthropic people, please don’t release Claude
What is the risk level below which you’d be OK with unpausing AI?
I think approximately 1 in 10,000 chance of extinction for each new GPT would be acceptable given the benefits of AI. This is approximately my guess for GPT-5, so if we could release that model and then pause, I’d be okay with that.
A major consideration here is the use of AI to mitigate other x-risks. Some of Toby Ord’s x-risk estimates:
AI − 1 in 10
Engineering Pandemic − 1 in 30
Unforeseen anthropogenic risks (eg. dystopian regime, nanotech) − 1 in 30
Other anthropogenic risks − 1 in 50
Nuclear war − 1 in 1000
Climate change − 1 in 1000
Other environmental damage 1 in 1000
Supervolcano − 1 in 10,000
If there was a concrete plan under which AI could be used to mitigate pandemics and anthropogenic risks, then I would be ok with a higher probability of AI extinction, but it seems more likely that AI progress would increase these risks before it decreased them.
AI could be helpful for climate change and eventually nuclear war. So maybe I should be willing to go a little higher on the risk. But we might need a few more GPTs to fix these problems and if each new GPT is 1 in 10,000 then it starts to even out.
What do you think about the potential benefits from AI?
I’m very bullish about the benefits of an aligned AGI. Besides mitigating x-risk, I think curing aging should be a top priority and is worth taking some risks to obtain.
How do you interpret models of AI pause, such as this one from Chad Jones?
I’ve read the post quickly, but I don’t have a background in economics, so it would take me a while to fully absorb. My first impression is that it is interesting but not that useful for making decisions right now. The simplifications required by the model offset the gains in rigor. What do you think? Is it something I should take the time to understand?
My guess would be that the discount rate is pretty cruxy. Intuitively I would expect almost any gains over the next 1000 years to be offset by reductions in x-risk since we could have zillions of years to reap the benefits. (On a meta-level I believe moral questions are not “truthy” so this is just according to my vaguely total utilitarian preferences, not some deeper truth).
Is there a risk that Mustafa’s company could speed up the race towards dangerous capabilities?
Disheartening to a hear a pretty weak answer to this critical question. Analysis of his answer:
First, I think the primary threat to the stability of the nation-state is not the existence of these models themselves, or indeed the existence of these models with the capabilities that I mentioned. The primary threat to the nation-state is the proliferation of power.
I’m really not sure what this means and surprised Rob didn’t follow up on this. I think he must mean that they won’t be open sourcing the weights, which is certainly good. However, it’s unclear how much this matters if the model is available to call from an API. The argument may be that other actors can’t fine-tune the model to remove guardrails, which they have put in place to make the model completely safe. I was impressed to hear his claim about jailbreaks later on:
It isn’t susceptible to any of the jailbreaks or prompt hacks, any of them. If anybody gets one, send it to me on Twitter.
Although strangely he also said:
it doesn’t generate code;
Which is trivial to disprove, so I’m not sure what he meant by that. Regardless, I think that providing API access to a model distributes a lot of the “power” of the model to everyone in the world.
I’m not in the AGI intelligence explosion camp that thinks that just by developing models with these capabilities, suddenly it gets out of the box, deceives us, persuades us to go and get access to more resources, gets to inadvertently update its own goals.
There hasn’t ever been any very solid rebuttal of the intelligence explosion argument. It mostly gets dismissed of the basis of sounding like sci-fi. You can make a good argument that dangerous capabilities will emerge before we reach this point, and we may have a “slow take-off” in that sense. However, it seems to me that we should expect recursive self-improvement to happen eventually because there is no fundamental reason why it isn’t possible and it would clearly be useful for achieving any task. So the question is whether it will start before or after TAI. It’s pretty clear that no one knows the answer to this question so it’s absurd to be gambling the future of humanity on this point.
Me not participating certainly doesn’t reduce the likelihood that these models get developed.
The AI race currently consists of a small handful of companies. A CEO who was actually trying to minimize the risk of extinction would at least attempt to coordinate a deceleration between these 4 or 5 actors before dismissing this as a hopeless tragedy of the commons.
First, predicting the values of our successors – what John Danaher (2021) calls axiological futurism – in worlds where these are meaningfully different from ours doesn’t seem intractable at all. Significant progress has already been made in this research area and there seems to be room for much more (see the next section and the Appendix).
Could you point more specifically to what progress you think has been made? As this research area seems to have only existed since 2021 we can’t have yet made successful predictions about future values so I’m curious what has been achieved.
I also don’t like this post and I’ve deleted most of it. But I do feel like this is quite important and someone needs to say it.
Anthropic Announces new S.O.T.A. Claude 3
People are clearly using agree / disagree voting wrong. What does it mean to agree vote a question?
Where in Cambridge will this take place (accommodation / venue)?
Is compensation for both students and mentors?
Will you provide/subsidize access to GPUs?
Thanks, Rudolf, I think this is a very important point, and probably the best argument against PauseAI. It’s true in general that The Ends Do Not Justify the Means (Among Humans).
My primary response is that you are falling for status-quo bias. Yes this path might be risky, but the default path is more risky. My perception is the current governance of AI is on track to let us run some terrible gambles with the fate of humanity.
Consider environmentalism. It seems quite uncertain whether the environmentalist movement has been net positive (!).
We can play reference class tennis all day but I can counter with the example of the Abolitionists, the Suffragettes, the Civil Rights movement, Gay Pride or the American XL Bully.
It seems to me that people overstate the track record of populist activism at solving complicated problems
...
the science is fairly straightforward, environmentalism is clearly necessary, and the movement has had huge winsAs I argue in the post, I think this is an easier problem than climate change. Just as most people don’t need a detailed understanding of the greenhouse effect, most people don’t need a detailed understanding of the alignment problem (“creating something smarter than yourself is dangerous”).
The advantage with AI is that there is a simple solution that doesn’t require anyone to make big sacrifices, unlike with climate change. With PauseAI, the policy proposal is right there in the name, so it is harder to become distorted than vaguer goals of “environmental justice”.
fighting Moloch rather than sacrificing our epistemics to him for +30% social clout
I think to a significant extent it is possible for PauseAI leadership to remain honest while still having broad appeal. Most people are fine if you say that “I in particular care mostly about x-risk, but I would like to form a coalition with artists who have lost work to AI.”
There is a spirit here, of truth-seeking and liberalism and building things, of fighting Moloch rather than sacrificing our epistemics to him for +30% social clout. I admit that this is partly an aesthetic preference on my part. But I do believe in it strongly.
I’m less certain about this but I think the evidence is much less strong than rationalists would like to believe. Consider: why has no successful political campaign ever run on actually good, nuanced policy arguments? Why do advertising campaigns not make rational arguments for why should prefer their product, instead appealing to your emotions? Why did it take until 2010 for people to have the idea of actually trying to figure out which charities are effective? The evidence is overwhelming that emotional appeals are the only way to persuade large numbers of people.
If we make the conversation about AIS more thoughtful, reasonable, and rational, it increases the chances that the right thing (whatever that ends up being—I think we should have a lot of intellectual humility here!) ends up winning.
Again, this seems like it would be good, but the evidence is mixed. People were making thoughtful arguments for why pandemics are a big risk long before Covid, but the world’s institutions were sufficiently irrational that they failed to actually do anything. If there had been an emotional, epistemically questionable mass movement calling for pandemic preparedness, that would have probably been very helpful.
Most economists seem to agree that European monetary policy is pretty bad and significantly harms Europe, but our civilization is too inadequate to fix the problem. Many people make great arguments about why aging sucks and it should really be a top priority to fix, but it’s left to Silicon Valley to actually do something. Similarly for shipping policy, human challenge trials and starting school later. There is long list of preventable, disastrous policies which society has failed to fix due lack of political will, not lack of sensible arguments.
There’s a crux which is very important. If you only want to attend protests where the protesters are reasonable and well informed and agree with you, then you implicitly only want to attend small protests.
It seems pretty clear to me that most people are much less concerned about x-risk than job loss and other concerns. So we have to make a decision—do we stick to our guns and have the most epistemically virtuous protest movement in history and make it 10x harder to recruit new people and grow the moment? Or do we compromise and welcome people with many concerns, form alliances with groups we don’t agree with in order to have a large and impactful movement?
It would be a failure of instrumental rationality to demand the former. This is just a basic reality about solving coordination problems.
[To provide a counter argument: having a big movement that doesn’t understand the problem is not useful. At some point the misalignment between the movement and the true objective will be catastrophic.
I don’t really buy this because I think that pausing is a big and stable enough target and it is a good solution for most concerns.]
This is something I am actually quite uncertain about so I would like to hear your opinion.
I mean something like “the scenario where there is no pause and also no other development that currently seems very unlikely and changes the level of risk dramatically (eg. a massive breakthrough in human brain emulation next year).”
Notably, I doubt we’ll discover the difference between GPT4 and superhuman to be small and I doubt GPT5 will be extremely good at interpretability.
I also doubt it, but I am not 1 in 10,000 confident.
It’s also worth remembering that this is advertising. Claiming to be a little bit better on some cherry picked metrics a year after GPT-4 was released is hardly a major accelerant in the overall AI race.
Fair point. On the other hand, the perception is in many ways more important than the actual capability in terms of incentivizing competitors to race faster.
Also based on early user reports it seems to actually be noticably better than GPT-4.
The main message of this post is that current PauseAI protest’s primary purpose is to build momentum for a later point.
This post is just my view. As with Effective Altruism, PauseAI does not have a homogenous point of view or a specific required set of beliefs to participate. I expect that the main organizers of PauseAI agree that GPT-5 is very unlikely to end the world. Whether they think it poses an acceptable risk, I’m not sure.
This paragraph seems too weak for how important it is in the argument. Notably, I doubt we’ll discover the difference between GPT4 and superhuman to be small and I doubt GPT5 will be extremely good at interpretability.
The important question for the argument is whether GPT-6 will pose an unacceptable risk.
Was there some blocker that caused this to happen now, rather than 6 months / 1 year ago?