Global moratorium on AGI, now (Twitter). Founder of CEEALAR (née the EA Hotel; ceealar.org)
Greg_Colbourn
Re “role-playing”, that is moot when it’s the end result that matters—what actions the AI takes in the world. See also: Frontier AI systems have surpassed the self-replicating red line.
Shutdown Avoidance (“do self-replication before being killed”), combined with the recent Apollo o1 research on propensity to attempt self-exfiltration, pretty much closes the loop on misaligned AIs escaping when given sufficient scaffolding to do so.
One can hope that the damage is limited and that it serves as an appropriate wake-up call to governments. I guess we’ll see..
Surely it’s just a matter of time—now that the method has been published—before AI models are spreading like viruses?
This is brilliant. I agree with almost all of it[1] - it’s a good articulation of how my own thinking on this has evolved over the last couple of years[2]. My timelines might be shorter, and my p(doom) higher, but it’s good to see an exposition for how one need not have such short timelines or high p(doom) to still draw the same conclusions. I recently donated significant amounts to PauseAI Global and PauseAI US. Your $30k to PauseAI US will get them to 5⁄6 of their current fundraising target—thank you!
- ^
Some points of disagreement, additional information, and emphasis in other comments I made as I read through.
- ^
Actually to be fair, it’s more detailed!
- ^
You should talk to David Pearce. His view of physicalism (phenomenal binding) precludes consciousness in digital minds[1].
I know of at least one potential counterexample: OpenAI’s RLHF was developed by AI safety people who joined OpenAI to promote safety. But it’s not clear that RLHF helps with x-risk.
I’d go further and say that it’s not actually a counterexample. RLHF allowed OpenAI to be hugely profitable—without it they wouldn’t’ve been able to publicly release their models and get their massive userbase.
(I can see an argument for blocking entrances to AI company offices, but I think the argument for blocking traffic is much weaker.)
I think Stop AI have taken this criticism onboard (having encountered it from a number of places). Their plan for the last couple of months has been to keep blocking OpenAI’s gates until they have their day in court[1] where they can make a “necessity” case for breaking the law to prevent a (much much) larger harm from occurring (or to prevent OpenAI from recklessly endangering everyone). Winning such a case would be huge.
- ^
They feature heavily in this recent documentary that is well worth a watch.
- ^
I don’t understand what’s going on here psychologically—according to the expressed beliefs of people like Dario Amodei and Shane Legg, they’re massively endangering their own lives in exchange for profit. It’s not even that they disagree with me about key facts, they’re just doing things that make no sense according to their own (expressed) beliefs.
Does anyone know what’s going on here? Dan Fagella says it’s a “Sardanapalus urge”, to want to be destroyed by their sand god (not anyone else’s), but I suspect it’s something more like extreme hubris[1] - irrational overconfidence. This is a very Silicon Valley / entrepreneurial trait. You pretty much have to go against the grain and against all the odds to win really big. But it’s one thing with making money, and another with your life (and yet another with everyone else on the planet’s lives too!).
I strongly believe that if Amodei, Altman, Legg and Hassabis were sat round a table with Omega and a 6 shooter with even 1 bullet in the chamber, they wouldn’t play a game of actual Russian Roulette with a prize of utopia/the glorious transhumanist future, let alone such a game with a prize of a mere trillion dollars.- ^
The biggest cases of hubris in the history of the known universe.
- ^
(I believe a lot of people get this wrong because they’re not thinking probabilistically. Someone has (say) a 10% P(doom) and a 10% chance of AGI within five years, and they round that off to “it’s not going to happen so we don’t need to worry yet.” A 10% chance is still really really bad.)
Yes! I’ve been saying this for a while, but most EAs still seem to be acting as if the median forecast is what is salient. If your regulation/alignment isn’t likely to be ready until the median forecasted date for AGI/TAI/ASI, then in half of all worlds you (we) don’t make it! When put like that, you can see that what seems like the “moderate” position is anything but—instead it is reckless in the extreme.
I think PauseAI US is less competent than some hypothetical alternative protest org that wouldn’t have made this mistake, but I also think it’s more competent than most protest orgs that could exist (or protest orgs in other cause areas).
Yes. In a short-timelines, high p(doom), world, we absolutely cannot let perfect be the enemy of the good. Being typical hyper-critical EAs might have lethal consequences[1]. We need many more people in advocacy if we are going to move the needle, so we shouldn’t be so discouraging of the people who are actually doing things. We should just accept that they won’t get everything right all the time.
In a short-timelines world, where inaction means very high p(doom), the bar for counterfactual net-negative[2] is actually pretty high. PauseAI is very far from reaching it.- ^
Or maybe I should say, “might actually be net-negative in and of itself”(!)
- ^
This term is over-used in EA/LW spaces, to the point where I think people often don’t actually think through fully what they are actually saying by using it. Is it actually net negative, integrating over all expected future consequences in worlds where it both does and doesn’t happen? Or is it just negative?
We need to develop AI as soon as possible because it will greatly improve people’s lives and we’re losing out on a huge opportunity cost.
This argument only makes sense if you have a very low P(doom) (like <0.1%) or if you place minimal value on future generations. Otherwise, it’s not worth recklessly endangering the future of humanity to bring utopia a few years (or maybe decades) sooner. The math on this is really simple—bringing AI sooner only benefits the current generation, but extinction harms all future generations. You don’t need to be a strong longtermist, you just need to accord significant value to people who aren’t born yet.
I’ve heard a related argument that the size of the accessible lightcone is rapidly shrinking, so we need to build AI ASAP even if the risk is high. If you do the math, this argument doesn’t make any sense (credence: 95%). The value of the outer edge of the lightcone is extremely small compared to its total volume.[17]
Accelerationists seem to not get to this part of Bostrom’s Astronomical Waste[1], which is in fact the most salient part [my emphasis in bold]:
III. The Chief Goal for Utilitarians Should Be to Reduce Existential Risk
In light of the above discussion, it may seem as if a utilitarian ought to focus her efforts on accelerating technological development. The payoff from even a very slight success in this endeavor is so enormous that it dwarfs that of almost any other activity. We appear to have a utilitarian argument for the greatest possible urgency of technological development.
However, the true lesson is a different one. If what we are concerned with is (something like) maximizing the expected number of worthwhile lives that we will create, then in addition to the opportunity cost of delayed colonization, we have to take into account the risk of failure to colonize at all. We might fall victim to an existential risk, one where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.8 Because the lifespan of galaxies is measured in billions of years, whereas the time-scale of any delays that we could realistically affect would rather be measured in years or decades, the consideration of risk trumps the consideration of opportunity cost. For example, a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility point-of-view) a delay of over 10 million years.
Therefore, if our actions have even the slightest effect on the probability of eventual colonization, this will outweigh their effect on when colonization takes place. For standard utilitarians, priority number one, two, three and four should consequently be to reduce existential risk. The utilitarian imperative “Maximize expected aggregate utility!” can be simplified to the maxim “Minimize existential risk!”.
- ^
TIL that highlighting a word and pasting (cmd-V) a URL makes it a link.
- ^
Some excerpts from the Apollo Research paper:
The alarmist rhetoric is kind of intentional. I hope it’s persuasive to at least some people. I’ve been quite frustrated post-GPT-4 over the lack of urgency in EA/LW over AI x-risk (as well as the continued cooperation with AGI accelerationists such as Anthropic). Actually to the point where I think of myself more as an “AI notkilleveryoneist” than an EA these days.
Thanks. I’m wondering now whether it’s mostly because I’m quoting Shakeel, and there’s been some (mostly unreasonable imo) pushback on his post on X.
Why is this being downvoted!?
Note that the protestors say [1]that they are going to use the “necessity defence” here.
- ^
Well worth watching this documentary by award winning journalist John Sherman
- ^
“to influence the policy of a government by intimidation” might fit, given that they may well end up more powerful than governments if they succeed in their mission to build AGI (and they already have a lot of money, power and influence).
We are fast running out of time to avoid ASI-induced extinction. How long until a model (that is intrinsically unaligned, given no solution yet to alignment) self-exfiltrates and initiates recursive self-improvement? We need a global moratorium on further AGI/ASI development asap. Please do what you can to help with this—talk to people you know, and your representatives. Support groups like PauseAI.
AI’s are already getting money with crypto memecoins. Wondering if there might be some kind of unholy mix of AI generated memecoins, crypto ransomware and self-replicating AI viruses unleashed in the near future.