Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
I tend to put P(doom) around 80%, so I think I’m on the pessimistic side, and I tend to think short timelines are at least a real and serious possibility that we should be planning for. Nevertheless, I disagree with a global stop or a pause being the “only reasonable hope”—global stops and pauses seem basically unworkable to me. I’m much more excited about governmentally enforced Responsible Scaling Policies, which seem like the “better option” that you’re missing here.
@evhub can you say more about what you envision a governmentally-enforced RSP world would look like? Is it similar to licensing? What happens when a dangerous capability eval goes off— does the government have the ability to implement a national pause?
Aside: IMO it’s pretty clear that the voluntary-commitment RSP regime is insufficient, since some companies simply won’t develop RSPs, and even if lots of folks adopted RSPs, the competitive pressures in favor of racing seem like they’d make it hard for anyone to pause for >a few months. I was surprised/disappointed that neither ARC nor Anthropic mentioned this. ARC says some stuff about how maybe in the future one day we might have some stuff from RSPs that could maybe inform government standards, but (in my opinion) their discussion of government involvement was quite weak, perhaps even to the point of being misleading (by making it seem like the voluntary commitments will be sufficient.)
I think some of the negative reaction to responsible scaling, at least among some people I know, is that it seems like an attempt for companies to say “trust us— we can scale responsibly, so we don’t need actual government regulation.” If the narrative is “hey, we agree that the government should force everyone to scale responsibly, and this means that the government would have the ability to tell people that they have to stop scaling if the government decides it’s too risky”, then I’d still probably prefer stopping right now, but I’d be much more sympathetic to the RSP position.
I think presumably the pause would just be for that company’s scaling—presumably other organizations that were still in compliance would still be fine.
That’s definitely my position, yeah—and I think it’s also ARC’s and Anthropic’s position. I think the key thing with the current advocacy around companies doing this is that one of the best ways to get a governmentally-enforced RSP regime is for companies to first voluntarily commit to the sort of RSPs that you want the government to later enforce.
Thanks! A few quick responses/questions:
I think this makes sense for certain types of dangerous capabilities (e.g., a company develops a system that has strong cyberoffensive capabilities. That company has to stop but other companies can keep going).
But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipulation, power-seeking, deception) or scientific capabilities (e.g., the ability to develop better AI systems)?
Supposing that 3-10 other companies are within a few months of these systems, do you think at this point we need a coordinated pause, or would it be fine to just force company 1 to pause?
Do you know if ARC or Anthropic have publicly endorsed this position anywhere? (And if not, I’d be curious for your take on why, although that’s more speculative so feel free to pass).
I wrote up a bunch of my thoughts on this in more detail here.
What should happen there is that the leading lab is forced to stop and try to demonstrate that e.g. they understand their model sufficiently such that they can keep scaling. Then:
If they can’t do that, then the other labs catch up and they’re all blocked on the same spot, which if you’ve put your capabilities bars at the right spots, shouldn’t be dangerous.
If they can do that, then they get to keep going, ahead of other labs, until they hit another blocker and need to demonstrate safety/understanding/alignment to an even greater degree.
Hi Evan,
What is your median time from now until human extinction? If it is only a few years, I would be happy to set up a bet like this one.
I mention Responsible Scaling!
EDIT to add: I’m interested in a response from evhub (or anyone else) to the points raised against Responsible Scaling (see links for more details).
I guess I’m not really sure what your objection is to Responsible Scaling Policies? I see that there’s a bunch of links, but I don’t really see a consistent position being staked out by the various sources you’ve linked to. Do you want to describe what your objection is?
I guess the closest there is “the danger is already apparent enough” which, while true, doesn’t really seem like an objection. I agree that the danger is apparent, but I don’t think that advocating for a pause is a very good way to address that danger.
The consistent position is that further scaling is reckless at this stage; it can’t be done in a “responsible” way, unless you think subjecting the world to a 10-25% risk of extinction is a responsible thing to be doing!
What is a better way of addressing the danger? Waiting for it to get more intense and more apparent by scaling further!? Waiting until a disaster actually happens? Actually pausing, or stopping (and setting an example), rather than just advocating for a pause?
Perhaps the crux is related to how dangerous you think current models are? I’m quite confident that we have at least a couple additional orders of magnitude of scaling before the world ends, so I’m not too worried about stopping training of current models, or even next-generation models. But I do start to get worried with next-next-generation models.
So, in my view, the key is to make sure that we have a well-enforced Responsible Scaling Policy (RSP) regime that is capable of preventing scaling unless hard safety metrics are met (I favor understanding-based evals for this) before the next two scaling generations. That means we need to get good RSPs into law with solid enforcement behind them and—at least in very short timeline worlds—that needs to happen in the next few years. By far the best way to make that happen, in my opinion, is to pressure labs to put out good RSPs now that governments can build on.
I don’t think the current models are dangerous, but perhaps they could be if used for long enough on improving AI. A couple of orders of magnitude (or a couple of generations) is only a couple of years! This is soon enough to be pushing as hard as we can for a pause right now!
Why try and take it right down to the wire with RSPs? Why over-complicate things? The stakes couldn’t be bigger (extinction). It’s super reckless to not just be saying “It seems quite likely we’re getting to world-ending models in 2-5 years. Let’s not keep going any longer. Let’s just stop now.” The tradeoff [edit: for Anthropic] for a few tens of $Bs of extra profit really doesn’t seem worth it!
I mean, yes, obviously we should be doing everything we can right now. I just think that a RSP-gated pause is the right way to do a pause. I’m not even sure what it would mean to do a pause without an RSP-like resumption condition.
Because it’s more likely to succeed. RSPs provides very clear and legible risk-based criteria that are much more plausibly things that you could actually get a government to agree to.
This seems extremely disingenuous and bad faith. That’s obviously not the tradeoff and it confuses me why you would even claim that. Surely you know that I am not Sam Altman or Dario Amodei or whatever.
The actual tradeoff is the probability of success. If I thought e.g. just advocating for a six month pause right now was more effective at reducing existential risk, I would do it.
Have the resumption condition be a global consensus on an x-safety solution or a global democratic mandate for restarting (and remember there are more components of x-safety than just alignment—also misuse and multi-agent coordination).
I think if governments actually properly appreciated the risks, they could agree to an unconditional pause.
Sorry. I’m looking at it at the company level. Please don’t take my critiques as being directed at you personally. What is in it for Anthropic and OpenAI and DeepMind to keep going with scaling? Money and power, right? I think it’s pushing it a bit at this stage to say that they, as companies, are primarily concerned with reducing x-risk. If they were they would’ve stopped scaling already. Forget the (suicide) race. Set an example to everyone and just stop!
This seems basically unachievable and even if it was achievable it doesn’t even seem like the right thing to do—I don’t actually trust the global median voter to judge whether additional scaling is safe or not. I’d much rather have rigorous technical standards than nebulous democratic standards.
That’s why we should be pushing them to have good RSPs! I just think you should be pushing on the RSP angle rather than the pause angle.
Fair. And where I say “global consensus on an x-safety”, I mean expert opinion (as I say in the OP). I expect the public to remain generally a lot more conservative than the technical experts though, in terms of risk they are willing to tolerate.
The RSP angle is part of the corporate “big AI” “business as usual” agenda. To those of us playing the outside game it seems very close to safetywashing.
I’ve written up more about why I think this is not true here.
Thanks. I’m not convinced.
Why are people downvoting my reply without comment, and upvoting evhub’s comment? It’s the most upvoted comment, even though he clearly didn’t even ctrl-F for “Responsible Scaling” / notice that I’d addressed it in the OP!
Thanks for sharing your thoughts, Greg!
Assuming you believe there is a 75 % chance of AGI within the next 5 years, the above suggests your median time from now until doom is 3.70 years (= 0.5*(5 − 0)/0.75/0.9). Is your median time from now until human extinction also close to 3.70 years? If so, we can set up a bet similar to the one between Bryan Caplan and Eliezer Yudkowsky:
I send you 10 k 2023-$ in the next few days.
If humans do not go extinct in 3.70 years, or until the end of 2027 for simplicity, you send me 19 k 2023-$.
The expected profit is:
For you, 500 2023-$ (= 10*10^3 − 0.5*19*10^3).
For me, 9.00 k 2023-$ (= −10*10^3 + 19*10^3), as I think the chance of humans going extinct until the end of 2027 is basically negligible. I would guess around 10^-7 per year.
The expected profit is quite positive for both of us, so we would agree on the bet as long as my (your) marginal earnings after loosing 10 k 2023-$ (19 k 2023-$) would still go towards donations, which arguably do not have much diminishing returns.
I guess my marginal earnings after loosing 10 k 2023-$ would still go towards donations, so I am happy to take the bet.
Hi Vasco, sorry for the delay getting back to you. I have actually had a similar bet offer up on X for nearly a year (offering to go up to $250k) with only one taker for ~$30 so far! My one is you give x now and I give 2x in 5 years, which is pretty similar. Anyway, happy to go ahead with what you’ve suggested.
I would donate the $10k to PauseAI (I would say $10k to PauseAI in 2024 is much greater EV than $19k to PauseAI at end of 2027).
[BTW, I have tried to get Bryan Caplan interested too, to no avail—if anyone is in contact with him, please ask him about it.]
As much as I may appreciate a good wager, I would feel remiss not to ask if you could get a better result for amount of home equity at risk by getting a HELOC and having a bank be the counterparty? Maybe not at lower dollar amounts due to fixed costs/fees, but likely so nearer the $250K point—especially with the expectation that interest rates will go down later in the year.
I don’t have a stable income so I can’t get bank loans (I have tried to get a mortgage for the property before and failed—they don’t care if you have millions in assets, all they care about is your income[1], and I just have a relatively small, irregular rental income (Airbnb). But I can get crypto-backed smart contract loans, and do have one out already on Aave, which I could extend.).
Also, the signalling value of the wager is pretty important too imo. I want people to put their money where their mouth is if they are so sure that AI x-risk isn’t a near term problem. And I want to put my money where my mouth is too, to show how serious I am about this.
I think this is probably because they don’t want to go through the hassle of actually having to repossess your house, so if this seems at all likely they won’t bother with the loan in the first place.
Thanks for following up, Greg! Strongly upvoted. I will try to understand how I can set up a contract describing the bet with your house as collateral.
Could you link to the post on X you mentioned?
I will send you a private message with Bryan’s email.
Definitely seek legal advice in the country and subdivision (e.g., US state) where Greg lives!
You may think of this as a bet, but I’ll propose an alternative possible paradigm: it’s may be a plain old promissory note backed by a mortgage. That is, a home-equity loan with an unconditional balloon payment in five years. Don’t all contracts in which one party must perform in the future include a necessarily implied clause that performance is not necessary in the event that the human race goes extinct by that time? At least, I don’t plan on performing any of my future contractual obligations if that happens . . . .
So even assuming this wouldn’t be unenforceable as gambling, it might run afoul of the rules for mortgage lending (e.g., because the implied interest rate [~14.4%?] is seen as usurious, or because it didn’t comply with local or national laws regulating mortgage lending). That is a pretty regulated industry in general. It would definitely need to follow all the formalities for secured lending against real property: we require those formalities to make sure the borrower knows what he is getting into, and to give notice to other would-be lenders that they would be further back in line on repayment.
I should also note that it is pretty difficult in many places to force a sale on someone’s primary residence if you hold certain types of security interests (as opposed to, e.g., a primary mortgage). So you might be holding a lien that doesn’t have much practical value unless/until Greg decides to sell and there is value after paying off whoever is ahead in line on payment. Again, I can only advise seeking legal counsel in the right jurisdiction.
The off-the-wall thought I have is that Greg might be able to get around some difficulties by delivering a promissory note backed by a recorded security interest to an unrelated charity. But at the risk of sounding like a broken record, everyone would need legal advice from someone licensed in the jurisdiction before embarking on any approach in this rather unusual and interesting scenario.
Thanks for sharing your thoughts, Jason!
Cool, thanks. I link to one post in the comment above. But see also.
Thanks! Could you also clarify where is your house, whether you live there or elsewhere, and how much cash you expect to have by the end of 2027 (feel free to share the 5th percentile, median and 95th percentile)?
It’s in Manchester, UK. I live elsewhere—renting currently, but shortly moving into another owned house that is currently being renovated (I’ve got a company managing the would-be-collateral house as an Airbnb, so no long term tenants either). Will send you more details via DM.
Cash is a tricky one, because I rarely hold much of it. I’m nearly always fully invested. But that includes plenty of liquid assets like crypto. Net worth wise, in 2027, assuming no AI-related craziness, I would be expect it to be in the 7-8 figure range, 5-95% maybe $500k-$100M).
Update. I bet Greg Colbourn 10 k€ that AI will not kill us all by the end of 2027.
Greg can presumably also just take out a loan? I think this will likely dominate the bet you proposed given that your implied interest rates are very high.
The bet might be nice symbolism though.
As I say above, I’ve been offering a similar bet for a while already. The symbolism is a big part of it.
I can currently only take out crypto-backed loans, which have been quite high interest lately (don’t have a stable income so can’t get bank loans or mortgages), and have considered this but not done it yet.
Thanks for the suggestion, Ryan. As I side note, I would be curious to know how my comment could be improved, as I see it was downvoted. I guess it is too adversarial.
I feel like there is a nice point there, but I am not sure I got it. By taking a loan, Greg would loose purchasing power in expectation (meanwhile, I have replaced “$” by “2023-$” in my comment), but he would gain it by taking the bet. People still take loans because they could value additional purchasing power now more than in the future, but this is why I said the bet would only make sense if my and Greg’s marginal earnings would continue to go towards donations if we lost the bet. To ensure this, I would consider the bet a risky investment, and move some of my investments from stocks to bonds to offset at least part of the increase in risk. Even then, I would want to set up an agreement with signatures from both of us, and a 3rd party before going ahead with the bet.
Yes, I think symbolism would plausibly dominate the benefits for Greg.
The key thing is that you don’t have to pay off loans if we’re all dead. So all loans are implicitly bets about whether society will collapse by some point.
Re risk, as per my offer on X, I’m happy to put my house up as collateral if you can be bothered to get the paperwork done. Otherwise happy to just trade on reputation (you can trash mine publicly if I don’t pay up).
(I didn’t downvote it.)
Would be interested to see your reasoning for this, if you have it laid out somewhere. Is it mainly because you think it’s ~impossible for AGI/ASI to happen in that time? Or because it’s ~impossible for AGI/ASI to cause human extinction?
I have not engaged so much with AI risk, but my views about it are informed by considerations in the 2 comments in this thread. Mammal species usually last 1 M years, and I am not convinced by arguments for extinction risk being much higher (I would like to see a detailed quantitative model), so I start from a prior of 10^-6 extinction risk per year. Then I guess the risk is around 10 % as high as that because humans currently have tight control of AI development.
To be consistent with 10^-7 extinction risk, I would guess 0.1 % chance of gross world product growing at least 30 % in 1 year until 2027, due to bottlenecks whose effects are not well modelled in Tom Davidson’s model, and 0.01 % chance of human extinction conditional on that.
Interesting. Obviously I don’t want to discourage you from the bet, but I’m surprised you are so confident based on this! I don’t think the prior of mammal species duration is really relevant at all, when for 99.99% of the last 1M years there hasn’t been any significant technology. Perhaps more relevant is homo sapiens wiping out all the less intelligent hominids (and many other species).
On the question of priors, I liked AGI Catastrophe and Takeover: Some Reference Class-Based Priors. It is unclear to me whether extinction risk has increased in the last 100 years. I estimated an annual nuclear extinction risk of 5.93*10^-12, which is way lower than the prior for wild mammals of 10^-6.
I see in your comment on that post, you say “human extinction would not necessarily be an existential catastrophe” and “So, if advanced AI, as the most powerful entity on Earth, were to cause human extinction, I guess existential risk would be negligible on priors?”. To be clear: what I’m interested in here is human extinction (not any broader conception of “existential catastrophe”), and the bet is about that.
Agreed.
See my comment on that post for why I don’t agree. I agree nuclear extinction risk is low (but probably not that low)[1]. ASI is really the only thing that is likely to kill every last human (and I think it is quite likely to do that given it will be way more powerful than anything else[2]).
But too be clear, global catastrophic / civilisational collapse risk from nuclear is relatively high (these often get conflated with “extinction”).
Not only do I think it will kill every last human, I think it’s quite likely it will wipe out all known carbon-based life.
I upvoted this offer. I have an alert for bet proposals on the forum, and this is the first genuine one I’ve seen in a while.
“I think AGI is 0-5 years away” != “I am certain AGI will happen in within five years.” I think it is best read as implying somewhere between 51 and 100% confidence, at least standing alone. Depending on where it is set, you probably should offer another ~12-18 months.
Nice point, Jason! I have adjusted the numbers above to account for that. I have also replaced “$” by “2023-$” to account for inflation.
Re p(doom) being high, I don’t think you need to commit to the view that the most likely outcome of AGI is doom. Surveys of AI researchers put the risk from rogue AI at 5%. In the XPT survey professional forecasters put the risk of extinction from AI at 0.03% by 2050 and 0.38% by 2100, whereas domain experts put the chance at 1.1% and 3%, respectively. Though they had longer timelines than you seem to endorse here.
I think your argument goes through even if the risk is 1% conditional on AGI and that seems like an estimate unlikely to upset too many people, so I would just go with that
I still don’t understand where the 95% for non-doom is coming from. I think it’s useful to look at actual mechanisms for why people think this (and so far I’ve found them lacking). The qualifications of the “professional forecasters” in the XPT survey are in doubt (and again, it was pre-GPT-4).
The argument might go through even if the risk is 1%, but people sure aren’t acting like that. At least in EA, broadly speaking (where I imagine the average p(doom|AGI) is closer to 10%). Also, I’d rather just say what I actually believe, even if it sounds “alarmist”. At least I’ve tried to argue for it in some detail. The main reason I am prioritising this so much is because I think it’s the most likely reason I, and everyone I know and love, will die. Unless we stop it. Forget longtermism and EA: people need to understand that this is a threat to their own personal near-term survival.
I think you come across as over-confident, not alarmist, and I think it hurts how you come across quite a lot. (We’ve talked a bit about the object level before.) I’d agree with John’s suggested approach.
Relatedly, I also think that your arguments for “p(doom|AGI)” being high aren’t convincing to people that don’t share your intuitions, and it looks like you’re relying on those (imo weak) arguments, when actually you don’t need to
I’m crying out for convincing gears-level arguments against (even have $1000 bounty on it), please provide some.
The issue is that both sides of the debate lack gears-level arguments. The ones you give in this post (like “all the doom flows through the tiniest crack in our defence”) are more like vague intuitions; equally, on the other side, there are vague intuitions like “AGIs will be helping us on a lot of tasks” and “collusion is hard” and “people will get more scared over time” and so on.
I’d say it’s more than a vague intuition. It follows from alignment/control/misuse/coordination not being (close to) solved and ASI being much more powerful than humanity. I think it should be possible to formalise it, even. “AGIs will be helping us on a lot of tasks”, “collusion is hard” and “people will get more scared over time” aren’t anywhere close to overcoming it imo.
These are what I mean by the vague intuitions.
Nobody has come anywhere near doing this satisfactorily. The most obvious explanation is that they can’t.
To be fair, I think I’m partly making wrong assumptions about what exactly you’re arguing for here.
On a slightly closer read, you don’t actually argue in this piece that it’s as high as 90% - I assumed that because I think you’ve argued for that previously, and I think that’s what “high” p(doom) normally means.
I do think it is basically ~90%, but I’m arguing here for doom being the default outcome of AGI; I think “high” can reasonably be interpreted as >50%.
I feel like this is a case of death by epistemic modesty, especially when it isn’t clear how these low p(doom) estimates are arrived at in a technical sense (and a lot seems to me like a kind of “respectability heuristic” cascade). We didn’t do very well with Covid as a society in the UK (and many other countries), following this kind of thinking.
What part of Greg writing comes across as over confident?
I suppose one solution might be to say that your personal view is that pdoom is >50%, but a range of estimates suggest >1% is plausible
Thanks for this post Greg.
Re your point about scaling, the Michael et al survey of NLP researchers suggests that researchers don’t think scaling will take us all the way there.
Figure 4.
Based on my limited understanding I agree with you that it seems pretty plausible that scaling does take us to human level AI and beyond, but the experts seem to disagree and I’m not sure why
Interesting. I’ll note that this survey was pre-GPT-4 (or even before GPT3.5 was in widespread use? May-June 2022) when (I think) people were still sceptical of LLMs being able to do well on university exams, amongst many other things. Would be interesting to see a similar survey that is post-GPT-4 (I’ve not been able to find anything). I predict that it will show a significantly higher % agreeing.
In general I think any survey on AI that was conducted in the pre-GPT-4 era is now woefully out of date.
I would agree, relying on pre-GPT4 estimates seems flawed.
Hmm… that isn’t exactly the question I’d like the answer too, which is more scaling + minor incremental improvements + creative prompting.
I agree with most of the points in this post (AI timelines might be quite short; probability of doom given AGI in a world that looks like our current one is high; there isn’t much hope for good outcomes for humanity unless AI progress is slowed down somehow). I will focus on one of the parts where I think I disagree and which feels like a crux for me on whether advocating AI pause (in current form) is a good idea.
You write:
I think framings like these do a misleading thing where they use the word “we” to ambiguously refer to both “humanity as a whole” and “us humans who are currently alive”. The “we” that decides how much risk to take is the humans currently alive, but the “we” that enjoys the dream holiday might be humans millions of years in the future.
I worry that “AI pause” is not being marketed honestly to the public. If people like Wei Dai are right (and I currently think they are), then AI development may need to be paused for millions of years potentially, and it’s unclear how long it will take unaugmented or only mildly augmented humans to reach longevity escape velocity.
So to a first approximation, the choice available to humans currently alive is something like:
Option A: 10% chance utopia within our lifetime (if alignment turns out to be easy) and 90% human extinction
Option B: ~100% chance death but then our descendants probably get to live in a utopia
For philosophy nerds with low time preference and altruistic tendencies (into which I classify many EA people and also myself), Option B may seem obvious. But I think many humans existing today would rather risk it and just try to build AGI now, rather than doing any AI pause, and to the extent that they say they prefer pause, I think they are being deceived by the marketing or acting under Caplanian Principle of Normality, or else they are somehow better philosophers than I expected they would be.
(Note: if you are so pessimistic about aligning AI without a pause that your probability on that is lower than the probability of unaugmented present-day humans reaching longevity escape velocity, then Option B does seem like a strictly better choice. But the older and more unhealthy you are, the less this applies to you personally.)
Are you simplifying here, or do you actually believe that “utopia in our lifetime” or “extinction” are the only two possible outcomes given AGI? Do you assign a 0% chance that we survive AGI, but don’t have a utopia in the next 80 years?
What if AGI stalls out at human level, or is incredibly expensive, or is buggy and unreliable like humans are? What if the technology required for utopia turns out to be ridiculously hard even for AGI, or substantially bottlenecked by available resources? What if technology alone can’t create a utopia, and the extra tech just exacerbates existing conflicts? What if AGI access is restricted to world leaders, who use it for their own purposes?
What if we build an unaligned AGI, but catch it early and manage to defeat it in battle? What if early, shitty AGI screws up in a way that causes a worldwide ban on further AGI development? What if we build an AGI, but we keep it confined to a box and can only get limited functionality out of it? What if we build an aligned AGI, but people hate it so much that it voluntary shuts off? What if the AGI that gets built is aligned to the values of people with awful views, like religious fundamentalists? What if AGI wants nothing to do with us and flees the galaxy? What if [insert X thing I didn’t think of here]?.
IMO, extinction and utopia are both unlikely outcomes. The bulk of the probability lies somewhere in the middle.
I was indeed simplifying, and e.g. probably should have said “global catastrophe” instead of “human extinction” to cover cases like permanent totalitarian regimes. I think some of the scenarios you mention could happen, but also think a bunch of them are pretty unlikely, and also disagree with your conclusion that “The bulk of the probability lies somewhere in the middle”. I might be up for discussing more specifics, but also I don’t get the sense that disagreement here is a crux for either of us, so I’m also not sure how much value there would be in continuing down this thread.
I would agree that “utopia in our lifetime” or “extinction” seems like a false dichotomy. What makes you say that you predict the bulk of the probability lies somewhere in the middle?
How about an Option A.1: pause for a few years or a decade to give alignment a chance to catch up? At least stop at the red lights for a bit to check whether anyone is coming, even if you are speeding!
I think this easily goes through, even for 1-10% p(doom|AGI), as it seems like ageing is basically already a solved problem or will be within a decade or so (see the video I linked to—David Sinclair; and there are many other people working in the space with promising research too).
I’m not an expert on most of the evidence in this post, but I’m extremely suspicious of the claim that GPT-4 represents AI that is “~ human level at language”, unless you mean something by this that is very different from what most people would expect.
Technically, GPT-4 is superhuman at language because whatever task you are giving it is in English, and the median human’s English proficiency is roughly nil. But a more commonsense interpretation of this statement is that a prompt-engineered AI and a trained human can do the task roughly as well.
What you link to shows the results of how GPT-4 performs on a bunch of different exams. This doesn’t really show how language is used in the real world, especially since the exams very closely match past exams that were in the training data. It’s good at some of them, but also extremely bad at others (AP English Literature and Codeforces in particular), which is an issue if you’re making a claim that it’s roughly human level.
Furthermore, language isn’t just putting words together in the right order and with the right inflection. It also includes semantic information (what the actual meaning of the sentences is) and pragmatic information (is the language conveying what it is trying to convey, not just the literal meaning). I’m not sure whether pragmatics in particular would be relevant for AI risk, but the fact that anecdotally even GPT-4 is pretty bad at pragmatics prevents a literal interpretation of your statement.
In my opinion, the best evidence for GPT-4 not being human level at language is that, in the real world, GPT-4 is much cheaper than a human but consistently unable to outcompete humans. News organizations have a strong incentive to overhype GPT-caused automation, but the examples that they’ve found are mostly of people saying that either GPT-4 or GPT-3 (it’s not always clear which) did their job much worse than them, but good enough for clients. Take https://www.washingtonpost.com/technology/2023/06/02/ai-taking-jobs/ as a typical story.
Exams aren’t exactly the real world, but the popular example of GPT-4 doing well on exams is https://www.slowboring.com/p/chatgpt-goes-to-harvard. This both ignores citations (which is a very important part of college writing, and one that GPT-3 couldn’t do whatsoever and which GPT-4 still is significantly below what I would expect from a human) and relies on the false belief that Harvard is a hard school to do well at (grade inflation!)
I still agree with two big takeaways of your post, that an AI pause would be good and that we don’t necessarily need AGI for a good future, but that’s more because it’s robust to a lot of different beliefs about AI than because I agree with the evidence provided. Again, a lot of the evidence is stuff that I don’t feel particularly knowledgeable about, I picked this claim because I’ve had to think about it before and because it just feels false from my experience using GPT-4.
GPT-4 is also proficient at many other languages, so I don’t think English is the appropriate benchmark! Is GPT-4 as good as the median human at language in general? I think yes. In fact it’s probably quite a lot better.
Can you link to examples? Most examples I’ve seen on X are people criticising chatGPT-3.5 (or other models) and then someone coming along showing chatGPT-4 getting it right!
It’s nearly always GPT-3 (or 3.5). We only need to be concerned about the best AI models, not the lower tiers! I’ve heard anecdotes, in real life, of people who are using GPT-4 to do parts of their jobs—e.g. writing long emails that their boss was impressed with (they didn’t tell them it was chatGPT!)
Harvard is one of the best schools in the world. The average human is quite far from being smart enough to get in to it. I don’t think saying this is helping the credibility of your argument! Seems a lot like goalpost moving.
Thanks, good to know :)
GPT4 is clearly above the median human when it comes to a range of exams. Do we have examples of GPT4′s comparison to the median human in non-exam like conditions?
I agree with most of your conclusions in this post. I feel uncomfortable. I’ll write more once I have processed some of my emotions and can think in a more clear manner.
Hope everything is okay.
PS. I’m doing AI Safety movement building in Australia and New Zealand, so if you need someone to talk to, feel free to reach out.
Hi Chris thanks for reaching out. Obviously things with the world aren’t ok, it seems insane that every country is staring down a massive national security risk and they haven’t done much about it.
How is movement building going?
I’ll reply via PM.
I know William is already aware, but for others who aren’t, there are groups focused on getting a pause: PauseAI (Discord) and the AGI-Moratorium-HQ Slack are two of them. And there are now quite a lot of people on X with ⏸️ or ⏹️ in their name. I find that it makes me feel better being pro-active about doing something about it.
I spoke with William at length yesterday. The situation is dire, but I don’t think it’s impossible.
Yea, thanks for the talk Greg, it was informative.
Strong upvote.
GPT-1 was released 2018. GPT-4 has shown sparks of AGI.
We have early evidence of self-improvement—or conservatively—positive feedback loops are evident.
Open AI intends to build ~AGI to automate alignment research. Sam Altman is attempting to raise $7T USD to build more GPUs.
Anthropic CEO estimates 2-3 years until AGI.
Meta has gone public about their goal of open-sourcing AGI.
Superalignment might even be impossible.
It seems to be difficult to defend a world against rouge AGIs, it seems difficult for aligned AIs to defend us.
I’m skeptical about the tractability of making AGI development taboo within a very few years. It seems like this plan would require moderate timelines in order to be viable.
That said, I’m starting to wonder whether we should be trying to gain support for a pause in theory: specifically, people to agree that in an ideal world, we would pause AI development where we are now.
That could help open up the Overton window.
AGI development is already taboo outside of tech circles. Per the September poll by the AIPI, only 12% disagree that “Preventing AI from quickly reaching superhuman capabilities” should be an important AI policy goal. (56% strongly agree, 20% somewhat agree, 8% somewhat disagree, 4% strongly disagree, 12% not sure.) Despite the fact that world leaders are themselves influenced by tech circles’ positions, leaders around the world are quite clear that they take the risk seriously.
The only reason AGI development hasn’t been halted already is that the general public does not yet know that big tech is both trying to build AGI, and actually making real progress towards it.
The taboo only really needs to kick in on moderate timelines, so we’re in luck :) On short timelines, only massive data centres and the leading AI labs need to be regulated.
I’ve now made post this into an X thread (with some slight edits and some condensing).
Upvoted your post because you made some good points, but I think your analogy between human cloning and AI training is totally wrong.
There is no black market in human cloning, and no police state trying to stop it, because no one benefits very much from cloning. Cloning is just not that useful. Whereas if we stop corporate AI development for 40 years but computer hardware keeps improving, anyone can get rich by training an AGI on their gaming laptop. It would be like trying to confiscate all the drug imports in a world where everyone with a cell phone is a drug addict.
Thanks. I also address the “get rich” point though! People can’t get rich from it AGI because they lose control of it (/the world ends) before they get rich. AGI is not that useful either, because it’s uncontrollable and has negative externalities that will come back and swamp any hoped for benefits, even for the producer (i.e. x-risk).