Hi I’m Steve Byrnes, an AGI safety / AI alignment researcher in Boston, MA, USA, with a particular focus on brain algorithms. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , Twitter , Mastodon , Threads , Bluesky , GitHub , Wikipedia , Physics-StackExchange , LinkedIn
Steven Byrnes
For what it’s worth, Yann LeCun is very confidently against LLMs scaling to AGI, and yet LeCun seems to have at least vaguely similar timelines-to-AGI as Ajeya does in that link.
Ditto for me.
Oh hey here’s one more: Chollet himself (!!!) has vaguely similar timelines-to-AGI (source) as Ajeya does. (Actually if anything Chollet expects it a bit sooner: he says 2038-2048, Ajeya says median 2050.)
I agree with Chollet (and OP) that LLMs will probably plateau, but I’m also big into AGI safety—see e.g. my post AI doom from an LLM-plateau-ist perspective.
(When I say “AGI” I think I’m talking about the same thing that you called digital “beings” in this comment.)
Here are a bunch of agreements & disagreements.
if François is right, then I think this should be considered strong evidence that work on AI Safety is not overwhelmingly valuable, and may not be one of the most promising ways to have a positive impact on the world.
I think François is right, but I do think that work on AI safety is overwhelmingly valuable.
Here’s an allegory:
There’s a fast-breeding species of extraordinarily competent and ambitious intelligent aliens. They can do science much much better than Einstein, they can run businesses much much better than Bezos, they can win allies and influence much much better than Hitler or Stalin, etc. And they’re almost definitely (say >>90% chance) coming to Earth sooner or later, in massive numbers that will keep inexorably growing, but we don’t know exactly when this will happen, and we also don’t know in great detail what these aliens will be like—maybe they will have callous disregard for human welfare, or maybe they’ll be great. People have been sounding the alarm for decades that this is a big friggin’ deal that warrants great care and advanced planning, but basically nobody cares.
Then some scientist Dr. S says “hey those dots in the sky—maybe they’re the aliens! If so they might arrive in the next 5-10 years, and they’ll have the following specific properties”. All of the sudden there’s a massive influx of societal interest—interest in the dots in particular, and interest in alien preparation in general.
But it turns out that Dr. S was wrong! The dots are small meteors. They might hit earth and cause minor damage but nothing unprecedented. So we’re back to not knowing when the aliens will come or what exactly they’ll be like.
Is Dr. S’s mistake “strong evidence that alien prep is not overwhelmingly valuable”? No! It just puts us back where we were before Dr. S came along.
(end of allegory)
(Glossary: the “aliens” are AGIs; the dots in the sky are LLMs; and Dr. S would be a guy saying LLMs will scale to AGI with no additional algorithmic insights.)
It would make AI Safety work less tractable
If LLMs will plateau (as I expect), I think there are nevertheless lots of tractable projects that would help AGI safety. Examples include:
The human brain runs some sort of algorithm to figure things out and gets things done and invent technology etc. We don’t know exactly what that algorithm is (or else we would already have AGI), but we know much more than zero about it, and it’s obviously at least possible that AGI will be based on similar algorithms. (I actually believe something stronger, i.e. that it’s likely, but of course that’s hard to prove.) So now this is a pretty specific plausible AGI scenario that we can try to plan for. And that’s my own main research interest—see Intro to Brain-Like-AGI Safety. (Other varieties of model-based reinforcement learning would be pretty similar too.) Anyway, there’s tons of work to do on planning for that.
…For example, I list seven projects here. Some of those (e.g. this) seem robustly useful regardless of how AGI will work, i.e. even if future AGI is neither brain-like nor LLM-like, but rather some yet-unknown third category of mystery algorithm.
Outreach—After the invention of AGI (again, what you called digital “beings”), there are some obvious-to-me consequences, like “obviously human extinction is on the table as a possibility”, and “obviously permanent human oversight of AGIs in the long term would be extraordinarily difficult if not impossible” and “obviously AGI safety will be hard to assess in advance” and “if humans survive, those humans will obviously not be doing important science and founding important companies given competition from trillions of much-much-more-competent AGIs, just like moody 7-year-olds are not doing important science and founding important companies today” and “obviously there will be many severe coordination problems involved in the development and use of AGI technology”. But, as obvious as those things are to me, hoo boy there sure are tons of smart prominent people who would very confidently disagree with all of those. And that seems clearly bad. So trying to gradually establish good common knowledge of basic obvious things like that, through patient outreach and pedagogy, seems robustly useful and tractable to me.
Policy—I think there are at least a couple governance and policy interventions that are robustly useful regardless of whether AGI is based on LLMs (as others expect) or not (as I expect). For example, I think there’s room for building better institutions through which current and future tech companies (and governments around the world) can cooperate on safety as AGI approaches (whenever that may happen).
It seems that many people in Open Phil have substantially shortened their timelines recently (see Ajeya here).
For what it’s worth, Yann LeCun is very confidently against LLMs scaling to AGI, and yet LeCun seems to have at least vaguely similar timelines-to-AGI as Ajeya does in that link.
Ditto for me.
See also my discussion here (“30 years is a long time. A lot can happen. Thirty years ago, deep learning was an obscure backwater within AI, and meanwhile people would brag about how their fancy new home computer had a whopping 8 MB of RAM…”)
To be clear, you can definitely find some people in AI safety saying AGI is likely in <5 years, although Ajeya is not one of those people. This is a more extreme claim, and does seem pretty implausible unless LLMs will scale to AGI.
I think this makes me very concern of a strong ideological and philosophical bubble in the Bay regarding these core questions of AI.
Yeah some examples would be:
many AI safety people seem happy to make confident guesses about what tasks the first AGIs will be better and worse at doing based on current LLM capabilities;
many AI safety people seem happy to make confident guesses about how much compute the first AGIs will require based on current LLM compute requirements;
many AI safety people seem happy to make confident guesses about which companies are likely to develop AGIs based on which companies are best at training LLMs today;
many AI safety people seem happy to make confident guesses about AGI UIs based on the particular LLM interface of “context window → output token”;
etc. etc.
Many ≠ All! But to the extent that these things happen, I’m against it, and I do complain about it regularly.
(To be clear, I’m not opposed to contingency-planning for the possibility that LLMs will scale to AGIs. I don’t expect that contingency to happen, but hey, what do I know, I’ve been wrong before, and so has Chollet. But I find that these kinds of claims above are often stated unconditionally. Or even if they’re stated conditionally, the conditionality is kinda forgotten in practice.)
I think it’s also important to note that these habits above are regrettably common among both AI pessimists and AI optimists. As examples of the latter, see me replying to Matt Barnett and me replying to Quintin Pope & Nora Belrose.
By the way, this might be overly-cynical, but I think there are some people (coming into the AI safety field very recently) who understand how LLMs work but don’t know how (for example) model-based reinforcement learning works, and so they just assume that the way LLMs work is the only possible way for any AI algorithm to work.
On the whole though, I think much of the case by proponents for the importance of working on AI Safety does assume that current paradigm + scale is all you need, or rest on works that assume it.
Yeah this is more true than I would like. I try to push back on it where possible, e.g. my post AI doom from an LLM-plateau-ist perspective.
There were however plenty of people who were loudly arguing that it was important to work on AI x-risk before “the current paradigm” was much of a thing (or in some cases long before “the current paradigm” existed at all), and I think their arguments were sound at the time and remain sound today. (E.g. Alan Turing, Norbert Weiner, Yudkowsky, Bostrom, Stuart Russell, Tegmark…) (OpenPhil seems to have started working seriously on AI in 2016, which was 3 years before GPT-2.)
I’m confused what you’re trying to say… Supposing we do in fact invent AGI someday, do you think this AGI won’t be able to do science? Or that it will be able to do science, but that wouldn’t count as “automating science”?
Or maybe when you said “whether ‘PASTA’ is possible at all”, you meant “whether ‘PASTA’ is possible at all via future LLMs”?
Maybe you’re assuming that everyone here has a shared assumption that we’re just talking about LLMs, and that if someone says “AI will never do X” they obviously means “LLMs will never do X”? If so, I think that’s wrong (or at least I hope it’s wrong), and I think we should be more careful with our terminology. AI is broader than LLMs. …Well maybe Aschenbrenner is thinking that way, but I bet that if you were to ask a typical senior person in AI x-risk (e.g. Karnofsky) whether it’s possible that there will be some big AI paradigm shift (away from LLMs) between now and TAI, they would say “Well yeah duh of course that’s possible,” and then they would say that they would still absolutely want to talk about and prepare for TAI, in whatever algorithmic form it might take.
OK yeah, “AGI is possible on chips but only if you have 1e100 of them or whatever” is certainly a conceivable possibility. :) For example, here’s me responding to someone arguing along those lines.
If there are any neuroscientists who have investigated this I would be interested!
There is never a neuroscience consensus but fwiw I fancy myself a neuroscientist and have some thoughts at: Thoughts on hardware / compute requirements for AGI.
One of various points I bring up is that:
(1) if you look at how human brains, say, go to the moon, or invent quantum mechanics, and you think about what algorithms could underlie that, then you would start talking about algorithms that entail building generative models, and editing them, and querying them, and searching through them, and composing them, blah blah.
(2) if you look at a biological brain’s low-level affordances, it’s a bunch of things related to somatic spikes and dendritic spikes and protein cascades and releasing and detecting neuropeptides etc.
(3) if you look at silicon chip’s low-level affordances, it’s a bunch of things related to switching transistors and currents going down wires and charging up capacitors and so on.
My view is: implementing (1) via (3) would involve a lot of inefficient bottlenecks where there’s no low-level affordance that’s a good match to the algorithmic operation we want … but the same is true of implementing (1) via (2). Indeed, I think the human brain does what it does via some atrociously inefficient workarounds to the limitations of biological neurons, limitations which would not be applicable to silicon chips.
By contrast, many people thinking about this problem are often thinking about “how hard is it to use (3) to precisely emulate (2)?”, rather than “what’s the comparison between (1)←(3) versus (1)←(2)?”. (If you’re still not following, see my discussion here—search for “transistor-by-transistor simulation of a pocket calculator microcontroller chip”.)
Another thing is that, if you look at what a single consumer GPU can do when it runs an LLM or diffusion model… well it’s not doing human-level AGI, but it’s sure doing something, and I think it’s a sound intuition (albeit hard to formalize) to say “well it kinda seems implausible that the brain is doing something that’s >1000× harder to calculate than that”.
Yeah sure, here are two reasonable positions:
(A) “We should plan for the contingency where LLMs (or scaffolded LLMs etc.) scale to AGI, because this contingency is very likely what’s gonna happen.”
(B) We should plan for the contingency where LLMs (or scaffolded LLMs etc.) scale to AGI, because this contingency is more tractable and urgent than the contingency where they don’t, and hence worth working on regardless of its exact probability.”
I think plenty of AI safety people are in (A), which is at least internally-consistent even if I happen to think they’re wrong. I also think there are also lots of AI safety people who would say that they’re in (B) if pressed, but where they long ago lost track of the fact that that’s what they were doing and instead they’ve started treating the contingency as a definite expectation, and thus they say things that omit essential caveats, or are wrong or misleading in other ways. ¯\_(ツ)_/¯
A big crux I think here is whether ‘PASTA’ is possible at all, or at least whether it can be used as a way to bootstrap everything else.
Do you mean “possible at all using LLM technology” or do you mean “possible at all using any possible AI algorithm that will ever be invented”?
As for the latter, I think (or at least, I hope!) that there’s wide consensus that whatever human brains do (individually and collectively), it is possible in principle for algorithms-running-on-chips to do those same things too. Brains are not magic, right?
I was under the impression that most people in AI safety felt this way—that transformers (or diffusion models) weren’t going to be the major underpinning of AGI.
I haven’t done any surveys or anything, but that seems very inaccurate to me. I would have guessed that >90% of “people in AI safety” are either strongly expecting that transformers (or diffusion models) will be the major underpinning of AGI, or at least they’re acting as if they strongly expect that. (I’m including LLMs + scaffolding and so on in this category.)
For example: people seem very happy to make guesses about what tasks the first AGIs will be better and worse at doing based on current LLM capabilities; and people seem very happy to make guesses about how much compute the first AGIs will require based on current LLM compute requirements; and people seem very happy to make guesses about which companies are likely to develop AGIs based on which companies are best at training LLMs today; and people seem very happy to make guesses about AGI UIs based on the particular LLM interface of “context window → output token”; etc. etc. This kind of thing happens constantly, and sometimes I feel like I’m the only one who even notices. It drives me nuts.
Hi, I’m an AI alignment technical researcher who mostly works independently, and I’m in the market for a new productivity coach / accountability buddy, to chat with periodically (I’ve been doing one ≈20-minute meeting every 2 weeks) about work habits, and set goals, and so on. I’m open to either paying fair market rate, or to a reciprocal arrangement where we trade advice and promises etc. I slightly prefer someone not directly involved in AI alignment—since I don’t want us to get nerd-sniped into object-level discussions—but whatever, that’s not a hard requirement. You can reply here, or DM or email me. :)update: I’m all set now
“Artificial General Intelligence”: an extremely brief FAQ
Some (problematic) aesthetics of what constitutes good work in academia
Humans are less than maximally aligned with each other (e.g. we care less about the welfare of a random stranger than about our own welfare), and humans are also less than maximally misaligned with each other (e.g. most people don’t feel a sadistic desire for random strangers to suffer). I hope that everyone can agree about both those obvious things.
That still leaves the question of where we are on the vast spectrum in between those two extremes. But I think your claim “humans are largely misaligned with each other” is not meaningful enough to argue about. What percentage is “largely”, and how do we even measure that?
Anyway, I am concerned that future AIs will be more misaligned with random humans than random humans are with each other, and that this difference will have important bad consequences, and I also think there are other disanalogies / reasons-for-concern as well. But this is supposed to be a post about terminology so maybe we shouldn’t get into that kind of stuff here.
My terminology would be that (2) is “ambitious value learning” and (1) is “misaligned AI that cooperates with humans because it views cooperating-with-humans to be in its own strategic / selfish best interest”.
I strongly vote against calling (1) “aligned”. If you think we can have a good future by ensuring that it is always in the strategic / selfish best interest of AIs to be nice to humans, then I happen to disagree but it’s a perfectly reasonable position to be arguing, and if you used the word “misaligned” for those AIs (e.g. if you say “alignment is unnecessary”), I think it would be viewed as a helpful and clarifying way to describe your position, and not as a reductio or concession.
For my part, I define “alignment” as “the AI is trying to do things that the AGI designer had intended for it to be trying to do, as an end in itself and not just as a means-to-an-end towards some different goal that it really cares about.” (And if the AI is not the kind of thing for which the word “trying” and “cares about” is applicable in the first place, then the AI is neither aligned nor misaligned, and also I’d claim it’s not an x-risk in any case.) More caveats in a thing I wrote here:
Some researchers think that the “correct” design intentions (for an AGI’s motivation) are obvious, and define the word “alignment” accordingly. Three common examples are (1) “I am designing the AGI so that, at any given point in time, it’s trying to do what its human supervisor wants it to be trying to do”—this AGI would be “aligned” to the supervisor’s intentions. (2) “I am designing the AGI so that it shares the values of its human supervisor”—this AGI would be “aligned” to the supervisor. (3) “I am designing the AGI so that it shares the collective values of humanity”—this AGI would be “aligned” to humanity.
I’m avoiding this approach because I think that the “correct” intended AGI motivation is still an open question. For example, maybe it will be possible to build an AGI that really just wants to do a specific, predetermined, narrow task (e.g. design a better solar cell), in a way that doesn’t involve taking over the world etc. Such an AGI would not be “aligned” to anything in particular, except for the original design intention. But I still want to use the term “aligned” when talking about such an AGI.
Of course, sometimes I want to talk about (1,2,3) above, but I would use different terms for that purpose, e.g. (1) “the Paul Christiano version of corrigibility”, (2) “ambitious value learning”, and (3) “CEV”.
- 10 Mar 2024 19:41 UTC; 8 points) 's comment on Clarifying two uses of “alignment” by (
May I ask, what is your position on creating artificial consciousness?
Do you see digital suffering as a risk? If so, should we be careful to avoid creating AC?I think the word “we” is hiding a lot of complexity here—like saying “should we decommission all the world’s nuclear weapons?” Well, that sounds nice, but how exactly? If I could wave a magic wand and nobody ever builds conscious AIs, I would think seriously about it, although I don’t know what I would decide—it depends on details I think. Back in the real world, I think that we’re eventually going to get conscious AIs whether that’s a good idea or not. There are surely interventions that will buy time until that happens, but preventing it forever and ever seems infeasible to me. Scientific knowledge tends to get out and accumulate, sooner or later, IMO. “Forever” is a very very long time.
The last time I wrote about my opinions is here.
Do you see digital suffering as a risk?
Yes. The main way I think about that is: I think eventually AIs will be in charge, so the goal is to wind up with AIs that tend to be nice to other AIs. This challenge is somewhat related to the challenge of winding up with AIs that are nice to humans. So preventing digital suffering winds up closely entangled with the alignment problem, which is my area of research. That’s not in itself a reason for optimism, of course.
We might also get a “singleton” world where there is effectively one and only one powerful AI in the world (or many copies of the same AI pursuing the same goals) which would alleviate some or maybe all of that concern. I currently think an eventual “singleton” world is very likely, although I seem to be very much in the minority on that.
Sorry if I missed it, but is there some part of this post where you suggest specific concrete interventions / actions that you think would be helpful?
Mark Solms thinks he understands how to make artificial consciousness (I think everything he says on the topic is wrong), and his book Hidden Spring has an interesting discussion (in chapter 12) on the “oh jeez now what” question. I mostly disagree with what he says about that too, but I find it to be an interesting case-study of someone grappling with the question.
In short, he suggests turning off the sentient machine, then registering a patent for making conscious machines, and assigning that patent to a nonprofit like maybe Future of Life Institute, and then
organise a symposium in which leading scientists and philosophers and other stakeholders are invited to consider the implications, and to make recommendations concerning the way forward, including whether and when and under what conditions the sentient machine should be switched on again – and possibly developed further. Hopefully this will lead to the drawing up of a set of broader guidelines and constraints upon the future development, exploitation and proliferation of sentient AI in general.
He also has a strongly-worded defense of his figuring out how consciousness works and publishing it, on the grounds that if he didn’t, someone else would.
I am not claiming analogies have no place in AI risk discussions. I’ve certainly used them a number of times myself.
Yes you have!—including just two paragraphs earlier in that very comment, i.e. you are using the analogy “future AI is very much like today’s LLMs but better”. :)
Cf. what I called “left-column thinking” in the diagram here.
For all we know, future AIs could be trained in an entirely different way from LLMs, in which case the way that “LLMs are already being trained” would be pretty irrelevant in a discussion of AI risk. That’s actually my own guess, but obviously nobody knows for sure either way. :)
It is certainly far from obvious: for example, devastating as the COVID-19 pandemic was, I don’t think anyone believes that 10,000 random re-rolls of the COVID-19 pandemic would lead to at least one existential catastrophe. The COVID-19 pandemic just was not the sort of thing to pose a meaningful threat of existential catastrophe, so if natural pandemics are meant to go beyond the threat posed by the recent COVID-19 pandemic, Ord really should tell us how they do so.
This seems very misleading. We know that COVID-19 has <<5% IFR. Presumably the concern is that some natural pandemics may be much much more virulent than COVID-19 was. So it’s important that the thing we imagine is “10,000 random re-rolls in which there is a natural pandemic”, NOT “10,000 random re-rolls of COVID-19 in particular”. And then we can ask questions like “How many of those 10,000 natural pandemics have >50% IFR? Or >90%? And what would we expect to happen in those cases?” I don’t know what the answers are, but that’s a much more helpful starting point I think.
We discussed the risk of `do-it-yourself’ science in Part 10 of this series. There, we saw that a paper by David Sarapong and colleagues laments “Sensational and alarmist headlines about DiY science” which “argue that the practice could serve as a context for inducing rogue science which could potentially lead to a ‘zombie apocalypse’.” These experts find little empirical support for any such claims.
Maybe this is addressed in Part 10, but this paragraph seems misleading insofar as Ord is talking about risk by 2100, and a major part of the story is that DIY biology in, say, 2085 may be importantly different and more dangerous than DIY biology in 2023, because the science and tech keeps advancing and improving each year.
Needless to say, even if we could be 100% certain that DIY biology in 2085 will be super dangerous, there obviously would not be any “empirical support” for that, because 2085 hasn’t happened yet. It’s just not the kind of thing that presents empirical evidence for us to use. We have to do the best we can without it. The linked paper does not seem to discuss that issue at all, unless I missed it.
(I have a similar complaint about the the discussion of Soviet bioweapons in Section 4—running a bioweapons program with 2024 science & technology is presumably quite different than running a bioweapons program with 1985 science & technology, and running one in 2085 would be quite different yet again.
(Recently I’ve been using “AI safety” and “AI x-safety” interchangeably when I want to refer to the “overarching” project of making the AI transition go well, but I’m open to being convinced that we should come up with another term for this.)
I’ve been using the term “Safe And Beneficial AGI” (or more casually, “awesome post-AGI utopia”) as the overarching “go well” project, and “AGI safety” as the part where we try to make AGIs that don’t accidentally [i.e. accidentally from the human supervisors’ / programmers’ perspective] kill everyone, and (following common usage according to OP) “Alignment” for “The AGI is trying to do things that the AGI designer had intended for it to be trying to do”.
(I didn’t make up the term “Safe and Beneficial AGI”. I think I got it from Future of Life Institute. Maybe they in turn got it from somewhere else, I dunno.)
(See also: my post Safety ≠ alignment (but they’re close!))
See also a thing I wrote here:
Some researchers think that the “correct” design intentions (for an AGI’s motivation) are obvious, and define the word “alignment” accordingly. Three common examples are (1) “I am designing the AGI so that, at any given point in time, it’s trying to do what its human supervisor wants it to be trying to do”—this AGI would be “aligned” to the supervisor’s intentions. (2) “I am designing the AGI so that it shares the values of its human supervisor”—this AGI would be “aligned” to the supervisor. (3) “I am designing the AGI so that it shares the collective values of humanity”—this AGI would be “aligned” to humanity.
I’m avoiding this approach because I think that the “correct” intended AGI motivation is still an open question. For example, maybe it will be possible to build an AGI that really just wants to do a specific, predetermined, narrow task (e.g. design a better solar cell), in a way that doesn’t involve taking over the world etc. Such an AGI would not be “aligned” to anything in particular, except for the original design intention. But I still want to use the term “aligned” when talking about such an AGI.
Of course, sometimes I want to talk about (1,2,3) above, but I would use different terms for that purpose, e.g. (1) “the Paul Christiano version of corrigibility”, (2) “ambitious value learning”, and (3) “CEV”.
I don’t recall the details of Tom Davidson’s model, but I’m pretty familiar with Ajeya’s bio-anchors report, and I definitely think that if you make an assumption “algorithmic breakthroughs are needed to get TAI”, then there really isn’t much left of the bio-anchors report at all. (…although there are still some interesting ideas and calculations that can be salvaged from the rubble.)
I went through how the bio-anchors report looks if you hold a strong algorithmic-breakthrough-centric perspective in my 2021 post Brain-inspired AGI and the “lifetime anchor”.
See also here (search for “breakthrough”) where Ajeya is very clear in an interview that she views algorithmic breakthroughs as unnecessary for TAI, and that she deliberately did not include the possibility of algorithmic breakthroughs in her bio-anchors model (…and therefore she views the possibility of breakthroughs as a pro tanto reason to think that her report’s timelines are too long).
OK, well, I actually agree with Ajeya that algorithmic breakthroughs are not strictly required for TAI, in the narrow sense that her Evolution Anchor (i.e., recapitulating the process of animal evolution in a computer simulation) really would work given infinite compute and infinite runtime and no additional algorithmic insights. (In other words, if you do a giant outer-loop search over the space of all possible algorithms, then you’ll find TAI eventually.) But I think that’s really leaning hard on the assumption of truly astronomical quantities of compute [or equivalent via incremental improvements in algorithmic efficiency] being available in like 2100 or whatever, as nostalgebraist points out. I think that assumption is dubious, or at least it’s moot—I think we’ll get the algorithmic breakthroughs far earlier than anyone would or could do that kind of insane brute force approach.