I think the point is making this explicit and having a solid exposition to point to when saying “progress is no good if we all die sooner!”
Seth Herd
I don’t think it’s worth the effort; I’d personally be just as pleased with one snapshot of the participants in conversation as I would be with a whole video. The point of podcasts for me is that I can do something else while still taking in something useful for my alignment work. But I am definitely a tone-of-voice attender over a facial-expression attender, so others will doubtless get more value out of it.
Ooops, I meant to say I wrote a post on one aspect of this interview on LW: Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours. It did produce some interesting discussion.
Yes, but pursuing excellence also costs time that could be spent elsewhere, and time/results tradeoffs are often highly nonlinear.
The perfect is the enemy of the good. It seems to me that the most common LW/EA personality already pursues excellence more than is optimal.
For more, see my LW comment:
Excellent work.
To summarize one central argument in briefest form:
Aschenbrenner’s conclusion in Situational Awareness is wrong in overstating the claim.
He claims that treating AGI as a national security issue is the obvious and inevitable conclusion for those that understand the enormous potential of AGI development in the next few years. But Aschenbrenner doesn’t adequately consider the possibility of treating AGI primarily as a threat to humanity instead of a threat the nation or to a political ideal (the free world). If we considered it primarily a threat to humanity, we might be able to cooperate with China and other actors to safeguard humanity.
I think this argument is straightforwardly true. Aschenbrenner does not adequately consider alternative strategies, and thus his claim of the conclusion being the inevitable consensus is false.
But the opposite isn’t an inevitable conclusion, either.
I currently think Aschenbrenner is more likely correct about the best course of action. But I am highly uncertain. I have thought hard about this issue for many hours both before and after Aschenbrenner’s piece sparked some public discussion. But my analysis, and the public debate thus far, are very far from conclusive on this complex issue.
This question deserves much more thought. It has a strong claim to being the second most pressing issue in the world at this moment, just behind technical AGI alignment.
This post can be summarized as “Aschenbrenner’s narrative is highly questionable”. Of course it is. From my perspective, having thought deeply about each of the issues he’s addressing, his claims are also highly plausible. To “just discard” this argument because it’s “questionable” would be very foolish. It would be like driving with your eyes closed once the traffic gets confusing.
This is the harshest response I’ve ever written. To the author, I apologize. To the EA community: we will not help the world if we fall back on vibes-based thinking and calling things we don’t like “questionable” to dismiss them. We must engage at the object level. While the future is hard to predict, it is quite possible that it will be very unlike the past, but in understandable ways. We will have plenty of problems with the rest of the world doing its standard vibes-based thinking and policy-making. The EA community needs to do better.
There is much to question and debate in Aschenbrenner’s post, but it must be engaged with at the object level. I will do that, elsewhere.
On the vibes/ad-hominem level, note that Aschenbrenner also recently wrote that Nobody’s on the ball on AGI alignment. He appears to believe (there and elsewhere) that AGI is a deadly risk, and we might very well all die from it. He might be out to make a quick billion, but he’s also serious about the risks involved.
The author’s object-level claim is that they don’t think AGI is immanent. Why? How sure are you? How about we take some action or at least think about the possibility, just in case you might be wrong and the many people close to its development might be right?
Agreed. That juxtaposition is quite suspicious.
Unfortunately, most of Aschenbrenner’s claims seem highly plausible. AGI is a huge deal, it could happen very soon, and the government is very likely to do something about it before it’s fully transformative. Whether them spending tons of money on his proposed manhattan project is the right move is highly debatable, and we should debate it.
I think the scaling hypothesis is false, and we’ll get to AGI quite soon anyway, by other routes. The better scaling works, the faster we’ll get there, but that’s gravy. We have all of the components of a human-like mind today, putting them together is one route to AGI.
I think a major issue is that the people who would be best at predicting AGI usually don’t want to share their rationale.
Gears-level models of the phenomenon in question are highly useful in making accurate predictions. Those with the best models are either worriers who don’t want to advance timelines, or enthusiasts who want to build it first. Neither has an incentive to convince the world it’s coming soon by sharing exactly how that might happen.
The exceptions are people who have really thought about how to get from AI to AGI, but are not in the leading orgs and are either uninterested in racing or want to attract funding and attention for their approach. Yann LeCun comes to mind.
Imagine trying to predict the advent of heavier-than-air flight without studying either birds or mechanical engineering. You’d get predictions like the ones we saw historically—so wild as to be worthless, except those from the people actually trying to achieve that goal.
(copied from LW comment since the discussion is happening over here)
I personally think LLMs will plateau around human level, but that they will be made agentic and self-teaching, and therefore and self-aware (in sum, “sapient”) and truly dangerous by scaffolding them into language model agents or language model cognitive architectures. See Capabilities and alignment of LLM cognitive for my logic in expecting that.
That would be a good outcome. We’d have agents with their own goals, capable enough to do useful and dangerous things, but probably not quite capable enough to self-exfiltrate, and probably initially under the control of relatively sane people. That would scare the pants off of the world, and we’d see some real efforts to align the things. Which is uniquely do-able, since they’d take top-level goals in natural language, and be readily interpretable by default (with real concerns still there aplenty, including waluigi effects and their utterances not reliably reflecting their real underlying cognition).
I think the general consensus, which I share, is that neither mind uploading nor good BCI to allow brain extensions are likely to happen before AGI. I wish I had citations ready to hand.
I haven’t heard as much discussion of the biological superbrains approach. I think it’s probably feasible to increase intelligence through genetic engineering, but that’s probably also too long to help with alignment before AGI happens, if you took the route of altering embryos. Altering adults would be tougher and more limited. And it would hit the same legal problems.
I think that neuromorphic AGI is a possibility, which is why some of my alignment work addresses it. I think the best and most prominent work on that topic is Steve Byrnes’ Intro to Brain-Like-AGI Safety.
I think that’s quite a pessimistic take. I take Altman seriously on caring about x-risk, although I’m not sure he takes it quite seriously enough. This is based on public comments to that effect around 2013, before he started running OpenAI. And Sutskever definitely seems properly concerned.
I agree that those teams aren’t completely trustworthy, and in an ideal world, we should be making this decision by including everyone on earth. But with a partial pause, do you expect to have better or worse teams in the lead for achieving AGI? That was my point.
I’m not sure which is the better place to have this discussion, so I’m trying both. Copied from my comment on Less Wrong:
That all makes sense. To expand a little more on some of the logic:
It seems like the outcome of a partial pause rests in part on whether that would tend to put people in the lead of the AGI race who are more or less safety-concerned.
I think it’s nontrivial that we currently have three teams in the lead who all appear to honestly take the risks very seriously, and changing that might be a very bad idea.
On the other hand, the argument for alignment risks is quite strong, and we might expect more people to take the risks more seriously as those arguments diffuse. This might not happen if polarization becomes a large factor in beliefs on AGI risk. The evidence for climate change was also pretty strong, but we saw half of America believe in it less, not more, as evidence mounted. The lines of polarization would be different in this case, but I’m afraid it could happen. I outlined that case a little in AI scares and changing public beliefs
In that case, I think a partial pause would have a negative expected value, as the current lead decayed, and more people who believe in risks less get into the lead by circumventing the pause.
This makes me highly unsure if a pause would be net-positive. Having alignment solutions won’t help if they’re not implemented because the taxes are too high.
The creation of compute overhang is another reason to worry about a pause. It’s highly uncertain how far we are from making adequate compute for AGI affordable to individuals. Algorithms and compute will keep getting better during a pause. So will theory of AGI, along with theory of alignment.
This puts me, and I think the alignment community at large, in a very uncomfortable position of not knowing whether a realistic pause would be helpful.
It does seem clear that creating mechanisms and political will for a pause are a good idea.
Advocating for more safety work also seems clear cut.
To this end, I think it’s true that you create more political capitol by successfully pushing for policy.
A pause now would create even more capitol, but it’s also less likely to be a win, and it could wind up creating polarization and so costing rather than creating capitol. It’s harder to argue for a pause now when even most alignment folks think we’re years from AGI.
So perhaps the low-hanging fruit is pushing for voluntary RSPs, and government funding for safety work. These are clear improvements, and likely to be wins that create capitol for a pause as we get closer to AGI.
There’s a lot of uncertainty here, and that’s uncomfortable. More discussion like this should help resolve that uncertainty, and thereby help clarify and unify the collective will of the safety community.
I agree with you that humans have mismatched goals among ourselves, so some amount of goal mismatch is just a fact we have to deal with. I think the ideal is that we get an AGI that makes its goal the overlap in human goals; see [Empowerment is (almost) All We Need](https://www.lesswrong.com/posts/JPHeENwRyXn9YFmXc/empowerment-is-almost-all-we-need) and others on preference maximization.
I also agree with your intuition that having a non-maximizer improves the odds of an AGI not seeking power or doing other dangerous things. But I think we need to go far beyond the intuition; we don’t want to play odds with the future of humanity. To that end, I have more thoughts on where this will and won’t happen.
I’m saying “the problem” with optimization is actually mismatched goals, not optimization/maximization. In more depth, and hopefully more usefully: I think unbounded goals are the problem with optimization (not the only problem, but a very big one).
If an AGI had a bounded goal like “make on billion paperclips”, it wouldn’t be nearly as dangerous; it might decide to eliminate humanity to make the odds of getting to a billion as good as possible (I can’t remember where I saw this important point; I think maybe Nate Soares made it). But it might decide that its best odds would just be making some improvements to the paperclip business, in which case it wouldn’t cause problems.
Mismatched goals is the problem. The logic of instrumental convergence applies to any goal, not just maximization goals.
This is a start, but just a start. Optimization/maximization isn’t actually the problem. Any highly competent agent with goals that don’t match ours is the problem.
A world that’s 10% paperclips and the rest composed of other stuff we don’t care about is no better than a true optimizer.
The idea “just don’t optimize” has a surprising amount of support in AGI safety, including quantilizers and satisficing. But they seem like only a bare start on taking the points off of the tiger’s teeth to me. The tiger will still gnaw you to death if it wants to even a little.
It means humans are highly imperfect maximizers of some imperfectly defined and ever-changing thing: your estimated future rewards according to your current reward function.
It doesn’t matter that you’re not exactly maximizing one certain thing; you’re working toward some set of things, and if you’re really good at that, it’s really bad for anyone who doesn’t like that set of things.
Optimization/maximization is a red herring. Highly compentent agents with goals different from yours is the core problem.
From a neuroscience/psychology perspective, I’d say that you are maximizing your future reward. And while that’s not a well-defined thing, it doesn’t matter; if you were highly competent, you’d make a lot of changes to the world according to what tickles you, and those might or might not be good for others, depending on your preferences (reward function). The slight difference between turning the world into one well-defined thing and a bunch of things you like isn’t that important to anyone who doesn’t like what you like.
This is a broader and more intuitive form of the argument Miles is trying to make precise.
If you can be dutch-booked without limit, well, you’re just not competent enough to be a threat; but you’re not going to let that happen, let alone a superintelligent version of you.
I completely agree.
But others may not, because most humans aren’t longtermists nor utilitarians. So I’m afraid arguments like this won’t sway the public opinion much at all. People like progress because it will get them and their loved ones (children and grandchildren, whose future they can imagine) better lives. They just barely care at all whether humanity ends after their grandchildren’s lives (to the extent they can even think about it).
This is why I believe that most arguents against AGI x-risk are really based on differing timelines. People like to think that humans are so special we won’t surpass them for a long time. And they mostly care about the future for their loved ones.