Well, sure, but if there a way to avoid the doom, then why after 20 years has no one published the plan for how to do it that doesn’t resemble a speculative research project of the type you try when you clearly don’t understand the problem and that doesn’t resemble the vague output of a politician writing about a sensitive issue?
rhollerith
First of all, let me say I’m glad to see your name on a new post because I have fond memories of your LW posts from over 10 years ago. You write:
exists(X, implies([long logical form saying X is the prettiest girl in the county], wants(“Jamal”, marry(“Jamal”, X))))
Where you wrote “exists”, you meant “forall”.
Suppose Zelda is definitely not the prettiest girl in the county, then implies([long logical form saying Zelda is the prettiest girl in the county], wants(“Jamal”, marry(“Jamal”, X)))) == imples(False, …) == True, which makes the
exists
form True regardless of whom Jamal want to be married to. I.e., the formula as you have it now does not constrain the space of possibilities the way you want it to.Another way to see it is that “For all x in X, x is green” gets formalized as forall(x, implies(x in X, x is green)), which in broad brush strokes is the form you want here.
the capability program is an easier technical problem than the alignment program.
You don’t know that; nobody knows that
Do you concede that frontier AI research is intrinsically dangerous?
That it is among the handful of the most dangerous research programs ever pursued by our civilization?
If not, I hope you can see why those who do consider it intrinsically dangerous are not particularly mollified or reassured by “well, who knows? maybe it will turn out OK in the end!”
The distinction between capabilities and alignment is a useful concept when choosing research on an individual level; but it’s far from robust enough to be a good organizing principle on a societal level.
When I wrote “the alignment program” above, I meant something specific, which I believe you will agree is robust enough to organize society (if only we could get society to go along with it): namely, I meant thinking hard together about alignment without doing anything dangerous like training up models with billions of parameters till we have at least a rough design that most of the professional researchers agree is more likely to help us than to kill us even if it turns out to have super-human capabilities—even if our settling on that design takes us many decades. E.g., what MIRI has been doing the last 20 years.
I dislike the phrase “we all die”, nobody has justifiable confidence high enough to make that claim, even if ASI is misaligned enough to seize power there’s a pretty wide range of options for the future of humans
It makes me sad that you do not see that “we all die” is the default outcome that naturally happens unless a lot of correct optimization pressure is applied by the researchers to the design of the first sufficiently-capable AI before the AI is given computing resources. It would have been nice to have someone with your capacity for clear thinking working on the problem. Are you sure you’re not overly attached (e.g., for intrapersonal motivational reasons) to an optimistic vision in which AI research “feels like the early days of hacker culture” and “there are hackathons where people build fun demos”?
history is full of cases where people dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems.
There are 2 concurrent research programs, and if one program (capability) completes before the other one (alignment), we all die, but the capability program is an easier technical problem than the alignment program. Do you disagree with that framing? If not, then how does “research might proceed faster than we expect” give you hope rather than dread?
Also, I’m guessing you would oppose a worldwide ban starting today on all “experimental” AI research (i.e., all use of computing resources to run AIs) till the scholars of the world settle on how to keep an AI aligned through the transition to superintelligence. That’s my guess, but please confirm. In your answer, please imagine that the ban is feasible and in fact can be effective (“leak-proof”?) enough to give the AI theorists all they time they need to settle on a plan even if that takes many decades. In other words, please indulge me this hypothetical question because I suspect it is a crux.
“Settled” here means that a majority of non-senile scholars / researchers who’ve worked full-time on the program for at least 15 years of their lives agree that it is safe for experiments to start as long as they adhere to a particular plan (which this majority agree on). Kinda like the way scholars have settled on the conclusion that anthropogenic emissions of carbon dioxide are causing the Earth to warm up.
I realize that there are safe experiments that would help the scholars of the world a lot in their theorizing about alignment, but I also expect that the scholars / theorist could probably settle on a safe effective plan without the benefit of experiments beyond the experiments that have been run up to now even though it might take them longer than it would with the benefit. Do you agree?
Certainly, working with human-level AIs gives you learning opportunities that are simply unavailable otherwise. It is likewise certainly true that there are things you can learn about how to fight a grizzly bear that you can learn only while fighting a grizzly bear. That does not mean that you should get into a fight with a grizzly bear when you are armed only with your bare hands. Sadly, humanity currently has no weapon other than it’s own bare hands (metaphorically speaking) in this particular fight with the grizzly bear. Acquiring an effective weapon will probably take a few more decades, and it would be foolish to engage in any combat with the bear before then.
The reason I believe it will take a few more decades to acquire the metaphorical effective weapon is that MIRI thinks it will take at least 3 more decades to figure out how to align an AI more capable than us. (They’ve been working on the problem for about 20 years now.)
In this metaphor, the grizzly bear is not really an AI, but rather the community of AI capability researchers, constantly sharing information, and the human combatant is all of humanity, which is being dragged into the fight by the capability researchers and also by those “alignment” researchers who are up for fighting with bare hands (although that of course is not how they would describe it).
Also, it is hard to know before launching an AI whether it will be human-level or ten times human-level. Were any of the designers of GPT-4 able, for example, to predict that GPT-4 would score in the 90th percentile on the Uniform Bar Exam? No they were not able to. They were as surprised as the rest of us.
After I wrote my paragraph about the clock (which you quoted) I noticed that the Bulletin has expanded the meaning of the clock to include risks from climate change, i.e., the clock is no longer about nuclear war specifically, so I deleted that paragraph.
First, thank you for the informative post.
they claim that we’re closer to nuclear war than any time since the Cuban Missile Crisis, which is clearly nonsense
John Mearsheimer also believes P(nuclear war) is higher now that at any time during the Cold War. If you like, I can try to find where he says that in a video of one of his speeches and interviews.
His reasoning is that the US national-security establishment has become much less competent since the decisive events of the Cold War with the result that Russia is much more likely to choose to start a nuclear war than they would be if the security situation the US currently finds itself in were being managed by the US’s Cold-War leaders. I’m happy to elaborate if there is interest.
If a permanent ban went into effect today on training ML models on anything larger than a single consumer-grade GPU card, e.g., Nvidea RTX 40 series, the work of MIRI researcher Scott Garrabrant would not be affected at all. How much of Redwood’s research would stop?
This needs more specificity. Obviously for Garrabrant’s work to have any effect, it will need to influence the design and deployment of an AI eventually; it’s just that is his research approach is probably decades away from when it can be profitably applied to an actual deployed AI. On the other hand, any AI researcher can remain productive if denied the use of a GPU cluster for a week: for example, he or she can use the week to tidy up his or her office and do related “housekeeping” tasks. I guess what I want to know is if there is a ban on GPU clusters, how long—weeks? months? years? -- can the median researcher at Redwood remain productive without abandoning most of his or her work up to now if there is a ban? And is there any researcher at Redwood doing work that is a lot more robust against such a ban than the median researcher at Redwood?
If you (the team that wrote this post) had the power to decide which organizations will get shut down (with immediate effect) would Redwood be one of the orgs you shut down? Assume that you had enough power that if you chose to, you could shut down all meaningful research on AI and that you could be as selective as you like about which organizations and parts (e.g., academic departments) of organizations to shut down.
Thanks in advance.
I am willing to bite your bullet.
I had a comment here explaining my reasoning, but deleted it because I plan to make a post instead.
I’m curious about your “to begin with”: do you interpret Tamay as saying that doom is improbable even if little-to-no measures are taken to address AI safety?