Thank you! Yes, stories about how the movies “War Games” and “The Day After” changed Ronald Reagan’s mind about cyberattacks and the risk of a global nuclear war were in part inspiration for this AI Safety Camp project. Stories can indeed be powerful.
Karl von Wendt
A Friendly Face (Another Failure Story)
Thank you very much! The cell comparison is very interesting.
OK, I updated: risk is less straightforward than I thought. While the AIs do call copies of themselves, rLLMs can’t really undergo a runaway replication cascade unless they can call themselves as “daemons” in separate threads (so that the control loop doesn’t have to wait for the output before continuing). And I currently don’t see an obvious profit motive to do so.
I’m not sure if I understand your point correctly. The LLMs wouldn’t have to be replicated, because different copies of the self-replicating agent could access the same LLM in parallel, just like many human users can access ChatGPT at the same time. At the later stage, when the LLM operators try to block access to their LLMs or even take them offline, the agent would have to find a way to replicate at least one LLM just once and run it on sufficiently powerful hardware to use it as (part of) its “brain”.
Agentic Mess (A Failure Story)
I’ll bring a simple board game about AI safety that I’ve developed recently in case anyone wants to do an initial test (not the one that was tested on the AISER, which was way too complex and slow ;).
Coordination by common knowledge to prevent uncontrollable AI
I fully agree, see this post.
As I have argued here in more detail, we don’t need AGI for an amazing future, including curing cancer. We don’t have to decide between “all in for AGI” and “full-stop in developing AI”. There’s a middle ground, and I think it’s the best option we have.
We don’t need AGI for an amazing future
EA Hamburg Session – Uncontrollable AI
VIRTUA: a novel about AI alignment
Policymakers and people in industry, at least till ChatGPT had no idea what was going on (e.g at the AI World Summit, 2 months ago very few people even knew about GPT-3). SOTA large language models are not really properly deployed, so nobody cared about them or even knew about them (till ChatGPT at least).
As you point out yourself, what makes people interested in developing AGI is progress in AI, not the public discussion of potential dangers. “Nobody cared about” LLMs is certainly not true—I’m pretty sure the relevant people watched them closely. That many people aren’t concerned about AGI or doubting its feasibility by now only means that THOSE people will not pursue it, and any public discussion will probably not change their minds. There are others who think very differently, like the people at OpenAI, Deepmind, Google, and (I suspect) a lot of others who communicate less openly about what they do.
I agree that [a common understanding of the dangers] would be something good to have. But the question is: is it even possible to have such a thing?
I think that within the scientific community, it’s roughly possible (but then your book/outreach medium must be highly targeted towards that community). Within the general public, I think that it’s ~impossible.
I don’t think you can easily separate the scientific community from the general public. Even scientific papers are read by journalists, who often publish about them in a simplified or distorted way. Already there are many alarming posts and articles out there, as well as books like Stuart Russell’s “Human Compatible” (which I think is very good and helpful), so keeping the lid on the possibility of AGI and its profound impacts is way too late (it was probably too late already when Arthur C. Clarke wrote “2001 - A Space Odyssey”). Not talking about the dangers of uncontrollable AI for fear that this may lead to certain actors investing even more heavily in the field is both naive and counterproductive in my view.
And I would strongly recommend not publishing your book as long as you haven’t done that.
I will definitely publish it, but I doubt very much that it will have a large impact. There are many other writers out there with a much larger audience who write similar books.
I also hope that a lot of people who have thought about these issues have proofread your book because it’s the kind of thing that could really increase P(doom) substantially.
I’m currently in the process of translating it to English so I can do just that. I’ll send you a link as soon as I’m finished. I’ll also invite everyone else in the AI safety community (I’m probably going to post an invite on LessWrong).
Concerning the Putin quote, I don’t think that Russia is at the forefront of development, but China certainly is. Xi has said similar things in public, and I doubt very much that we know how much they currently spend on training their AIs. The quotes are not relevant, though, I just mentioned them to make the point that there is already a lot of discussion about the enormous impact AI will have on our future. I really can’t see how discussing the risks should be damaging, while discussing the great potential of AGI for humanity should not.
I strongly disagree with “Avoid publicizing AGI risk among the general public” (disclaimer: I’m a science fiction novelist about to publish a novel about AGI risk, so I may be heavily biased). Putin said in 2017 that “the nation that leads in AI will be the ruler of the world”. If anyone who could play any role at all in developing AGI (or uncontrollable AI as I prefer to call it) isn’t trying to develop it by now, I doubt very much that any amount of public communication will change that.
On the other hand, I believe our best chance of preventing or at least slowing down the development of uncontrollable AI is a common, clear understanding of the dangers, especially among those who are at the forefront of development. To achieve that, a large amount of communication will be necessary, both within development and scientific communities and in the public.
I see various reasons for that. One is the availability heuristic: People don’t believe there is an AI x-risk because they’ve never seen it happen outside of science fiction movies and nobody but a few weird people in the AI safety community is talking seriously about it (very similar to climate change a few decades ago). Another reason is social acceptance: As long as everyone thinks AI is great and the nation with the most AI capabilities wins, if you’re working on AI capabilities, you’re a hero. On the other hand, if most people think that strong AI poses a significant risk to their future and that of their kids, this might change how AI capabilities researchers are seen, and how they see themselves. I’m not suggesting disparaging people working at AI labs, but I think working in AI safety should be seen as “cool”, while blindly throwing more and more data and compute at a problem and see what happens should be regarded as “uncool”.
Thanks for that!
Uncontrollable AI as an Existential Risk
Let’s talk about uncontrollable AI
You could have an AI with some meta-cognition, able to figure out what’s good and maximizing it in the same way EAs try to figure out what’s good and maximize it with parts of their life.
I’m not sure how that would work, but we don’t need to discuss it further, I’m no expert.
I don’t think it’s a good method and I think you should target a much more specific public but yes, I know what you mean.
What exactly do you think is “not good” about a public discussion of AI risks?
The superintelligence is misaligned with our own objectives but is benign
I don’t see how this is possible. There is nothing like “a little misalignment”. Keep in mind that creating an unstoppable and uncontrollable AI is a one-shot event that can’t be undone and will have extremely wide and long-term effects on everything. If this AI is misaligned even very slightly, the differences between its goals and humanity’s will aggregate and increase over time. It’s similar to launching a rocket without any steering mechanism with the aim of landing it on the Jupiter moon Europa: You have to set every parameter exactly right or the rocket will miss the target by far. Even the slightest deviance, like e.g. an unaccounted-for asteroid passing by close to the rocket and altering its course very slightly due to gravitational effects, will completely ruin the mission.
On the other hand, if we manage to build an AGI that is “docile” and “corrigible” (which I doubt very much we can do), this would be similar to having a rocket that can be steered from afar: In this case, I would say it is fully aligned, even if corrections are necessary once in a while.
Should we end up with both—a misaligned and an aligned AGI, or more of them—it is very likely that the worst AGI (from humanity’s perspective) will win the battle for world supremacy, so this is more or less the same as just having one misaligned AGI.
My personal view on your subject is that you don’t have to work in AI to shape its future. You can also do that by bringing the discussion into the public and create awareness for the dangers. This is especially relevant, and may even be more effective than a career in an AI lab, if our only chance for survival is to prevent a misaligned AI, at least until we have solved alignment (see my post on “red lines”).
Thanks! I should point out that this isn’t my work alone, we were a team of six developing the story. Unfortunately, the co-authors Artem, Daniel, Ishan, Peter, and Sofia are not visible in this cross-post from LessWrong.