“Intro to brain-like-AGI safety” series—just finished!

(Previously on EAF: “Intro to brain-like-AGI safety” series—halfway point!)

For those who aren’t regular readers of alignmentforum or lesswrong, I’ve been writing a 15-part post series “Intro to Brain-Like-AGI Safety”. And the final post is now posted! 🥳🎉🎊

Some key claims that I argue for in the series

(copied from the final post)

  1. We know enough neuroscience to say concrete things about what “brain-like AGI” would look like (Posts #1#9);

  2. In particular, while “brain-like AGI” would be different from any known algorithm, its safety-relevant aspects would have much in common with actor-critic model-based reinforcement learning with a multi-dimensional value function (Posts #6, #8, #9);

  3. “Understanding the brain well enough to make brain-like AGI” is a dramatically easier task than “understanding the brain” full stop—if the former is loosely analogous to knowing how to train a ConvNet, then the latter would be loosely analogous to knowing how to train a ConvNet, and achieving full mechanistic interpretability of the resulting trained model, and understanding every aspect of integrated circuit physics and engineering, etc. Indeed, making brain-like AGI should not be thought of as a far-off sci-fi hypothetical, but rather as an ongoing project which may well reach completion within the next decade or two (Posts #2#3);

  4. In the absence of a good technical plan for avoiding accidents, researchers experimenting with brain-like AGI algorithms will probably accidentally create out-of-control AGIs, with catastrophic consequences up to and including human extinction (Posts #1, #3, #10, #11);

  5. Right now, we don’t have any good technical plan for avoiding out-of-control AGI accidents (Posts #10#14);

  6. Creating such a plan seems neither to be straightforward, nor to be a necessary step on the path to creating powerful brain-like AGIs—and therefore we shouldn’t assume that such a plan will be created in the future “by default” (Post #3);

  7. There’s a lot of work that we can do right now to help make progress towards such a plan (Posts #12#15).

General notes

  • The series has a total length comparable to a 300-page book. But I tried to make to easy to skim and skip around. In particular, every post starts with a summary and table of contents.

  • The last post lists seven open problems /​ projects that I think would help with brain-like-AGI safety. I’d be delighted to discuss these more and flesh them out with potential researchers, potential funders, people who think they’re stupid or counterproductive, etc.

  • General discussion is welcome here, or at the last post, or you can reach out to me by email. :-)

Copied from Post #1
Copied from the final post
No comments.