Hi I’m Steve Byrnes, an AGI safety researcher in Boston, MA, USA, with a particular focus on brain algorithms—see https://sjbyrnes.com/agi.html
I really liked this!!!
Since you asked for feedback, here’s a little suggestion, take it or leave it: I found a couple things at the end slightly out-of-place, in particular “If you choose to tackle the problem of nuclear security, what angle can you attack the problem from that will give you the most fulfillment?” and “Do any problems present even bigger risks than nuclear war?”
Immediately after such an experience, I think the narrator would not be thinking about option of not bothering to work on nuclear security because other causes are more important, nor thinking about their own fulfillment. If other causes came to mind, I imagine it would be along the lines of “if I somehow manage to stop the nuclear war, what other potential catastrophes are waiting in the wings, ready to strike anytime in the months and years after that—and this time with no reset button?”
Or if you want it to fit better as written now, then shortly after the narrator snaps back to age 18 the text could say something along the lines of “You know about chaos theory and the butterfly effect; this will be a new re-roll of history, and there might not be a nuclear war this time around. Maybe last time was a fluke?” Then that might remove some of the single-minded urgency that I would otherwise expect the narrator to feel, and thus it would become a bit more plausible that the narrator might work on pandemics or whatever.
(Maybe that “new re-roll of history” idea is what you had in mind? Whereas I was imagining the Groundhog Day / Edge of Tomorrow / Terminator trope where the narrator knows 100% for sure that there will be a nuclear war on this specific hour of this specific day, if the narrator doesn’t heroically stop it.)
(I’m not a writer, don’t trust my judgment.)
Hmm, yeah, I guess you’re right about that.
Oh, you said “evolution-type optimization”, so I figured you were thinking of the case where the inner/outer distinction is clear cut. If you don’t think the inner/outer distinction will be clear cut, then I’d question whether you actually disagree with the post :) See the section defining what I’m arguing against, in particular the “inner as AGI” discussion.
Nah, I’m pretty sure the difference there is “Steve thinks that Jacob is way overestimating the difficulty of humans building AGI-capable learning algorithms by writing source code”, rather than “Steve thinks that Jacob is way underestimating the difficulty of computationally recapitulating the process of human brain evolution”.
For example, for the situation that you’re talking about (I called it “Case 2” in my post) I wrote “It seems highly implausible that the programmers would just sit around for months and years and decades on end, waiting patiently for the outer algorithm to edit the inner algorithm, one excruciatingly-slow step at a time. I think the programmers would inspect the results of each episode, generate hypotheses for how to improve the algorithm, run small tests, etc.” If the programmers did just sit around for years not looking at the intermediate training results, yes I expect the project would still succeed sooner or later. I just very strongly expect that they wouldn’t sit around doing nothing.
AlphaGo has a human-created optimizer, namely MCTS. Normally people don’t use the term “mesa-optimizer” for human-created optimizers.
Then maybe you’ll say “OK there’s a human-created search-based consequentialist planner, but the inner loop of that planner is a trained ResNet, and how do you know that there isn’t also a search-based consequentialist planner inside each single run through the ResNet?”
Admittedly, I can’t prove that there isn’t. I suspect that there isn’t, because there seems to be no incentive for that (there’s already a search-based consequentialist planner!), and also because I don’t think ResNets are up to such a complicated task.
I find most justifications and arguments made in favor of a timeline of less than 50 years to be rather unconvincing.
If we don’t have convincing evidence in favor of a timeline <50 years, and we also don’t have convincing evidence in favor of a timeline ≥50 years, then we just have to say that this is a question on which we don’t have convincing evidence of anything in particular. But we still have to take whatever evidence we have and make the best decisions we can. ¯\_(ツ)_/¯
(You don’t say this explicitly but your wording kinda implies that ≥50 years is the default, and we need convincing evidence to change our mind away from that default. If so, I would ask why we should take ≥50 years to be the default. Or sorry if I’m putting words in your mouth.)
I am simply not able to understand why we are significantly closer to AGI today than we were in 1950s
Lots of ingredients go into AGI, including (1) algorithms, (2) lots of inexpensive chips that can do lots of calculations per second, (3) technology for fast communication between these chips, (4) infrastructure for managing large jobs on compute clusters, (5) frameworks and expertise in parallelizing algorithms, (6) general willingness to spend millions of dollars and roll custom ASICs to run a learning algorithm, (7) coding and debugging tools and optimizing compilers, etc. Even if you believe that you’ve made no progress whatsoever on algorithms since the 1950s, we’ve made massive progress in the other categories. I think that alone puts us “significantly closer to AGI today than we were in the 1950s”: once we get the algorithms, at least everything else will be ready to go, and that wasn’t true in the 1950s, right?
But I would also strongly disagree with the idea that we’ve made no progress whatsoever on algorithms since the 1950s. Even if you think that GPT-3 and AlphaGo have absolutely nothing whatsoever to do with AGI algorithms (which strikes me as an implausibly strong statement, although I would endorse much weaker versions of that statement), that’s far from the only strand of research in AI, let alone neuroscience. For example, there’s a (IMO plausible) argument that PGMs and causal diagrams will be more important to AGI than deep neural networks are. But that would still imply that we’ve learned AGI-relevant things about algorithms since the 1950s. Or as another example, there’s a (IMO misleading) argument that the brain is horrifically complicated and we still have centuries of work ahead of us in understanding how it works. But even people who strongly endorse that claim wouldn’t also say that we’ve made “no progress whatsoever” in understanding brain algorithms since the 1950s.
Sorry if I’m misunderstanding.
isn’t there an infinite degree of freedom associated with a continuous function?
I’m a bit confused by this; are you saying that the only possible AGI algorithm is “the exact algorithm that the human brain runs”? The brain is wired up by a finite number of genes, right?
most contemporary progress on AI happens by running base-optimizers which could support mesa-optimization
GPT-3 is of that form, but AlphaGo/MuZero isn’t (I would argue).
I’m not sure how to settle whether your statement about “most contemporary progress” is right or wrong. I guess we could count how many papers use model-free RL vs model-based RL, or something? Well anyway, given that I haven’t done anything like that, I wouldn’t feel comfortable making any confident statement here. Of course you may know more than me! :-)
If we forget about “contemporary progress” and focus on “path to AGI”, I have a post arguing against what (I think) you’re implying at Against evolution as an analogy for how humans will create AGI, for what it’s worth.
Ideally we’d want a method for identifying valence which is more mechanistic that mine. In the sense that it lets you identify valence in a system just by looking inside the system without looking at how it was made.
Yeah I dunno, I have some general thoughts about what valence looks like in the vertebrate brain (e.g. this is related, and this) but I’m still fuzzy in places and am not ready to offer any nice buttoned-up theory. “Valence in arbitrary algorithms” is obviously even harder by far. :-)
Have you read https://www.cold-takes.com/where-ai-forecasting-stands-today/ ?
I do agree that there are many good reasons to think that AI practitioners are not AI forecasting experts, such as the fact that they’re, um, obviously not—they generally have no training in it and have spent almost no time on it, and indeed they give very different answers to seemingly-equivalent timelines questions phrased differently. This is a reason to discount the timelines that come from AI practitioner surveys, in favor of whatever other forecasting methods / heuristics you can come up with. It’s not per se a reason to think “definitely no AGI in the next 50 years”.
Well, maybe I should just ask: What probability would you assign to the statement “50 years from today, we will have AGI”? A couple examples:
If you think the probability is <90%, and your intention here is to argue against people who think it should be >90%, well I would join you in arguing against those people too. This kind of technological forecasting is very hard and we should all be pretty humble & uncertain here. (Incidentally, if this is who you’re arguing against, I bet that you’re arguing against fewer people than you imagine.)
If you think the probability is <10%, and your intention here is to argue against people who think it should be >10%, then that’s quite a different matter, and I would strongly disagree with you, and I would very curious how you came to be so confident. I mean, a lot can happen in 50 years, right? What’s the argument?
Let’s say a human writes code more-or-less equivalent to the evolved “code” in the human genome. Presumably the resulting human-brain-like algorithm would have valence, right? But it’s not a mesa-optimizer, it’s just an optimizer. Unless you want to say that the human programmers are the base optimizer? But if you say that, well, every optimization algorithm known to humanity would become a “mesa-optimizer”, since they tend to be implemented by human programmers, right? So that would entail the term “mesa-optimizer” kinda losing all meaning, I think. Sorry if I’m misunderstanding.
Addendum: In the other direction, one could point out that the authors were searching for “an approximation of an approximation of a neuron”, not “an approximation of a neuron”. (insight stolen from here.) Their ground truth was a fancier neuron model, not a real neuron. Even the fancier model is a simplification of real life. For example, if I recall correctly, neurons have been observed to do funny things like store state variables via changes in gene expression. Even the fancier model wouldn’t capture that. As in my parent comment, I think these kinds of things are highly relevant to simulating worms, and not terribly relevant to reverse-engineering the algorithms underlying human intelligence.
It’s possible much of that supposed additional complexity isn’t useful
Yup! That’s where I’d put my money.
It’s a forgone conclusion that a real-world system has tons of complexity that is not related to the useful functions that the system performs. Consider, for example, the silicon transistors that comprise digital chips—”the useful function that they perform” is a little story involving words like “ON” and “OFF”, but “the real-world transistor” needs three equations involving 22 parameters, to a first approximation!
By the same token, my favorite paper on the algorithmic role of dendritic computation has them basically implementing a simple set of ANDs and ORs on incoming signals. It’s quite likely that dendrites do other things too besides what’s in that one paper, but I think that example is suggestive.
Caveat: I’m mainly thinking of the complexity of understanding the neuronal algorithms involved in “human intelligence” (e.g. common sense, science, language, etc.), which (I claim) are mainly in the cortex and thalamus. I think those algorithms need to be built out of really specific and legible operations, and such operations are unlikely to line up with the full complexity of the input-output behavior of neurons. I think the claim “the useful function that a neuron performs is simpler than the neuron itself” is always true, but it’s very strongly true for “human intelligence” related algorithms, whereas it’s less true in other contexts, including probably some brainstem circuits, and the neurons in microscopic worms. It seems to me that microscopic worms just don’t have enough neurons to not squeeze out useful functionality from every squiggle in their neurons’ input-output relations. And moreover here we’re not talking about massive intricate beautifully-orchestrated learning algorithms, but rather things like “do this behavior a bit less often when the temperature is low” etc. See my post Building brain-inspired AGI is infinitely easier than understanding the brain for more discussion kinda related to this.
See here, the first post is a video of a research meeting where he talks dismissively about Stuart Russell’s argument, and then the ensuing forum discussion features a lot of posts by me trying to sell everyone on AI risk :-P
(Other context here.)
There was a 2020 documentary We Need To Talk About AI. All-star lineup of interviewees! Stuart Russell, Roman Yampolskiy, Max Tegmark, Sam Harris, Jurgen Schmidhuber, …. I’ve seen it, but it appears to be pretty obscure, AFAICT.
I happened to watch the 2020 Melissa McCarthy film Superintelligence yesterday. It’s umm, not what you’re looking for. The superintelligent AI’s story arc was a mix of 20% arguably-plausible things that experts say about superintelligent AGI, and 80% deliberately absurd things for comedy. I doubt it made anyone in the audience think very hard about anything in particular. (I did like it as a romantic comedy :-P )
There’s some potential tension between “things that make for a good movie” and “realistic”, I think.
I saw Jeff Hawkins mention (in some online video) that someone had sent Human Compatible to him unsolicited but he didn’t say who. And then (separately) a bit later the mystery was resolved: I saw some EA-affiliated person or institution mention that they had sent Human Compatible to a bunch of AI researchers. But I can’t remember where I saw that, or who it was. :-(
No I don’t think we’ve met! In 2016 I was a professional physicist living in Boston. I’m not sure if I would have even known what “EA” stood for in 2016. :-)
It also seems like the technical problem does get easier in expectation if you have more than one shot. By contrast, I claim, many of the Moloch-style problems get harder.
I agree. But maybe I would have said “less hard” rather than “easier” to better convey a certain mood :-P
It does seem like within technical AI safety research the best work seems to shift away from Agent Foundations type of work and towards neural-nets-specific work.
I’m not sure what your model is here.
Maybe a useful framing is “alignment tax”: if it’s possible to make an AI that can do some task X unsafely with a certain amount of time/money/testing/research/compute/whatever, then how much extra time/money/etc. would it take to make an AI that can do task X safely? That’s the alignment tax.
The goal is for the alignment tax to be as close as possible to 0%. (It’s never going to be exactly 0%.)
In the fast-takeoff unipolar case, we want a low alignment tax because some organizations will be paying the alignment tax and others won’t, and we want one of the former to win the race, not one of the latter.
In the slow-takeoff multipolar case, we want a low alignment tax because we’re asking organizations to make tradeoffs for safety, and if that’s a very big ask, we’re less likely to succeed. If the alignment tax is 1%, we might actually succeed. Remember, that there are many reasons that organizations are incentivized to make safe AIs, not least because they want the AIs to stay under their control and do the things they want them to do, not to mention legal risks, reputation risks, employees who care about their children, etc. etc. So if all we’re asking is for them to spend 1% more training time, maybe they all will. If instead we’re asking them all to spend 100× more compute plus an extra 3 years of pre-deployment test protocols, well, that’s much less promising.
So either way, we want a low alignment tax.
OK, now let’s get back to what you wrote.
I think maybe your model is:
“If Agent Foundations research pans out at all, it would pan out by discovering a high-alignment-tax method of making AGI”
(You can correct me if I’m misunderstanding.)
If we accept that premise, then I can see where you’re coming from. This would be almost definitely useless in a multipolar slow-takeoff world, and merely “probably useless” in a unipolar fast-takeoff world. (In the latter case, there’s at least a prayer of a chance that the safe actors will be so far ahead of the unsafe actors that the former can pay the tax and win the race anyway.)
But I’m not sure that I believe the premise. Or at least I’m pretty unsure. I am not myself an Agent Foundations researcher, but I don’t imagine that Agent Foundations researchers would agree with the premise that high-alignment-tax AGI is the best that they’re hoping for in their research.
Oh, hmmm, the other possibility is that you’re mentally lumping together “multipolar slow-takeoff AGI” with “prosaic AGI” and with “short timelines”. These are indeed often lumped together, even if they’re different things. Anyway, I would certainly agree that both “prosaic AGI” and “short timelines” would make Agent Foundations research less promising compared to neural-net-specific work.
I think that “AI alignment research right now” is a top priority in unipolar fast-takeoff worlds, and it’s also a top priority in multipolar slow-takeoff worlds. (It’s certainly not the only thing to do—e.g. there’s multipolar-specific work to do, like the links in Jonas’s answer on this page, or here etc.)
(COI note: I myself am doing “AI alignment research right now” :-P )
First of all, in the big picture, right now humanity is simultaneously pursuing many quite different research programs towards AGI (I listed a dozen or so here (see Appendix)). If more than one of them is viable (and I think that’s likely), then in a perfect world we would figure out which of them has the best hope of leading to Safe And Beneficial AGI, and differentially accelerate that one (and/or differentially decelerate the others). This isn’t happening today—that’s not how most researchers are deciding what AI capabilities research to do, and it’s not how most funding sources are deciding what AI capabilities research to fund. Could it happen in the future? Yes, I think so! But only if...
AI alignment researchers figure out which of these AGI-relevant research programs is more or less promising for safety,
…and broadly communicate that information to experts, using legible arguments…
…and do it way in advance of any of those research programs getting anywhere close to AGI
The last one is especially important. If some AI research program has already gotten to the point of super-powerful proto-AGI source code published on GitHub, there’s no way you’re going to stop people from using and improving it. Whereas if the research program is still very early-stage and theoretical, and needs many decades of intense work and dozens more revolutionary insights to really start getting powerful, then we have a shot at this kind of differential technological development strategy being viable.
(By the same token, maybe it will turn out that there’s no way to develop safe AGI, and we want to globally ban AGI development. I think if a ban were possible at all, it would only be possible if we got started when we’re still very far from being able to build AGI.)
So for example, if it’s possible to build a “prosaic” AGI using deep neural networks, nobody knows whether it would be possible to control and use it safely. There are some kinda-illegible intuitive arguments on both sides. Nobody really knows. People are working on clarifying this question, and I think they’re making some progress, and I’m saying that it would be really good if they could figure it out one way or the other ASAP.
Second of all, slow takeoff doesn’t necessarily mean that we can just wait and solve the alignment problem later. Sometimes you can have software right in front of you, and it’s not doing what you want it to do, but you still don’t know how to fix it. The alignment problem could be like that.
One way to think about it is: How slow is slow takeoff, versus how long does it take to solve the alignment problem? We don’t know.
Also, how much longer would it take, once somebody develops best practices to solve the alignment problem, for all relevant actors to reach a consensus that following those best practices is a good idea and in their self-interest? That step could add on years, or even decades—as they say, “science progresses one funeral at a time”, and standards committees work at a glacial pace, to say nothing of government regulation, to say nothing of global treaties.
Anyway, if “slow takeoff” is 100 years, OK fine, that’s slow enough. If “slow takeoff” is ten years, maybe that’s slow enough if the alignment problem happens to have an straightforward, costless, highly-legible and intuitive, scalable solution that somebody immediately discovers. Much more likely, I think we would need to be thinking about the alignment problem in advance.
For more detailed discussion, I have my own slow-takeoff AGI doom scenario here. :-P
(not an expert) My impression is that a perfectly secure OS doesn’t buy you much if you use insecure applications on an insecure network etc.
Also, if you think about classified work, the productivity tradeoff is massive: you can’t use your personal computer while working on the project, you can’t use any of your favorite software while working on the project, you can’t use an internet-connected computer while working on the project, you can’t have your cell phone in your pocket while talking about the project, you can’t talk to people about the project over normal phone lines and emails… And then of course viruses get into air-gapped classified networks within hours anyway. :-P
Not that we can’t or shouldn’t buy better security, I’m just slightly skeptical of specifically focusing on building a new low-level foundation rather than doing all the normal stuff really well, like network traffic monitoring, vetting applications and workflows, anti-spearphishing training, etc. etc. Well, I guess you’ll say, “we should do both”. Sure. I guess I just assume that the other things would rapidly become the weakest link.
In terms of low-level security, my old company has a big line of business designing chips themselves to be more secure; they spun out Dover Microsystems to sell that particular technology to commercial (as opposed to military) customers. Just FYI, that’s just one thing I happen to be familiar with. Actually I guess it’s not that relevant.
Hmm, I guess I wasn’t being very careful. Insofar as “helping future humans” is a different thing than “helping living humans”, it means that we could be in a situation where the interventions that are optimal for the former are very-sub-optimal (or even negative-value) for the latter. But it doesn’t mean we must be in that situation, and in fact I think we’re not.
I guess if you think: (1) finding good longtermist interventions is generally hard because predicting the far-future is hard, but (2) “preventing extinction (or AI s-risks) in the next 50 years” is an exception to that rule; (3) that category happens to be very beneficial for people alive today too; (4) it’s not like we’ve exhausted every intervention in that category and we’re scraping the bottom of the barrel for other things … If you believe all those things, then in that case, it’s not really surprising if we’re in a situation where the tradeoffs are weak-to-nonexistent. Maybe I’m oversimplifying, but something like that I guess?
I suspect that if someone had an idea about an intervention that they thought was super great and cost effective for future generations and awful for people alive today, well they would probably post that idea on EA Forum just like anything else, and then people would have a lively debate about it. I mean, maybe there are such things...Just nothing springs to my mind.
I feel like that guy’s got a LOT of chutzpah to not-quite-say-outright-but-very-strongly-suggest that the Effective Altruism movement is a group of people who don’t care about the Global South. :-P
More seriously, I think we’re in a funny situation where maybe there are these tradeoffs in the abstract, but they don’t seem to come up in practice.
Like in the abstract, the very best longtermist intervention could be terrible for people today. But in practice, I would argue that most if not all current longtermist cause areas (pandemic prevention, AI risk, preventing nuclear war, etc.) are plausibly a very good use of philanthropic effort even if you only care about people alive today (including children).
Or, in the abstract, AI risk and malaria are competing for philanthropic funds. But in practice, a lot of the same people seem to care about both, including many of the people that the article (selectively) quotes. …And meanwhile most people in the world care about neither.
I mean, there could still be an interesting article about how there are these theoretical tradeoffs between present and future generations. But it’s misleading to name names and suggest that those people would gleefully make those tradeoffs, even if it involves torturing people alive today or whatever. Unless, of course, there’s actual evidence that they would do that. (The other strong possibility is, if actually faced with those tradeoffs in real life, they would say, “Uh, well, I guess that’s my stop, this is where I jump off the longtermist train!!”).
Anyway, I found the article extremely misleading and annoying. For example, the author led off with a quote where Jaan Tallinn says directly that climate change might be an existential risk (via a runaway scenario), and then two paragraphs later the author is asking “why does Tallinn think that climate change isn’t an existential risk?” Huh?? The article could have equally well said that Jaan Tallinn believes that climate change is “very plausibly an existential risk”, and Jaan Tallinn is the co-founder of an organization that does climate change outreach among other things, and while climate change isn’t a principal focus of current longtermist philanthropy, well, it’s not like climate change is a principal focus of current cancer research philanthropy either! And anyway it does come up to a reasonable extent, with healthy discussions focusing in particular on whether there are especially tractable and neglected things to do.
So anyway, I found the article very misleading.
(I agree with Rohin that if people are being intimidated, silenced, or cancelled, then that would be a very bad thing.)
Just one guy, but I have no idea how I would have gotten into AGI safety if not for LW … I had a full-time job and young kids and not-obviously-related credentials. But I could just come out of nowhere in 2019 and start writing LW blog posts and comments, and I got lots of great feedback, and everyone was really nice. I’m full-time now, here’s my writings, I guess you can decide whether they’re any good :-P