I endorse Tsvi’s comment above. I’ll add that it’s hard to say how close we are to closing basic gaps in understanding of things like “good reasoning”, because mathematical insight is notoriously difficult to predict. All I can say is that logical induction does seem like progress to me, and we’re taking various different approaches on the remaining problems. Also, yeah, one of those avenues is a follow-up to PPRHOL. (One experiment we’re running now is an attempt to implement a cellular automaton in HOL that implements a reflective reasoner with access to the source code of the world, where the reasoner uses HOL to reason about the world and itself. The idea is to see whether we can get the whole stack to work simultaneously, and to smoke out all the implementation difficulties that arise in practice when you try to use a language like HOL for reasoning about HOL.)
So8res
Good question. The main effect is that I’ve increased my confidence in the vague MIRI mathematical intuitions being good, and the MIRI methodology for approaching big vague problems actually working. This doesn’t constitute a very large strategic shift, for a few reasons. One reason is that my strategy was already predicated on the idea that our mathematical intuitions and methodology are up to the task. As I said in last year’s AMA, visible progress on problems like logical uncertainty (and four other problems) were one of the key indicators of success that I was tracking; and as I said in February, failure to achieve results of this caliber in a 5-year timeframe would have caused me to lose confidence in our approach. (As of last year, that seemed like a real possibility.) The logical induction result increases my confidence in our current course, but it doesn’t shift it much.
Another reason logical induction doesn’t affect my strategy too much is that it isn’t that big a result. It’s one step on a path, and it’s definitely mathematically exciting, and it gives answers to a bunch of longstanding philosophical problems, but it’s not a tool for aligning AI systems on the object level. We’re building towards a better understanding of “good reasoning”, and we expect this to be valuable for AI alignment, and logical induction is a step in that direction, but it’s only one step. It’s not terribly useful in isolation, and so it doesn’t call for much change in course.
Thanks for the write-up, Rob. OpenPhil actually decided to evaluate our technical agenda last summer, and Holden put Daniel Dewey on the job. The report isn’t done yet, in part because it has proven very time-intensive to fully communicate the reasoning behind our research priorities, even to someone with as much understanding of the AI landscape as Daniel Dewey. Separately, we have plans to get an independent evaluation of our organizational efficacy started later in 2016, which I expect to be useful for our admin team as well as prospective donors.
FYI, when it comes to evaluating our research progress, I doubt that the methods you propose would get you much Bayesian evidence. Our published output will look like round pegs shoved into square holes regardless of whether we’re doing our jobs well or poorly, because we’re doing research that doesn’t fit neatly into an existing academic niche. Our objective is to make direct progress on what appear to us to be the main neglected technical obstacles to developing reliable AI systems in the long term, with a goal of shifting the direction of AI research in a big way once we hit certain key research targets; and we’re specifically targeting research that isn’t compatible with industry’s economic incentives or academia’s publish-or-perish incentives. To get information about how well we’re doing our jobs, I think the key questions to investigate are (1) whether we’ve chosen good research targets; and (2) whether we’re making good progress towards them.
We’ve been focusing our communication efforts mainly on helping people evaluate (1): I’ve been working on explaining our approach and agenda, and OpenPhil is also on the job. To investigate (2), we’d need to spend a sizable chunk of time with mathematically adept evaluators — we still haven’t hit any of our key research targets, which means that evaluating our progress requires understanding our smaller results and why we think they’re progress towards the big results. In practice, we’ve found that explaining this usually requires explaining why we think the big targets are vital, as this informs (e.g.) which shortcuts are and are not acceptable. I plan to wait until after the OpenPhil report is finished before taking on another time-intensive eval.
Fortunately, (2) will become much easier to evaluate as we achieve (or persistently fail to achieve) those key targets. This also provides us with an opportunity to test our approach and methodology. People who understand our approach and find it uncompelling often predict that some of the results we’re shooting for cannot be achieved. This means we’ll get some evidence about (1) as we learn more about (2). For example, last year I mentioned “naturalized AIXI” as an ambitious 5-year research target. If we are not able to make concrete progress towards that goal, then over the next four years, I will lose confidence in our approach and eventually change our course dramatically. Conversely, if we make discoveries that are important pieces of that puzzle, I’ll update in favor of us being onto something, especially if we find puzzle pieces that knowledgeable critics predicted we wouldn’t find. This data will hopefully start rolling in soon, now that our research team is getting up to size.
(“Concrete progress” / “important puzzle pieces” in this case are satisfactory asymptotic algorithms for any of: (1) reasoning under logical uncertainty; (2) identifying the best available decision with respect to a utility function; (3) performing induction from inside an environment; (4) identifying the referents of goals in realistic world-models; and (5) reasoning about the behavior of smarter reasoners; the last of which is hopefully a subset of 1 and 2. The linked papers give rough descriptions of what counts as ‘satisfactory’ in each case; I’ll work to make the desiderata more explicit as time goes on.)
- Oct 12, 2016, 8:25 PM; 10 points) 's comment on Ask MIRI Anything (AMA) by (
- Oct 12, 2016, 5:26 PM; 9 points) 's comment on Ask MIRI Anything (AMA) by (
- Oct 12, 2016, 11:51 PM; 3 points) 's comment on Ask MIRI Anything (AMA) by (
I want to push back a bit against point #1 (“Let’s divide problems into ‘funding constrained’ and ‘talent constrained’.) In my experience recruiting for MIRI, these constraints are tightly intertwined. To hire talent, you need money (and to get money, you often need results, which requires talent).
I think the “are they funding constrained or talent constrained?” model is incorrect, and potentially harmful. In the case of MIRI, imagine we’re trying to hire a world-class researcher for $50k/year, and can’t find one. Are we talent constrained, or funding constrained? (Our actual researcher salaries are higher than this, but they weren’t last year, and they still aren’t anywhere near competitive with industry rates.)
Furthermore, there are all sorts of things I could be doing to loosen the talent bottleneck, but only if I knew the money was going to be there. I could be setting up a researcher stewardship program, having seminars run at Berkeley and Stanford, and hiring dedicated recruiting-focused researchers who know the technical work very well and spend a lot of time practicing getting people excited—but I can only do this if I know we’re going to have the money to sustain that program alongside our core research team, and if I know we’re going to have the money to make hires. If we reliably bring in only enough funding to sustain modest growth, I’m going to have a very hard time breaking the talent constraint.
And that’s ignoring the opportunity costs of being under-funded, which I think are substantial. For example, at MIRI there are numerous additional programs we could be setting up, such as a visiting professor + postdoc program, or a separate team that is dedicated to working closely with all the major industry leaders, or a dedicated team that’s taking a different research approach, or any number of other projects that I’d be able to start if I knew the funding would appear. All those things would lead to new and different job openings, letting us draw from a wider pool of talented people (rather than the hyper-narrow pool we currently draw from), and so this too would loosen the talent constraint—but again, only if the funding was there.
Right now, we have more trouble finding top-notch math talent excited about our approach to technical AI alignment problems than we have raising money, but don’t let this fool you—the talent constraint would be much, much easier to address with more money, and there are many things we aren’t doing (for lack of funding) that I think would be high impact.
- Jan 26, 2019, 8:46 PM; 18 points) 's comment on Simultaneous Shortage and Oversupply by (
All right, I’ll come back for one more question. Thanks, Wei. Tough question. Briefly,
(1) I can’t see that many paths to victory. The only ones I can see go through either (a) aligned de-novo AGI (which needs to be at least powerful enough to safely prevent maligned systems from undergoing intelligence explosions) or (b) very large amounts of global coordination (which would be necessary to either take our time & go cautiously, or to leap all the way to WBE without someone creating a neuromorph first). Both paths look pretty hard to walk, but in short, (a) looks slightly more promising to me. (Though I strongly support any attempts to widen path (b)!)
(2) It seems to me that the default path leads almost entirely to UFAI: insofar as MIRI research makes it easier for others to create UFAI, most of that effect isn’t replacing wins with losses, it’s just making the losses happen sooner. By contrast, this sort of work seems necessary in order to keep path (a) open. I don’t see many other options. (In other words, I think it’s net positive because it creates some wins and moves some losses sooner, and that seems like a fair trade to me.)
To make that a bit more concrete, consider logical uncertainty: if we attain a good formal understanding of logically uncertain reasoning, that’s quite likely to shorten AI timelines. But I think I’d rather have a 10-year time horizon and be dealing with practical systems built upon solid foundations that come from a decade’s worth of formally understanding what good logically uncertain reasoning looks like, rather than a 20-year time horizon where we have to deal with systems built using 19 years of hacks and 1 year of patches bolted on at the end.
(In other words, the possibility of improving AI capabilities is the price you have to pay to keep path (a) open.)
A bunch of other factors also play into my considerations (including a heuristic which says “the best way to figure out which problems are the real problems is to start solving the things that appear to be the problems,” and another heuristic which says “if you see a big fire, try to put it out, and don’t spend too much time worrying about whether putting it out might actually start worse fires elsewhere”, and a bunch of others), but those are the big considerations, I think.
- Jun 12, 2015, 1:57 PM; 3 points) 's comment on I am Nate Soares, AMA! by (
Kinda. The current approach is more like “Pretend you’re trying to solve a much easier version of the problem, e.g. where you have a ton of computing power and you’re trying to maximize diamond instead of hard-to-describe values. What parts of the problem would you still not know how to solve? Try to figure out how to solve those first.”
(1) If we manage to (a) generate a theory of advanced agents under many simplifying assumptions, and then (b) generate a theory of bounded rational agents under far fewer simplifying assumptions, and then (c) figure out how to make highly reliable practical generally intelligent systems, all before anyone else gets remotely close to AGI, then we might consider teching up towards designing AI systems ourselves. I currently find this scenario unlikely.
(2) We’re currently far enough away from knowing what the actual architectures will look like that I don’t think it’s useful to try to build AI components intended for use in an actual AGI at this juncture.
(3) I think that making theorem provers easier to use is an important task and a worthy goal. I’m not optimistic about attempts to merge natural language with Martin-Lof type theory. If you’re interested in improving theorem-proving tools in ways that might make it easier to design safe reflective systems in the future, I’d point you more towards trying to implement (e.g.) Marcello’s Waterfall in a dependently typed language (which may well involve occasionally patching the language, at this stage).
You could call it a kind of moral relativism if you want, though it’s not a term I would use. I tend to disagree with many self-proclaimed moral relativists: for example, I think it’s quite possible for one to be wrong about what they value, and I am not generally willing to concede that Alice thinks murder is OK just because Alice says Alice thinks murder is OK.
Another place I depart from most moral relativists I’ve met is by mixing in a healthy dose of “you don’t get to just make things up.” Analogy: we do get to make up the rules of arithmetic, but once we do, we don’t get to decide whether 7+2=9. This despite the fact that a “7″ is a human concept rather than a physical object (if you grind up the universe and pass it through the finest sieve, you will find no particle of 7). Similarly, if you grind up the universe you’ll find no particle of Justice, and value-laden concepts are human concoctions, but that doesn’t necessarily mean they bend to our will.
My stance can roughly be summarized as “there are facts about what you value, but they aren’t facts about the stars or the void, they’re facts about you.” (The devil’s in the details, of course.)
First, I think that civilization had better be really dang mature before it considers handing over the reins to something like CEV. (Luke has written a bit about civilizational maturity in the past.)
Second, I think that the CEV paper (which is currently 11 years old) is fairly out of date, and I don’t necessarily endorse the particulars of it. I do hope, though, that if humanity (or posthumanity) ever builds a singleton, that they build it with a goal of something like taking into account the extrapolated preferences of all sentients and fulfilling some superposition of those in a non-atrocious way. (I don’t claim to know how to fill in the gaps there.)
(1) I suspect it’s possible to create an artificial system that exhibits what many people would call “intelligent behavior,” and which poses an existential threat, but which is not sentient or conscious. (In the same way that Deep Blue wasn’t sentient: it seems to me like optimization power may well be separable from sentience/consciousness.) That’s no guarantee, of course, and if we do create a sentient artificial mind, then it will have moral weight in its own right, and that will make our job quite a bit more difficult.
(2) The goal is not to build a sentient mind something that wants to destroy humanity but can’t. (That’s both morally reprehensible and doomed to failure! :-p) Rather, the goal is to successfully transmit the complicated values of humanity into a powerful optimizer.
Have you read Bostrom’s The Superintelligent Will? Short version is, it looks possible to build powerful optimizers that pursue goals we might think are valueless (such as an artificial system that, via very clever long-term plans, produces extremely large amounts of diamond, or computes lots and lots of digits of pi). We’d rather not build that sort of system (especially if it’s powerful enough to strip the Earth of resources and turn them into diamonds / computing power): most people would rather build something that shares some of our notion of “value,” such as respect for truth and beauty and wonder and so on.
It looks like this isn’t something you get for free. (In fact, it looks very hard to get: it seems likely that most minds would by default have incentives to manipulate & decieve in order to acquire resources.) We’d rather not build minds that try to turn everything they can into a giant computer for computing digits of pi, so the question is how to design the sort of mind that has things like respect for truth and beauty and wonder?
In hollywood movies, you can just build something that looks cute and fluffy and then it will magically acquire a spark of human-esque curiosity and regard for other sentient life, but in the real world, you’ve got to figure out how to program in those capabilities yourself (or program something that will reliably acquire them), and that’s hard :-)
The most reliable strategy to date is “ask me” :-)
Luke talks about the pros and cons of various terms here. Then, long story short, we asked Stuart Russell for some thoughts and settled on “AI alignment” (his suggestion, IIRC).
Couldn’t it be that the returns on intelligence tend to not be very high for a self-improving agent around the human area?
Seems unlikely to me, given my experience as an agent at roughly the human level of intelligence. If you gave me a human-readable version of my source code, the ability to use money to speed up my cognition, and the ability to spawn many copies of myself (both to parallelize effort and to perform experiments with) then I think I’d be “superintelligent” pretty quickly. (In order for the self-improvement landscape to be shallow around the human level, you’d need systems to be very hardware-limited, and hardware currently doesn’t look like the bottleneck.)
(I’m also not convinced it’s meaningful to talk about “the human level” except in a very broad sense of “having that super powerful domain generality that humans seem to possess”, so I’m fairly uncomfortable with terminology such as “20x the human level.”)
Great question! I suggest checking out either our research guide or our technical agenda. The first is geared towards students who are wondering what to study in order to eventually gain the skills to be an AI alignment researcher, the latter is geared more towards professionals who already have the skills and are wondering what the current open problems are.
In your case, I’d guess maybe (1) get some solid foundations via either set theory or type theory, (2) get solid foundations on AI, perhaps via AI: A Modern Approach, (3) brush up on probability theory, formal logic, and causal graphical models, and then (4) dive into the technical agenda and figure out which open problems pique your interest.
1) The things we have no idea how to do aren’t the implicit assumptions in the technical agenda, they’re the explicit subject headings: decision theory, logical uncertainty, Vingean reflection, corrigibility, etc :-)
We’ve tried to make it very clear in various papers that we’re dealing with very limited toy models that capture only a small part of the problem (see, e.g., basically all of section 6 in the corrigibility paper).
Right now, we basically have a bunch of big gaps in our knowledge, and we’re trying to make mathematical models that capture at least part of the actual problem—simplifying assumptions are the norm, not the exception. All I can easily say that common simplifying assumptions include: you have lots of computing power, there is lots of time between actions, you know the action set, you’re trying to maximize a given utility function, etc. Assumptions tend to be listed in the paper where the model is described.
2) The FLI folks aren’t doing any research; rather, they’re administering a grant program. Most FHI folks are focused more on high-level strategic questions (What might the path to AI look like? What methods might be used to mitigate xrisk? etc.) rather than object-level AI alignment research. And remember that they look at a bunch of other X-risks as well, and that they’re also thinking about policy interventions and so on. Thus, the comparison can’t easily be made. (Eric Drexler’s been doing some thinking about the object-level FAI questions recently, but I’ll let his latest tech report fill you in on the details there. Stuart Armstrong is doing AI alignment work in the same vein as ours. Owain Evans might also be doing object-level AI alignment work, but he’s new there, and I haven’t spoken to him recently enough to know.)
Insofar as FHI folks would say we’re making assumptions, I doubt they’d be pointing to assumptions like “UDT knows the policy set” or “assume we have lots of computing power” (which are obviously simplifying assumptions on toy models), but rather assumptions like “doing research on logical uncertainty now will actually improve our odds of having a working theory of logical uncertainty before it’s needed.”
(3) I think most of the FHI folks & FLI folks would agree that it’s important to have someone hacking away at the technical problems, but just to make the arguments more explicit, I think that there are a number of problems that it’s hard to even see unless you have your “try to solve FAI” goggles on. Consider: people have been working on some of these problems for decades (logical uncertainty) or even centuries (decision theory) without solving the AI-alignment-relevant parts.
We’re still very much trying to work out the initial theory of highly reliable advanced agents. This involves taking various vague philosophical problems (“what even is logical uncertainty?”) and turning them into concrete mathematical models (akin to the concrete model of probability theory attained by Kolmogorov & co).
We’re still in the preformal stage, and if we can get this theory to the formal stage, I expect we may be able to get a lot more eyes on the problem, because the ever-crawling feelers of academia seem to be much better at exploring formalized problems than they are at formalizing preformal problems.
Then of course there’s the heuristic of “it’s fine to shout ‘model uncertainty!’ and hover on the sidelines, but it wasn’t the armchair philosophers who did away with the epicycles, it was Kepler, who was up to his elbows in epicycle data.” One of the big ways that you identify the things that need working on is by trying to solve the problem yourself. By asking how to actually build an aligned superintelligence, MIRI has generated a whole host of open technical problems, and I predict that that host will be a very valuable asset now that more and more people are turning their gaze towards AI alignment.
Than a slow takeoff? Yes :-)
(1) Eventually. Predicting the future is hard. My 90% confidence interval conditioned on no global catastrophes is maybe 5 to 80 years. That is to say, I don’t know.
(2) I fairly strongly expect a fast takeoff. (Interesting aside: I was recently at a dinner full of AI scientists, some of them very skeptical about the whole long-term safety problem, who unanimously professed that they expect a fast takeoff—I’m not sure yet how to square this with the fact that Bostrom’s survey showed fast takeoff was a minority position).
It seems hard (but not impossible) to build something that’s better than humans at designing AI systems & has access to its own software and new hardware, which does not self improve rapidly. Scenarios where this doesn’t occur include (a) scenarios where the top AI systems are strongly hardware limited; (b) scenarios where all operators of all AI systems successfully remove all incentives to self-improve; or (c) the first AI system is strong enough to prevent all intelligence explosions, but is also constructed such that it does not itself self-improve. The first two scenarios seem unlikely from here, the third is more plausible (if the frontrunners explicitly try to achieve it) but still seems like a difficult target to hit.
(3) I think we’re pretty likely to eventually get a singleton: in order to get a multi-polar outcome, you need to have a lot of systems that are roughly at the same level of ability for a long time. That seems difficult but not impossible. (For example, this is much more likely to happen if the early AGI designs are open-sourced and early AGI algorithms are incredibly inefficient such that progress is very slow and all the major players progress in lockstep.)
Remember that history is full of cases where a better way of doing things ends up taking over the world—humans over the other animals, agriculture dominating hunting & gathering, the Brits, industrialization, etc. (Agriculture and arguably industrialization emerged separately in different places, but in both cases the associated memes still conquered the world.) One plausible outcome is that we get a series of almost-singletons that can’t quite wipe out other weaker entities and therefore eventually go into decline (which is also a common pattern throughout history), but I expect superintelligent systems to be much better at “finishing the job” and securing very long-term power than, say, the Romans were. Thus, I expect a singleton outcome in the long run.
The run-up to that may look pretty strange, though.
- Jun 11, 2015, 11:51 PM; 1 point) 's comment on I am Nate Soares, AMA! by (
We don’t have a working definition of “what has intrinsic value.” My basic view on these hairy problems (“but what should I value?”) is that we really don’t want to be coding in the answer by hand. I’m more optimistic about building something that has a few layers of indirection, e.g., something that figures out how to act as intended, rather than trying to transmit your object-level intentions by hand.
In the paper you linked, I think Max is raising about a slightly different issue. He’s talking about what we would call the ontology identification problem. Roughly, imagine building an AI system that you want to produce lots of diamond. Maybe it starts out with an atomic model of the universe, and you (looking at its model) give it a utility function that scores one point per second for every carbon atom covalently bound to four other carbon atoms (and then time-discounts or something). Later, the system develops a nuclear model of the universe. You do want it to somehow deduce that carbon atoms in the old model map onto six-proton atoms in the new model, and maybe query the user about how to value carbon isotopes in its diamond lattice. You don’t want it to conclude that none of these six-proton nuclei pattern-match to “true carbon”, and then turn the universe upside down looking for some hidden cache of “true carbon.”
We have a few different papers that mention this problem, albeit shallowly: Ontological Crises in Artificial Agents’ Value Systems, The Value Learning Problem, Formalizing Two Problems of Realistic World-Models. There’s a lot more work to be done here, and it’s definitely on our radar, though also note that work on this problem is at least a little blocked on attaining a better understanding of how to build multi-level maps of the world.
I mostly agree with Daniel’s paper :-)
Great question! The short version is, writing more & publishing more (and generally engaging with the academic mainstream more) are very high on my priority list.
Mainstream publications have historically been fairly difficult for us, as until last year, AI alignment research was seen as fairly kooky. (We’ve had a number of papers rejected from various journals due to the “weird AI motivation.”) Going forward, it looks like that will be less of an issue.
That said, writing capability is a huge bottleneck right now. Our researchers are currently trying to (a) run workshops, (b) engage with & evaluate promising potential researchers, (c) attend conferences, (d) produce new research, (e) write it up, and (f) get it published. That’s a lot of things for a three-person research team to juggle! Priority number 1 is to grow the research team (because otherwise nothing will ever be unblocked), and we’re aiming to hire a few new researchers before the year is through. After that, increasing our writing output is likely the next highest priority.
Expect our writing output this year to be similar to last year’s (i.e., a small handful of peer reviewed papers and a larger handful of technical reports that might make it onto the arXiv), and then hopefully we’ll have more & higher quality publications starting in 2016 (the publishing pipeline isn’t particularly fast).
I’ll interpret this question as “what are the most plausible ways for you to lose confidence in MIRI’s effectiveness and/or leave MIRI?” Here are a few ways that could happen for me:
I could be convinced that I was wrong about the type and quality of AI alignment research that the external community is able to do. There’s some inferential distance here, so I’m not expecting to explain my model in full, but in brief, I currently expect that there are a few types of important research that academia and industry won’t do by default. If I was convinced that either (a) there are no such gaps or (b) they will be filled by academia and industry as a matter of course, then I would downgrade my assessment of the importance of MIRI accordingly.
I could learn that our research path was doomed, for one reason or another, and simultaneously learn that repurposing our skill/experience/etc. for other purposes was not worth the opportunity cost of all our time and effort.