I am Nate Soares, AMA!
Hello Effective Altruism Forum, I am Nate Soares, and I will be here to answer your questions tomorrow, Thursday the 11th of June, 15:00-18:00 US Pacific time. You can post questions here in the interim.
Last week Monday, I took the reins as executive director of the Machine Intelligence Research Institute. MIRI focuses on studying technical problems of long-term AI safety. I’m happy to chat about what that means, why it’s important, why we think we can make a difference now, what the open technical problems are, how we approach them, and some of my plans for the future.
I’m also happy to answer questions about my personal history and how I got here, or about personal growth and mindhacking (a subject I touch upon frequently in my blog, Minding Our Way), or about whatever else piques your curiosity. This is an AMA, after all!
EDIT (15:00): All right, I’m here. Dang there are a lot of questions! Let’s get this started :-)
EDIT (18:00): Ok, that’s a wrap. Thanks, everyone! Those were great questions.
- [Link] Nate Soares is answering questions about MIRI at the EA Forum by 11 Jun 2015 0:27 UTC; 28 points) (LessWrong;
- Ask MIRI Anything (AMA) by 11 Oct 2016 19:54 UTC; 18 points) (
- MIRI is seeking an Office Manager / Force Multiplier by 5 Jul 2015 19:02 UTC; 8 points) (
- 2015 MIRI Summer Fundraiser: How We Could Scale by 28 Jul 2015 2:43 UTC; 7 points) (
- 11 Jun 2015 23:50 UTC; 7 points) 's comment on [Link] Nate Soares is answering questions about MIRI at the EA Forum by (LessWrong;
What are some of the most neglected sub-tasks of reducing existential risk? That is, what is no one working on which someone really, really should be?
Policy work / international coordination. Figuring out how to build an aligned AI is only part of the problem. You also need to ensure that an aligned AI is built, and that’s a lot harder to do during an international arms race. (A race to the finish would be pretty bad, I think.)
I’d like to see a lot more people figuring out how to ensure global stability & coordination as we enter a time period that may be fairly dangerous.
Nailed it.
(anyone have) any suggestions for how to make progress in this area?
Hi from Melbourne Oz Nate! Law and Ethics research interests me. e.g. AI in drone aircraft & self drive cars has huge potential harm risks. What ‘formal’ MIRI inputs exist to ‘global’ governments exist to ensure relevant expertise sets ‘ethics-risks’ framework. Ryan Calo proposes a Federal Robotics Commission http://www.brookings.edu/research/reports2/2014/09/case-for-federal-robotics-commission Glenn Floyd www.reachers.org
What is the top thing you think you’ll do differently now that you’re Executive Director?
What do you think is the biggest mistake MIRI has made in it’s past? How have you learned from it?
What do you think has been the biggest success MIRI has had? How have you learned from that?
(1) Things Executive!Nate will do differently from Researcher!Nate? Or things Nate!MIRI will do differently from Luke!MIRI? For the former, I’ll be thinking lots more about global coordination & engaging with interested academics etc, and lots less about specific math problems. For the latter, the biggest shift is probably going to be something like “more engagement with the academic mainstream,” although it’s a bit hard to say: Luke probably would have pushed in that direction too, after growing the research team a bit. (I have a lot of opportunities available to me that weren’t available to Luke at this time last year.)
(2) The old SIAI definitely made some obvious mistakes; see e.g. Holden Karnofsky’s 2012 critique. Luke tried to transfer a number of the lessons learned to me, but it remains to be seen whether I actually learned them :-) The concrete list includes things like (a) constantly drive to systematize, automate, and outsource the busywork; (b) always attack the biggest constraint (by contrast, most people seem to have a default mode of “try and do everything that meets a certain importance level”); (c) put less emphasis on explicit models that you’ve built yourself an more emphasis on advice from others who have succeeded in doing something similar to what you’re trying to do.
(3) MIRI played a pretty big role in getting long-term AI alignment issues onto the world stage. There are lots and lots of things I’ve learned from that particular success. Perhaps the biggest is “don’t disregard intellectual capital.”
What metrics does MIRI use to internally measure its own success?
(1) number of FAIs produced ;-)
Other important metrics include:
number of agent foundations forum posts produced
number of papers written
number of papers published in conferences/journals
number of papers published in high-prestige conferences/journals (a fuzzy metric)
number of conferences attended
number of collaborative papers written
number of research associates
number of people who have attended a workshop
number of non-MIRI-employees who have produced a technical result
amount of progress on core technical problems (a very fuzzy metric, which is why it’s important to also track the more concrete numbers above)
size of research team
I also of course keep my eye on “number of dollars available.”
Congrats on the new position!
My question: what advances does MIRI hope to achieve in the next 5 years?
Short version: FAI. (You said “hope”, not “expect” :-p)
Longer version: Hard question, both because (a) I don’t know how you want me to trade off between how nice the advance would be and how likely we are to get it, and (b) my expectations for the next five years are very volatile. In the year since Nick Bostrom released Superintelligence, there has been a huge wave of interest in the future of AI (due in no small part to the efforts of FLI and their wonderful Puerto Rico conference!), and my expectations of where I’ll be in five years range all the way from “well that was a nice fad while it lasted” to “oh wow there are billions of dollars flowing into the field”.
But I’ll do my best to answer. The most obvious schelling point I’d like to hit in 5 years is “fully naturalized AIXI,” that is, a solid theoretical understanding of how we would “brute force” an FAI if we had ungodly amounts of computing power. (AIXI is an equation that Marcus Hutter uses to define an optimal general intelligence under certain simplifying assumptions that don’t hold in the real world: AIXI is sufficiently powerful that you could use it to destroy the world while demonstrating something that would surely look like “intelligence” from the outside, but it’s not yet clear how you could use it to build a generally intelligent system that maximizes something in the world—for example, even if you gave me unlimited computing power, I wouldn’t yet know how to write the program that stably and reliably pursues the goal of turning as much of the universe as possible into into diamond.)
Formalizing “fully naturalized AIXI” would require a better understanding of decision theory (How do we want advanced systems to reason about counterfactuals? Preferences alone are not enough to determine what counts as a “good action,” that notion also depends on how you evaluate the counterfactual consequences of taking various actions, we lack a theory of idealized counterfactual reasoning.), logical uncertainty (What does it even mean for a reasoner to reason reliably about something larger than the reasoner? Solomonoff induction basically works by having the reasoner be just friggin’ bigger than the environment, and I’d be thrilled if we could get a working theoretical model of “good reasoning” in cases where the reasoner is smaller than the environment), and a whole host of other problems (many of them covered in our technical agenda).
5 years is a pretty wildly optimistic timeline for developing fully naturalized AIXI, though, and I’d be thrilled if we could make concrete progress in any one of the topic areas listed in the technical agenda.
What question should we be asking you?
I don’t even know what that word means ;-)
Haha, what useful and interesting question are we missing?
Which uncertainties about the trajectory to AI do you regard as of key strategic importance?
(a) how many major insights remain between us and strong AI? (b) how many of those insights will come from thinking hard, and how many will come from examining the brain? (c) how many more AI winters will there be? (d) how far ahead will the frontrunner be? (e) will there be an arms race?, to name a few.
Working without concrete feedback, how are you planning on increasing the chance that MIRI’s work will be relevant to the AI developers of the future?
That’s a good question: we don’t have a practical AGI to poke at, so why do we expect that we can do work today that’s likely to be relevant many years down the line?
I’ll answer in part with an analogy: Say you went back in time and dropped by to visit Kolmogorov back when he was trying to formalize probability theory, and you asked “working without concrete feedback, how are you planning to increase the chance that your probability theory will be relevant to people trying to reason probabilistically in the future?” It seems like the best response is for him to sort of cock his head and say “well, uh, I’m still trying to formalize what I mean by “chance” and “probability” and so on; once we’ve got those things ironed out, then we can chat.”
Similarly, we’re still trying to formalize the theory of advanced agents: right now, if you handed me unlimited computing power, I wouldn’t know how to program it to reliably and “intelligently” pursue a known goal, even a very simple goal, such as “produce as much diamond as possible.” There are parts of the problem of designing highly reliable advanced agents that we don’t understand even in principle yet. We don’t even know how to brute force the solution yet. We’re still trying to formalize the problems :-)
Also, note that working on theory doesn’t mean you can’t get feedback: we make various mathematical models that attempt to capture part of the problem, we investigate their behavior, we see which parts of the problems they do and don’t capture, and so on. (For example: Stuart Armstrong came up with a formal definition of a utility-indifferent agent; Benja responded by identifying a way Stuart’s agent succumbs to blackmail. I think this counts as pretty concrete feedback: it doesn’t get that much more concrete than “your idea provably doesn’t work”!)
As for relevance, there are definitely paths where this sort of work wouldn’t end up being relevant (jumping straight to whole-brain emulation, jumping straight to nanotech, etc.) but I currently don’t think those scenarios are all that likely. Other cases where it turns out these problems are irrelevant include (a) we needed the theory, but didn’t complete it in time, (b) it turns out you can build a safe AGI even if you don’t understand why it’s working, not even in theory, and (c) someone else got to the theory first. I’m trying to avoid (a), (b) doesn’t seem likely enough to bet the universe on it, and I’d count (c) as a win :-)
Asking on behalf of Daniel Satanove, former intern at MIRI (summer 2014):
What do other people who are concerned with AI safety (e.g., Elon Musk, Bill Gates, Stuart Russell, etc.) think the path to friendly AI is? Are there other people who are working directly on Friendly AI research other than MIRI?
(1) I don’t want to put words in their mouths. I’m guessing that most of us have fairly broad priors over what may happen, though. The future’s hard to predict.
(2) Depends what you mean by “Friendly AI research.” Does AI boxing count? Does improving the transparency of ML algorithms count? Once the FLI grants start going through, there will be lots of people doing long-term AI safety research that may well be useful, so if you count that as FAI research, then the answer is “there will be soon.” But if by “FAI research” you mean “working towards a theoretical understanding of highly reliable advanced agents,” then the answer is “not to my knowledge, no.”
Welcome, everybody!
Nate: on behalf of the EA community, thanks very much for showing up here. I think I speak for a lot of EAs when I say that since MIRI has such ambitious goals, it’s really valuable to keep things grounded with open conversations about why you’re doing what you’re doing, and how it’s turning out. So I think you’ve already won a lot of respect by making yourself available to answer questions here! Rest assured, you’re not expected to answer every single question!
Everyone else: feel free to ask more questions in the next couple of hours, and to comment, and to upvote the questions you find the most interesting. We’re lucky to have Nate around, so enjoy! :)
Wow, cheers for answering over 30 questions here, Nate! What a heroic effort.
Thanks for your questions everybody. That is a LOT of reading to go through for anyone interested in this problem, and plenty of interesting thoughts to be absorbed.
If EAs want to support MIRI financially or with relevant technical skills, it’s good that they now know more about what research they would be helping, and have an idea about the kind of person who will be leading it. Here is a link to how to get involved with MIRI as a researcher or donor: https://intelligence.org/get-involved/
Thanks very much Nate, and on behalf of the EA community, good luck in the new job!
How does MIRI plan to interface with important AI researchers that disagree with key pieces in the argument for safety?
There’s a big spectrum, there. Some people think that no matter what the AI does that’s fine because it’s our progeny (even if it turns as much matter as it can into a giant computer so it can find better YouTube recommendations). Other people think that you can’t actually build a superintelligent paperclip maximizer (because maximizing paperclips would be stupid, and we’re assuming that it’s intelligent). Other people think that yeah, you don’t get good behavior by default, but AI is hundreds and hundreds of years off, so we don’t need to start worrying now. Other people think that AI alignment is a pressing concern now but that improving our theoretical understanding of what we’re trying to do isn’t the missing puzzle piece. I interface with each of these different types of people in very different ways.
To actually answer your question, though, the default interface is “publish papers, attend conferences,” with a healthy dose of “talk to people in person when they’re in town” mixed in :-)
1) I see a trend in the way new EAs concerned about the far future think about where to donate money that seems dangerous, it goes:
I am an EA and care about impactfulness and neglectedness → Existential risk dominates my considerations → AI is the most important risk → Donate to MIRI.
The last step frequently involves very little thought, it borders on a cached thought.
How would you be conceiving of donating your X-risk money at the moment if MIRI did not exist? Which other researchers or organizations should be being scrutinized by donors who are X-risk concerned, and AI persuaded?
1) Huh, that hasn’t been my experience. We have a number of potential donors who ring us up and ask who in AI alignment needs money the most at the moment. (In fact, last year, we directed a number of donors to FHI, who had much more of a funding gap than MIRI did at that time.)
2) If MIRI disappeared and everything else was held constant, then I’d be pretty concerned about the lack of people focused on the object level problems. (All talk more about why I think this is so important in a little bit, I’m pretty sure at least one other person asks that question more directly.) There’d still be a few people working on the object level problems (Stuart Russell, Stuart Armstrong), but I’d want lots more. In fact, that statement is also true in the actual world! We only have three people on the research team right now, remember, with a fourth joining in August.
In other words, if you were to find yourself in a world like this one except without a MIRI, then I would strongly suggest building something like a MIRI :-)
It seems easy to imagine scenarios where MIRI’s work is either irrelevant (e.g., mainstream AI research keeps going in a neuromorphic or heuristic trial-and-error direction and eventually “succeeds” that way) or actively harmful (e.g., publishes ideas that eventually help others to build UFAIs). I don’t know how to tell whether MIRI’s current strategy overall has positive expected impact. What’s your approach to this problem?
All right, I’ll come back for one more question. Thanks, Wei. Tough question. Briefly,
(1) I can’t see that many paths to victory. The only ones I can see go through either (a) aligned de-novo AGI (which needs to be at least powerful enough to safely prevent maligned systems from undergoing intelligence explosions) or (b) very large amounts of global coordination (which would be necessary to either take our time & go cautiously, or to leap all the way to WBE without someone creating a neuromorph first). Both paths look pretty hard to walk, but in short, (a) looks slightly more promising to me. (Though I strongly support any attempts to widen path (b)!)
(2) It seems to me that the default path leads almost entirely to UFAI: insofar as MIRI research makes it easier for others to create UFAI, most of that effect isn’t replacing wins with losses, it’s just making the losses happen sooner. By contrast, this sort of work seems necessary in order to keep path (a) open. I don’t see many other options. (In other words, I think it’s net positive because it creates some wins and moves some losses sooner, and that seems like a fair trade to me.)
To make that a bit more concrete, consider logical uncertainty: if we attain a good formal understanding of logically uncertain reasoning, that’s quite likely to shorten AI timelines. But I think I’d rather have a 10-year time horizon and be dealing with practical systems built upon solid foundations that come from a decade’s worth of formally understanding what good logically uncertain reasoning looks like, rather than a 20-year time horizon where we have to deal with systems built using 19 years of hacks and 1 year of patches bolted on at the end.
(In other words, the possibility of improving AI capabilities is the price you have to pay to keep path (a) open.)
A bunch of other factors also play into my considerations (including a heuristic which says “the best way to figure out which problems are the real problems is to start solving the things that appear to be the problems,” and another heuristic which says “if you see a big fire, try to put it out, and don’t spend too much time worrying about whether putting it out might actually start worse fires elsewhere”, and a bunch of others), but those are the big considerations, I think.
In the past, people like Eliezer Yudkowsky (see 1, 2, 3, 4, 5) have argued that MIRI has a medium probability of success.
What is this probability estimate based on and how is success defined?
(Note that I’ve asked this before, but I’m curious for more perspective.)
To what degree is MIRI now restricted by lack of funding, and is there any amount of funding beyond which you could not make effective use of it?
Among recruiting new talent and having funding for new positions, what is the greatest bottleneck?
Right now we’re talent-constrained, but we’re also fairly well-positioned to solve that problem over the next six months. Jessica Taylor is joining us in august. We have another researcher or two pretty far along in the pipeline, and we’re running four or five more research workshops this summer, and CFAR is running a summer fellows program in July. It’s quite plausible that we’ll hire a handful of new researchers before the end of 2015, in which case our runway would start looking pretty short, and it’s pretty likely that we’ll be funding constrained again by the end of the year.
A modified version of this question: Assuming MIRI’s goal is saving the world (and not MIRI), at what funding level would MIRI recommend giving elsewhere, and where would it recommend giving?
I’m not sure how to interpret this question: are you asking how much money I’d like to see dumped on other people? I’d like to see lots of money dumped on lots of other people, and for now I’m going to delegate to the GiveWell, Open Philanthropy Project, and GoodVentures folks to figure out who and how much :-)
I think they mean “what is the quantity of funding at MIRI which would cause a shift in the best marginal use of money, and what organization would it switch to.”
mhpage, if this is not what you mean, let me know.
I’m not sure what the answer to this is going forward, but relevant things Nate said in response to other questions on this page:
+
Indeed, that is what I meant.
I was assuming that MIRI’s position is that it presently is the most-effective recipient of funds, but that assumption might not be correct (which would itself be quite interesting).
What are your plans for taking MIRI to the next level? What is the next level?
Now that MIRI is focused on math research (a good move) and not on outreach, there is less of a role for volunteers and supporters. With the donation from Elon Musk, some of which will presumably get to MIRI, the marginal value of small donations has gone down. How do you plan to keep your supporters engaged and donating? (The alternative, which is perhaps feasible, could be for MIRI to be an independent research institution, without a lot of public engagement, funded by a few big donors.)
(a) grow the research team, (b) engage more with mainstream academia. I’d also like to spend some time experimenting to figure out how to structure the research team so as to make it more effective (we have a lot of flexibility here that mainstream academic institutes don’t have). Once we have the first team growing steadily and running smoothly, it’s not entirely clear whether the next step will be (c.1) grow it faster or (c.2) spin up a second team inside MIRI taking a different approach to AI alignment. I’ll punt that question to future-Nate.
So first of all, I’m not convinced that there’s less of a role for supporters. If we had just ten people earning-to-give at the (amazing!) level of Ethan Dickinson, Jesse Liptrap, Mike Blume, or Alexei Andreev (note: Alexei recently stopped earning-to-give in order to found a startup), that would bring in as much money per year as the Thiel Foundation. (I think people often vastly overestimate how many people are earning-to-give to MIRI, and underestimate how useful it is: the small donors taken together make a pretty big difference!)
Furthermore, if we successfully execute on (a) above, then we’re going to be burning through money quite a bit faster than before. An FLI grant (if we get one) will certainly help, but I expect it’s going to be a little while before MIRI can support itself on large donations & grants alone.
As for how I plan to keep supporters engaged & donating, I don’t expect it will be that much of a problem: I think that many of our donors are excited to see us publish peer-reviewed papers, attend conferences, and engage in the ongoing global conversation. It’s hard for me to say for sure, but it seems quite likely that the last year has been much more exciting for MIRI donors than the previous few years, even though there was no Singularity Summit and most of our output was math.
Any links on this?
https://intelligence.org/2014/06/11/mid-2014-strategic-plan/
What’s your response to Peter Hurford’s arguments in his article Why I’m Skeptical Of Unproven Causes...?
That post mixes a bunch of different assertions together, let me try to distill a few of them out and answer them in turn:
One of Peter’s first (implicit) points is that AI alignment is a speculative cause. I tend to disagree.
Imagine it’s 1942. The Manhattan project is well under way, Leo Szilard has shown that it’s possible to get a neutron chain reaction, and physicists are hard at work figuring out how to make an atom bomb. You suggest that this might be a fine time to start working on nuclear containment, so that, once humans are done bombing the everloving breath out of each other, they can harness nuclear energy for fun and profit. In this scenario, would nuclear containment be a “speculative cause”?
There are currently thousands of person-hours and billions of dollars going towards increasing AI capabilities every year. To call AI alignment a “speculative cause” in an environment such as this one seems fairly silly to me. In what sense is it speculative to work on improving the safety of the tools that other people are currently building as fast as they can? Now, I suppose you could argue that either (a) AI will never work or (b) it will be safe by default, but both those arguments seem pretty flimsy to me.
You might argue that it’s a bit weird for people to claim that the most effective place to put charitable dollars is towards some field of scientific study. Aren’t charitable dollars supposed to go to starving children? Isn’t the NSF supposed to handle scientific funding? And I’d like to agree, but society has kinda been dropping the ball on this one.
If we had strong reason to believe that humans could build strangelets, and society were pouring billions of dollars and thousands of human-years into making strangelets, and almost no money or effort was going towards strangelet containment, and it looked like humanity was likely to create a strangelet sometime in the next hundred years, then yeah, I’d say that “strangelet safety” would be an extremely worthy cause.
How worthy? Hard to say. I agree with Peter that it’s hard to figure out how to trade off “safety of potentially-very-highly-impactful technology that is currently under furious development” against “children are dying of malaria”, but the only way I know how to trade those things off is to do my best to run the numbers, and my back-of-the-envelope calculations currently say that AI alignment is further behind than the globe is poor.
Now that the EA movement is starting to look more seriously into high-impact interventions on the frontiers of science & mathematics, we’re going to need to come up with more sophisticated ways to assess the impacts and tradeoffs. I agree it’s hard, but I don’t think throwing out everything that doesn’t visibly pay off in the extremely short term is the answer.
Alternatively, you could argue that MIRI’s approach is unlikely to work. That’s one of Peter’s explicit arguments: it’s very hard to find interventions that reliably affect the future far in advance, especially when there aren’t hard objective metrics. I have three disagreements with Peter on this point.
First, I think he picks the wrong reference class: yes, humans have a really hard time generating big social shifts on purpose. But that doesn’t necessarily mean humans have a really hard time generating math—in fact, humans have a surprisingly good track record when it comes to generating math!
Humans actually seem to be pretty good at putting theoretical foundations underneath various fields when they try, and various people have demonstrably succeeded at this task (Church & Turing did this for computing, Shannon did this for information theory, Kolmogorov did a fair bit of this for probability theory, etc.). This suggests to me that humans are much better at producing technical progress in an unexplored field than they are at generating social outcomes in a complex economic environment. (I’d be interested in any attempt to quantitatively evaluate this claim.)
Second, I agree in general that any one individual team isn’t all that likely to solve the AI alignment problem on their own. But the correct response to that isn’t “stop funding AI alignment teams”—it’s “fund more AI alignment teams”! If you’re trying to ensure that nuclear power can be harnessed for the betterment of humankind, and you assign low odds to any particular research group solving the containment problem, then the answer isn’t “don’t fund any containment groups at all,” the answer is “you’d better fund a few different containment groups, then!”
Third, I object to the whole “there’s no feedback” claim. Did Kolmogorov have tight feedback when he was developing an early formalization of probability theory? It seems to me like the answer is “yes”—figuring out what was & wasn’t a mathematical model of the properties he was trying to capture served as a very tight feedback loop (mathematical theorems tend to be unambiguous), and indeed, it was sufficiently good feedback that Kolmogorov was successful in putting formal foundations underneath probability theory. We’re trying to do something similar with various other confusing aspects of good reasoning (such as logical uncertainty), and you’re welcome to raise concerns about whether we need to understand good reasoning under logical uncertainty in order to build an aligned AI, but saying that there’s “no feedback loop” seems to just misunderstand the approach.
Great article. My thoughts:
The smallpox vaccine was the first ever vaccine… a highly unproven cause. This site says it saved over half a billion lives. If there was an EA movement when Edward Jenner was alive hundreds of years ago, would it have sensibly advised Jenner to work on a different project because the idea of vaccines was an unproven one?
Note that most of the top lifesavers on ScienceHeros.com did research work, which is an inherently unprovable cause, but managed to save many more lives than a person donating to Givewell’s top charities can expect to save. Of course, scientific research can also backfire and cost lives. So one response to this might be to say: “scientific research is an unproven cause that’s hard to know the sign of, so we should ignore scientific research in favor of proven causes”. But to me this sounds like a head-in-the-sand approach. Scientific research is going to be by far the most significant bit affecting the future of life on Earth. I would rather see the EA movement try to develop tools to get better at predicting science impacts, or at least save money to nudge science when it’s more clear what impacts it might have.
I regret talking mainly about what is “unproven” when I really meant to talk about what (a) has tight feedback loops and (b) is approached experimentally. See the clarification in http://lesswrong.com/lw/ic0/where_ive_changed_my_mind_on_my_approach_to/
I think MIRI can fit this description in some ways (I’m particularly excited about the AI Impacts blog), but it doesn’t in other ways.
What do you think of the stability under self-modification example in this essay?
I haven’t taken the time to fully understand MIRI’s work. But my reading is that MIRI’s work is incremental without being empirical—like most people working in math & theoretical computer science, they are using proofs to advance their knowledge rather than randomized controlled trials. So this might meet the “tight feedback loops” criterion without meeting the “approached experimentally” criterion.
BTW, you might be interested in this comment of mine about important questions for which it’s hard to gather relevant experimental data.
Here are some related guesses of mine if anyone is interested:
The importance of the far future is so high that there’s nothing to do but bite the bullet and do the best we can to improve it.
MIRI represents a promising approach to improving the far future, but it shouldn’t be the only approach we investigate. For example, I would like to see an organization that attempted to forecast a broad variety of societal and technological trends, predict how they’ll interact, and try to identify the best spots to apply leverage.
The first thing to do is to improve our competency at predicting the future in general. The organization I describe could evolve out of a hedge fund that learned to generate superior returns through long-term trading, for instance. The approach to picking stocks that Charlie Munger, Warren Buffet’s partner, describes in Poor Charlie’s Almanack sounds like the sort of thing that might work for predicting other aspects of how the future will unfold. Munger reads a ton of books and uses a broad variety of mental frameworks to try to understand the assets he evaluates (more of a fox than a hedgehog).
(Interesting to note that the Givewell founders are ex-employees of Bridgewater, one of the world’s top hedge funds.)
A meta-level approach to predictions: push for the legalization of prediction markets that would let us aggregate the views of many people and financially incentivize them to forecast accurately. Although there are likely problems with this approach, e.g. markets for unwanted events creating financial incentives for speculators to cause those unwanted events.
When thinking about the far future, the best we may be able to do is identify specific key parameters that we think will have a positive impact on the future and then use experimental approaches with tight feedback loops to measure whether we are nudging those parameters in the right direction. For example, maybe we think a world with fewer belligerent people is one that’s more likely to survive existential threats. We write a bot that uses sentiment analysis to measure the level of belligerence in online discussion. We observe that the legalization of marijuana in a particular US state causes a noticeable drop in the level of belligerence of people talking online. We sponsor campaigns to legalize marijuana in other states and notice more drops. Etc. This isn’t a serious suggestion; legalizing marijuana in the US makes other countries like Iran and Russia even more belligerent by comparison; it’s just an illustration.
(Cast in these terms, MIRI is valuable if “a taller stack of quality AI safety papers leads to a world that’s more likely to survive AGI threats”.)
But maybe we think that even the truth value of a statement like “a world with fewer belligerent people is more likely to survive existential threats” is essentially a coin toss once you look far enough out. In that case, the best we can do might be to try to attain wealth and positions of power as a movement, while improving our prediction capabilities so they are at least a little better than everyone else’s. Maybe we’d be able to see Bad Stuff on the horizon before others were paying much attention and direct resources to avert it.
It might also be wise to develop a core competency in “averting disasters on the horizon”, whatever that might look like… e.g. practice actually nudging society to see which strategies work effectively. The broad ability to nudge society is one that can be developed through experiment and tight feedback loops, and could be effective for lots of different things.
Related: Robin Hanson and Charles Twardy AMA on the SciCast tech forecasting project. Some correlates of forecasting success.
I’ve never understood this argument. There has always been a latent incentive to off CEOs or destroy infrastructure and trade on the resulting stock price swings. In practice this is very difficult to pull off. Prediction markets would be under more scrutiny and thus harder to game in this manner.
To take a step back, this objection is yet another example of one that gets trotted out against prediction markets all the time but which has been addressed in the white papers on the topic.
1) Your current technical agenda involves creating a math of logical uncertainty and forming world-models out of this. When (if possible) do you predict that such a math will be worked out, and will MIRI’s focus move to the value learning problem then?
2) How long do you estimate that formal logic will be the arena in which MIRI’s technical work takes place—that is, how long will knowing formal logic be of use to a potential researcher before the research moves to new places?
(1) That’s not quite how I’d characterize the current technical agenda. Rather, I’d say that in order to build an AI aligned with human interests, you need to do three things: (a) understand how to build an AI that’s aligned with anything (could you build an AI that reliably builds as much diamond as possible?), (b) understand how to build an AI that assists you in correcting things-you-perceive-as-flaws (this doesn’t come for free, but it’s pretty important, because humans are bad at getting software right on the first try), and (c) figure out how to build a machine that can safely learn human values & intentions from training data.
We’re currently splitting our time between all these problems. It’s not that we haven’t focused on the value learning problem yet, rather, it’s that the value learning problem is only a fraction of the whole problem. We’ll keep working on all the parts, and I’m not sure which parts will yield first. I can’t give you a timeline on how long various parts will take; scientific progress is very hard to predict.
(2) I wouldn’t currently say that “formal logic is the arena in which MIRI’s technical work takes place”—if anything, “math in general” is the arena, and that will probably remain the case until we have a much better understanding of the problems we’re trying to solve (and how to solve simplified versions of them), at which point computer programming will become much more essential. Again, it’s hard to say how long it will take to get there, because scientific progress is hard to predict.
Formal logic is one of many tools useful in mathematics (alongside probability theory, statistics, linear algebra, etc.) that shows up fairly frequently in our work, but I don’t think of our work as “focused on formal logic.” I don’t think we’ll “move away from formal logic” at a particular time; rather, we’ll just use whichever mathematical tools look useful for the problems at hand. That will change as the problems change :-)
Thank you for the response; it was helpful :^)
It seems a bit like the question behind the question might be “I’d like to help, but I don’t know formal logic, when will that stop being a barrier”. In which case it’s worth saying that I’m attending a MIRI decision theory workshop at the moment, and I don’t really know formal logic, but it isn’t proving too much of a barrier; I can think about the assertion “Suppose PA proves that A implies B” without really understanding exactly what PA is.
Hi Nate,
Thanks for the AMA. I’m most curious as to what MIRI’s working definition is for what has intrinsic value. The core worry of MIRI has been that it’s easy to get the AI value problem wrong, to build AIs that don’t value the correct thing. But how do we humans get the value problem right? What should we value?
Max Tegmark alludes to this in Friendly Artificial Intelligence: the Physics Challenge:
So I have two questions: (1) Do you see this (e.g., what Tegmark is speaking about above) as part of MIRI’s bailiwick? (2) If so, do you have any thoughts or research directions you can share publicly?
We don’t have a working definition of “what has intrinsic value.” My basic view on these hairy problems (“but what should I value?”) is that we really don’t want to be coding in the answer by hand. I’m more optimistic about building something that has a few layers of indirection, e.g., something that figures out how to act as intended, rather than trying to transmit your object-level intentions by hand.
In the paper you linked, I think Max is raising about a slightly different issue. He’s talking about what we would call the ontology identification problem. Roughly, imagine building an AI system that you want to produce lots of diamond. Maybe it starts out with an atomic model of the universe, and you (looking at its model) give it a utility function that scores one point per second for every carbon atom covalently bound to four other carbon atoms (and then time-discounts or something). Later, the system develops a nuclear model of the universe. You do want it to somehow deduce that carbon atoms in the old model map onto six-proton atoms in the new model, and maybe query the user about how to value carbon isotopes in its diamond lattice. You don’t want it to conclude that none of these six-proton nuclei pattern-match to “true carbon”, and then turn the universe upside down looking for some hidden cache of “true carbon.”
We have a few different papers that mention this problem, albeit shallowly: Ontological Crises in Artificial Agents’ Value Systems, The Value Learning Problem, Formalizing Two Problems of Realistic World-Models. There’s a lot more work to be done here, and it’s definitely on our radar, though also note that work on this problem is at least a little blocked on attaining a better understanding of how to build multi-level maps of the world.
That diamond/carbon scenario is an excellent concrete example of the ontology problem.
What is your AI arrival timeline? Once we get AI, how quickly do you think it will self-improve? How likely do you think it is that there will be a singleton vs. many competing AIs?
(1) Eventually. Predicting the future is hard. My 90% confidence interval conditioned on no global catastrophes is maybe 5 to 80 years. That is to say, I don’t know.
(2) I fairly strongly expect a fast takeoff. (Interesting aside: I was recently at a dinner full of AI scientists, some of them very skeptical about the whole long-term safety problem, who unanimously professed that they expect a fast takeoff—I’m not sure yet how to square this with the fact that Bostrom’s survey showed fast takeoff was a minority position).
It seems hard (but not impossible) to build something that’s better than humans at designing AI systems & has access to its own software and new hardware, which does not self improve rapidly. Scenarios where this doesn’t occur include (a) scenarios where the top AI systems are strongly hardware limited; (b) scenarios where all operators of all AI systems successfully remove all incentives to self-improve; or (c) the first AI system is strong enough to prevent all intelligence explosions, but is also constructed such that it does not itself self-improve. The first two scenarios seem unlikely from here, the third is more plausible (if the frontrunners explicitly try to achieve it) but still seems like a difficult target to hit.
(3) I think we’re pretty likely to eventually get a singleton: in order to get a multi-polar outcome, you need to have a lot of systems that are roughly at the same level of ability for a long time. That seems difficult but not impossible. (For example, this is much more likely to happen if the early AGI designs are open-sourced and early AGI algorithms are incredibly inefficient such that progress is very slow and all the major players progress in lockstep.)
Remember that history is full of cases where a better way of doing things ends up taking over the world—humans over the other animals, agriculture dominating hunting & gathering, the Brits, industrialization, etc. (Agriculture and arguably industrialization emerged separately in different places, but in both cases the associated memes still conquered the world.) One plausible outcome is that we get a series of almost-singletons that can’t quite wipe out other weaker entities and therefore eventually go into decline (which is also a common pattern throughout history), but I expect superintelligent systems to be much better at “finishing the job” and securing very long-term power than, say, the Romans were. Thus, I expect a singleton outcome in the long run.
The run-up to that may look pretty strange, though.
Perhaps the first of them to voice a position on the matter expected a fast takeoff and was held in high regard by the others, so they followed along, having not previously thought about it?
Couldn’t it be that the returns on intelligence tend to not be very high for a self-improving agent around the human area? Like, it could be that modifying yourself when you’re human-level intelligent isn’t very useful, but that things really take off at 20x the human level. That would seem to suggest a possible d) the first superhuman AI system is self-improves for some time and then peters out. More broadly, the suggestion is that since the machine is presumably not yet superintelligent, there might be relevant constraints other than incentives and hardware. Plausible or not?
Seems unlikely to me, given my experience as an agent at roughly the human level of intelligence. If you gave me a human-readable version of my source code, the ability to use money to speed up my cognition, and the ability to spawn many copies of myself (both to parallelize effort and to perform experiments with) then I think I’d be “superintelligent” pretty quickly. (In order for the self-improvement landscape to be shallow around the human level, you’d need systems to be very hardware-limited, and hardware currently doesn’t look like the bottleneck.)
(I’m also not convinced it’s meaningful to talk about “the human level” except in a very broad sense of “having that super powerful domain generality that humans seem to possess”, so I’m fairly uncomfortable with terminology such as “20x the human level.”)
(1) What is the probability of mankind, or a “good” successor species we turn into, surviving for the next 1000 years? (2) What is the probability of MIRI being the first organization to create an AGI smart enough to, say, be better at computer programming than any human?
(1) Not great. (2) Not great.
(To be clear, right now, MIRI is not attempting to build an AGI. Rather, we’re working towards a better theoretical understanding of the problem.)
1)Which are the implicit assumptions, within MIRI’s research agenda, of things that “currently we have absolutely no idea of how to do that, but we are taking this assumption for the time being, and hoping that in the future either a more practical version of this idea will be feasible, or that this version will be a guiding star for practical implementations”?
I mean things like
UDT assumes it’s ok for an agent to have a policy ranging over all possible environments and environment histories
The notion of agent used by MIRI assumes to some extent that agents are functions, and that if you want to draw a line around the reference class of an agent, you draw it around all other entities executing that function.
The list of problems in which the MIRI papers need infinite computability is: X, Y, Z etc…
(something else)
And so on
2) How do these assumptions diverge from how FLI, FHI, or non-MIRI people publishing on the AGI 2014 book conceive of AGI research?
3) Optional: Justify the differences in 2 and why MIRI is taking the path it is taking.
1) The things we have no idea how to do aren’t the implicit assumptions in the technical agenda, they’re the explicit subject headings: decision theory, logical uncertainty, Vingean reflection, corrigibility, etc :-)
We’ve tried to make it very clear in various papers that we’re dealing with very limited toy models that capture only a small part of the problem (see, e.g., basically all of section 6 in the corrigibility paper).
Right now, we basically have a bunch of big gaps in our knowledge, and we’re trying to make mathematical models that capture at least part of the actual problem—simplifying assumptions are the norm, not the exception. All I can easily say that common simplifying assumptions include: you have lots of computing power, there is lots of time between actions, you know the action set, you’re trying to maximize a given utility function, etc. Assumptions tend to be listed in the paper where the model is described.
2) The FLI folks aren’t doing any research; rather, they’re administering a grant program. Most FHI folks are focused more on high-level strategic questions (What might the path to AI look like? What methods might be used to mitigate xrisk? etc.) rather than object-level AI alignment research. And remember that they look at a bunch of other X-risks as well, and that they’re also thinking about policy interventions and so on. Thus, the comparison can’t easily be made. (Eric Drexler’s been doing some thinking about the object-level FAI questions recently, but I’ll let his latest tech report fill you in on the details there. Stuart Armstrong is doing AI alignment work in the same vein as ours. Owain Evans might also be doing object-level AI alignment work, but he’s new there, and I haven’t spoken to him recently enough to know.)
Insofar as FHI folks would say we’re making assumptions, I doubt they’d be pointing to assumptions like “UDT knows the policy set” or “assume we have lots of computing power” (which are obviously simplifying assumptions on toy models), but rather assumptions like “doing research on logical uncertainty now will actually improve our odds of having a working theory of logical uncertainty before it’s needed.”
(3) I think most of the FHI folks & FLI folks would agree that it’s important to have someone hacking away at the technical problems, but just to make the arguments more explicit, I think that there are a number of problems that it’s hard to even see unless you have your “try to solve FAI” goggles on. Consider: people have been working on some of these problems for decades (logical uncertainty) or even centuries (decision theory) without solving the AI-alignment-relevant parts.
We’re still very much trying to work out the initial theory of highly reliable advanced agents. This involves taking various vague philosophical problems (“what even is logical uncertainty?”) and turning them into concrete mathematical models (akin to the concrete model of probability theory attained by Kolmogorov & co).
We’re still in the preformal stage, and if we can get this theory to the formal stage, I expect we may be able to get a lot more eyes on the problem, because the ever-crawling feelers of academia seem to be much better at exploring formalized problems than they are at formalizing preformal problems.
Then of course there’s the heuristic of “it’s fine to shout ‘model uncertainty!’ and hover on the sidelines, but it wasn’t the armchair philosophers who did away with the epicycles, it was Kepler, who was up to his elbows in epicycle data.” One of the big ways that you identify the things that need working on is by trying to solve the problem yourself. By asking how to actually build an aligned superintelligence, MIRI has generated a whole host of open technical problems, and I predict that that host will be a very valuable asset now that more and more people are turning their gaze towards AI alignment.
Asking for a friend:
“What would it take to get hired by MIRI, if not in a capacity as a researcher? What others ways can I volunteer to help MIRI, operationally or otherwise?”
We’re actually going to be hiring a full-time office manager soon: someone who can just Make Stuff Happen and free up a lot of our day-to-day workload. Keep your eyes peeled, we’ll be advertising the opening soon.
Additionally, we’re hurting for researchers who can write fast & well, and before too long we’ll be looking for a person who can stay up to speed on the technical research but spend most of their time doing outreach and stewarding other researchers who are interested in doing AI alignment research. Both of these jobs would require a bit less technical ability than is required to make new breakthroughs in the field.
Many years ago, SIAI’s outlook seemed to be one of desperation—the world was mad, and probably doomed. Only nine coders, locked in a basement, could save it. Now things seem much more optimistic, the Powers That Be are receptive to AGI risk, and MIRI’s job is to help understand the issues. Is this a correct impression? If so, what caused the change?
It appears that the phrase “Friendly AI research” has been replaced by “AI alignment research”. Why was that term picked?
Luke talks about the pros and cons of various terms here. Then, long story short, we asked Stuart Russell for some thoughts and settled on “AI alignment” (his suggestion, IIRC).
How does existential risk affect you emotionally? If negatively, how do you cope?
What do you think of popular portrayals of AI-risk in general? Do you think there’s much of a point either in trying to spread broad awareness of the issue? Do you think that any such efforts ultimately do more harm than good, and that we should try to keep AI-risk more secretive?
For example, are things like like Ex Machina, which doesn’t really present the full AI arguement, but does make it obvious that AI is a risk, or Wait But Why’s AI posts good?
Thanks!
What is your stance on whole brain emulation as a path to a positive singularity?
Hard to get there. Highly likely that we get to neuromorphic AI along the way. (Low-fidelity images or low-speed partial simulations are likely very useful for learning more about intelligence, and I currently expect that the caches of knowledge unlocked on the way to WBE probably get you to AI before the imaging/hardware supports WBE.)
What are your biggest flaws, skill gaps, areas to grow?
Hi, I’m a software developer with good knowledge of basic algorithms and machine learning techniques. What mathematics and computer science fields should I learn to be able to make a significant impact in solving AGI problem?
Great question! I suggest checking out either our research guide or our technical agenda. The first is geared towards students who are wondering what to study in order to eventually gain the skills to be an AI alignment researcher, the latter is geared more towards professionals who already have the skills and are wondering what the current open problems are.
In your case, I’d guess maybe (1) get some solid foundations via either set theory or type theory, (2) get solid foundations on AI, perhaps via AI: A Modern Approach, (3) brush up on probability theory, formal logic, and causal graphical models, and then (4) dive into the technical agenda and figure out which open problems pique your interest.
Let’s assume that an AGI is, indeed, created sometime in the future. Let us also assume that MIRI achieves its goal of essentialy protecting us from the existential dangers that stem from it. My question may well be quite naive, but how likely is it for a totalitarian “New World Order” to seize control of said AGI and use it for their own purposes, deciding who gets to benefit from it and to what degree?
This is something I, myself, get asked a lot and while it takes into account the current state of society which look nothing like the next ones probably will, I can’t seem to properly reject as a possibilty.
I wouldn’t reject it as a possibility. MIRI wants AGI to have good consequences for human freedom, happiness, etc., but any big increase in power raises the risk that the power will be abused. Ideally we’d want the AI to resist being misused, but there’s a tradeoff between ‘making the AI more resistant to misuse by its users (when the AI is right and the user is wrong)’ and ‘making the AI more amenable to correction by its users (when the AI is wrong and the user is right).’
I wouldn’t say it’s inevitable either, though. It doesn’t appear to me that past technological growth has tended to increase how totalitarian the average state is.
Do you think a fast takeoff is more likely?
Than a slow takeoff? Yes :-)
What are MIRI’s plans for publication over the next few years, whether peer-reviewed or arxiv-style publications?
More specifically, what are the a) long-term intentions and b) short-term actual plans for the publication of workshop results, and what kind of priority does that have?
Great question! The short version is, writing more & publishing more (and generally engaging with the academic mainstream more) are very high on my priority list.
Mainstream publications have historically been fairly difficult for us, as until last year, AI alignment research was seen as fairly kooky. (We’ve had a number of papers rejected from various journals due to the “weird AI motivation.”) Going forward, it looks like that will be less of an issue.
That said, writing capability is a huge bottleneck right now. Our researchers are currently trying to (a) run workshops, (b) engage with & evaluate promising potential researchers, (c) attend conferences, (d) produce new research, (e) write it up, and (f) get it published. That’s a lot of things for a three-person research team to juggle! Priority number 1 is to grow the research team (because otherwise nothing will ever be unblocked), and we’re aiming to hire a few new researchers before the year is through. After that, increasing our writing output is likely the next highest priority.
Expect our writing output this year to be similar to last year’s (i.e., a small handful of peer reviewed papers and a larger handful of technical reports that might make it onto the arXiv), and then hopefully we’ll have more & higher quality publications starting in 2016 (the publishing pipeline isn’t particularly fast).
Hi Nate!
Daniel Dewey at FHI outlined some strategies to mitigate existential risk from a fast take-off scenario here: http://www.danieldewey.net/fast-takeoff-strategies.pdf
I expect you to agree with the exponential decay model, if not – why?
I would also like your opinion on his four strategic categories, namely:
International coordination
Sovereign AI
AI-empowered project
Other decisive technological advantage
Thanks for your attention!
I mostly agree with Daniel’s paper :-)
That was my guess :) To be more specific: do you (or does MIRI) have any preferences for which strategy to pursue, or is it too early to say? I get the sense from MIRI and FHI that aligned sovereign AI is the end goal. Thanks again for doing the AMA!
I am not Nate, but my view (and my interpretation of some median FHI view) is that we should keep options open about those strategies and as-yet unknown other strategies instead of fixating on one at the moment. There’s a lot of uncertainty, and all of the strategies look really hard to achieve. In short, no strongly favored strategy.
FWIW, I also think that most current work in this area, including MIRI’s, promotes the first three of those goals pretty well.
Follow-up: this comment suggests that Nate weakly favors strategies 2 and/or 3 over 1.
Are you single? What are some strategic methods that would make one successful at seducing you? (I’m giving a very liberal interpretation to “I’m also happy to answer questions about (...) whatever else piques your curiosity” :P)
The most reliable strategy to date is “ask me” :-)
1) What was the length of time between you reading the sequences and doing research on the value alignment problem?
2) What portion of your time will now be spent on technical research? Also, what is Eliezer Yudkowsky spending most of his work-time on? Is he still writing up introductory stuff like he said in the HPMOR author notes?
3) What are any unstated pre-requisites for researching the value-alignment problem that aren’t in MIRI’s research guide? e.g. could include Real Analysis or particular types of programming ability
What is your best characterisation of Robin Hanson’s arguments against FOOM, and what is your analysis of the strengths and weaknesses of his argument?
I remember reading that you had plans to change the world via economic/political influence, and then you realized that existential risk was more important. The same thing happened to me.
What was that experience like for you? How long did it take you to change your mind? Other thoughts?
What are some ways in which you’ve changed your mind? Recently, important things, things that come to mind, whatever you want.
What path did MIRI’s staff take there? How many came from other charities?
Three questions:
1: As a past MIRI researcher, which one of the technical problems in the technical research agenda currently looks like the biggest pain in the ass/the one requiring the most lead time to solve?
2: When you become executive director, will that displace all of your research work, or will you still have a few thought cycles left over to contribute mathematically to workshops/do some part-time research?
3: My current life plan is “speedrun college in 3 years (mostly done), speedrun employment by living in a van and spending under 14k/year so I can build up enough money to live off the interest for the rest of my life in 7 years or so, once financially independent devote life to solving biggest problem of the world in 7 years”.
Would you advise the path of “direct money towards financial independence, devote life to solving problem once financially independent”, or the path of “direct money towards X risk donations, slot direct work in as a hobby during the extended employment period” for a high percentile engineering student who can self-teach but is unsure of whether they are mathematically capable enough to meaningfully contribute? (Or some combination of the two)
Kieran Allen asks:
I’ll take a stab at this question too.
There are two different schools of thought about what the goal of AI as a field is. One is that the goal is to build a machine that can do everything humans can—possibly including experiencing emotions and other conscious states. On this view, a “full AI” would plausibly be a person, deserving of moral rights like any other.
The more common view within contemporary AI is that the goal of AI is to build machines that can effectively achieve a variety of practical goals in a variety of environments. Think Nate’s Deep Blue example, but generalized: instead of steering arrangements of chess pieces on a board toward some goal state, a “full” AI steers arbitrary arrangements of objects in space toward some goal state. Such an AI might not be conscious or have real preferences; it might have “goals” only in the limited sense that Deep Blue has “goals.” This is the kind of AI MIRI has in mind, and the kind we’re trying to plan for: a system that can draw inferences from sensor inputs and execute effective plans, but not necessarily one that has more moral weight than Google’s search engine algorithms do.
If it turns out that you do need to make AI algorithms conscious in order to make them effective at scientific and engineering tasks, that does make our task a lot harder, because, yes, we’ll have to take into account the AI’s moral status when we’re designing it, and not just the impact its actions have on other beings. For now, though, consciousness and intelligent behavior look like different targets, and there are obvious economic reasons why mainstream AI is likely to prioritize “high-quality decision making” over “emulating human consciousness.”
A better analogy to MIRI’s goal than “we build Hitler and then put him in chains” is “we build a reasonably well-behaved child and teach the child non-Hitler-ish values.” But both of those ways of thinking are still excessively anthropomorphized. A real-world AI, of the “high-quality decision-making” sort, may not resemble a human child any more closely than the earliest airplanes resembled a baby bird.
For more information about this, I can recommend Stuart Russell’s talk on the future of AI: https://www.youtube.com/watch?v=GYQrNfSmQ0M
(1) I suspect it’s possible to create an artificial system that exhibits what many people would call “intelligent behavior,” and which poses an existential threat, but which is not sentient or conscious. (In the same way that Deep Blue wasn’t sentient: it seems to me like optimization power may well be separable from sentience/consciousness.) That’s no guarantee, of course, and if we do create a sentient artificial mind, then it will have moral weight in its own right, and that will make our job quite a bit more difficult.
(2) The goal is not to build a sentient mind something that wants to destroy humanity but can’t. (That’s both morally reprehensible and doomed to failure! :-p) Rather, the goal is to successfully transmit the complicated values of humanity into a powerful optimizer.
Have you read Bostrom’s The Superintelligent Will? Short version is, it looks possible to build powerful optimizers that pursue goals we might think are valueless (such as an artificial system that, via very clever long-term plans, produces extremely large amounts of diamond, or computes lots and lots of digits of pi). We’d rather not build that sort of system (especially if it’s powerful enough to strip the Earth of resources and turn them into diamonds / computing power): most people would rather build something that shares some of our notion of “value,” such as respect for truth and beauty and wonder and so on.
It looks like this isn’t something you get for free. (In fact, it looks very hard to get: it seems likely that most minds would by default have incentives to manipulate & decieve in order to acquire resources.) We’d rather not build minds that try to turn everything they can into a giant computer for computing digits of pi, so the question is how to design the sort of mind that has things like respect for truth and beauty and wonder?
In hollywood movies, you can just build something that looks cute and fluffy and then it will magically acquire a spark of human-esque curiosity and regard for other sentient life, but in the real world, you’ve got to figure out how to program in those capabilities yourself (or program something that will reliably acquire them), and that’s hard :-)
I know that in the past LessWrong, HPMOR, and similar community-oriented publications have been a significant source of recruitment for areas that MIRI is interested in, such as rationality, EA, awareness of the AI problem, and actual research associates (including yourself, I think). What, if anything, are you planning to do to further support community engagement of this sort? Specifically, as a LW member I’m interested to know if you have any plans to help LW in some way.
I have a friend studying a masters’ degree in artificial intelligence, and he says:
How much does an internship at MIRI as a researcher pay?
Is MIRIs hope/ambition that that CEV (http://wiki.lesswrong.com/wiki/Coherent_Extrapolated_Volition) or something resemblant of CEV will be implement, or is this not something you have a stance on?
(I’m not asking whether you think CEV should be the goal-system of the first superintelligence. I know it’s possible to have strategies such as first creating an oracle and then at some later point implement something CEV-like.)
First, I think that civilization had better be really dang mature before it considers handing over the reins to something like CEV. (Luke has written a bit about civilizational maturity in the past.)
Second, I think that the CEV paper (which is currently 11 years old) is fairly out of date, and I don’t necessarily endorse the particulars of it. I do hope, though, that if humanity (or posthumanity) ever builds a singleton, that they build it with a goal of something like taking into account the extrapolated preferences of all sentients and fulfilling some superposition of those in a non-atrocious way. (I don’t claim to know how to fill in the gaps there.)
As someone who’s spent a significant amount of time thinking about possible rearrangements of civilization, reading On Saving The World was both tantalizing and frustrating (as well as cementing your position as one of the most impressive people I am aware of). I understand building up from the ground, covering all the pre-requisites and inferential distance, would be a huge effort and currently not worth your time, but I feel like even a terse summary without any detailed justifications for suggestions based on of all those years of thought would be highly interesting, and a pointer towards areas worth exploring.
Would you be willing to at least summarize some of your high-level conclusions, with the understanding that you’re not going to attempt to defend, justify, or develop them in any depth since you have higher priorities?
Or at least laying out the inferential steps you see most lacking within EA groups you meet? Or less-wrongians
What are your contrarian beliefs?
Cheeky question:
You probably believe in many strange things that most people do not. Nonetheless, I think you are very clever and trust you a lot. Can you think of any unusual beliefs you have that have implications for asset prices?
There are different inputs needed to advance AI safety: money, research talent, executive talent, and others. How do you see the tradeoff between these resources, and which seems most like a priority right now?
Looks like a few of Nate’s other answers partly address your question: “Right now we’re talent-constrained...” and “grow the research team...”
anonymous question from a big fan of yours on tumblr:
“Re: Nate Soares (thanks for doing this btw, it’s really nice of you), two questions. First, I understand his ethical system described in his recent “should” series and other posts to be basically a kind of moral relativism; is he comfortable with that label? Second, does he only intend it for a certain subset of humans with agreeable values, or does it apply to all value systems, even ones we would find objectionable?”
(I’m passing on questions without comment from anyone without an e-a.com account or who wants anonymity here. )
You could call it a kind of moral relativism if you want, though it’s not a term I would use. I tend to disagree with many self-proclaimed moral relativists: for example, I think it’s quite possible for one to be wrong about what they value, and I am not generally willing to concede that Alice thinks murder is OK just because Alice says Alice thinks murder is OK.
Another place I depart from most moral relativists I’ve met is by mixing in a healthy dose of “you don’t get to just make things up.” Analogy: we do get to make up the rules of arithmetic, but once we do, we don’t get to decide whether 7+2=9. This despite the fact that a “7″ is a human concept rather than a physical object (if you grind up the universe and pass it through the finest sieve, you will find no particle of 7). Similarly, if you grind up the universe you’ll find no particle of Justice, and value-laden concepts are human concoctions, but that doesn’t necessarily mean they bend to our will.
My stance can roughly be summarized as “there are facts about what you value, but they aren’t facts about the stars or the void, they’re facts about you.” (The devil’s in the details, of course.)
igotthatreference.jpg
What are some of your techniques for doing good research?
So as I understand it, what MIRI is doing now is to think about theoretical issues and strategies and write papers about this, in the hope that the theory you develop can be made use of by others?
Does MIRI think of ever:
Developing AI yourselves at some point?
Creating a goal-alignment/safy-framework to be used by people developing AGI? (Where e.g. reinforcement learners or other AI-compinents can be “plugged in”, but in some sense are abstracted away.)
Also (feel free to skip this part of the question if it is too big/demanding):
Personally, I have a goal of progressing the field of computer-assisted proofs by making them more automated and by making the process of making them more user-friendly. The system would be made available through a website where people can construct proofs and see the proofs, but the components of the system would also be made available for use elsewhere. One of the goals would be to make it possible and practical to construct claims that are in natural language and are made using components of natural language, but also have an unambiguous logical notation (probably in Martin-Löf type theory). The hope would be that this could be used for rigorous proofs about self-inproving AI, and that the technologies/code-base developed and the vocabulary/defnitions/claims/proofs in the system could be of use for a goal-alignment/safy-framework.
(Anyone reading this who are interested in hearing more, could get in touch with me, and/or take a look at this document:
https://docs.google.com/document/d/1GTTFO7RgEAJxy8HRUprCIKZYpmF4KJiVAGRHXF_Sa70/edit)
If I got across what it is that I’m hoping to make; does it sound like this could be useful to the field of AI safety / goal alignment? Or are you unsure? Or does it seem like my understanding of what the field needs is flawed to some degree, and that my efforts in all probability would be better spent elsewhere?
Kinda. The current approach is more like “Pretend you’re trying to solve a much easier version of the problem, e.g. where you have a ton of computing power and you’re trying to maximize diamond instead of hard-to-describe values. What parts of the problem would you still not know how to solve? Try to figure out how to solve those first.”
(1) If we manage to (a) generate a theory of advanced agents under many simplifying assumptions, and then (b) generate a theory of bounded rational agents under far fewer simplifying assumptions, and then (c) figure out how to make highly reliable practical generally intelligent systems, all before anyone else gets remotely close to AGI, then we might consider teching up towards designing AI systems ourselves. I currently find this scenario unlikely.
(2) We’re currently far enough away from knowing what the actual architectures will look like that I don’t think it’s useful to try to build AI components intended for use in an actual AGI at this juncture.
(3) I think that making theorem provers easier to use is an important task and a worthy goal. I’m not optimistic about attempts to merge natural language with Martin-Lof type theory. If you’re interested in improving theorem-proving tools in ways that might make it easier to design safe reflective systems in the future, I’d point you more towards trying to implement (e.g.) Marcello’s Waterfall in a dependently typed language (which may well involve occasionally patching the language, at this stage).
I guess I am way late to the party, but.....
What part of the MIRI research agenda do you think is the most accessible to people with the least background?
How could AI alignment research be made more accessible?
Are there any areas of the current software industry that developing expertise in might be useful to MIRI’s research agenda in the future?
Do you believe a terminal value could ever be “rational”? Or is that a Wrong Question?
Could you say more about what you mean by “rational” in this context? Do you have a particular kind of rationality in mind?
Hey Nate, congratulations! I think we briefly met in the office in February when I asked Luke about his plans; now it turns out I should have been quizzing you instead!
I have a huge list of questions; basically the same list I asked Seth Baum, actually. Feel free to answer as many or as few as you want. Apologies if you’ve already written on the subject elsewhere; feel free to just link if so.
What is your current marginal project(s)? How much will they cost, and what’s the expected output (if they get funded).
What is the biggest mistake you’ve made?
What is the biggest mistake you think others make?
What is the biggest thing you’ve changed your mind about recently? (say past year)
How do you balance the liklihood/risks of
FAI supergood
Everything continues much as now
UFAI
e.g. for what p would you prefer a p chance of FAI and a 1-p chance of UFAI over a guarantee of mankind continuing in a AGI-less fashion? (does this make sense in your current ontology?)
What’s your probability distribution for AGI timescale?
Do you have any major disagreements with Eliezer or Luke about 1) expectations for the future 2) strategy?
What do you think about the costs and benefits of publishing in journals as strategy?
Do you think the world has become better or worse over time? How? Why?
Do you think the world has become more or less at risk over time? How? Why?
What you think about Value Drift?
What do you think will be the impact of the Elon Musk money?
How do you think about weighing future value vs current value?
Personal question, feel free to disregard, but this is an AMA:
How has concern about AI’s affected your personal life, beyond the obvious. Has it affected your retirement savings? Do you plan / already have children?
Hey Larks, that’s a huge set of questions. It might be helpful to some themed bundles of questions from here and split them off into their own comments, so that others can upvote and read the questions according to their interest.
Will you still be answering questions now, or in future?
Nate will answer questions in an hour and a half:
Ah, I meant would he still be answering questions that got asked later.
Ah, there’s no plans, though I imagine Rob Bensinger wouldn’t mind me saying that if you have any useful follow-on questions, you can find his contact details on the MIRI website.
It could be useful to mention that sort of thing on future AMAs
I usually see MIRI’s goal in its technical agenda is “to ensure that the development of smarter-than-human intelligence has a positive impact on humanity.” Is there any chance of expanding this to include all sentient beings? If not, why not? Given that nonhuman animals vastly outnumber the human ones, I would think the most pressing question for AI is its effect on nonhuman animals rather than on human ones.
Yep :-)
The official mission statement is just “has a positive impact.” I’ll encourage people to also use phrasing that’s more inclusive to other sentients in future papers/communications.
Unless there are strategic concerns I don’t fully understand I second this. I cringe a little every time I see such goal-descriptions.
Personally I would argue that the issue of largest moral concern is ensuring that new beings that can have good experiences and have a meaningful existence are put into existence, as the quality and quantity of consciousness experienced by such not-yet-existant beings could dwarf what is experienced by currently existing beings on our small planet.
I understand that MIRI doesn’t want to take stance on all controversial ethical issues, but I would also wonder if MIRI has considered replacing “a positive impact on humanity” with “a positive impact on humanity and …”, e.g. “a positive impact on humanity and other sentient beings” or “a positive impact on humanity and the universe”.
I am not worried as much as you about the effect of AI on nonhuman animals, but I agree that it would maybe be nice if MIRI was slightly more explicitly anti-speciesist in its materials. I think they have a pretty good excuse for not being clearer about this, though.
FWIW, MIRI people seem pretty un-speciesist to me, in the strict sense of not being biased based on species. (Eliezer is AFAIK alone among MIRI employees in his confidence that chickens etc are morally irrelevant.) I have had a few conversations with Nate about nonhuman animals, and I’ve thought his opinions were thoroughly reasonable.
(Nate can probably respond to this too, but I think it’s possible that I’m a more unbiased source on MIRI’s attitude to non-human animals.)
P[humans and animals survive a long time]; large
P[humans survive with animal life with super AI]: small
P[humans survive without animal life with super AI]: much smaller
P[Animals survive without humans but with super AI]: nearly none?
It seems to me that by focusing on protecting humanity and its society, you’re protecting animals by implication pretty much.
Promoting animal liberation has large wierdness points.
MIRI’s efforts are already hampered by wierdness points.
So using MIRI as a platform to promote animal liberation is probably not a wise move?
What are your thoughts on “normal people”? To what extent do they frustrate you?