RobBensinger comments on Intro to caring about AI alignment as an EA cause

RobBensinger 17 Apr 2017 3:26 UTC
1 point
0 ∶ 0

You’re right that we can’t be sure that conscious beings will do good things, but we don’t have that assurance for any outcome we might push for.

One way to think about the goal is that we want to “zero in” on valuable futures: it’s unclear what exactly a good future looks like, and we can’t get an “assurance,” but for example a massive Manhattan Project to develop whole brain emulation is a not-implausible path to zeroing in, assuming WBE isn’t too difficult to achieve on the relevant timescale and assuming you can avoid accelerating difficult-to-align AI too much in the process. It’s a potentially promising option for zeroing in because emulated humans could be leveraged to do a lot of cognitive work in a compressed period of time to sort out key questions in moral psychology+philosophy, neuroscience, computer science, etc. that we need to answer in order to get a better picture of good outcomes.

This is also true for a Manhattan Project to develop a powerful search algorithm that generates smart creative policies to satisfy our values, while excluding hazardous parts of the search space—this is the AI route.

Trying to ensure that AI is conscious, without also solving WBE or alignment or global coordination or something of that kind in the process, doesn’t have this “zeroing in” property. It’s more of a gamble that hopefully good-ish futures have a high enough base rate even when we don’t put a lot of work into steering in a specific direction, that maybe arbitrary conscious systems would make good things happen. But building a future of conscious AI systems opens up a lot of ways for suffering to end up proliferating in the universe, just as it opens up a lot of ways for happiness to end up proliferating in the universe. Just as it isn’t obvious that e.g. rabbits experience more joy than suffering in the natural world, it isn’t obvious that conscious AI systems in a multipolar outcome would experience more joy than suffering. (Or otherwise experience good-on-net lives.)

True, but if a singleton isn’t inevitable, trying to create one will itself pose a serious existential risk, as would any similar world takeover attempt.

I think if an AI singleton isn’t infeasible or prohibitively difficult to achieve, then it’s likely to happen eventually regardless of what we’d ideally prefer to have happen, absent some intervention to prevent it. Either it’s not achievable, or something needs to occur to prevent anyone in the world from reaching that point. If you’re worried about singletons, I don’t think pursuing multipolar outcomes and/or conscious-AI outcomes should be a priority for you, because I don’t think either of those paths concentrates very much probability mass (if any) into scenarios where singletons start off feasible but something blocks them from occurring.

Multipolar scenarios are likelier to occur in scenarios where singletons simply aren’t feasible, as a background fact about the universe; but conditional on singletons being feasible, I’m skeptical that achieving a multipolar AI outcome would do much (if anything) to prevent a singleton from occurring afterward, and I think it would make alignment much more difficult.

Alignment and WBE look like difficult tasks, but they have the “zeroing in” property, and we don’t know exactly how difficult they are. Alignment in particular could turn out to be much harder than it looks or much easier, because there’s so little understanding of what specifically is required. (Investigating WBE has less value-of-information because we already have some decent WBE roadmaps.)
- jprwg 18 Apr 2017 20:30 UTC
  0 points
  0 ∶ 0
  Parent
  To clarify: I don’t think it will be especially fruitful to try to ensure AIs are conscious, for the reason you mention: multipolar scenarios don’t really work that way, what will happen is determined by what’s efficient in a competitive world, which doesn’t allow much room to make changes now that will actually persist.
  
  And yes, if a singleton is inevitable, then our only hope for a good future is to do our best to align the singleton, so that it uses its uncontested power to do good things rather than just to pursue whatever nonsense goal it will have been given otherwise.
  
  What I’m concerned about is the possibility that a singleton is not inevitable (which seems to me the most likely scenario) but that folks attempt to create one anyway. This includes realities where a singleton is impossible or close to it, as well as where a singleton is possible but only with some effort made to push towards that outcome. An example of the latter would just be a soft takeoff coupled with an attempt at forming a world government to control the AI—such a scenario certainly seems to me like it could fit the “possible but not inevitable” description.
  
  A world takeover attempt has the potential to go very, very wrong—and then there’s the serious possibility that the creation of the singleton would be successful but the alignment of it would not. Given this, I don’t think it makes sense to push unequivocally for this option, with the enormous risks it entails, until we have a good idea of what the alternative looks like. That we can’t control that alternative is irrelevant—we can still understand it! When we have a reasonable picture of that scenario, then we can start to think about whether it’s so bad that we should embark on dangerous risky strategies to try to avoid it.
  
  One element of that understanding would be on how likely AIs are to be conscious; another would be how good or bad a life conscious AIs would have in a multipolar scenario. I agree entirely that we don’t know this yet—whether for rabbits or for future AIs—that’s part of what I’d need to understand before I’d agree that a singleton seems like our best chance at a good future.
  - RobBensinger 27 Apr 2017 22:59 UTC
    1 point
    0 ∶ 0
    Parent
    Did anything in Nate’s post or my comments strike you as “pushing for a singleton”? When people say “singleton,” I usually understand them to have in mind some kind of world takeover, which sounds like what you’re talking about here. The strategy people at MIRI favor tends to be more like “figure out what minimal AI system can end the acute risk period (in particular, from singletons), while doing as little else as possible; then steer toward that kind of system”. This shouldn’t be via world takeover if there’s any less-ambitious path to that outcome, because any added capability, or any added wrinkle in the goal you’re using the system for, increases accident risk.
    
    More generally, alignment is something that you can partially solve for systems with some particular set of capabilities, rather than being all-or-nothing.
    
    I agree entirely that we don’t know this yet—whether for rabbits or for future AIs—that’s part of what I’d need to understand before I’d agree that a singleton seems like our best chance at a good future.
    
    I think it’s much less likely that we can learn that kind of generalization in advance than that we can solve most of the alignment problem in advance. Additionally, solving this doesn’t in any obvious way get you any closer to being able to block singletons from being developed, in the scenario where singletons are “possible but only with some effort made”. Knowing about the utility of a multipolar outcome where no one ever builds a singleton can be useful for knowing whether you should aim for a multipolar outcome where no one ever build a singleton, but it doesn’t get us any closer to knowing how to prevent anyone from ever building a singleton if you find a way to achieve an initially multipolar outcome.
    
    I’d also add that I think the risk of producing bad conscious states via non-aligned AI mainly lies in AI systems potentially having parts or subsystems that are conscious, rather than in the system as a whole (or executive components) being conscious in the fashion of a human.