KR

Karma: 56

Intellectual Diversity in AI Safety

KR22 Jul 2020 19:07 UTC

21 points

8 comments3 min readEA link

How do takeoff speeds affect the probability of bad outcomes from AGI?

KR7 Jul 2020 17:53 UTC

18 points

0 comments8 min readEA link

KR 7 Jul 2020 0:00 UTC
13 points
0 ∶ 0
on: KR’s Shortform
Thought experiment for longtermism: if you were alive in 1920 trying to have the largest possible impact today, would the ideas you came up with without the benefit of hindsight still have an effect today?
I find this a useful intuition pump in general. If someone says “X will happen in 50 years” I think of myself looking at 2020 from 1970, asking how many of that sort of prediction I made then would have been accurate now. The world in 50 years is going to be at least as hard to imagine (hopefully more, given exponential growth) to us as the world of today would have from 1970. What did we know? What did we completely miss? What kinds of systematic mistakes might we be making?

KR 18 Jun 2020 4:18 UTC
5 points
0 ∶ 0
on: KR’s Shortform
An argument in favor of slow takeoff scenarios being generally safer is that we will get to see and experiment with the precursor AIs before they become capable of causing x-risks. Even if the behavior of this precursor AI is predictive of the superhuman AI’s, our ability to use it depends on the reaction to the potential dangers of this precursor AI. A society confident that there is no danger from increasing the capabilities of the machine that has been successfully running its electrical grid gains much less of an advantage from a slow takeoff (as opposed to the classic hard takeoff) than one with an awareness of its potential dangers.
Personally, I would expect a shift in attitudes towards AI as it becomes obviously more capable than humans in many domains. However, whether this shift involves being more careful or instead abdicating decisions to the AI entirely seems unclear to me. The way I play chess with a much stronger opponent is very different from how I play chess with a weaker or equally matched one. With the stronger opponent I am far more likely to expect obvious-looking blunders to actually be a set-up, for instance, and spend more time trying to figure out what advantage they might gain from it. On the other hand, I never bother to check my calculator’s math by hand, because the odds that it’s wrong is far lower than the chance that I will mess up somewhere in my arithmetic. If someone came up with an AI-calculator that gave occasional subtly wrong answers, I certainly wouldn’t notice.
Taking advantage of the benefits of a slow takeoff also requires the ability to have institutions capable of noticing and preventing problems. In a fast takeoff scenario, it is much easier for a single, relatively small project to unilaterally take off. This is, essentially, a gamble on that particular team’s capabilities. In a slow takeoff, it will be rapidly obvious that some project(s) seem to be trending in that direction, which makes it more likely that if the project seems unsafe there will be time to impose external control on it. How much of an advantage this is depends on how much you trust whichever institutions will be needed to impose those controls. Humanity’s track record in this respect seems to me to be mixed. Some historical precedents for cooperation (or lack thereof) in controlling dangerous technologies and their side-effects are the Asilomar Conference, nuclear proliferation treaties, and various pollution agreements. Asilomar, which seems to me the most successful of these, involved a relatively small scientific field voluntarily adhering to some limits on potentially dangerous research until more information could be gathered. Nuclear proliferation treaties reduce the cost of a zero-sum arms race, but it isn’t clear to me if they significantly reduced the risk of nuclear war. Pollution regulations have had very mixed results, with some major successes (eg acid rain) but on the whole failing to avert massive global change. Somewhat closer to home, the response to Covid-19 hasn’t been particularly encouraging. It is unclear to me which, if any, of these present a fair comparison, but our track record in cooperating seems decidedly mixed.

KR 22 Jul 2020 22:39 UTC
3 points
0 ∶ 0
in reply to: Geoffrey Irving’s comment on: Intellectual Diversity in AI Safety
My impression is that people like you are pretty rare, but all of this is based off subjective impressions and I could be very wrong.
Have you met a lot of other people who came to AI safety from some background other than the Yudkowsky/Superintelligence cluster?

KR 7 Jul 2020 17:53 UTC
3 points
0 ∶ 0
in reply to: Aaron Gertler’s comment on: KR’s Shortform
Thanks! I ended up expanding it significantly and posting the full version here.

KR 7 Jul 2020 18:02 UTC
2 points
0 ∶ 0
in reply to: Buck’s comment on: KR’s Shortform
My understanding of the hinge of history argument is that the current time has more leverage than either the past or future. Even if that’s true, it doesn’t necessarily mean that it’s any more obvious what needs to be done to influence the future.
If I believed that e.g. AI is obviously the most important lever right now, and think I know which direction to push that lever, I would ask myself “using the same reasoning, which levers would I be trying to push where in 1920”. As far as I can tell this is pretty agnostic about how easy it is to push these levers around, just which you would want to be pushing.

KR 19 Jun 2020 5:39 UTC
1 point
0 ∶ 0
in reply to: Buck’s comment on: KR’s Shortform
Thanks for the links, I googled briefly before I wrote this to check my memory and couldn’t find anything. I think what formed my impression was that even in very detailed conversations/writing on AI, as far as I could tell by default there was no mention or implicit acknowledgement of the possibility. On reflection I’m not sure if I would expect it to be even if people did think it was likely, though.

KR 19 Jun 2020 3:32 UTC
1 point
0 ∶ 0
on: KR’s Shortform
EA-style discussion about AI seems to dismiss out of hand the possibility that AI might be sentient. I can’t find an example, but the possibility seems generally scoffed at in the same tone people dismiss Skynet and killer robot scenarios. Bostrom’s simulation hypothesis, however, is broadly accepted as at the very least an interestingly plausible argument.
These two stances seem entirely incompatible—if silicon can create a whole world inside of which are sentient minds, why can’t it just create the minds with no need for the framing device? It is plausible that sentience does not emerge unless you very precisely mimic natural (or “natural”) evolutionary pressures, but this seems unlikely. It’s likewise possible that something about the process by which we expect to create AI doesn’t allow for sentience, but in that case I think the burden of proof is on the people making the argument to identify this feature and argue for their reasons.
The strongest argument I can think off of the top of my head is that, if we expect a chance of future-AI created by something resembling modern machine learning methods to have a chance at sentience, we should likewise expect, say, worm-equivalent AIs to have it to. Is c. elegans sentient? Is OpenWorm? If you answered yes to the first and no to the second, what is OpenWorm missing that c. elegans has?

KR

In­tel­lec­tual Diver­sity in AI Safety

How do take­off speeds af­fect the prob­a­bil­ity of bad out­comes from AGI?

Intellectual Diversity in AI Safety

How do takeoff speeds affect the probability of bad outcomes from AGI?