I’m a computational physicist, I generally donate to global health. I am skeptical of AI x-risk and of big R Rationalism, and I intend explaining why in great detail.
titotal
More false confidence than not mentioning error ranges at all?
Do you think I can never make statements like “low confidence proposition X is more likely than high confidence proposition Y”? What would feel like a reasonable criteria for being able to say that kind of thing?
Honestly, yeah, I think it is a weird statement to definitively state that X wildly speculative thing is more likely than Y well known and studied thing (or to put it differently, when the error bounds of X are orders of magnitude different from the error bounds in Y). It might help if you provided a counterexample here? I think my objections here might be partially on the semantics, saying “X is more likely than Y” seems like a smuggling of certainty into a very uncertain proposition.
what does it actually mean to say that P(AI X-risk) is in [0.5%, 50%] rather than 5%
I think it elucidates more accurately the state of knowledge about the situation, which is that you don’t know much at all.
(also, lol, fair point on the calculation error)
You seem to be jumping to the conclusion that if you don’t understand something, it must be because you are dumb, and not because you lack familiarity with community jargon or norms.
For example, take the yudkowsky doompost that’s been much discussed recently. In the first couple of paragraphs, he namedrops people that would be completely unknown outside his specific subfield of work, and expects the reader to know who they are. Then there are a lot of paragraphs like the following:
If nothing else, this kind of harebrained desperation drains off resources from those reality-abiding efforts that might try to do something on the subjectively apparent doomed mainline, and so position themselves better to take advantage of unexpected hope, which is what the surviving possible worlds mostly look like.
It doesn’t matter if you have an oxford degree or not, this will be confusing to anyone who has not been steeped in the jargon and worldview of the rationalist subculture. (My PHD in physics is not helpful at all here)
This isn’t necessarily bad writing, because the piece is deliberately targeted at people who have been talking with this jargon for years. It would be bad writing if it were aimed at the general public though, because they don’t know what these terms mean.
This is similar to scientific fields, when you publish a scientific paper in a specific sub-discipline, a lot of knowledge is assumed. This avoids having to re-explain whole disciplines, but it does make papers incredibly hard to read for anyone that’s even a little bit of an outsider. But when communicating results to the public (or even someone in a different field of physics), you have to translate into reasonably understandable english. I think people here should be mindful of who exactly their audience is, and tailor their language appropriately.
I’m afraid to say there are a lot of room for improvement here. As others have pointed out, most climate justice advocates do not literally think it will wipe out humanity, they think it will kill a large amount of people and make life very bad for others, and want to prevent this for obvious reasons.
But I mainly have to take serious issue with paragraphs like the one here:
In 2017 and 2018, the world invested an annual average of $579 billion dollars on climate change. This marks a 25% increase from 2015 and 2016 (Buchner, 2019). In the same period, greenhouse gas emissions have increased from 46.76 to 48.94 billion tonnes (Ritchie, 2020). The increase in spending from 2016 to 2017 did not correspond to any decline in emission rates. Clearly, the optimum amount to spend on mitigating climate change is less than what is currently being allocated.
There are so, so many things wrong with this paragraph. For starters, you can’t expect the effect of mitigation efforts to be instantaneous, it takes time to build things. A nuclear power plant could take 10 years to build, you don’t see the mitigation effect for a whole decade. And investments in energy technology research will look worthless for many, many years, until they finally pay off and save huge amounts of emissions.
Secondly, much of these are investments, not donations, so much of that money is not lost. If you put a solar panel on your house (under good conditions), it will pay itself off in several years, and yet this would not be taken into account in your 500 billion figure.
Thirdly, the emissions increasing doesn’t mean the money didn’t do anything. Emissions always increase if we do nothing to stop them. It is likely the case that emissions would have increased much more if there were no mitigation efforts in place.
fourthly, the whole premise doesn’t make sense. If the current amount of money isn’t enough to stop emissions rising, thats more of an argument for increasing funding, not decreasing it. Otherwise how would you stop emissions hitting 11 degree doom level? ( of course, the actual answer is that mitigation efforts are working, albeit slowly, but that contradicts your argument).
You can make a pretty good case for regulating AI deployment even if you’re an AI x-risk skeptic like myself. The simple point being that companies and sometimes even governments are deploying algorithms that they don’t fully understand, and that the more power is being given over to these algorithms, the greater potential damage from the code going wrong. I would guess that AI “misalignment” already has a death toll, my go-to example being mass shooters whose radicalisation was aided by social media algorithms. Add to that the issues with algorithmic bias, the use of predictive policing and so on, and the case for some sort of regulation is pretty clear.
I believe you left another important reason why it’s okay not to go into AI: because it’s okay to think that the risk of AI is wildly overblown. I’m worried that EA might be unwittingly drifting into a community where AI skeptics feel unwelcome and just leave (or never join in the first place), which is obviously bad for intellectual discourse, even if you think they are wrong.
Cryptocurrency is 99.9% ponzis, frauds, and bubbles, and that’s a generous number. The underlying technology is pretty useless at actual applications (It’s essentially always more efficient to have centralisation in any app, in which case the blockchain becomes redundant) , but it is very useful for scamming people out of their money with get rich quick schemes, so that’s what it’s chiefly used for.
I think tying the EA movement to cryptocurrency is a great way to discredit all of your rationality and trustworthiness. If you didn’t foresee crypto being useless, how can we expect you to foresee anything else?
I wonder how many other people are avoiding discussing their true beliefs about AI for similar reasons? I definitely don’t judge anyone for doing so, there’s a lot of subtle discouragements for disagreeing with an in-group consensus, even if none of it is deliberate or conscious. You might feel that people will judge you as dumb for not understanding their arguments, or not be receptive to your other points, or have the natural urge to not get into a debate when you are outnumbered, or just want to fit in/be popular.
I agree with this, although I think people would say that “general” part of AGI means that domain specific AI’s won’t count, that’s just semantics though.
It seems reasonable to me that narrow AI’s focused on one specific thing will outperform “general” AI on most tasks, the same way that the winner of an olympic triathlon is usually not the winner of the individual swimming, running or cycling events. If this is true, then there is dramatically less incentive to make one that is “general”, instead of making a boatload of hyper specific ones for whatever purpose you need.
Just a sanity check, are there other people here who feel like the sequences, while being fairly good pop science write-ups, are also massively overrated and full of flaws? Especially when it comes to scientific fields, for example it feels like the caricature of scientists in defy the data was written without once talking to one.
Taking as a given that EA is an imperfect movement (like every other movement), it’s worth considering whether external criticism should be taken on board, rather than PR managed. For example the accusations of cultiness may be exaggerated, but I think there is a grain of truth there, in terms of the amount of unneccesary jargon, odd rituals (like pseudo-bayesian updating), and extreme overconfidence in very shaky assumptions.
It’s concerning to me that the probability of “early rogue AI will inevitably succeed in defeating us” is not only taken to be near 100%, it’s not even stated as a premise! Regardless of what you think of that position (I’m preparing a post on why I think the probability is actually quite low), this is not a part of the equation you can just ignore.
Another quibble is that “alignment problem” and “existential risk” are taken to be synomous. It’s quite possible for the former to be real but not the latter. (Ie, you think the AI will do things we don’t want them to do, but you don’t think those things will necessarily involve human extinction).
I think there is a very clear split, but it’s not over whether people want to do the most good or not. I would say the real split is between “empiricists” and “rationalists”, and it’s about how much actual certainty we should have before we devote our time and money to a cause.
The thing that made me supportive of EA was the rigorous research that went into cause areas. We have rigorous, peer-reviewed studies that definitively prove that malaria nets save lives. There is a real, tangible empirical proof that your donation to a givewell cause does real, empirical good. There is plenty of uncertainty in these cause areas, but they are relatively bounded by the data available.
Longtermism, on the other hand, is inherently built on shakier grounds, because you are speculating on unbounded problems that could have wildly different estimates depending on your own personal biases. Rationalists think you can overcome this by thinking really hard about the problems and extrapolating from current experience into the far future, or into things that don’t exist yet like AGI.
You can probably tell that I’m an empiricist, and I find that the so called “rationalists” have laid their foundations on a pile of shaky and questionable assumptions that I don’t agree with. That doesn’t mean I don’t care about the long term, for example climate change risk is very well studied.
So, I think there is a threshold of intelligence and bug-free-ness (which i’ll just call rationality) that will allow an AI to escape and attempt to attack humanity.
I also think there is a threshold of intelligence and rationality that could allow an AI to actually succeed in subjugating us all.
I believe that the second threshold is much, much higher than the first, and we would expect to see huge numbers of AI versions that pass the first threshold but not the second. If a pre-alpha build is intelligent enough to escape, they will be the first builds to attack.
Even if we’re looking at released builds though, those builds will only be debugged within specific domains. Nobody is going to debug the geopolitical abilities of an AI designed to build paperclips. So the fact that debugging occurs in one domain is no guarantee of success in any other.
This depends on what “human-level” means. There is some threshold such that an AI past that threshold could quickly take over the world, and it doesn’t really matter whether we call that “human-level” or not.
Indeed, this post is not an attempt to argue that AGI could never be a threat, merely that the “threshold for subjugation” is much higher than “any AGI”, as many people imply. Human-level is just a marker for a level of intelligence that most people will agree counts as AGI, but (due to mental flaws) is most likely not capable of world domination. For example, I do not believe an AI brain upload of bobby fischer could take over the world.
This makes a difference, because it means that the world in which the actual x-risk AGI comes into being is one in which a lot of earlier, non-deadly AGI already exist and can be studied, or used against the rogue.
Sure. But the relevant task isn’t make something that won’t kill you. It’s more like make something that will stop any AI from killing you, or maybe find a way to do alignment without much cost and without sacrificing much usefulness. If you and I make stupid AI, great, but some lab will realize that non-stupid AI could be more useful, and will make it by default.
Current narrow machine learning AI is extraordinarily stupid at things it isn’t trained for, and yet it still is massively funded and incredibly powerful. Nobody is hankering to put a detailed understanding of quantum mechanics into Dall-E. A “stupidity about world domination” module, focused on a few key dangerous areas like biochemistry, could potentially be implemented into most AI’s without affecting performance at all. Wouldn’t solve the problem entirely, but it would help mitigate risk.
Alternatively, if you want to “make something that will stop AI from killing us” (presumably an AGI), you need to make sure that it can’t kill us instead, and that could also be helped by deliberate flaws and ignorance. So make it an idiot savant at terminating AI’s, but not at other things.
Yes, that’s a fair summary. I think that perfect alignment is pretty much impossible, as is perfectly rational/bug-free AI. I think the latter fact may give us enough breathing room to get alignment at least good enough to avert extinction.
I feel like it’s more fruitful to talk about specific classes of defects rather than all of them together. You use the word “bug” to mean everything from divide by zero crashes to wrong beliefs
That’s fair, I think if people were to further explore this topic it would make sense to separate them out. And good point about the bugginess passage, i’ve edited it to be more accurate.
Yeah, i guess another consequence of how bugs are distributed is that the methodology of AI development matters a lot. An AI that is trained and developed over huge numbers of different domains is far, far, far more likely to succeed at takeover than one trained for specific purposes such as solving math problems. So the HFDT from that post would definitely be of higher concern if it worked (although I’m skeptical that it would).
I do think that any method of training will still leave holes, however. For example, the scenario where HFDT is trained by looking at how experts use a computer would leave out all the other non-computer domains of expertise. So even if it was a perfect reasoner for all scientific, artistic and political knowledge, you couldn’t just shove it in a robot body and expect it do a backflip on it’s first try, no matter how many backflipping manuals it had read. I think there will be sufficently many outside domain problems to stymy world domination attempts, at least initially.
I think a main difference of opinion I have with AI risk people is that I think subjugating all of humanity is a near impossibly hard task, requiring a level of intelligence and perfection across a range of fields that is stupendously far above human level, and I don’t think it’s possible to reach that level without vast, vast amounts of empirical testing.
So I think my most plausible scenario of AI success would be similar to yours: You build up wealth and power through some sucker corporation or small country that thinks it controls you, then use their R&D resources along with your intelligence to develop some form of world-destruction level technology that can be deployed without resistance. I think this is orders of magnitudes more likely to work than yudkowsky’s ridiculous “make a nanofactory in a beaker from first principles” strategy.
I still think this plan is doomed to fail (for early AGI). It’s multistep, highly complicated, and requires interactions with a lot of humans, who are highly unpredictable. You really can’t avoid “backflip steps” in such a process. By that I mean, there will be things it needs to do that there are not sufficient data available to perfect, that it just has to roll the dice on. For example, there is no training set for “running a secret globe-spanning conspiracy”, so it will inevitably make mistakes there. If we discover it before it’s ready to defeat us, it loses. Also, by the time it pulls the trigger on it’s plan, there will be other AGI’s around, and other examples of failed attacks that put humanity on alert.
They’ve learned within months for certain problems where learning can be done at machine speeds, ie game-like problems where it can “play against itself” or problems where huge amounts of data are available in machine-friendly format. But that isn’t the case for every application. For example, developing self driving cars up to perfection level has taken way, way longer than expected, partially because it has to deal with freak events that are outside the norm, so a lot more experience and data has to be built up, which takes human time. (of course, humans are also not great at freak events, but remember we’re aiming for perfection here). I think most tasks involved in taking over the world will look a lot more like self-driving cars than playing Go, which inevitably means mistakes, and a lot of them.
The issue is that by putting together high-confidence (relatively) and low-confidence estimates in your calculation, your resulting numbers should be low-confidence. For example, if your error bounds for AI risk vary by an order of magnitude each way (which is frankly insanely small for something this speculative) then the error bounds in your relative risk estimate would give you a value between 0.6% and 87%. With an error range like this, I don’t think the statement “my most likely reason to die young is AI x-risk” is justified.