I basically agree with this, with one particular caveat, in that the EA and LW communities might eventually need to fight/block open source efforts due to issues like bioweapons, and it’s very plausible that the open-source community refuses to stop open-sourcing models even if there is clear evidence that they can immensely help/automate biorisk, so while I think the fight was done too early, I think the fighty/uncooperative parts of making AI safe might eventually matter more than is recognized today.
Sharmake
To respond to a local point here:
Also, I am suspicious of framing “opposition to geoengineering” as bad—this, to me, is a red flag that someone has not done their homework on uncertainties in the responses of the climate system to large-scale interventions like albedo modification. Geoengineering the planet wrong is absolutely an X-risk.
While I can definitely buy that geoengineering is a net-negative, I’m not sure how geoengineering gone wrong can actually result in X-risk, at least to me so far, and I don’t currently understand the issues that well.
It doesn’t speak well that he frames opposition to geoengineering as automatically bad (even if I assume the current arguments against geoengineering are quite bad).
This is roughly my take, with the caveat that I’d replace CEV by instruction following, and I wouldn’t be so sure that alignment is easy (though I do think we can replace it with the assumption that it is highly incentivized to solve the AI alignment problem and that the problem is actually solvable).
Crossposting this comment from LW, because I think there is some value here:
The main points are that value alignment will be way more necessary for ordinary people to survive, no matter the institiutions adopted, that the world hasn’t yet weighed in that much on AI safety and plausibly never will, but we do need to prepare for a future in which AI safety may become mainstream, that Bayesianism is fine actually, and many more points in the full comment.
The big reason I lean towards disagreeing nowadays is coming to the belief that I expect the AI control/alignment problem to be much less neglected and important to solve, and more generally I’ve come to doubt the assumption that worlds in which we survive are worlds in which we achieve very large value (under my own value set), such that reducing existential risk is automatically good.
Late comment, I basically agree with the point being made here that we should avoid committing a fallacy of assuming work done is constant/lump of labor fallacies, but I don’t think this weakens the argument that human work will be replaced by AI work totally, for 2 reasons:
-
In a world where you can copy AI labor hugely readily, wages fall for the same reason why prices fall when more goods are supplied, and in particular humans have a biological minimum wage of 20-100 watts that fundamentally makes them unemployable once AIs can be run for cheaper than this, and human wages are likely to fall below subsistence if AIs are copied hugely.
-
While more work will happen from growing the economy, it is still better to invest in AIs to do the work than it is to invest in humans, and thus even while labor grows, human labor specifically can fall to essentially 0, so the automation hypothesis is at least a consistent hypothesis to hold economically.
-
To be honest, even if we grant the assumption that AI alignment is achieved and it matters who achieves AGI/ASI, I’d be much, much less confident in America racing, and think that it’s weakly negative to race.
One big reason for this is that the pressures AGI introduces are closer to cross-cutting pressures than pressures that are dependent on nations, like the intelligence curse sort of scenario where elites have incentives to invest in their automated economy, and leave the large non-elite population to starve/be repressed:
https://lukedrago.substack.com/p/the-intelligence-curse
I think this might not be irrationality, but a genuine difference in values.
In particular, I think something like a discount rate disagreement is at the core of a lot of disagreements on AI safety, and to be blunt, you shouldn’t expect convergence unless you successfully persuade them of this.
I ultimately decided to vote for the animal welfare groups, because I believe that animal welfare, in both it’s farmed and wild variants is probably one of the most robust and large problems in the world, and with the exception of groups that are the logistical/epistemic backbone of the movements (they are valuable for gathering data and making sure that the animal welfare groups can do their actions), I’ve become more skeptical that other causes were robustly net-positive, especially reducing existential risks.
This sounds very much like the missile gap/bomber gap narrative, and yeah this is quite bad news if they actually adopt the commitments pushed here.
The evidence that China is racing to AGI is quite frankly very little, and I see a very dangerous arms race that could come:
I honestly agree with this post, and to best translate this into my own thinking, we should rather have AI that is superhuman at faithful COT reasoning than it is at wise forward pass thinking.
The Peter Singer/Einstein/Legible reasoning corresponds to COT reasoning, whereas a lot of the directions for intuitive wise/illegible thinking depend on making the forward pass thinking more capable, which is not a great direction for reasons of trust and alignment.
In retrospect, I agree more with 3, and while I do still think AI timelines are plausibly very short, I do think that after-2030 timelines are reasonably plausible from my perspective.
I have become less convinced that takeoff speed from the perspective of the state will be slow, slightly due to entropix reducing my confidence in a view where algorithmic progress doesn’t suddenly go critical and make AI radically better, and more so because I now think there will be less flashy/public progress, and more importantly I think the gap between consumer AI and internal AI used in OpenAI will only widen, so I expect a lot of the GPT-4 moments where people wowed and got very concerned at AI to not happen again.
So I expect the landscape of AI governance to have less salience when AIs can automate AI research than the current AI governance field thinks, which means overall I’ve reduced my probability of a strong societal response from say 80-90% likely to only 45-60% likely.
This is mostly correct as a summary of my position, but for point 6, I want to point out while this is technically true, I do fear economic incentives are against this path.
Agree with the rest of the summary though.
My disagreements with “AGI ruin: A List of Lethalities”
I basically grant 2, sort of agree with 1, and drastically disagree with three (that timelines will be long.)
Which makes me a bit weird, since while I do have real confidence in the basic story that governments are likely to influence AI a lot, I do have my doubts that governments will try to regulate AI seriously, especially if timelines are short enough.
Yeah, at least several comments have much more severe issues than tone or stylistic choices, like rewording ~every claim by Ben, Chloe and Alice, and then assuming that the transformed claims had the same truth value as the original claim.
I’m in a position very similar to Yarrow here: While I think Kat Woods has mostly convinced me that the most incendiary claims are likely false, and I’m sympathetic to the case for suing Ben and Habryka, there was dangerous red flags in the responses, so much so that I’d stop funding Nonlinear entirely, and I think it’s quite bad that Kat Woods responded the way they did.
I unendorsed primarily because apparently, the board didn’t fire because of safety concerns, though I’m not sure this is accurate.
It seems like the board did not fire Sam Altman for safety reasons, but instead for other reasons instead. Utterly confusing, and IMO demolishes my previous theory, though a lot of other theories also lost out.
Sources below, with their archive versions included:
https://twitter.com/norabelrose/status/1726635769958478244
https://twitter.com/eshear/status/1726526112019382275
While I generally agree that they almost certainly have more information on what happened, which is why I’m not really certain on this theory, my main reason here is that for the most part, AI safety as a cause basically managed to get away with incredibly weak standards of evidence for a long time, until the deep learning era in 2019-, especially with all the evolution analogies, and even now it still tends to have very low standards (though I do believe it’s slowly improving right now). This probably influenced a lot of EA safetyists like Ilya, who almost certainly imbibed the norms of the AI safety field, and one of them is that there is a very low standard of evidence needed to claim big things, and that’s going to conflict with corporate/legal standards of evidence.
But I don’t think most people who hold influential positions within EA (or EA-minded people who hold influential positions in the world at large, for that matter) are likely to be that superficial in their analysis of things. (In particular, I’m strongly disagreeing with the idea that it’s likely that the board “basically had no evidence except speculation from the EA/LW forum”. I think one thing EA is unusually good at – or maybe I should say “some/many parts of EA are unusually good at” – is hiring people for important roles who think for themselves and have generally good takes about things and acknowledge the possibility of being wrong about stuff. [Not to say that there isn’t any groupthink among EAs. Also, “unusually good” isn’t necessarily that high of a bar.])
I agree with this weakly, in the sense that being high up in EA is at least a slight update towards them actually thinking through things and being able to make actual cases. My disagreement here is that this effect is probably not strong enough to wash away the cultural effects of operating in a cause area where they don’t need to meet any standard of evidence except long-winded blog posts and getting rewarded, for many reasons.
Also, the board second-guessed it’s decision, which would be evidence for the theory that they couldn’t make a case that actually abided to the standard of evidence for a corporate/legal setting.
If it was any other cause like say GiveWell or some other causes in EA, I would trust them much more that they do have good reason. But AI safety has been so reliant on very low-non-existent standards of evidence or epistemics that they probably couldn’t explain themselves in a way that would abide by the strictness of a corporate/legal standard of evidence.
Edit: The firing wasn’t because of safety related concerns.
While finm made a general comment in response to you, I want to specifically focus on the footnote, because I think it’s a central crux in why a lot of EAs are way less doomy than you.
Quote below:
I think the 13 9s can be reduced to something requiring closer to 1-2 9s at the very least, and there are 2 reasons for this:
I think you drastically overestimate how many chances the AI gets at misalignment, because the trillions of executions will use far, far too little compute per single action to lead to a takeover, even in worlds where a software intelligence explosion is possible and large efficiencies in software are gainable. I expect there to be at most millions of such plausible actions, and the more likely scenario is that it’s closer to hundreds or thousands of chances to escape, undermining this tweet:
https://x.com/gcolbourn/status/1762224406955216897
2. If we manage to catch an AI doing bad stuff, then it’s much, much harder for the AI to escape, because there are a lot of techniques that can be applied to make the issue go away like proliferating the escape example.
More from Ryan Greenblatt here:
https://www.lesswrong.com/posts/i2nmBfCXnadeGmhzW/catching-ais-red-handed
I definitely think alignment needs to be reliable, but I do not think it needs to be so reliable that we cannot achieve it, or that doom is very likely and we can’t change the probabilities.
I’d certainly say it’s quite scary, but I do think there’s a reasonable hope of surviving and going on to thrive such that I think alignment invest is worth the money.