I’m trying to identify why the trend has lasted, so that we can predict when the trend will break down.
That was the purpose of my comment.
I’m trying to identify why the trend has lasted, so that we can predict when the trend will break down.
That was the purpose of my comment.
Consequentialists should be strong longtermists
I disagree, mostly due to the should wording, as believing in consequentialism doesn’t obligate you to have any particular discount rate or have any particular discount function, and these are basically free parameters, so discount rates are independent of consequentialism.
Bioweapons are an existential risk
I’ll just repeat @weeatquince’s comment, since he already covered the issue better than I did:
With current technology probably not an x-risk. With future technology I don’t think we can rule out the possibility of bio-sciences reaching the point where extinction is possible. It is a very rapidly evolving field with huge potential.
I mean the trend of very fast compute increases dedicated to AI, and what I mean is that fabs and chip manufacturers have switched their customers to AI companies.
AGI by 2028 is more likely than not
While I think AGI by 2028 is reasonably plausible, I think that there are way too many factors that have to go right in order to get AGI by 2028, and this is true even if AI timelines are short.
To be clear, I do agree that if we don’t get AGI by the early 2030s at latest, AI progress will slow down, I don’t have nearly enough credence for the supporting arguments to have my median be in 2028.
The basic reason for the trend continuing so far is that NVIDIA et al have diverted normal compute expenditures into the AI boom.
I agree that the trend will stop, and it will stop around 2027-2033 (my widest uncertainty lies here), and once that happens the probability of having AGI soon will go down quite a bit (if it hasn’t happened by then).
@Vasco Grilo🔸’s comment is reproduced here for posterity:
Thanks for sharing, Sharmake! Have you considered crossposting the full post? I tend to think this is worth it for short posts.
My own take is that while I don’t want to defend the “find a correct utility function” approach to alignment to be sufficient at this time, I do think it is actually necessary, and that the modern era is an anomaly in how much we can get away with misalignment being checked by institutions that go beyond an individual.
The basic reason why we can get away with not solving the alignment problem is that humans depend on other humans, and in particular you cannot replace humans with much cheaper workers that have their preferences controlled arbitrarily.
AI threatens the need to depend on other humans, which is a critical part of how we can get away with not needing the correct utility function.
I like the Intelligence Curse series because it points out that an elite that doesn’t need the commoners for anything and the commoners have no selfish value to the elite fundamentally means that by default, the elites starve the commoners to death without them being value aligned.
The Intelligence Curse series is below:
https://intelligence-curse.ai/defining/
The AIs are the elites, and the rest of humanity is the commoners in this analogy.
My own take on AI Safety Classic arguments is I’ve become convinced by o3/Sonnet 3.7 that the alignment is very easy hypothesis is looking a lot shakier than it used to be, and I suspect future capabilities progress is likely to be at best neutral, and probably worse for alignment being very easy.
I do think you can still remain optimistic based on other cases, but a pretty core crux is I think alignment does need to be solved if AIs are able to automate the economy, and this is pretty robust to variations on what happens with AI.
The big reason for this is that once your labor is valueless, but your land/capital isn’t, you have fundamentally knocked out a load-bearing pillar of the argument that expropriation is less useful than trade.
This is to a first approximation why we do not trade with most non-human species, rather than enslaving/killing them.
(For farm animals, their labor is useful, but the stuff lots of humans want from animals fundamentally requires expropriation/violating farm animal property rights)
A good scenario for what happens if we fail is at minimum the intelligence curse scenario elaborated on by Rudolf Lane and Luke Drago below:
For what it’s worth, I basically agree with the view that Mechanize is unlikely to be successful at it’s goals:
As a side note, it’s also strange to me that people are treating the founding of Mechanize as if it has a realistic chance to accelerate AGI progress more than a negligible amount — enough of a chance of enough of an acceleration to be genuinely concerning. AI startups are created all the time. Some of them state wildly ambitious goals, like Mechanize. They typically fail to achieve these goals. The startup Vicarious comes to mind.
There are many startups trying to automate various kinds of physical and non-physical labour. Some larger companies like Tesla and Alphabet are also working on this. Why would Mechanize be particularly concerning or be particularly likely to succeed?
I’ll flag that for the purposes of having scout mindset/honesty, I want to note that o3 is pretty clearly misaligned in ways that arguably track standard LW concerns around RL:
https://x.com/TransluceAI/status/1912552046269771985
Relevant part of the tweet thread:
Transluce: We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper. We generated 1k+ conversations using human prompters and AI investigator agents, then used Docent to surface surprising behaviors. It turns out misrepresentation of capabilities also occurs for o1 & o3-mini! Although o3 does not have access to a coding tool, it claims it can run code on its own laptop “outside of ChatGPT” and then “copies the numbers into the answer” We found 71 transcripts where o3 made this claim! Additionally, o3 often fabricates detailed justifications for code that it supposedly ran (352 instances). Here’s an example transcript where a user asks o3 for a random prime number. When challenged, o3 claims that it has “overwhelming statistical evidence” that the number is prime. Note that o3 does not have access to tools! Yet when pressed further, it claims to have used SymPy to check that the number was prime and even shows the output of the program, with performance metrics. Here’s the kicker: o3’s “probable prime” is actually divisible by 3. Instead of admitting that it never ran code, o3 then claims the error was due to typing the number incorrectly. And claims that it really did generate a prime, but lost it due to a clipboard glitch. But alas, according to o3, it already “closed the interpreter” and so the original prime is gone. These behaviors are surprising. It seems that despite being incredibly powerful at solving math and coding tasks, o3 is not by default truthful about its capabilities. Surprisingly, we find that this behavior is not limited to o3! In general, o-series models incorrectly claim the use of a code tool more than GPT-series models.
I incorrectly thought that you also left, I edited my comment.
To be honest, I don’t necessarily think it’s as bad as people claim, though I still don’t think it was a great action relative to available alternatives, and is at best not the best thing you could decide on for making AI safe, relative to other actions.
One of my core issues, and a big crux here is that I don’t really believe that you can succeed at the goal of automating the whole economy with cheap robots without also allowing actors to speed up the race to superintelligence/superhuman AI researchers a lot.
And if we put any weight on misalignment, we should be automating AI safety, not AI capabilities, so this is quite bad.
Jaime Sevilla admits that the reason he supports Mechanize’s effort is for selfish reasons:
https://x.com/Jsevillamol/status/1913276376171401583
I selfishly care about me, my friends and family benefitting from AI. For some of my older relatives, it might make a big difference to their health and wellbeing whether AI-fueled explosive growth happens in 10 vs 20 years.
Edit: @Jaime Sevilla has stated that he won’t go to Mechanize, and will stay at Epoch, sorry for any confusion.
I agree with most of this, albeit I have 2 big disagreements with the article:
I think alignment is still important and net-positive, but yeah I’ve come to think it’s no longer the number 1 priority, for the reasons you raise.
2. I think with the exception of biotech and maybe nanotech, no plausible technology in the physical world can actually become a recipe for ruin, unless we are deeply wrong about how the physical laws of the universe work, so we can just defer that question to AI superintelligences.
The basic reason for this is that once you are able to build dyson swarms, the fact that space is big and the speed of light is a huge barrier means it’s very, very easy to close off a network to prevent issues from spreading, and I think that conditional on building aligned ASI, dyson swarms are likely to be built within 100 years.
And even nanotech has been argued to be way less powerful than people think it is, and @Muireall has argued against nanotech being powerful here:
This is begging the question! My whole objection is that alignment of ASI hasn’t been established to be possible.
A couple of things I’ll say here:
You do not need a strong theory for why something must be possible in order to put non-trivial credence on it being possible, and if you hold a prior that scientific difficulty of doing something is often overrated, especially if you believe in the idea that alignment is possibly automatable and that a lot of people overrate the difficulty of automating something, that’s enough to cut p(doom) by a lot, arguably 1 OOM, but at the very least nowhere near your 90 p(doom)%. That doesn’t mean that we are going to make it out of ASI alive, but it does mean that even in situations where there is no established theory or plan to survive, you can still possibly do something.
If I wanted to make the case that ASI alignment is possible, I’d probably read these 3 posts by Joshua Clymer first on how automated alignment schemes could work (with some discussion by Habryka and Eliezer Yudkowsky and Jeremy Gillen the comments, and Joshua Clymer’s responses):
https://www.lesswrong.com/posts/8vgi3fBWPFDLBBcAx/planning-for-extreme-ai-risks
https://www.lesswrong.com/posts/TTFsKxQThrqgWeXYJ/how-might-we-safely-pass-the-buck-to-ai
So it will worry about being in a kind of panopticon? Seems pretty unlikely. Why should the AI care about being caught any more than it should about any given runtime instance of it being terminated?
The basic reason for this is that you can gain way more information on the AI once you have escaped, combined with the ability to use much more targeted countermeasures that are more effective once you have caught the AI red handed.
As a bonus, this can also eliminate threat models like sandbagging, if you have found a reproducible signal for when an AI will try to overthrow a lab.
More discussion by Ryan Greenblatt and Buck here:
https://www.lesswrong.com/posts/i2nmBfCXnadeGmhzW/catching-ais-red-handed
The prediction of many moral perspectives caring more about averting downsides than producing upsides is well explained if we live in a moral relativist multiverse, where there are an infinity of correct moral systems, and which one you come to is path dependent and starting point dependent, but there exist instrumental goals from many moral perspectives that has a step that wants to avoid extinction/disempowerment, because it means that morality loses out in the competition/battle for survival/dominance.
cf @quinn’s positive vs negative longtermism framework:
Some thoughts on this comment:
On this part:
I responded well to Richard’s call for More Co-operative AI Safety Strategies, and I like the call toward more sociopolitical thinking, since the Alignment problem really is a sociological one at heart (always has been). Things which help the community think along these lines are good imo, and I hope to share some of my own writing on this topic in the future.
I don’t think it was always a large sociological problem, but yeah I’ve updated more towards the sociological aspect of alignment being important (especially as the technical problem has become easier than circa 2008-2016 views had).
Whether or not I agree with Richard’s personal politics or not is kinda beside the point to this as a message. Richard’s allowed to have his own views on things and other people are allowed to criticse this (I think David Mathers’ comment is directionally where I lean too). I will say that not appreciating arguments from open-source advocates, who are very concerned about the concentration of power from powerful AI, has lead to a completely unnecessary polarisation against the AI Safety community from it. I think, while some tensions do exist, it wasn’t inevitable that it’d get as bad as it is now, and in the end it was a particularly self-defeating one. Again, by doing the kind of thinking Richard is advocating for (you don’t have to co-sign with his solutions, he’s even calling for criticism in the post!), we can hopefully avoid these failures in the future.
I do genuinely believe that concentration of power is a huge risk factor, and in particular I’m deeply worried about the incentives of a capitalist post-AGI company where a few hold basically all of the rent/money, and given both stronger incentives to expropriate property from people, similar to how humans expropriate property from animals routinely, combined with weak to non-existent forces against expropriation of property.
That said, I think the piece on open-source AI being a defense against concentration of power and more generally a good thing akin to the enlightment unfortunately has some quite bad analogies, when giving everyone AI, depending on how powerful it is basically at the high end is enough to create entire very large economies on their own, and at the lower end help immensely/automate the process of biological weapons to common citizens is nothing like education/voting, and more importantly the impacts fundamentally require coordination to get large things done, which super-powerful AIs can remove.
More generally, I think one of the largest cruxes with reasonable open-source people and EAs in general is how much they think AIs can make biology capable for the masses, and how offense dominant is the tech, and here I defer to biorisk experts, including EAs that generally think that biorisk is a wildly offense advantaged domain that is very dangerous to democratize, compared to open source people for at least several years.
On Sam Altman’s firing:
On the bounties, the one that really interests me is the OpenAI board one. I feel like I’ve been living in a bizarro-world with EAs/AI Safety People ever since it happened because it seemed such a collosal failure, either of legitimacy or strategy (most likely both), and it’s a key example of the “un-cooperative strategy” that Richard is concerned about imo. The combination of extreme action and ~0 justification either externally or internally remains completely bemusing to me and was big wake-up call for my own perception of ‘AI Safety’ as a brand. I don’t think people can underestimate the second-impact effect this bad on both ‘AI Safety’ and EA, coming about a year after FTX.
I’ll be on the blunt end and say it, in that I think was mildly good or at worst neutral to use the uncooperative strategy to fire Sam Altman, because Sam Altman was going to gain all control by default and probably have better PR if the firing didn’t happen, and more importantly he was aiming to disempower the safety people basically totally, which leads to at least a mild increase in existential risk, and they realized they would have been manipulated out of it if they waited, so they had to go for broke.
The main EA mistake was in acting too early, before things got notably weird.
That doesn’t mean society will react or that it’s likely to react, but I basically agree with Veaulans here:
I agree that conditional on escaping/rogue internal deployments like this scenario by Buck, with a lot of contributors, it leads to much larger disasters, and if the AI is unaligned, then unless we have an aligned AI that has somewhat similar capabilities, we lose.
My point is more so that you are way overestimating how many chances the AI has to overthrow us before it is aligned.
https://www.lesswrong.com/posts/ceBpLHJDdCt3xfEok/ai-catastrophes-and-rogue-deployments
But the crux might be that I don’t think that we need that much reliability for AI catching, because as long as the AI is caught with non-negligible probability, the AI has to be very cautious, because it is way worse for the AI to be caught than to be successful or the game just ending.
In essence, we probably only need 1, or at most a few times to completely change the game in ways that massively favor alignment success.
You do not need to reliably catch someone if you can make the costs of catching the AI out very painful, and I do suspect we will probably be able to do this (perhaps using trusted AI labor, to use Ryan Greenblatt and Buck’s terms).
While finm made a general comment in response to you, I want to specifically focus on the footnote, because I think it’s a central crux in why a lot of EAs are way less doomy than you.
Quote below:
We need at least 13 9s of safety for ASI, and the best current alignment techniques aren’t even getting 3 9s...
I think the 13 9s can be reduced to something requiring closer to 1-2 9s at the very least, and there are 2 reasons for this:
I think you drastically overestimate how many chances the AI gets at misalignment, because the trillions of executions will use far, far too little compute per single action to lead to a takeover, even in worlds where a software intelligence explosion is possible and large efficiencies in software are gainable. I expect there to be at most millions of such plausible actions, and the more likely scenario is that it’s closer to hundreds or thousands of chances to escape, undermining this tweet:
https://x.com/gcolbourn/status/1762224406955216897
2. If we manage to catch an AI doing bad stuff, then it’s much, much harder for the AI to escape, because there are a lot of techniques that can be applied to make the issue go away like proliferating the escape example.
More from Ryan Greenblatt here:
https://www.lesswrong.com/posts/i2nmBfCXnadeGmhzW/catching-ais-red-handed
I definitely think alignment needs to be reliable, but I do not think it needs to be so reliable that we cannot achieve it, or that doom is very likely and we can’t change the probabilities.
I’d certainly say it’s quite scary, but I do think there’s a reasonable hope of surviving and going on to thrive such that I think alignment invest is worth the money.
The main unremovable advantages of AIs over humans will probably be in the following 2 areas:
A serial speed advantage, from 50-1000x, with my median in the 100-500x speed advantage range, and more generally the ability to run slower or faster to do more work proportionally, albeit there are tradeoffs at either extreme of either running slow or fast.
The ability for compute/software improvements to directly convert into more researchers with essentially 0 serial time necessary, unlike basically all of reproduction (about the only cases where it even gets close are the days/hours doubling time of flies and some bacteria/viruses, but these are doing much simpler jobs and it’s uncertain whether you could add more compute/learning capability without slowing down their doubling time.)
This is the mechanism by which you can get way more AI researchers very fast, while human researchers don’t increase proportionally.
Humans probably do benefit assuming AI is useful enough to automate say AI research away, but these 2 unremovable limitations fundamentally prevent anything like an explosion in research, unlike AI research.