I understand that, but my point is that I thought these were AI safety companies, and would therefore prioritize AI safety above all else. If they don’t, why do so many people still treat them as if they did?
Technoliberal
I don’t know, maybe eventually it could help, but with these “cutting edge” coding models doesn’t it seem irresponsible? what if the safeguards don’t work? shouldn’t you release the model publicly only after you’ve exhaustively patched every single possible jailbreak? (even then I would argue it’s still better to not release it, since billions of people means hundreds of thousands of bad actors, and again, as an AI safety company with “cutting edge” models I wouldn’t take any risks)
So you’re saying it’s all BS? You’re saying that Anthropic and OpenAI ultimately don’t care about AI alignment? It sure seems like it for me, but browsing this website I have the feeling most people disagree with you
I had a question. Why do all the AI safety companies seem to do the opposite of AI safety? Anthropic keeps publicly releasing models (which means they can be accessed by billions of people), same for OpenAI, and while these models are unlikely to cause major problems, if you’re releasing a product that is going to be used by billions of people you should make sure the product is around 99.9999% failure proof. Anthropic themselves have said “AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities” when referring to Mythos. Now sure, Fable is claimed to be “safe for general use”, and maybe it is, but why take the risk? Especially after only around 2-3 months of safety testing? I would want a company that claims to be for AI safety to always err on the side of caution, but this frankly seems quite reckless.
If I had to be more specific I would mean “reducing the probability of all humanity (and only humanity) dying in a few short days/weeks from 50% to 10%” by “significantly reduce existential risk”.
Also, I disagree with your methods. X risks aren’t especially bad because of all the utility lost (and “negative utility” created), they’re bad because after they happen there’s never any utility again. Unless apes re-evolve into humans and reestablish all of civilization all over again, but we’re getting too hypothetical. What’s 100, or even 1000 years of death and suffering compared to 10000 of utopia? If stalling/slowing down technological progress for 1000 years made the P(Doom) go from 50% to 1%, I would definitely take it. Unless of course you think utopia is gonna be some short lived thing, but I seriously doubt that.
That’s fair, but I imagine X risks and S risks are very heavily correlated. Especially in regards to “speed of progress”, accelerationism will, in my view, obviously increase X risks (safety research takes time, the more time you have, the more time for research you have, the more research is done, therefore reducing risk) but also increase S risks (this is more personal opinion, but I don’t think the current leaders of AI innovation have stuff like animal welfare in mind. if we just keep chugging along, the first ASI might not care about animals at all).
I would imagine “significantly reducing” as going from 50% to 10%, but I should have been more clear
Wrote a post about it, but the TL;DR is that extintion is THE worst case scenario. It is the end of all utility and completely irreversible, whereas progress can always be made at a later date.
Technoliberal’s Quick takes
I wanted to make this poll to see how the community views the speed/x-risk tradeoff. I’m personally 99% x-risk and 1% speed, so I would hard agree. My prediction is most people will agree, maybe a 70⁄30 split, but I’m curious to see.
You can’t solve every possible jailbreak, but you should solve every jailbreak humanly possible if you’re to release an AI that is claimed to be almost superhuman at cyber skills. I think current models are mostly bad for society, but I also think there’s a possibility that current models could achieve AGI. Maybe it’s only a 4% chance, but again, why take the risk? what is there to gain (other than money)?
I don’t understand how publicly releasing these models will help in researching AI safety (and when I say “AI safety” I mostly mean AGI alignment). I thought the whole point of an aligned AGI is that you don’t have to tell it to do stuff correctly, it already knows what’s correct, even more than you, so I don’t see how letting anyone use the models will help in aligning them. I’m not an AI expert or anything, but to me it seems aligning AGI is less of a “we don’t have enough data” problem and more of a “we don’t even know where to start” problem.