AI strategy & governance. ailabwatch.org.
Zach Stein-Perlman
I agree this is possible, and I think a decent fraction of the value of “AI welfare” work comes from stuff like this.
Those humans decide to dictate some or all of what the future looks like, and lots of AIs end up suffering in this future because their welfare isn’t considered by the decision makers.
This would be very weird: it requires that either the value-setters are very rushed or that they have lots of time to consult with superintelligent advisors but still make the wrong choice. Both paths seem unlikely.
- Jul 6, 2024, 3:48 AM; 11 points) 's comment on Digital Minds Takeoff Scenarios by (
Among your friends, I agree; among EA Forum users, I disagree.
Caveats:
I endorse the argument we should figure out how to use LLM-based systems without accidentally torturing them because they’re more likely to take catastrophic actions if we’re torturing them.
I haven’t tried to understand the argument we should try to pay AIs to [not betray us / tell on traitors / etc.] and working on AI-welfare stuff would help us offer AIs payment better; there might be something there.
I don’t understand the decision theory mumble mumble argument; there might be something there.
(Other than that, it seems hard to tell a story about how “AI welfare” research/interventions now could substantially improve the value of the long-term future.)
(My impression is these arguments are important to very few AI-welfare-prioritizers / most AI-welfare-prioritizers have the wrong reasons.)
- Jul 2, 2024, 1:41 AM; 22 points) 's comment on Zach Stein-Perlman’s Quick takes by (
My position on “AI welfare”
If we achieve existential security and launch the von Neumann probes successfully, we will be able to do >>10^80 operations in expectation. We could tile the universe with hedonium or do acausal trade or something and it’s worth >>10^60 happy human lives in expectation. Digital minds are super important.
Short-term AI suffering will be small-scale—less than 10^40 FLOP and far from optimized for suffering, even if suffering is incidental—and worth <<10^20 happy human lives (very likely <10^10).
10^20 isn’t even a feather in the scales when 10^60 is at stake.
“Lock-in” [edit: of “AI welfare” trends on Earth] is very unlikely; potential causes of short-term AI suffering (like training and deploying LLMs) are very different from potential causes of astronomical-scale digital suffering (like tiling the universe with dolorium, the arrangement of matter optimized for suffering). And digital-mind-welfare research doesn’t need to happen yet; there will be plenty of subjective time for it before the von Neumann probes’ goals are set.
Therefore, to a first approximation, we should not trade off existential security for short-term AI welfare, and normal AI safety work is the best way to promote long-term digital-mind-welfare.
[Edit: the questionable part of this is #4.]
- Jul 8, 2024, 11:26 AM; 9 points) 's comment on Making AI Welfare an EA priority requires justifications that have not been given by (
- Jul 2, 2024, 11:48 AM; 3 points) 's comment on How do AI welfare and AI safety interact? by (
I am surprised that you don’t understand Eliezer’s comments in this thread. I claim you’d do better to donate $X to PauseAI now than lock up $2X which you will never see again (plus lock up more for overcollateralization) in order to get $X to PauseAI now.
For anyone who wants to bet on doom:
I claim it can’t possibly be good for you
Unless you plan to spend all of your money before you would owe money back
People seem to think what matters is ∫bankroll when what actually matters is ∫consumption?
Or unless you’re betting on high rates of returns to capital, not really on doom
Good news: you can probably borrow cheaply. E.g. if you have $2X in investments, you can sell them, invest $X at 2x leverage, and effectively borrow the other $X.
No beliefs make this rational for Greg.
Greg made a bad bet. He could do strictly better, by his lights, by borrowing 10K, giving it to PauseAI, and paying back ~15K (10K + high interest) in 4 years. (Or he could just donate 10K to PauseAI. If he’s unable to do this, Vasco should worry about Greg’s liquidity in 4 years.) Or he could have gotten a better deal by betting with someone else; if there was a market for this bet, I claim the market price would be substantially more favorable to Greg than paying back 200% (plus inflation) over <4 years.
[Edit: the market for this bet is, like, the market for 4-year personal loans.]
Yes, I’ve previously made some folks at Anthropic aware of these concerns, e.g. associated with this post.
In response to this post, Zac Hatfield-Dodds told me he expects Anthropic will publish more information about its governance in the future.
I claim that public information is very consistent with the investors hold an axe over the Trust; maybe the Trust will cause the Board to be slightly better or the investors will abrogate the Trust or the Trustees will loudly resign at some point; regardless, the Trust is very subordinate to the investors and won’t be able to do much.
And if so, I think it’s reasonable to describe the Trust as “maybe powerless.”
Maybe. Note that they sometimes brag about how independent the Trust is and how some investors dislike it, e.g. Dario:
Every traditional investor who invests in Anthropic looks at this. Some of them are just like, whatever, you run your company how you want. Some of them are like, oh my god, this body of random people could move Anthropic in a direction that’s totally contrary to shareholder value.
And I’ve never heard someone from Anthropic suggest this.
I suspect the informal agreement was nothing more than the UK AI safety summit “safety testing” session, which is devoid of specific commitments.
I agree such commitments are worth noticing and I hope OpenAI and other labs make such commitments in the future. But this commitment is not huge: it’s just “20% of the compute we’ve secured to date” (in July 2023), to be used “over the next four years.” It’s unclear how much compute this is, and with compute use increasing exponentially it may be quite little in 2027. Possibly you have private information but based on public information the minimum consistent with the commitment is quite little.
It would be great if OpenAI or others committed 20% of their compute to safety! Even 5% would be nice.
In November, leading AI labs committed to sharing their models before deployment to be tested by the UK AI Safety Institute.
I suspect Politico hallucinated this / there was a game-of-telephone phenomenon. I haven’t seen a good source on this commitment. (But I also haven’t heard people at labs say “there was no such commitment.”)
The original goal involved getting attention. Weeks ago, I realized I was not on track to get attention. I launched without a sharp object-level goal but largely to get feedback to figure out whether to continue working on this project and what goals it should have.
I share this impression. Unfortunately it’s hard to capture the quality of labs’ security with objective criteria based on public information. (I have disclaimers about this in 4-6 different places, including the homepage.) I’m extremely interested in suggestions for criteria that would capture the ways Google’s security is good.
Not necessarily. But:
There are opportunity costs and other tradeoffs involved in making the project better along public-attention dimensions.
The current version is bad at getting public attention; improving it and making it get 1000x public attention would still leave it with little; likely it’s better to wait for a different project that’s better positioned and more focused on getting public attention. And as I said, I expect such a project to appear soon.
Yep. But in addition to being simpler, the version of this project optimized for getting attention has other differences:
Criteria are better justified, more widely agreeable, and less focused on x-risk
It’s done—or at least endorsed and promoted—by a credible org
The scoring is done by legible experts and ideally according to a specific process
Even if I could do this, it would be effortful and costly and imperfect and there would be tradeoffs. I expect someone else will soon fill this niche pretty well.
Yep, that’s related to my “Give some third parties access to models to do model evals for dangerous capabilities” criterion. See here and here.
As I discuss here, it seems DeepMind shared super limited access with UKAISI (only access to a system with safety training + safety filters), so don’t give them too much credit.
I suspect Politico is wrong and the labs never committed to give early access to UKAISI. (I know you didn’t assert that they committed that.)
Briefly + roughly (not precise):
At some point we’ll send out lightspeed probes to tile the universe with some flavor of computronium. The key question (for scope-sensitive altruists) is what that computronium will compute. Will an unwise agent or incoherent egregore answer that question thoughtlessly? I intuit no.
I can’t easily make this intuition legible. (So I likely won’t reply to messages about this.)