AI strategy & governance. ailabwatch.org.
Zach Stein-Perlman
Demis Hassabis — Google DeepMind: The Podcast
I believe that Anthropic’s policy advocacy is (1) bad and (2) worse in private than in public.
But Dario and Jack Clark do publicly oppose strong regulation. See https://ailabwatch.org/resources/company-advocacy/#dario-on-in-good-company-podcast and https://ailabwatch.org/resources/company-advocacy/#jack-clark. So this letter isn’t surprising or a new betrayal — the issue is the preexisting antiregulation position, insofar as it’s unreasonable.
Thanks.
I notice they have few publications.
Setting aside whether Neil’s work is useful, presumably almost all of the grant is for his lab. I failed to find info on his lab.
...huh, I usually disagree with posts like this, but I’m quite surprised by the 2022 and 2023 grants.
Actually, this is a poor description of my reaction to this post. Oops. I should have said:
Digital mind takeoff is maybe-plausibly crucial to how the long-term future goes. But this post seems to focus on short-term stuff such that the considerations it discusses miss the point (according to my normative and empirical beliefs). Like, the y-axis in the graphs is what matters short-term (and it’s at most weakly associated with what matters long-term: affecting the von Neumann probe values or similar). And the post is just generally concerned with short-term stuff, e.g. being particularly concerned about “High Maximum Altitude Scenarios”: aggregate welfare capacity “at least that of 100 billion humans” “within 50 years of launch.” Even ignoring these particular numbers, the post is ultimately concerned with stuff that’s a rounding error relative to the cosmic endowment.
I’m much more excited about “AI welfare” work that’s about what happens with the cosmic endowment, or at least (1) about stuff directly relevant to that (like the long reflection) or (2) connected to it via explicit heuristics like the cosmic endowment will be used better in expectation if “AI welfare” is more salient when we’re reflecting or choosing values or whatever.
The considerations in this post (and most “AI welfare” posts) are not directly important to digital mind value stuff, I think, if digital mind value stuff is dominated by possible superbeneficiaries created by von Neumann probes in the long-term future. (Note: this is a mix of normative and empirical claims.)
(Minor point: in an unstable multipolar world, it’s not clear how things get locked in, and for the von Neumann probes in particular, note that if you can launch slightly faster probes a few years later, you can beat rushed-out probes.)
When telling stories like your first paragraph, I wish people either said “almost all of the galaxies we reach are tiled with some flavor of computronium and here’s how AI welfare work affected the flavor” or “it is not the case that almost all of the galaxies we reach are tiled with some flavor of computronium and here’s why.”
The universe will very likely be tiled with some flavor of computronium is a crucial consideration, I think.
Briefly + roughly (not precise):
At some point we’ll send out lightspeed probes to tile the universe with some flavor of computronium. The key question (for scope-sensitive altruists) is what that computronium will compute. Will an unwise agent or incoherent egregore answer that question thoughtlessly? I intuit no.
I can’t easily make this intuition legible. (So I likely won’t reply to messages about this.)
I agree this is possible, and I think a decent fraction of the value of “AI welfare” work comes from stuff like this.
Those humans decide to dictate some or all of what the future looks like, and lots of AIs end up suffering in this future because their welfare isn’t considered by the decision makers.
This would be very weird: it requires that either the value-setters are very rushed or that they have lots of time to consult with superintelligent advisors but still make the wrong choice. Both paths seem unlikely.
- Jul 6, 2024, 3:48 AM; 11 points) 's comment on Digital Minds Takeoff Scenarios by (
Among your friends, I agree; among EA Forum users, I disagree.
Caveats:
I endorse the argument we should figure out how to use LLM-based systems without accidentally torturing them because they’re more likely to take catastrophic actions if we’re torturing them.
I haven’t tried to understand the argument we should try to pay AIs to [not betray us / tell on traitors / etc.] and working on AI-welfare stuff would help us offer AIs payment better; there might be something there.
I don’t understand the decision theory mumble mumble argument; there might be something there.
(Other than that, it seems hard to tell a story about how “AI welfare” research/interventions now could substantially improve the value of the long-term future.)
(My impression is these arguments are important to very few AI-welfare-prioritizers / most AI-welfare-prioritizers have the wrong reasons.)
- Jul 2, 2024, 1:41 AM; 22 points) 's comment on Zach Stein-Perlman’s Quick takes by (
My position on “AI welfare”
If we achieve existential security and launch the von Neumann probes successfully, we will be able to do >>10^80 operations in expectation. We could tile the universe with hedonium or do acausal trade or something and it’s worth >>10^60 happy human lives in expectation. Digital minds are super important.
Short-term AI suffering will be small-scale—less than 10^40 FLOP and far from optimized for suffering, even if suffering is incidental—and worth <<10^20 happy human lives (very likely <10^10).
10^20 isn’t even a feather in the scales when 10^60 is at stake.
“Lock-in” [edit: of “AI welfare” trends on Earth] is very unlikely; potential causes of short-term AI suffering (like training and deploying LLMs) are very different from potential causes of astronomical-scale digital suffering (like tiling the universe with dolorium, the arrangement of matter optimized for suffering). And digital-mind-welfare research doesn’t need to happen yet; there will be plenty of subjective time for it before the von Neumann probes’ goals are set.
Therefore, to a first approximation, we should not trade off existential security for short-term AI welfare, and normal AI safety work is the best way to promote long-term digital-mind-welfare.
[Edit: the questionable part of this is #4.]
- Jul 8, 2024, 11:26 AM; 9 points) 's comment on Making AI Welfare an EA priority requires justifications that have not been given by (
- Jul 2, 2024, 11:48 AM; 3 points) 's comment on How do AI welfare and AI safety interact? by (
Claude 3.5 Sonnet
I am surprised that you don’t understand Eliezer’s comments in this thread. I claim you’d do better to donate $X to PauseAI now than lock up $2X which you will never see again (plus lock up more for overcollateralization) in order to get $X to PauseAI now.
For anyone who wants to bet on doom:
I claim it can’t possibly be good for you
Unless you plan to spend all of your money before you would owe money back
People seem to think what matters is ∫bankroll when what actually matters is ∫consumption?
Or unless you’re betting on high rates of returns to capital, not really on doom
Good news: you can probably borrow cheaply. E.g. if you have $2X in investments, you can sell them, invest $X at 2x leverage, and effectively borrow the other $X.
No beliefs make this rational for Greg.
Greg made a bad bet. He could do strictly better, by his lights, by borrowing 10K, giving it to PauseAI, and paying back ~15K (10K + high interest) in 4 years. (Or he could just donate 10K to PauseAI. If he’s unable to do this, Vasco should worry about Greg’s liquidity in 4 years.) Or he could have gotten a better deal by betting with someone else; if there was a market for this bet, I claim the market price would be substantially more favorable to Greg than paying back 200% (plus inflation) over <4 years.
[Edit: the market for this bet is, like, the market for 4-year personal loans.]
A few DC and EU people tell me that in private, Anthropic (and others) are more unequivocally antiregulation than their public statements would suggest.
I’ve tried to get this on the record—person X says that Anthropic said Y at meeting Z, or just Y and Z—but my sources have declined.