AI strategy & governance. ailabwatch.org.
Zach Stein-Perlman
Given 3, a key question is what can we do to increase P(optimonium | ¬ AI doom)?
For example:
Averting AI-enabled human-power-grabs might increase P(optimonium | ¬ AI doom)
Averting premature lock-in and ensuring the von Neumann probes are launched deliberately would increase P(optimonium | ¬ AI doom), but what can we do about that?
Some people seem to think that having norms of being nice to LLMs is valuable for increasing P(optimonium | ¬ AI doom), but I’m skeptical and I haven’t seen this written up.
(More precisely we should talk about expected fraction of resources that are optimonium rather than probability of optimonium but probability might be a fine approximation.)
One key question for the debate is: what can we do / what are the best ways to “increas[e] the value of futures where we survive”?
My guess is it’s better to spend most effort on identifying possible best ways to “increas[e] the value of futures where we survive” and arguing about how valuable they are, rather than arguing about “reducing the chance of our extinction [vs] increasing the value of futures where we survive” in the abstract.
I want to make salient these propositions, which I consider very likely:
In expectation, almost all of the resources our successors will use/affect comes via von Neumann probes (or maybe acausal trade or affecting the simulators).
If 1, the key question for evaluating a possible future from scope-sensitive perspectives is will the von Neumann probes be launched, and what is it that they will tile the universe with? (modulo acausal trade and simulation stuff)
[controversial] The best possible thing to tile the universe with (maybe call it “optimonium”) is wildly better than what you get if you not really optimizing for goodness,[1] so given 2, the key question is will the von Neumann probes tile the universe with ~the best possible thing (or ~the worst possible thing) or something else?
Considerations about just our solar system or value realized this century miss the point, by my lights. (Even if you reject 3.)
- ^
Call computronium optimized to produce maximum pleasure per unit of energy “hedonium,” and that optimized to produce maximum pain per unit of energy “dolorium,” as in “hedonistic” and “dolorous.” Civilizations that colonized the galaxy and expended a nontrivial portion of their resources on the production of hedonium or dolorium would have immense impact on the hedonistic utilitarian calculus. Human and other animal life on Earth (or any terraformed planets) would be negligible in the calculation of the total. Even computronium optimized for other tasks would seem to be orders of magnitude less important.
So hedonistic utilitarians could approximate the net pleasure generated in our galaxy by colonization as the expected production of hedonium, multiplied by the “hedons per joule” or “hedons per computation” of hedonium (call this H), minus the expected production of dolorium, multiplied by “dolors per joule” or “dolors per computation” (call this D).
Meta: Frontier AI Framework
Agree. Nice (truth-tracking) comments seem high-leverage for boosting morale + reducing excessive aversion to forum-posting + countering the phenomenon where commenters are more critical than the average reader (which warps what authors think about their readers).
o3
This is circular. The principle is only compromised if (OP believes) the change decreases EV — but obviously OP doesn’t believe that; OP is acting in accordance with the do-what-you-believe-maximizes-EV-after-accounting-for-second-order-effects principle.
Maybe you think people should put zero weight on avoiding looking weird/slimy (beyond what you actually are) to low-context observers (e.g. college students learning about the EA club). You haven’t argued that here. (And if that’s true then OP made a normal mistake; it’s not compromising principles.)
My impression is that CLTR mostly adds value via its private AI policy work. I agree its AI publications seem not super impressive but maybe that’s OK.
Probably same for The Future Society and some others.
My top candidates:
AI Safety and Governance Fund
PauseAI US
Center for AI Policy
Palisade
MIRI
A classification of every other org I reviewed:
Good but not funding-constrained: Center for AI Safety, Future of Life Institute
Would fund if I had more money: Control AI, Existential Risk Observatory, Lightcone Infrastructure, PauseAI Global, Sentinel
Would fund if I had a lot more money, but might fund orgs in other cause areas first: AI Policy Institute, CEEALAR, Center for Human-Compatible AI, Manifund
Might fund if I had a lot more money: AI Standards Lab, Centre for the Governance of AI, Centre for Long-Term Policy, CivAI, Institute for AI Policy and Strategy, METR, Simon Institute for Longterm Governance
Would not fund: Center for Long-Term Resilience, Center for Security and Emerging Technology, Future Society, Horizon Institute for Public Service, Stop AI
Your ranking is negatively correlated with my (largely deference-based) beliefs (and I think weakly negatively correlated with my inside view). Your analysis identifies a few issues with orgs-I-support that seem likely true and important if true. So this post will cause me to develop more of an inside view or at least prompt the-people-I-defer-to with some points you raise. Thanks for writing this post. [This is absolutely not an endorsement of the post’s conclusions. I have lots of disagreements. I’m just saying parts of it feel quite helpful.]
Here’s my longtermist, AI focused list. I really haven’t done my research, e.g. I read zero marginal funding posts. MATS is probably the most popular of these, so this is basically a vote for MATS.
I would have ranked The Midas Project around 5 but it wasn’t an option.
The current state of RSPs
IAPS: Mapping Technical Safety Research at AI Companies
What AI companies should do: Some rough ideas
Anthropic rewrote its RSP
Model evals for dangerous capabilities
OpenAI o1
“Improve US AI policy 5 percentage points” was defined as
Instead of buying think tanks, this option lets you improve AI policy directly. The distribution of possible US AI policies will go from being centered on the 50th-percentile-good outcome to being centered on the 55th-percentile-good outcome, as per your personal definition of good outcomes. The variance will stay the same.
(This is still poorly defined.)
Demis Hassabis — Google DeepMind: The Podcast
A few DC and EU people tell me that in private, Anthropic (and others) are more unequivocally antiregulation than their public statements would suggest.
I’ve tried to get this on the record—person X says that Anthropic said Y at meeting Z, or just Y and Z—but my sources have declined.
I think for many people, positive comments would be much less meaningful if they were rewarded/quantified, because you would doubt that they’re genuine. (Especially if you excessively feel like an imposter and easily seize onto reasons to dismiss praise.)
I disagree with your recommendations despite agreeing that positive comments are undersupplied.