Ah sorry, I meant to use “volatility” to refer to something like “expected variance in one’s estimate of their future beliefs,” which is maybe what you refer to as “beliefs about volatility.”
Mau
(I’m understanding your comment as providing an example of a situation where volatility goes to 0 with additional evidence.)
I agree it’s clear that this happens in some situations—it’s less immediately obvious to me whether this happens in every possible situation.
(Feel free to let me know if I misread. I’m also not sure what you mean by “like this.”)
Thanks for this! I wonder how common or rare the third [edit: oops, meant “fourth”] type of graph is. I have an intuition that there’s something weird or off about having beliefs that act that way (or thinking you do), but I’m having trouble formalizing why. Some attempts:
If you think you’re at (say) the upper half of a persistent range of volatility, that means you expect to update downward as you learn more. So you should just make the update proactively, bringing your confidence toward medium (and narrowing volatility around medium confidence).
Special case (?): if you’re reading or hearing a debate and your opinion keeps wildly swinging back and forth, at some point you should probably think, “well, I guess I’m bad at evaluating these arguments; probably I should stop strongly updating based on whether I find them compelling.”
For many estimators, variance decreases as you get more independent samples.
At the (unrealistic) limit of deliberation, you’ve seen and considered everything, and then there’s no more room for volatility.
[Cross-posting from the comments]
I want to express two other intuitions that make me very skeptical of some alternatives to effective altruism.
As you write:
“By their fruits ye shall know them” holds for intellectual principles, not just people. If a set of principles throws off a lot of rotten fruit, it’s a sign of something wrong with the principles, a reductio ad absurdum.
I really like this thought. We might ask: what are the fruits of ineffective altruism? Admittedly, much good. But also the continued existence, at scale, of extreme poverty, factory farming, low-probability high-impact risks, and threats to future generations, long after many of these problems could have been decimated if there had been widespread will to act.
That’s a lot of rotten fruit.
In practice, by letting intuition lead, ineffective altruism systematically overlooks distant problems, leaving their silent victims to suffer and die in masses. So even if strong versions of EA are too much, EA in practice looks like desperately needed corrective movement. Or at least, we need something different from whatever allows such issues to fester, often with little opposition.
Lastly, one thing that sometimes feels a bit lost in discussions of effectiveness: “effective” is just half of it. Much of what resonates is the emphasis on altruism (which is far from unique, but also far from the norm)--on how, in a world with so much suffering, much of our lives should be oriented around helping others.
That’s helpful, thanks!
Thanks for writing this! I’d be curious to hear you expand on a few points.
If you’re holding the biosecurity portfolio in China when a crisis strikes, you might suddenly find yourself directly shaping the decisions of the president.
My naive guess would have been that, when a crisis strikes and senior national security officials put their eye on something, junior staff get mostly swept aside (i.e., might get listened to for information about what’s happening locally but wouldn’t have much room to set priorities or policy goals), because they’re (perhaps inappropriately) seen as insufficiently aware of broader political/geopolitical/intelligence considerations, as insufficiently aligned with the President’s priorities, and/or as rivals in an opportunity to show impressive leadership ability.
But it sounds like that’s not what happens—which of my above assumptions are mistaken, or what are they missing?
Third, a career in the Foreign Service offers you the best possible crash course in international affairs.
Why do you see it as a better crash course than working in U.S.-based international affairs roles (e.g., some U.S.-based roles in the State Department or National Security Council)? (As you mention, there are upsides from being on the front lines, but I could also imagine some U.S.-based roles offering more breadth and more insight into high-level foreign policymaking?)
Also, could you share a bit about how you see the network benefits of working in the Foreign Service comparing to the network benefits of U.S.-based policy roles? Is this a significant upside/downside?
I haven’t had time to read this carefully, but regarding funding, this list of funding opportunities compiled by Michael Aird might be useful. In case you haven’t yet come across these, there are some grants for self-study and independent research in this space, and there are a few grants-based mentorship programs—grants might be easier (in terms of overhead, and also better for the dissemination of your research) than charging readers for downloading a report.
(Oh and since you asked about writing clarity—personally, I usually find a summary at the top really helpful for following a piece of writing.)
Thanks for the thoughtful comment! Without commenting on the candidacy or election overall, a response (lightly edited for clarity) to your point about pandemics:
You emphasize pandemic expertise, but pandemic prevention priorities are arguably more relevant to who will make a difference. It might not take much expertise to think that now is a bad time for Congress to slash pandemic prevention funding, which happened despite some lobbying against it. And for harder decisions, a non-expert member of Congress can hire or consult with expert advisors, as is common practice. Instead of expertise being most important in this case, a perspective I’ve heard from people very familiar with Congress is that Congress members’ priorities are often more important, since members face tough negotiations and tradeoffs. So maybe what’s lacking in Congress isn’t pandemic-related expertise or lobbying, but willingness to make it a priority to keep something like covid from happening again.
I’m thinking of ~IDA with a non-adversarial (e.g. truthful) model, but could easily be mistaken. Curious what you’re expecting?
Fair, I’m also confused.
Sure! I’ll follow up.
No worries, I was probably doing something similar.
I don’t expect a team that designs advanced AI to also choose what it optimizes for (and I think this is more clear if we replace “what it optimizes for” with “how it’s deployed,” which seems reasonable pre-superintelligence)
Could you say a bit more about where you’re coming from here? (My initial intuition would be: assuming alignment ends up being based on some sort of (amplified) human feedback, doesn’t the AI developer get a lot of choice, through its control over who gives the human feedback and how feedback is aggregated (if there are multiple feedback-givers)?)
I instinctively doubt “partly-utilitarian” systems provide much of the expected value from acausal trade
Ah sorry, to clarify, what I had in mind was mostly that (fully) non-utilitarian systems, by trading with (fully) utilitarian systems, would provide much utilitarian value. (Although on second thought, that doesn’t clearly raise the value of partly utilitarian systems more than it raises the value of fully utilitarian systems. Maybe that’s what you were suggesting?)
On second thought, another potential wrinkle, re:the representation of utilitarianism in the AI’s values. Here are two ways that could be defined:
In some sort of moral parliament, what % of representatives are utilitarian?
How good are outcomes relative to what would be optimal by utilitarian lights?
Arguably the latter definition is the more morally relevant one. The former is related but maybe not linearly. (E.g., if the non-utilitarians in a parliament are all scope-insensitive, maybe utilitarianism just needs 5% representation to get > 50% of what it wants. If that’s the case, then it may make sense to be risk-averse with respect to expected representation, e.g., maximize the chances that some sort of compromise happens at all.)
Good points re: negotiations potentially going poorly for Alice (added: and the potential for good compromise), and also about how I may be underestimating the probability of human values converging.
I still think scenario (1) is not so likely, because:
Any advanced AI will initially be created by a team, in which there will be pressures for at least intra-team compromise (and very possibly also external pressures).
More speculatively: maybe acausal trade will enable & incentivize compromise even if each world is unipolar (assuming there isn’t much convergence across worlds).
Thanks! From the other comment thread, now I’m less confident in the moral parliament per se being a great framework, but I’d guess something along those lines should work out.
Ah good point, I was thinking of a moral parliament where representation is based on value pluralism rather than moral uncertainty, but I think you’re still right that a moral parliament approach (as originally conceived) wouldn’t produce the outcome I had in mind.
Still, is it that hard for some approach to produce a compromise? (Is the worry that [creating a powerful optimizer that uses significant resources to optimize for different things] is technically hard even if alignment has been solved? Edited to add: My intuition is this isn’t hard conditional on alignment being solved, since e.g. then you could just align the AI to an adequately pluralistic human or set of humans, or maybe directly reward this sort of pluralism in training, but I haven’t thought about it much.)
(A lot of my optimism comes from my assumption that ~all (popularity-weighted) value systems which compete with utilitarianism are at least somewhat scope-insensitive, which makes them easy to mostly satisfy with a small fraction of available resources. Are there any prominent value systems other than utilitarianism that are fully scope-sensitive?)
Why would a reasonable candidate for a ‘partially-utilitarian AI’ lead to an outcome that’s ~worthless by utilitarian lights? I disagree with that premise—that sounds like a ~non-utilitarian AI to me, not a (nontrivially) partly utilitarian AI.
(Maybe I could have put more emphasis on what kind of AI I have in mind. As my original comment mentioned, I’m talking about “a sufficiently strong version of ‘partly-utilitarian.’” So an AI that’s just slightly utilitarian wouldn’t count. More concretely, I have in mind something like: an agent that operates via a moral parliament in which utilitarianism has > 10% of representation.)
[Added] See also my reply to Zach, in which I write:
What about the following counterexample? Suppose a powerful agent optimizes for a mixed objective, which leads to it optimizing ~half of the accessible universe for utilitarianism, the other ~half for some other scope-sensitive value, and a few planets for modal scope-insensitive human values. Then, even at high levels of capability, this universe will be ~half as good by utilitarian lights as a universe that’s fully optimized for utility, even though the optimizer wasn’t just optimizing for utility. (If you doubt whether there exist utility functions that would lead to roughly these outcomes, I’m happy to make arguments for that assumption.)
Thanks for pushing back!
I think “partly-utilitarian AI,” in the standard sense of the phrase, would produce orders of magnitude less utility than a system optimizing for utility, just because it’s likely that optimization for [anything else] probably comes apart from optimization for utility at high levels of capability.
What about the following counterexample? Suppose a powerful agent optimizes for a mixed objective, which leads to it optimizing ~half of the accessible universe for utilitarianism, the other ~half for some other scope-sensitive value, and a few planets for modal scope-insensitive human values. Then, even at high levels of capability, this universe will be ~half as good by utilitarian lights as a universe that’s fully optimized for utility, even though the optimizer wasn’t just optimizing for utility. (If you doubt whether there exist utility functions that would lead to roughly these outcomes, I’m happy to make arguments for that assumption.)
(I also don’t yet find your linked argument convincing. It argues that “If a future does not involve optimizing for the good, value is almost certainly near-zero.” I agree, but imo it’s quite a leap to conclude from that that [If a future does not only involve optimizing for the good, value is almost certainly near-zero.])
If we stipulate that “partly-utilitarian AI” makes a decent fraction of the utility of a utilitarian AI, I think such a system is extremely unlikely to exist.
This seems like a possible crux, but I don’t fully understand what you’re saying here. Could you rephrase?
[Added] Pasting this from my reply to Josh:
(Maybe I could have put more emphasis on what kind of AI I have in mind. As my original comment mentioned, I’m talking about “a sufficiently strong version of ‘partly-utilitarian.’” So an AI that’s just slightly utilitarian wouldn’t count. More concretely, I have in mind something like: an agent that operates via a moral parliament in which utilitarianism has > 10% of representation.)
Thanks for this! I might tweak claim 1 to the following: The probability that this AI has partly utilitarian values dominates EV calculations. (In a soft sense of “dominates”—i.e., it’s the largest single factor, but not the approximately only factor.)
Argument for this version of the claim over the original one:
From a utilitarian view, partly-utilitarian AI would be “just” a few times less valuable than fully utilitarian AI (for a sufficiently strong version of “partly-utilitarian”).
There’s lots of room for moral trade / win-win compromises between different value systems. For example, common scope-insensitive values and utilitarian values can both get most of what they want. So partly-utilitarian AI could easily be ~similarly valuable (say, half as valuable) as fully utilitarian AI.
And partly-utilitarian AI is more than a few times more likely than fully utilitarian AI to come about.
Most AI developers would be much more likely to make their AI partly utilitarian than fully utilitarian, since this pluralism may better reflect their values and better accommodate (internal and external) political pressures.
Efforts to make pluralistic AI mitigate “race to the bottom” dynamics, by making “losing” much less bad for actors who don’t develop advanced AI first. So pluralistic efforts are significantly more likely to succeed at making aligned AI at all.
Since it’s at worst a factor of a few less valuable and it’s more than a few times more likely, the EV of partly utilitarian AI is higher than that of fully utilitarian AI.
[Edit: following the below discussion, I’m now less confident in the second premise above, so now I’m fairly unsure which of P(AI is fully utilitarian) and P(AI is partly utilitarian) is more important, and I suspect neither is > 10x more important than the other.]
Thanks for posting this!
[A]dvanced nanotechnology might allow states to manufacture weapons that pose an (even) greater probability of existential catastrophe than the highly risky weapons that are currently accessible (such as currently accessible nuclear and biological weapons). I’d place a “grey goo” scenario, where self-replicating nanoscale machines turn the entire world into copies of themselves, into this category.
I have a few uninformed doubts about how likely this is (although you never claimed it was especially likely):
There are already millions of different types of self-replicating nanoscale machines out there (biological organisms), and none of them seem to have gotten close to turning the entire world into copies of themselves. So it seems pretty hard to make “gray goo,” even with APM.
These machines may be more capable than biological ones if they were very intelligent, but this scenario seems to already be covered by the “APM accelerates TAI” scenario.
On the other hand, maybe APM is much more dangerous than evolution because operators could do more than just local optimization.
Niche point: How much is that argument undermined by anthropic considerations? I suspect not very, because:
I’m pointing out that we don’t see near-catastrophe, rather than that we don’t see total catastrophe.
Our actions arguably matter much more if we haven’t gotten lucky.
As armchair ecology, there seem to be non-luck reasons why there hasn’t been biological “gray goo” (“green goo”?) (although, admittedly, manufactured machines might be able to get around these):
There’s a tradeoff between versatility and specialization—it’s hard to be most successful in all niches.
There’s competition, e.g., if a population is very large, predators multiply.
Organisms seem unable to have both explosive population growth and fast motion, since organisms that rely on eating other organisms for energy run out of food if their population explodes, while organisms that rely on the sun for energy can’t move quickly.
The offense-defense balance might not be so bad: as suggested by the point about biological predators, APM might create strong defensive capabilities, e.g., the capability to quickly identify dangerously replicating machines and then create targeted/specialized countermeasures.
Being really good at replicating within human bodies (naively) seems much easier than being good at replicating in any environment. But the former worry is ~bioweapons, which are already covered by another risk scenario you mention.
Developing or using “gray goo” might not be very strategically appealing.
Assuming it could be made, it’d be very self-destructive (and/or maybe could be retaliated against), so using it would be a terrible idea, and it’d be hard to make credible threats with it. In other words, it might not be a type-2a vulnerability (“safe first strike”) after all.
It might still be “safe first strike” vulnerability if there were great narrow-scope countermeasures that just one side could develop in advance and no secure second-strike capabilities, in which case these weapons would pose more local risk but less humanity-wide risk. (Or maybe more humanity-wide risk if first strike were just somewhat safe?)
Thanks for this post!
We need funding pots specifically for the non-technical elements of AI Safety.
I agree there’s lots of room for non-technical work to help improve the impacts of AI. Still, I’m not sure funder interest and experience in non-technical AI work is what’s missing. As some examples of how this interest and experience is already present:
OpenPhil has a program officer who specializes in AI governance and policy, and OpenPhil has given out over $80 million in AI governance grants.
The current fund chair of the Long-Term Future Fund has a background in non-technical AI research.
FTX Future Fund staff have backgrounds in philosophy, economics, and law.
(So why isn’t more AI governance work happening? In case you haven’t seen it yet, you might find discussion of bottlenecks in this post interesting.)
It looks like the other comments have already offered a good amount of relevant reading material, but in case you’re up for some more, I think the ideas expressed in this paper (video introduction here) are a big part of why some people think that we don’t know how to train models to have any (somewhat complex) objectives that we want them to have, which is a response to points (1), partly (3), and also (2) (if we interpret the quote in (2) as described in Rob’s comment).
This report (especially pp. 1-8) might also make the potential difficulty of penalizing deception more intuitive.