Good points! I’d be curious to hear what Lewis thought of those two HSA grants and why Open Phil hasn’t done more since then.
Brian_Tomasik
Thanks! That’s encouraging to hear (although it would be better for animals if the charities did fill their funding gaps).
There could still be some funging if a smaller remaining funding gap discourages other donors, such as the Animal Welfare Fund, from giving more, but at least the effect is probably less drastic than if the org hits its target RFMF fully.
Great point! Michael said something similar:
the funders may have specific total funding targets below filling their near term RFMF, and the closer to those targets, the less they give.
For example, the funders might aim for a marginal utility of 6 utilons per dollar, so using your example numbers, they would only want to fund the org up to $800K. And if someone else is already giving $100K, they would only want to give $700K.
My guess would be that in practice, funders probably aren’t thinking too much about a curve of marginal utility per dollar but are more thinking something like: Is this org working on an important problem? Do they need more money to continue/expand this particular work? What percent of that funding gap do we want to fill?
But this is just my speculation about how I imagine people would make grants when they have lots of charities to review and lots of money to disburse, with limited time to investigate each one in depth. If they have time to review more detailed plans about what each incremental chunk of money would be spent on, they might get closer to the marginal-utility approach you mention.
Thanks! I may have several questions later. :) For now, I was curious about your thoughts on the funging framework in general. Do you think it is the case that if one EA gives more to you, then other EAs like the Animal Welfare Fund will tend to give at least somewhat less? And how much less?
I sort of wonder if that funging model is wrong, especially in the case of rapidly growing charities. For example, suppose the Animal Welfare Fund in year 1 thinks you have enough money, so they don’t grant any more. But another donor wants you to spend $25K (or whatever it costs) to hire an extra person. That other donor gives you a $25K donation in year 1, and you make the additional hire. Suppose that donor stops giving to you in year 2. Now you have an extra hire needing $25K/year—a funding gap that no one is filling. So in a sense, you now have that much more additional room for funding. And as long as that hire remains employed with you, you have that additional funding gap every year. The one-time donation of $25K resulted in an ongoing additional $25K of room for funding in subsequent years. This would be the opposite of the funging model assumed in my post: an individual donor’s gift led to more funding from the Animal Welfare Fund over the long run, not less.
Maybe if the Animal Welfare Fund thought you had enough money in year 1, then they would continue to stick by that assessment in year 2 and not want to fund the additional hire. But I doubt people’s opinions about exactly how much money can be spent well are that precise.
The funging model seems most applicable to a static organization that has a constant number of staff members and a constant cost of programs. If it gets more money than it needs in one year, it will consume less in the following year.
Thanks! That’s good to know. When I looked through the Animal Welfare Fund grantees recently, Healthier Hens was one that I picked out as a possible candidate for donating to. I’m more concerned about extreme than chronic pain, but I guess HH says that bone fractures cause some intense pain as well as chronic (and of course I care about chronic pain somewhat too).
Is there info about why grantors didn’t give more funding to HH? I wonder if there’s something they know that I don’t. (In general, that’s a main downside of trying to donate off the beaten path.)
Yeah, more research on questions like whether beef reduces net suffering would be extremely useful, both for my personal donation decisions and more importantly for potentially shifting the priorities of the animal movement overall. My worries about funging here ultimately derive from my thinking that the movement is missing some crucial considerations (or else just has different values from me), and the best way to fix that would be for more people to highlight those considerations.
I’m unsure how more research on the welfare of populous wild animals would shift people’s views. I guess relative to mainstream animal-rights ideology that says more wildlife is good, there’s only really room to move in a more pessimistic direction. But for people already thinking about wild-animal welfare, it’s less clear. To me it’s obvious that I would be horrified to be born as a random wild animal, but I’m often surprised by how little some classical utilitarians care about suffering relative to happiness.
This is one reason I’m more inclined these days to promote suffering-focused philosophy rather than generic antispeciesism. However, there aren’t that many ways to donate to suffering-focused philosophy at the moment, and depending on who is funded, that approach has its own possible downside risks. For example, I’ve considered whether the antinatalism movement could benefit from funding (because it’s people-rich and money-poor), but a lot of antinatalists are abrasive and may give suffering-focused ethics a bad name. Picking the right antinatalists (and other advocates of suffering-focused ethics) to fund would be a lot of work (but might be worth it). Also, this philosophy work doesn’t scratch my itch to have some amount of concrete suffering-reduction impact in the near term.
That’s a useful post! It’s an interesting idea. There could be some funging between Open Phil and other EA animal donors—like, if Open Phil is handling the welfare reforms, then other donors don’t have to and can donate more to non-welfare stuff. OTOH, the fact that a high-status funder like Open Phil does welfare reforms makes it more likely that other EAs follow suit.
Another thing I’d worry about is that if Open Phil’s preferred animal charities have less RFMF, then maybe Open Phil would allocate less of its funds to animal welfare in general, leaving more available for other cause areas. Some of those cause areas, like biorisk reduction, plausibly increase expected suffering. From the perspective of this worry, it may be safest to give to small charities that Open Phil would be unlikely to consider or charities that Open Phil doesn’t find promising enough for some reason.
You’d have to donate enough to reduce the recommendation status of an org, which seems unlikely for their Top Charities, at least
It’s unlikely, but if it did happen, it would be a huge negative impact, so in expectation it could still be nontrivial funging? For example, if I think one of ACE’s four top charities is way better than the others, then if I donate a small amount to it, there’s a tiny chance this leads to it becoming unrecommended, but if so, that would result in a ton less future funding to the org.
But I suppose there could still be funging; the funders may have specific total funding targets below filling their near term RFMF, and the closer to those targets, the less they give.
Yeah. Or it could work in reverse: if they commit to giving only, say, 50% of an org’s budget, then if individual donors give more, this “unlocks” the ability for the big donors to give more also. However, Karnofsky says it’s a myth that Open Phil has a hard rule like this. Also, as I noted in the post, I wouldn’t want them to have a hard rule like this, because it could leave really valuable orgs significantly underfunded, which seems bad.
Probably the answer of how it actually works varies depending on the specific case. For example, I imagine that an org that everything thinks is outstanding would be more likely to get fully topped up, while an org that seems average wouldn’t be. But as an outsider, I can only speculate about how these decisions are made, which is why I posted this question.
Thanks!
so the effect may be thought of as additional money going to the worst/borderline EA Animal welfare grantee
Yeah, that’s the funging scenario that I had in mind. :) It’s fine if everyone agrees about the ranking of the different charities. It’s not great if the donor to the funged charity thinks the funged charity is significantly better than the average Animal Welfare Fund grant.
EA Animal Welfare fund does ask on their application form about counterfactual funding
Interesting! That does support the idea there is some funging that happens intentionally. Of course, I don’t think that’s a bad thing in general. The Animal Welfare Fund has a duty to spend its donors’ money as effectively as possible, and that includes not funding something that another (perhaps less EA-minded) donor would have funded anyway. The problem is that I take the same attitude: I don’t want to spend my limited money on something that other donors (including the Animal Welfare Fund) would have funded otherwise. The result is a “giver’s dilemma”, which Karnofsky illustrated with this example:
Imagine that two donors, Alice and Bob, are both considering supporting a charity whose room for more funding is $X, and each is willing to give the full $X to close that gap. If Alice finds out about Bob’s plans, her incentive is to give nothing to the charity, since she knows Bob will fill its funding gap. Conversely, if Bob finds out about Alice’s funding plans, his incentive is to give nothing to the charity and perhaps support another instead. This creates a problematic situation in which neither Alice nor Bob has the incentive to be honest with the other about his/her giving plans and preferences—and each has the incentive to try to wait out the other’s decision.
This is relevant to the idea I suggested in the penultimate paragraph of my post. If the Animal Welfare Fund published a list of funding gaps that it wasn’t going to fill, this could encourage individual donors to only give to places that the Animal Welfare Fund wouldn’t. But then the Animal Welfare Fund would take on the burden of funding all of the best charities (by its lights), which wouldn’t be fair to people who donated to the Animal Welfare Fund. The Fund would prefer for the charities it thinks are best to get as much funding from others as possible. That could imply not disclosing ahead of time where the Fund would be donating, although I doubt the Fund managers are thinking too strategically about this particular issue, and their future grants are often somewhat predictable from past grants anyway.
[Question] How much funging is there with donations to different EA animal charities?
If the AI didn’t face any competition and was a rational agent, it might indeed want to be extremely cautious about making changes to itself or building successors, for the reason you mention. However, if there’s competition among AIs, then just like in the case of a human AI arms race, there might be pressure to self-improve even at the risk of goal drift.
If an AI is built to value helping humans, and if that value can remain intact, then it wouldn’t need to be “enslaved”; it would want to be nice on its own accord. However, I agree with what I take to be the thrust of your question, which is that the chances seem slim that an AI would continue to care about human concerns after many rounds of self-improvement. It seems too easy for things to slide askew from what humans wanted one way or other, especially if there’s a competitive environment with complex interactions among agents.
Thanks. :) I’m personally not one of those transhumanists who welcome the transition to weird posthuman values. I would prefer for space not to be colonized at all in order to avoid astronomically increasing the amount of sentience (and therefore the amount of expected suffering) in our region of the cosmos. I think there could be some common ground, at least in the short run, between suffering-focused people who don’t want space colonized in general and existential-risk people who want to radically slow down the pace of AI progress. If it were possible, the Butlerian Jihad solution could be pretty good both for the AI doomers and the negative utilitarians. Unfortunately, it’s probably not politically possible (even domestically much less internationally), and I’m unsure whether half measures toward it are net good or bad. For example, maybe slowing AI progress in the US would help China catch up, making a competitive race between the two countries more likely, thereby increasing the chance of catastrophic Cold War-style conflict.
Interesting point about most mutants not being very successful. That’s a main reason I tend to imagine that the first AGIs who try to overpower humans, if any, would plausibly fail.
I think there’s some difference in the case of intelligence at the level of humans and above, versus other animals, in adaptability to new circumstances, because human-level intelligence can figure out problems by reason and doesn’t have to wait for evolution to brute-force its way into genetically based solutions. Humans have changed their environments dramatically from the ancestral ones without killing themselves (yet), based on this ability to be flexible using reason. Even the smarter non-human animals display some amount of this ability (cf. the Baldwin effect). (A web search shows that you’ve written about the Baldwin effect and how being smarter leads to faster evolution, so feel free to correct/critique me.)
If you mean that posthumans are likely to be fragile at the collective level, because their aggregate dynamics might result in their own extinction, then that’s plausible, and it may happen to humans themselves within a century or two if current trends continue.
Work related to AI trajectories can still be important even if you think the expected value of the far future is net negative (as I do, relative to my roughly negative-utilitarian values). In addition to alignment, we can also work on reducing s-risks that would result from superintelligence. This work tends to be somewhat different from ordinary AI alignment, although some types of alignment work may reduce s-risks also. (Some alignment work might increase s-risks.)
If you’re not a longtermist or think we’re too clueless about the long-run future, then this work would be less worthwhile. That said, AI will still be hugely disruptive even in the next few years, so we should pay some attention to it regardless of what else we’re doing.
I think GPT-4 is an early AGI. I don’t think it makes sense to use a binary threshold, because various intelligences (from bacteria to ants to humans to superintelligences) have varying degrees of generality.
The goalpost shifting seems like the AI effect to me: “AI is anything that has not been done yet.”
I don’t think it’s obvious that GPT-4 isn’t conscious (even for non-panpsychists), nor is it obvious that its style of intelligence is that different from what happens in our brains.
Suppose that near-term AGI progress mostly looks like making GPT smarter and smarter. Do people think this, in itself, would likely cause human extinction? How? Due to mesa-optimizers that would emerge during training of GPT? Due to people hooking GPT up to control of actions in the real world, and those autonomous systems would themselves go off the rails? Just due to accelerating disruptive social change that makes all sorts of other risks (nuclear war, bioterrorism, economic or government collapse, etc) more likely? Or do people think the AI extinction risk mainly comes when people start building explicitly agentic AIs to automate real-world tasks like making money or national defense, not just text chats and image understanding as GPT does?
I think humans may indeed find ways to scale up their control over successive generations of AIs for a while, and successive generations of AIs may be able to exert some control over their successors, and so on. However, I don’t see how at the end of a long chain of successive generations we could be left with anything that cares much about our little primate goals. Even if individual agents within that system still cared somewhat about humans, I doubt the collective behavior of the society of AIs overall would still care, rather than being driven by its own competitive pressures into weird directions.
An analogy I often give is to consider our fish ancestors hundreds of millions of years ago. Through evolution, they produced somewhat smarter successors, who produced somewhat smarter successors, and so on. At each point along that chain, the successors weren’t that different from the previous generation; each generation might have said that they successfully aligned their successors with their goals, for the most part. But over all those generations, we now care about things dramatically different from what our fish ancestors did (e.g., worshipping Jesus, inclusion of trans athletes, preventing children from hearing certain four-letter words, increasing the power and prestige of one’s nation). In the case of AI successors, I expect the divergence may be even more dramatic, because AIs aren’t constrained by biology in the way that both fish and humans are. (OTOH, there might be less divergence if people engineer ways to reduce goal drift and if people can act collectively well enough to implement them. Even if the former is technically possible, I’m skeptical that the latter is socially possible in the real world.)
Some transhumanists are ok with dramatic value drift over time, as long as there’s a somewhat continuous chain from ourselves to the very weird agents who will inhabit our region of the cosmos in a million years. But I don’t find it very plausible that in a million years, the powerful agents in control of the Milky Way will care that much about what certain humans around the beginning of the third millennium CE valued. Technical alignment work might help make the path from us to them more continuous, but I’m doubtful it will avert human extinction in the long run.
I think a simple reward/punishment signal can be an extremely basic neural representation that “this is good/bad”, and activation of escape muscles can be an extremely basic representation of an imperative to avoid something. I agree that these things seem almost completely unimportant in the simplest systems (I think nematodes aren’t the simplest systems), but I also don’t see any sharp dividing lines between the simplest systems and ourselves, just degrees of complexity and extra machinery. It’s like the difference between a :-| emoticon and the Mona Lisa. The Mona Lisa has lots of extra detail and refinement, but there’s a continuum of possible drawings in between them and no specific point where something qualitatively different occurs.
That’s my current best guess of how to think about sentience relative to my moral intuitions. If there turns out to be a major conceptual breakthrough in neuroscience that points to some processing that’s qualitatively different in complex brains relative to nematodes or NPCs, I might shift my view—although I find it hard to not extend a tiny bit of empathy toward the simpler systems anyway, because they do have preferences and basic neural representations. If we were to discover that consciousness is a special substance/etc that only exists at all in certain minds, then it’s easier for me to understand saying that nematodes or NPCs have literally zero amounts of it.
That’s right. :) There are various additional details to consider, but that’s the main idea.
Catastrophic risks have other side effects in scenarios where humanity does survive, and in most cases, humanity would survive. My impression is that apart from AI risk, biorisk is the most likely form of x-risk to cause actual extinction rather than just disruption. Nuclear winter and especially climate change seem to have a higher ratio of (probability of disruption but still survival)/(probability of complete extinction). AI extinction risk would presumably still involve intelligent agents reaching the stars, so it still may lead to astronomical amounts of suffering.
There are also considerations about cooperation. For example, if one has enough credence in Evidential Cooperation in Large Worlds (ECL), then even a negative utilitarian should support reaching the stars because many other value systems want it (though some don’t, even for reasons besides reducing suffering). Even ignoring ECL, it seems like a bad idea to actively increase biorisk because of the backlash it would provoke. However, due to the act/omission distinction, it’s probably ok to encourage others to omit funding for biorisk-safety work, or at least to try to avoid increasing such funding yourself. Given that work on reducing AI risk isn’t necessarily bad from a suffering-reduction standpoint, shifting biorisk funding to AI risk (or other EA cause areas) is a way to do this omission in a way that may not be that objectionable to most EAs, because the risk of human extinction is still being reduced in either case.