Thanks for taking the time to write this, Ezra—I found it useful.
Daniel_Dewey
Since the groups above seem to exhaust the space of beneficiaries (if what we care about is well-being), we can’t expect to get more effectiveness improvements in this way. In future, such improvements will have to come from finding new interventions, or intervention types.
Though I think the conclusion may well be correct, this argument doesn’t seem valid to me. Thinking about it more produced some ideas I found interesting.
Imagine that we instead had only one group of beneficiaries: all conscious beings. We could run the same argument—this group exhausts all possible beneficiaries, etc. -- and conclude that discovering new beneficiary groups isn’t helpful. However, breaking down “conscious beings” into present and future groups, and breaking down further into humans and animals, has in fact been very helpful, so we would have been wrong to stop looking for beneficiary groups.
From where we stand now, I can imagine discovering more useful beneficiary groups by breaking down the three you highlight further. Arguably, this is what happened with people in extreme poverty: they are a very help-able subgroup. Similarly, factory-farmed animals seem to be a very help-able subgroup of non-human animals, and maybe chickens are the most-help-able. Maybe the discovery of more very help-able subgroups, e.g. subgroups of future conscious beings (artificial beings? future animals?) or subgroups of wild animals (species that suffer a lot in the wild?), will lead to big EA breakthroughs in the future.
Of course, which groups are help-able basically depends on the interventions available, so splitting EA research into “find new beneficiary groups” and “find new interventions” is a blurry distinction.
- Moral circles: Degrees, dimensions, visuals by 24 Jul 2020 4:04 UTC; 87 points) (
- 16 Jan 2018 15:58 UTC; 3 points) 's comment on How to get a new cause into EA by (
Thanks Tara! I’d like to do more writing of this kind, and I’m thinking about how to prioritize it. It’s useful to hear that you’d be excited about those topics in particular.
I have very mixed feelings about Sarah’s post; the title seems inaccurate to me, and I’m not sure about how the quotes were interpreted, but it’s raised some interesting and useful-seeming discussion. Two brief points:
I understand what causes people to write comments like “lying seems bad but maybe it’s the best thing to do in some cases”, but I don’t think those comments usually make useful points (they typically seem pedantic at best and edgy at worst), and I hope people aren’t actually guided by considerations like those. Most EAs I work with, AFAICT, strive to be honest about their work and believe that this is the best policy even when there are prima facie reasons to be dishonest. Maybe it’s worth articulating some kind of “community-utilitarian” norms, probably drawing on rule utilitarianism, to explain why I think honesty is the best policy?
I think the discussion of what “pledge” means to different people is interesting; a friend pointed out to me that blurring the meaning of “pledge” into something softer than an absolute commitment could hurt my ability to make absolute commitments in the future, and I’m now considering ways to be more articulate about the strength of different commitment-like statements I make. Maybe it’s worth picking apart and naming some different concepts, like game-theoretic cooperation commitments, game-theoretic precommitments (e.g. virtues adopted before a series of games is entered), and self-motivating public statements (where nobody else’s decisions lose value if I later reverse my statement, but I want to participate in a social support structure for shared values)?
Re: donation: I’d personally feel best about donating to the Long-Term Future EA Fund (not yet ready, I think?) or the EA Giving Group, both managed by Nick Beckstead.
I’ve often found the EAs around me to be
(i) very supportive of taking on things that are ex ante good ideas, but carry significant risk of failing altogether, and
(ii) good at praising these decisions after they have turned out to fail.
It doesn’t totally remove the sting to have those around you say “Great job taking that risk, it was the right decision and the EV was good!” and really mean it, but I do find that it helps, and it’s a habit I’m trying to build to praise these kinds of things after the fact as much as I praise big successes.
Of course there is some tension; often, if a thing fails to produce value, it’s useful to figure out how we could have anticipated that failure, and why it might not have been the right decision ex ante. Balance, I guess.
Thanks Kerry, Benito! Glad you found it helpful.
I am very bullish on the Far Future EA Fund, and donate there myself. There’s one other possible nonprofit that I’ll publicize in the future if it gets to the stage where it can use donations (I don’t want to hype this up as an uber-solution, just a nonprofit that I think could be promising).
I unfortunately don’t spend a lot of time thinking about individual donation opportunities, and the things I think are most promising often get partly funded through Open Phil (e.g. CHAI and FHI), but I think diversifying the funding source for orgs like CHAI and FHI is valuable, so I’d consider them as well.
Thanks Nate!
The end goal is to prevent global catastrophes, but if a safety-conscious AGI team asked how we’d expect their project to fail, the two likeliest scenarios we’d point to are “your team runs into a capabilities roadblock and can’t achieve AGI” or “your team runs into an alignment roadblock and can easily tell that the system is currently misaligned, but can’t figure out how to achieve alignment in any reasonable amount of time.”
This is particularly helpful to know.
We worry about “unknown unknowns”, but I’d probably give them less emphasis here. We often focus on categories of failure modes that we think are easy to foresee. As a rule of thumb, when we prioritize a basic research problem, it’s because we expect it to help in a general way with understanding AGI systems and make it easier to address many different failure modes (both foreseen and unforeseen), rather than because of a one-to-one correspondence between particular basic research problems and particular failure modes.
Can you give an example or two of failure modes or “categories of failure modes that are easy to foresee” that you think are addressed by some HRAD topic? I’d thought previously that thinking in terms of failure modes wasn’t a good way to understand HRAD research.
As an example, the reason we work on logical uncertainty isn’t that we’re visualizing a concrete failure that we think is highly likely to occur if developers don’t understand logical uncertainty. We work on this problem because any system reasoning in a realistic way about the physical world will need to reason under both logical and empirical uncertainty, and because we expect broadly understanding how the system is reasoning about the world to be important for ensuring that the optimization processes inside the system are aligned with the intended objectives of the operators.
I’m confused by this as a follow-up to the previous paragraph. This doesn’t look like an example of “focusing on categories of failure modes that are easy to foresee,” it looks like a case where you’re explicitly not using concrete failure modes to decide what to work on.
“how do we ensure the system’s cognitive work is being directed at solving the right problems, and at solving them in the desired way?”
I feel like this fits with the “not about concrete failure modes” narrative that I believed before reading your comment, FWIW.
Welcome! :)
I think your argument totally makes sense, and you’re obviously free to use your best judgement to figure out how to do as much good as possible. However, a couple of other considerations seem important, especially for things like what a “true effective altruist” would do.
1) One factor of your impact is your ability to stick with your giving; this could give you a reason to adopt something less scary and demanding. By analogy, it might seem best for fitness to commit to intense workouts 5 days a week, strict diet changes, and no alcohol, but in practice trying to do this may result in burning out and not doing anything for your fitness, while a less-demanding plan might be easier to stick with and result in better fitness over the length of your life.
Personally, the prospect of giving up retirement doesn’t seem too demanding; I like working, and retirement is so far away that it’s hard to take seriously. However, I’d understand if others didn’t feel this way, and I wouldn’t want to push them into a commitment they won’t be able to keep.
2) Another factor of your impact is the other people you influence who may start giving, and would not have done so without your example—in fact, it doesn’t seem implausible that this could make up the majority of your impact over your life. To the extent that giving is a really significant cost for people, it’s harder to spread the idea (e.g. many more people are vegetarian than vegan [citation needed]), and asking people to give up major parts of their life story like retirement (or a wedding, or occasional luxuries, or christmas gifts for their families, etc.) comes with real costs that could be measured in dollars (with lots of uncertainty). More broadly, the norms that we establish as a community affect the growth of the community, which directly affects total giving—if people see us as a super-hardcore group that requires great sacrifice, I just expect less money to be given.
For these reasons, I prefer to follow and encourage norms that say something like “Hey, guess what—you can help other people a huge amount without sacrificing anything huge! Your life can be just as you thought it would be, and also help other people a lot!” I actually anticipate these norms to have better consequences in terms of helping people than more strict norms (like “don’t retire”) do, mostly for reasons 1 and 2.
There’s still a lot of discussion on these topics, and I could imagine finding out that I’m wrong—for example, I’ve heard that there’s evidence of more demanding religions being more successful at creating a sense of community and therefore being more satisfying and attractive. However, my best guess is that “don’t retire” is too demanding.
(I looked for an article saying something like this but better to link to, but I didn’t quickly find one—if anyone knows where one is, feel free to link!)
Thanks for recommending a concrete change in behavior here!
I also appreciate the discussion of your emotional engagement / other EAs’ possible emotional engagement with cause prioritization—my EA emotional life is complicated, I’m guessing others have a different set of feelings and struggles, and this kind of post seems like a good direction for understanding and supporting one another.
ETA: personally, it feels correct when the opportunity arises to emotionally remind myself of the gravity of the ER-triage-like decisions that humans have to make when allocating resources. I can do this by celebrating wins (e.g. donations / grants others make, actual outcomes) as well as by thinking about how far we have to go in most areas. It’s slightly scary, but makes me more confident that I’m even-handedly examining the world and its problems to the best of my abilities and making the best calls I can, and I hope it keeps my ability to switch cause areas healthy. I’d guess this works for me partially because those emotions don’t interfere with my ability to be happy / productive, and I expect there are people whose feelings work differently and who shouldn’t regularly dwell on that kind of thing :)
This is a great point—thanks, Jacob!
I think I tend to expect more from people when they are critical—i.e. I’m fine with a compliment/agreement that someone spent 2 minutes on, but expect critics to “do their homework”, and if a complimenter and a critic were equally underinformed/unthoughtful, I’d judge the critic more harshly. This seems bad!
One response is “poorly thought-through criticism can spread through networks; even if it’s responded to in one place, people cache and repeat it other places where it’s not responded to, and that’s harmful.” This applies equally well to poorly thought-through compliments; maybe the unchallenged-compliment problem is even worse, because I have warm feelings about this community and its people and orgs!
Proposed responses (for me, though others could adopt them if they thought they’re good ideas):
For now, assume that all critics are in good faith. (If we have / end up with a bad-critic problem, these responses need to be revised; I’ll assume for now that the asymmetry of critique is a bigger problem.)
When responding to critiques, thank the critic in a sincere, non-fake way, especially when I disagree with the critique (e.g. “Though I’m about to respond with how I disagree, I appreciate you taking the critic’s risk to help the community. Thank you! [response to critique]”)
Agree or disagree with critiques in a straightforward way, instead of saying e.g. “you should have thought about this harder”.
Couch compliments the way I would couch critiques.
Try to notice my disagreements with compliments, and comment on them if I disagree.
Thoughts?
There’s this policy report from September 2014, Unprecedented Technological Risks, signed by Beckstead, Bostrom, Bowerman, Cotton-Barratt, MacAskill, Ó hÉigeartaigh, and Ord. Not a long read, but I’d expect the references to be among the best available.
Thanks for these thoughts. (Your second link is broken, FYI.)
On empirical feedback: my current suspicion is that there are some problems where empirical feedback is pretty hard to get, but I actually think we could get more empirical feedback on how well HRAD can be used to diagnose and solve problems in AI systems. For example, it seems like many AI systems implicitly do some amount of logical-uncertainty-type reasoning (e.g. AlphaGo, which is really all about logical uncertainty over the result of expensive game-tree computations) -- maybe HRAD could be used to understand how those systems could fail?
I’m less convinced that the “ignored physical aspect of computation” is a very promising direction to follow, but I may not fully understand the position you’re arguing for.
I’m going to try to answer these questions, but there’s some danger that I could be taken as speaking for MIRI or Paul or something, which is not the case :) With that caveat:
I’m glad Rob sketched out his reasoning on why (1) and (2) don’t play a role in MIRI’s thinking. That fits with my understanding of their views.
(1) You might think that “learning to reason from humans” doesn’t accomplish (1) because a) logic and mathematics seem to be the only methods we have for stating things with extremely high certainty, and b) you probably can’t rule out AI catastrophes with high certainty unless you can “peer inside the machine” so to speak. HRAD might allow you to peer inside the machine and make statements about what the machine will do with extremely high certainty.
My current take on this is that whatever we do, we’re going to fall pretty far short of proof-strength “extremely high certainty”—the approaches I’m familiar with, including HRAD, are after some mix of
a basic explanation of why an AI system designed a certain way should be expected to be aligned, corrigible, or some mix or other similar property
theoretical and empirical understanding that makes us think that an actual implementation follows that story robustly / reliably
HRAD makes trade-offs than other approaches do, and it does seem to me like successfully-done HRAD would be more likely to be amenable to formal arguments that cover some parts of our confidence gap, but it doesn’t look to me like “HRAD offers proof-level certainty, other approaches offer qualitatively less”.
(2) Produce an AI system that can help create an optimal world… You might think that “learning to reason from humans” doesn’t accomplish (2) because it makes the AI human-limited. If we want an advanced AI to help us create the kind of world that humans would want “if we knew more, thought faster, were more the people we wished we were” etc. then the approval of actual humans might, at some point, cease to be helpful.
It’s true that I’m more focused on “make sure human values keep steering the future” than on the direct goal of “optimize the world”; I think that making sure human values keep steering the future is the best leverage point for creating an optimal world.
My hope is that for some decisions, actual humans (like us) would approve of “make this decision on the basis of something CEV-like—do things we’d approve of if we knew more, thought faster, etc., where those approvals can be predicted with high confidence, don’t pose super-high risk of lock-in to a suboptimal future, converge among different people, etc.” If you and I think this is a good idea, it seems like an AI system trained on us could think this as well.
Another way of thinking about this is that the world is currently largely steered by human values, AI threatens to introduce another powerful steering force, and we’re just making sure that that power is aligned with us at each timestep. A not-great outcome is that we end up with the world humans would have made if AI were not possible in the first place, but we don’t get toward optimality very quickly; a more optimistic outcome is that the additional steering power accelerates us very significantly along the track to an optimal world, steered by human values along the way.
Nice work, and looks like a good group of advisors!
For many Americans, income taxes might go down; probably worth thinking about what to do with that “extra” money.
Has anyone here seen any good analyses of helping Syrian refugees as a cause area, or the most effective ways to do it? I’ve seen some commentary on opening borders and some general tips on disaster relief from GiveWell, but not much beyond that. Thanks!
I am not Nate, but my view (and my interpretation of some median FHI view) is that we should keep options open about those strategies and as-yet unknown other strategies instead of fixating on one at the moment. There’s a lot of uncertainty, and all of the strategies look really hard to achieve. In short, no strongly favored strategy.
FWIW, I also think that most current work in this area, including MIRI’s, promotes the first three of those goals pretty well.
Prediction-making in my Open Phil work does feel like progress to me, because I find making predictions and writing them down difficult and scary, indicating that I wasn’t doing that mental work as seriously before :) I’m quite excited to see what comes of it.