I agree with your central claim that we need more implementation, but I either disagree or am confused by a number of other parts of this post. I think the heart of my confusion is that it focuses on only one piece of end to end impact stories: Is there a plausible story for how the proposed actions actually make the world better?
You frame this post as “A general strategy for doing good things”. This is not what I care about. I do not care about doing things, I care about things being done. This is semantic but it also matters? I do not care about implementation for it’s own sake, I care about impact. The model you use assumes preparation, implementation and the unspoken impact. If the action leading to the best impact is to wait, this is the action we should take, but it’s easy to overlook this if the focus is on implementation. So my Gripe #1 is that we care about impact, not implementation, and we should say this explicitly. We don’t want to fall into a logistic trap either [1].
The question you pose is confusing to me:
“if the entire community disappeared, would the effects still be good for the world?”.
I’m confused by the timeline of the answer to this question (the effects in this instant or in the future?). I’m also confused by what the community disappearing means – does this mean all the individual people in the community disappear? As an example, MLAB skills up participants in machine learning; it is unclear to me if this is Phase 1 or Phase 2 because I’m not sure the participants disappear; if they disappear then no value has been created, but if they don’t disappear (and we include future impact) they will probably go make the world better in the future. If the EA community disappeared but I didn’t, I would still go work on alignment. It seems like this is the case for many EAs I know. Such a world is better than if the EA community never existed, and the future effects on the world would be positive by my lights, but no phase 2 activities happened up until that point. It seems like MLAB is probably Phase 1, as is university, as is the first half of many people’s careers where they are failing to have much impact and are skill/career capital building. If you do mean disappearing all community members, is this defined by participation in the community or level of agreement with key ideas (or something else)? I would consider it a huge win if OpenAI’s voting board of directions were all members of the EA community, or if they had EA-aligned beliefs; this would actually make us less likely to die. Therefore, I think doing outreach to these folks, or more generally “educating people in key positions about the risks from advanced AI” is a pretty great activity to be doing – even though we don’t yet know most the steps to AGI going well. It seems like this kind of outreach is considered Phase 1 in your view because it’s just building the potential influence of EA ideas. So Gripe #2: The question is ambiguous so I can’t distinguish between Phase 1 and 2 activities on your criteria.
You give the example of
writing an AI alignment textbook would be useful to the world even absent our communities, so would be Phase 2
I disagree with this. I don’t think writing a textbook actually makes the world much better. (An AI alignment textbook exists) is not the thing I care about; (aligned AI making the future of humanity go well) is the thing I care about. There’s like 50 steps from the textbook existing to the world being saved, unless your textbook has a solution for alignment, and then it’s only like 10 steps[2]. But you still need somebody to go do those things.
In such a scenario, if we ask “if the entire community disappeared [including all its members], would the effects still be good for the world?”, then I would say that the textbook existing is counterfactually better than the textbook not existing, but not by much. I don’t think the requisite steps needed to prevent the world from ending would be taken. To me, assuming (the current AL alignment community all disappears) cuts our chances of survival in half, at least[3]. I think this framing is not the right one because it is unlikely that the EA or alignment communities will disappear, and I think the world is unfortunately dependent on whether or not these communities stick around. To this end, I think investing in the career and human capital of EA-aligned folks who want to work on alignment is a class of activities relatively likely to improve the future. Convincing top AI researchers and math people etc. is also likely high EV, but you’re saying it’s Phase 2. Again, I don’t care about implementation, I care about impact. I would love to hear AI alignment specific Phase 2 activities that seem more promising than “building the resource bucket (# of people, quality of ideas, $ to a lesser extent, skills of people) of people dedicated to solving alignment”. By more promising I mean have a higher expected value or increase our chances of survival more. Writing a textbook doesn’t pass the test I don’t think. There’s some very intractable ideas I can think of like the UN creates a compute monitoring division. Of the FTX Future Fund ideas, AI Alignment Prizes are maybe Phase 2 depending on the prize, but depends on how we define the limit of the community; probably a lot of good work deserving of a prize would result in an Alignment Forum or LessWrong post without directly impacting people outside these communities much. Writing about AI Ethics suffers from the alignment textbook because it just relies on other people (who probably won’t) taking it seriously. Gripe 3: In terms of AI Alignment, the cause area I focus on most, we don’t seem to have promising Phase 2 ideas but some Phase 1 ideas seem robustly good.
I guess I think AI alignment is a problem where not many things actually help. Creating an aligned AGI helps (so research contributing to that goal has high EV, even if it’s Phase 1), but it’s only something we get one shot at. Getting good governance helps; much of the way to do this is Phase 1 of aligned people getting into positions of power; the other part is creating strategy and policy etc; CSET could create an awesome plan to govern AGI, but, assuming policy makers don’t read reports from disappeared people, this is Phase 1. Policy work is Phase 1 up until there is enough inertia for a policy to get implemented well without the EA community. We’re currently embarrassingly far from having robustly good policy ideas (with a couple exceptions). Gripe 3.5: There’s so much risk of accidental harm from acting soon, and we have no idea what we’re doing.
I agree that we need implementation, but not for its own sake. We need it because it leads to impact or because it’s instrumentally good for getting future impact (as you mention, better feedback, drawing in more people, time diversification based on uncertainty). The irony and cognitive dissonance of being a community dedicated to doing lots of good who then spends most its time thinking does not allude me; as a group organizer at a liberal arts college I think about this quite a bit.
I think the current allocation between Phase 1 and Phase 2 could be incorrect, and you identify some decent reasons why it might be. What would change my mind is a specific plan where having more Phase 2 activities actually increases the EV of the future. In terms of AI Alignment, Phase 1 activities just seem better in almost all cases. I understand that this was a high-level post, so maybe I’m asking for too much.
the concept of a logistics magnet is discussed in Chapter 11 of “Did That Just Happen?!: Beyond “Diversity”―Creating Sustainable and Inclusive Organizations” (Wadsworth, 2021). “This is when the group shifts its focus from the challenging and often distressing underlying issue to, you guessed it, logistics.” (p. 129)
Paths to impact like this are very fuzzy. I’m providing some details purely to show there’s lots of steps and not because I think they’re very realistic. Some steps might be: a person reads the book, they work at an AI lab, they get promoted into a position of influence, they use insights from the book to make some model slightly more aligned and publish a paper about it; 30 other people do similar things in academia and industry, eventually these pieces start to come together and somebody reads all the other papers and creates an AGI that is aligned, this AGI takes a pivotal act to ensure others don’t develop misaligned AGI, we get extremely lucky and this AGI isn’t deceptive, we have a future!
I think it sounds self-important to make a claim like this, so I’ll briefly defend it. Most the world doesn’t recognize the importance or difficulty of the alignment problem. The people who do and are working on it make up the alignment community by my definition; probably a majority consider themselves longtermist or EAs, but I don’t know. If they disappeared, almost nobody would be working on this problem (from a direction that seems even slightly promising to me). There are no good analogies, but… If all the epidemiologists disappeared, our chances of handling the next pandemic well would plunge. This is a bad example partially because others would realize we have a problem and many people have a background close enough that they could fill in the gaps
Re. Gripe #2: I appreciate I haven’t done a perfect job of pinning down the concepts. Rather than try to patch them over now (I think I’ll continue to have things that are in some ways flawed even if I add some patches), I’ll talk a little more about the motivation for the concepts, in the hope that this can help you to triangulate what I intended:
I think that there’s a (theoretically possible) version of EA which has become sort of corrupt, and continues to gather up resources while failing to deploy them for good ends
I think keeping a certain amount of Phase 2 work keeps EA honest, and connecting to its roots of trying to do good in the world
The ability to credibly point to achieved impact is asymmetrically deployable by memeplexes which are really gearing up to do good things and help people achieve more good things, over versions of the memeplex which tell powerful narratives about why they’re really the most important thing but will ultimately fail to achieve anything
In slogan form: “Phase 2 work guarantees EA isn’t a Ponzi scheme”
I think keeping more attention on “what are our current best guesses about concrete things that we can go do” prevents people’s pictures of what’s important from getting too unmoored from reality
Re. Gripe #3 (/#3.5): I also think AI stuff is super important and that we’re mostly not ready for Phase 2 stuff. But I’m also very worried that a lot of work people do on it is kind of missing the point of what ends up mattering …
So I think that AI alignment etc. would be in a better place if we put more effort into Phase 1.5 stuff. I think that this is supported by having some EA attention on Phase 2 work for things which aren’t directly about alignment, but affect the background situation of the world and so are relevant for how well AI goes. Having the concrete Phase 2 work there encourages serious Phase 1.5 work about such things — which probably helps to encourage serious Phase 1.5 work about other AI things (like how we should eventually handle deployment).
Sorry for the long and disorganized comment.
I agree with your central claim that we need more implementation, but I either disagree or am confused by a number of other parts of this post. I think the heart of my confusion is that it focuses on only one piece of end to end impact stories: Is there a plausible story for how the proposed actions actually make the world better?
You frame this post as “A general strategy for doing good things”. This is not what I care about. I do not care about doing things, I care about things being done. This is semantic but it also matters? I do not care about implementation for it’s own sake, I care about impact. The model you use assumes preparation, implementation and the unspoken impact. If the action leading to the best impact is to wait, this is the action we should take, but it’s easy to overlook this if the focus is on implementation. So my Gripe #1 is that we care about impact, not implementation, and we should say this explicitly. We don’t want to fall into a logistic trap either [1].
The question you pose is confusing to me:
I’m confused by the timeline of the answer to this question (the effects in this instant or in the future?). I’m also confused by what the community disappearing means – does this mean all the individual people in the community disappear? As an example, MLAB skills up participants in machine learning; it is unclear to me if this is Phase 1 or Phase 2 because I’m not sure the participants disappear; if they disappear then no value has been created, but if they don’t disappear (and we include future impact) they will probably go make the world better in the future. If the EA community disappeared but I didn’t, I would still go work on alignment. It seems like this is the case for many EAs I know. Such a world is better than if the EA community never existed, and the future effects on the world would be positive by my lights, but no phase 2 activities happened up until that point. It seems like MLAB is probably Phase 1, as is university, as is the first half of many people’s careers where they are failing to have much impact and are skill/career capital building. If you do mean disappearing all community members, is this defined by participation in the community or level of agreement with key ideas (or something else)? I would consider it a huge win if OpenAI’s voting board of directions were all members of the EA community, or if they had EA-aligned beliefs; this would actually make us less likely to die. Therefore, I think doing outreach to these folks, or more generally “educating people in key positions about the risks from advanced AI” is a pretty great activity to be doing – even though we don’t yet know most the steps to AGI going well. It seems like this kind of outreach is considered Phase 1 in your view because it’s just building the potential influence of EA ideas. So Gripe #2: The question is ambiguous so I can’t distinguish between Phase 1 and 2 activities on your criteria.
You give the example of
I disagree with this. I don’t think writing a textbook actually makes the world much better. (An AI alignment textbook exists) is not the thing I care about; (aligned AI making the future of humanity go well) is the thing I care about. There’s like 50 steps from the textbook existing to the world being saved, unless your textbook has a solution for alignment, and then it’s only like 10 steps[2]. But you still need somebody to go do those things.
In such a scenario, if we ask “if the entire community disappeared [including all its members], would the effects still be good for the world?”, then I would say that the textbook existing is counterfactually better than the textbook not existing, but not by much. I don’t think the requisite steps needed to prevent the world from ending would be taken. To me, assuming (the current AL alignment community all disappears) cuts our chances of survival in half, at least[3]. I think this framing is not the right one because it is unlikely that the EA or alignment communities will disappear, and I think the world is unfortunately dependent on whether or not these communities stick around. To this end, I think investing in the career and human capital of EA-aligned folks who want to work on alignment is a class of activities relatively likely to improve the future. Convincing top AI researchers and math people etc. is also likely high EV, but you’re saying it’s Phase 2. Again, I don’t care about implementation, I care about impact. I would love to hear AI alignment specific Phase 2 activities that seem more promising than “building the resource bucket (# of people, quality of ideas, $ to a lesser extent, skills of people) of people dedicated to solving alignment”. By more promising I mean have a higher expected value or increase our chances of survival more. Writing a textbook doesn’t pass the test I don’t think. There’s some very intractable ideas I can think of like the UN creates a compute monitoring division. Of the FTX Future Fund ideas, AI Alignment Prizes are maybe Phase 2 depending on the prize, but depends on how we define the limit of the community; probably a lot of good work deserving of a prize would result in an Alignment Forum or LessWrong post without directly impacting people outside these communities much. Writing about AI Ethics suffers from the alignment textbook because it just relies on other people (who probably won’t) taking it seriously. Gripe 3: In terms of AI Alignment, the cause area I focus on most, we don’t seem to have promising Phase 2 ideas but some Phase 1 ideas seem robustly good.
I guess I think AI alignment is a problem where not many things actually help. Creating an aligned AGI helps (so research contributing to that goal has high EV, even if it’s Phase 1), but it’s only something we get one shot at. Getting good governance helps; much of the way to do this is Phase 1 of aligned people getting into positions of power; the other part is creating strategy and policy etc; CSET could create an awesome plan to govern AGI, but, assuming policy makers don’t read reports from disappeared people, this is Phase 1. Policy work is Phase 1 up until there is enough inertia for a policy to get implemented well without the EA community. We’re currently embarrassingly far from having robustly good policy ideas (with a couple exceptions). Gripe 3.5: There’s so much risk of accidental harm from acting soon, and we have no idea what we’re doing.
I agree that we need implementation, but not for its own sake. We need it because it leads to impact or because it’s instrumentally good for getting future impact (as you mention, better feedback, drawing in more people, time diversification based on uncertainty). The irony and cognitive dissonance of being a community dedicated to doing lots of good who then spends most its time thinking does not allude me; as a group organizer at a liberal arts college I think about this quite a bit.
I think the current allocation between Phase 1 and Phase 2 could be incorrect, and you identify some decent reasons why it might be. What would change my mind is a specific plan where having more Phase 2 activities actually increases the EV of the future. In terms of AI Alignment, Phase 1 activities just seem better in almost all cases. I understand that this was a high-level post, so maybe I’m asking for too much.
the concept of a logistics magnet is discussed in Chapter 11 of “Did That Just Happen?!: Beyond “Diversity”―Creating Sustainable and Inclusive Organizations” (Wadsworth, 2021). “This is when the group shifts its focus from the challenging and often distressing underlying issue to, you guessed it, logistics.” (p. 129)
Paths to impact like this are very fuzzy. I’m providing some details purely to show there’s lots of steps and not because I think they’re very realistic. Some steps might be: a person reads the book, they work at an AI lab, they get promoted into a position of influence, they use insights from the book to make some model slightly more aligned and publish a paper about it; 30 other people do similar things in academia and industry, eventually these pieces start to come together and somebody reads all the other papers and creates an AGI that is aligned, this AGI takes a pivotal act to ensure others don’t develop misaligned AGI, we get extremely lucky and this AGI isn’t deceptive, we have a future!
I think it sounds self-important to make a claim like this, so I’ll briefly defend it. Most the world doesn’t recognize the importance or difficulty of the alignment problem. The people who do and are working on it make up the alignment community by my definition; probably a majority consider themselves longtermist or EAs, but I don’t know. If they disappeared, almost nobody would be working on this problem (from a direction that seems even slightly promising to me). There are no good analogies, but… If all the epidemiologists disappeared, our chances of handling the next pandemic well would plunge. This is a bad example partially because others would realize we have a problem and many people have a background close enough that they could fill in the gaps
Re. Gripe #2: I appreciate I haven’t done a perfect job of pinning down the concepts. Rather than try to patch them over now (I think I’ll continue to have things that are in some ways flawed even if I add some patches), I’ll talk a little more about the motivation for the concepts, in the hope that this can help you to triangulate what I intended:
I think that there’s a (theoretically possible) version of EA which has become sort of corrupt, and continues to gather up resources while failing to deploy them for good ends
I think keeping a certain amount of Phase 2 work keeps EA honest, and connecting to its roots of trying to do good in the world
The ability to credibly point to achieved impact is asymmetrically deployable by memeplexes which are really gearing up to do good things and help people achieve more good things, over versions of the memeplex which tell powerful narratives about why they’re really the most important thing but will ultimately fail to achieve anything
In slogan form: “Phase 2 work guarantees EA isn’t a Ponzi scheme”
I think keeping more attention on “what are our current best guesses about concrete things that we can go do” prevents people’s pictures of what’s important from getting too unmoored from reality
Thanks for the clarification! I would point to this recent post on a similar topic to the last thing you said.
Re. Gripe #3 (/#3.5): I also think AI stuff is super important and that we’re mostly not ready for Phase 2 stuff. But I’m also very worried that a lot of work people do on it is kind of missing the point of what ends up mattering …
So I think that AI alignment etc. would be in a better place if we put more effort into Phase 1.5 stuff. I think that this is supported by having some EA attention on Phase 2 work for things which aren’t directly about alignment, but affect the background situation of the world and so are relevant for how well AI goes. Having the concrete Phase 2 work there encourages serious Phase 1.5 work about such things — which probably helps to encourage serious Phase 1.5 work about other AI things (like how we should eventually handle deployment).