I’m Aaron, I’ve done Uni group organizing at the Claremont Colleges for a bit. Current cause prioritization is AI Alignment.
Aaron_Scher
Sorry for the long and disorganized comment.
I agree with your central claim that we need more implementation, but I either disagree or am confused by a number of other parts of this post. I think the heart of my confusion is that it focuses on only one piece of end to end impact stories: Is there a plausible story for how the proposed actions actually make the world better?
You frame this post as “A general strategy for doing good things”. This is not what I care about. I do not care about doing things, I care about things being done. This is semantic but it also matters? I do not care about implementation for it’s own sake, I care about impact. The model you use assumes preparation, implementation and the unspoken impact. If the action leading to the best impact is to wait, this is the action we should take, but it’s easy to overlook this if the focus is on implementation. So my Gripe #1 is that we care about impact, not implementation, and we should say this explicitly. We don’t want to fall into a logistic trap either [1].
The question you pose is confusing to me:
“if the entire community disappeared, would the effects still be good for the world?”.
I’m confused by the timeline of the answer to this question (the effects in this instant or in the future?). I’m also confused by what the community disappearing means – does this mean all the individual people in the community disappear? As an example, MLAB skills up participants in machine learning; it is unclear to me if this is Phase 1 or Phase 2 because I’m not sure the participants disappear; if they disappear then no value has been created, but if they don’t disappear (and we include future impact) they will probably go make the world better in the future. If the EA community disappeared but I didn’t, I would still go work on alignment. It seems like this is the case for many EAs I know. Such a world is better than if the EA community never existed, and the future effects on the world would be positive by my lights, but no phase 2 activities happened up until that point. It seems like MLAB is probably Phase 1, as is university, as is the first half of many people’s careers where they are failing to have much impact and are skill/career capital building. If you do mean disappearing all community members, is this defined by participation in the community or level of agreement with key ideas (or something else)? I would consider it a huge win if OpenAI’s voting board of directions were all members of the EA community, or if they had EA-aligned beliefs; this would actually make us less likely to die. Therefore, I think doing outreach to these folks, or more generally “educating people in key positions about the risks from advanced AI” is a pretty great activity to be doing – even though we don’t yet know most the steps to AGI going well. It seems like this kind of outreach is considered Phase 1 in your view because it’s just building the potential influence of EA ideas. So Gripe #2: The question is ambiguous so I can’t distinguish between Phase 1 and 2 activities on your criteria.
You give the example of
writing an AI alignment textbook would be useful to the world even absent our communities, so would be Phase 2
I disagree with this. I don’t think writing a textbook actually makes the world much better. (An AI alignment textbook exists) is not the thing I care about; (aligned AI making the future of humanity go well) is the thing I care about. There’s like 50 steps from the textbook existing to the world being saved, unless your textbook has a solution for alignment, and then it’s only like 10 steps[2]. But you still need somebody to go do those things.
In such a scenario, if we ask “if the entire community disappeared [including all its members], would the effects still be good for the world?”, then I would say that the textbook existing is counterfactually better than the textbook not existing, but not by much. I don’t think the requisite steps needed to prevent the world from ending would be taken. To me, assuming (the current AL alignment community all disappears) cuts our chances of survival in half, at least[3]. I think this framing is not the right one because it is unlikely that the EA or alignment communities will disappear, and I think the world is unfortunately dependent on whether or not these communities stick around. To this end, I think investing in the career and human capital of EA-aligned folks who want to work on alignment is a class of activities relatively likely to improve the future. Convincing top AI researchers and math people etc. is also likely high EV, but you’re saying it’s Phase 2. Again, I don’t care about implementation, I care about impact. I would love to hear AI alignment specific Phase 2 activities that seem more promising than “building the resource bucket (# of people, quality of ideas, $ to a lesser extent, skills of people) of people dedicated to solving alignment”. By more promising I mean have a higher expected value or increase our chances of survival more. Writing a textbook doesn’t pass the test I don’t think. There’s some very intractable ideas I can think of like the UN creates a compute monitoring division. Of the FTX Future Fund ideas, AI Alignment Prizes are maybe Phase 2 depending on the prize, but depends on how we define the limit of the community; probably a lot of good work deserving of a prize would result in an Alignment Forum or LessWrong post without directly impacting people outside these communities much. Writing about AI Ethics suffers from the alignment textbook because it just relies on other people (who probably won’t) taking it seriously. Gripe 3: In terms of AI Alignment, the cause area I focus on most, we don’t seem to have promising Phase 2 ideas but some Phase 1 ideas seem robustly good.
I guess I think AI alignment is a problem where not many things actually help. Creating an aligned AGI helps (so research contributing to that goal has high EV, even if it’s Phase 1), but it’s only something we get one shot at. Getting good governance helps; much of the way to do this is Phase 1 of aligned people getting into positions of power; the other part is creating strategy and policy etc; CSET could create an awesome plan to govern AGI, but, assuming policy makers don’t read reports from disappeared people, this is Phase 1. Policy work is Phase 1 up until there is enough inertia for a policy to get implemented well without the EA community. We’re currently embarrassingly far from having robustly good policy ideas (with a couple exceptions). Gripe 3.5: There’s so much risk of accidental harm from acting soon, and we have no idea what we’re doing.
I agree that we need implementation, but not for its own sake. We need it because it leads to impact or because it’s instrumentally good for getting future impact (as you mention, better feedback, drawing in more people, time diversification based on uncertainty). The irony and cognitive dissonance of being a community dedicated to doing lots of good who then spends most its time thinking does not allude me; as a group organizer at a liberal arts college I think about this quite a bit.
I think the current allocation between Phase 1 and Phase 2 could be incorrect, and you identify some decent reasons why it might be. What would change my mind is a specific plan where having more Phase 2 activities actually increases the EV of the future. In terms of AI Alignment, Phase 1 activities just seem better in almost all cases. I understand that this was a high-level post, so maybe I’m asking for too much.
- ^
the concept of a logistics magnet is discussed in Chapter 11 of “Did That Just Happen?!: Beyond “Diversity”―Creating Sustainable and Inclusive Organizations” (Wadsworth, 2021). “This is when the group shifts its focus from the challenging and often distressing underlying issue to, you guessed it, logistics.” (p. 129)
- ^
Paths to impact like this are very fuzzy. I’m providing some details purely to show there’s lots of steps and not because I think they’re very realistic. Some steps might be: a person reads the book, they work at an AI lab, they get promoted into a position of influence, they use insights from the book to make some model slightly more aligned and publish a paper about it; 30 other people do similar things in academia and industry, eventually these pieces start to come together and somebody reads all the other papers and creates an AGI that is aligned, this AGI takes a pivotal act to ensure others don’t develop misaligned AGI, we get extremely lucky and this AGI isn’t deceptive, we have a future!
- ^
I think it sounds self-important to make a claim like this, so I’ll briefly defend it. Most the world doesn’t recognize the importance or difficulty of the alignment problem. The people who do and are working on it make up the alignment community by my definition; probably a majority consider themselves longtermist or EAs, but I don’t know. If they disappeared, almost nobody would be working on this problem (from a direction that seems even slightly promising to me). There are no good analogies, but… If all the epidemiologists disappeared, our chances of handling the next pandemic well would plunge. This is a bad example partially because others would realize we have a problem and many people have a background close enough that they could fill in the gaps
- ^
Pilot study results: Cost-effectiveness information did not increase interest in EA
I read this post around the beginning of March this year (~6 months ago). I think reading this post was probably net-negative for my life plans. Here are some thoughts about why I think reading this post was bad for me, or at least not very good. I have not re-read the post since then, so maybe some of my ideas are dumb for obvious reasons.
I think the broad emphasis on general skill and capacity building often comes at the expense of directly pursuing your goals. In many ways, the post is like “Skill up in an aptitude because in the future this might be instrumentally useful for making the future go well.” And I think this is worse than “Identify what skills might help the future go well, then skill up in these skills, then you can cause impact.” I think the aptitudes framework is what I might say if I knew a bunch of un-exceptional people were listening to me and taking my words as gospel, but it is not what I would advise to an exceptional person who wants to change the world for the better (I would try to instill a sense of specifically aiming at the thing they want and pursuing it more directly). This distinction is important. To flesh this out, if only geniuses are reading my post, I might advise that they try high variance, high EV things which have a large chance of ending up in the tails (e.g., startups, for which most the people will fail). But I would not recommend to a broader crowd that they try startups, because more of them would fail, and then the community that I was trying to create to help the future go well is largely made up of people who took long shot bets and failed, making them not so useful, and making my community less useful when it’s crunch time (although I am currently unsure what we need at crunch time, having a bunch of people who pursued aptitudes growth is probably good). Therefore, I think I understand and somewhat endorse a safer, aptitudes based advice at a community scale, but I don’t want it to get in the way of ‘people who are willing to take greater risks and do whacky career stuff’ actually doing so.
My personal experience is that reading this post gave me the idea that I could sorta continue life as normal, but with a slight focus on developing particular aptitudes like building organizational success, research on core longtermist topics, communicating maybe. I currently think that plan was bad and, if adopted more broadly, has a very bad chance of working (i.e., AI alignment gets solved). However, I also suspect that my current path is suboptimal – I am not investing in my career capital or human capital for the long-run as much as I should be.
So I guess my overall take is something like: people should consider the aptitudes framework, but they should also think about what needs to happen in the world in order to get the thing you care about. Taking a safer, aptitudes based approach, is likely the right path for many people, but not for everybody. If you take seriously the career advice that you read, it seems pretty unlikely that this would cause you to take roughly the same actions you were planning on taking before reading – you should be suspicious of this surprising convergence.
We should be paying Intro Fellows
The Unilateralist’s Curse, An Explanation
It’s not obvious that getting dangerous AI later is better
Five neglected work areas that could reduce AI risk
Thanks for writing this up and making it public. Couple comments:
On average 45 applications were submitted to each position.
CEA Core roles received an average of 54 applications each; EOIs received an average of 53 applications each.
Is the first number a typo? Shouldn’t it be ~54
Ashby hires 4% of applicants, compared to 2% at CEA
...
Overall, CEA might be slightly more selective than Ashby’s customers, but it does not seem like the difference is large
Whether this is “large” is obviously subjective. When I read this, I see ‘CEA is twice as selective as industry over the last couple years’. Therefore my conclusion is something like: Yes, it is still hard to get a job in EA, as evident from CEA being around twice as selective as industry for some roles; there are about 54 applicants per role at CEA. I think the summary of this post should be updated to say something like “CEA is more competitive but in the same ballpark as industry”
The article doesn’t seem to have a comment section so I’m putting some thoughts here.
Economic growth: I don’t feel I know enough about historical economic growth to comment on how much to weigh the “the trend growth rate of GDP per capita in the world’s frontier economy has never exceeded three percent per year.” I’ll note that I think the framing here is quite different than that of Christiano’s Hyperbolic Growth, despite them looking at roughly the same data as far as I can tell.
Scaling current methods: the article seems to cherrypick the evidence pretty significantly and makes the weak claim that “Current methods may also not be enough.” It is obvious that my subjective probability that current methods are enough should be <1, but I have yet to come across arguments that push that credence below say 50%.
“Scaling compute another order of magnitude would require hundreds of billions of dollars more spending on hardware.” This is straightforwardly false. The table included in the article, from the Chinchilla paper with additions, is a bit confusing because it doesn’t include where we are now, and because it lists only model size rather than total training compute (FLOP). Based on Epoch’s database of models, PaLM 2 is trained with about 7.34e24 FLOP, and GPT-4 is estimated at 2.10e25 (note these are not official numbers). This corresponds to being around the 280B param (9.9e24 FLOP) or 520B param (3.43e25 FLOP) rows in the table. In this range, tens of millions of dollars are being spent on compute for the biggest training runs now. It should be obvious that you can get a couple more orders of magnitude more compute before hitting hundreds of billions of dollars. In fact, the 10 Trillion param row in the table, listed at $28 billion, corresponds to a total training compute of 1.3e28 FLOP, which is more than 2 orders of magnitude above the biggest publicly-known models are estimated. I agree that cost may soon become a limiting factor, but the claim that an order of magnitude would push us into hundreds of billions is clearly wrong given that currently costs are tens of millions.
Re cherrypicking data, I guess one of the most important points that seems to be missing from this section is the rate of algorithmic improvement. I would point to Epoch’s work here.
“Constitutional AI, a state-of-the-art alignment technique that has even reached the steps of Capitol Hill, also does not aim to remove humans from the process at all: “rather than removing human supervision, in the longer term our goal is to make human supervision as efficacious as possible.”″ This seems to me like a misunderstanding of Constitutional AI, for which a main component is “RL from AI Feedback.” Constitutional AI is all about removing humans from the loop in order to get high quality data more efficiently. There’s a politics thing where developers don’t want to say they’re removing human supervision, and it’s also true that human supervision will probably play a role in data generation in the future, but the human:total (AI+human) contribution to data ratio is surely going to go down. For example research using AIs where we used to use humans, see also Anthropic’s paper Model Written Evaluations, and the AI-labeled MACHIAVELLI benchmark. More generally, I would bet the trend toward automating datasets and benchmarks will continue, even if humans remain in the loop somewhat; insofar as humans are a limiting factor, developers will try to make them less necessary, and we already have AIs that perform very similarly to human raters at some tasks.
“We are constantly surprised in our day jobs as a journalist and AI researcher by how many questions do not have good answers on the internet or in books, but where some expert has a solid answer that they had not bothered to record. And in some cases, as with a master chef or LeBron James, they may not even be capable of making legible how they do what they do.” Not a disagreement, but I do wonder how much of this is a result of information being diffuse and just hard to properly find, a kind of task I expect AIs to be good at. For instance, 2025 language models equipped with search might be similarly useful to if you had a panel of relevant experts you could ask questions to.
Noting that section 3: “Even if technical AI progress continues, social and economic hurdles may limit its impact” matters for some outcomes and not for others. It matters given the authors define “transformative AI in terms of its observed economic impact.” It matters for many outcomes I care about like human well-being, that are related to economic impacts. It applies less to worries around existential risk and human disempowerment, for which powerful AIs may pose risks even while not causing large economic impacts ahead of time (e.g., bioterrorism doesn’t require first creating a bunch of economic growth).
Overall I think the claim of section 3 is likely to be right. A point pushing the other direction is that there may be a regulatory race to the bottom where countries want to enable local economic growth from AI and so relax regulations, think medical tourism for all kinds of services.
“Yet as this essay has outlined, myriad hurdles stand in the way of widespread transformative impact. These hurdles should be viewed collectively. Solving a subset may not be enough.” I definitely don’t find the hurdles discussed here to be sufficient to make this claim. It feels like there’s a motte and bailey, where the easy to defend claim is “these 3+ hurdles might exist, and we don’t have enough evidence to discount any of them”, and the harder to defend claim is “these hurdles disjunctively prevent transformative AI in the short term, so all of them must be conquered to get such AI.” I expect this shift isn’t intended by the authors, but I’m noting that I think it’s a leap.
“Scenarios where AI grows to an autonomous, uncontrollable, and incomprehensible existential threat must clear the same difficult hurdles an economic transformation must.” I don’t think this is the case. For example, section 3 seems to not apply as I mentioned earlier. I think it’s worth noting that AI safety researcher Eliezer Yudkowsky has made a similar argument to what you make in section 3, and he is also thinks existential catastrophe in the near term is likely. I think the point your making here is directionally right, however, that AI which poses existential risk is likely to be transformative in the sense you’re describing. That is, it’s not necessary for such AI to be economically transformative, and there are a couple other ways catastrophically-dangerous AI can bypass the hurdles you lay out, but I think it’s overall a good bet that existentially dangerous AIs are also capable of being economically transformative, so the general picture of hurdles, insofar as they are real, will affect such risks as well [I could easily see myself changing my mind about this with more thought]. I welcome more discussion on this point and have some thoughts myself, but I’m tired and won’t include them in this comment; happy to chat privately about where “economically transformative” and “capable of posing catastrophic risks” lie on various spectrums.
While my comment has been negative and focused on criticism, I am quite glad this article was written. Feel free to check out a piece I wrote, laying out some of my thinking around powerful AI coming soon, which is mostly orthogonal to this article. This comment was written sloppily, partially as my off-the-cuff notes while reading, sorry for any mistakes and impolite tone.
EA Claremont Winter 21/22 Intro Fellowship Retrospective
I disagree with a couple specific points as well as the overall thrust of this post. Thank you for writing it!
A maximizing viewpoint can say that we need to be cautious lest we do something wonderful but not maximally so. But in practice, embracing a pragmatic viewpoint, saving money while searching for the maximum seems bad.
I think I strongly disagree with this because opportunities for impact appear heavy tailed. Funding 2 interventions that are in the 90th percentile is likely less good than funding 1 intervention in the 99th percentile. Given this state of the world, spending much of our resources trying to identify the maximum is worthwhile. I think the default of the world is that I donate to a charity in the 50th percentile. And if I adopt a weak mandate to do lots of good (a non-maximizing frame, or an early EA movement), I will probably identify and donate to a charity in the 90th percentile. It is only when I take a maximizing stance, and a strong mandate to do lots of good (or when many thousands of hours have been spent on global priorities research), that I will find and donate to the very best charities. The ratios matter of course, and probably if I was faced with donating $1,000 to 90th percentile charities or $1 to a 99th percentile charity, I would probably donate to the 90th percentile charities, but if the numbers are $2 and $1, I should donate to the 99th percentile charity. I am claiming: the distribution of altruistic opportunities is roughly heavy tailed; the best (and maybe only) way to end up in the heavy tail is to take a maximizing approach; the “wonderful” thing that we would do without maximizing is, as measured ex post (looking at the results in retrospect), significantly worse than the best thing; a claim that I am also making, though which I think is weakest, is that we can differentiate between the “wonderful” and the “maximal available” opportunities ex ante (before hand) given research and reflection; the thing I care about is impact, and the EA movement is good insofar as it creates positive impact in the world (including for members of the EA community, but they are a small piece of the universe).
There are presumably people who would have pursued PhDs in computer science, and would have been EA-aligned tenure track professors now, but who instead decided to earn-to-give back in 2014. Whoops!
To me this seems like it doesn’t support the rest of your argument. I agree that the correct allocation of EA labor is not all doing AI Safety research, and we need to have outreach and career related resources to support people with various skills, but to me this is more-so a claim that we are not maximizing well enough — we are not properly seeking the optimal labor allocation because we’re a relatively uncoordinated set of individuals. If we were better at maximizing at a high level, and doing a good job of it, the problem you are describing would not happen, and I think it’s extremely likely that we can solve this problem.
With regard to the thrust of your post: I cannot honestly tell a story about how the non-maximizing strategy wins. That is, when I think about all the problems in the world: pandemics, climate change, existential threats from advanced AI, malaria, mass suffering of animals, unjust political imprisonment, etc., I can’t imagine that we solve these problems if we approach them like exercise or saving for retirement. If I actually cared about exercise or saving for retirement, I would treat them very differently than I currently do (and I have had periods in my life where I cared more about exercise and thus spent 12 hours a week in the gym). I actually care about the suffering and happiness in the world, and I actually care that everybody I know and love doesn’t die from unaligned AI or a pandemic or a nuclear war. I actually care, so I should try really hard to make sure we win. I should maximize my chances of winning, and practically this means maximizing for some of the proxy goals I have along the way. And yes, it’s really easy to mess up this maximize thing and to neglect something important (like our own mental health), but that is an issue with the implementation, not with the method.
Perhaps my disagreement here is not a disagreement about what EA descriptively is and more-so a claim about what I think a good EA movement should be. I want a community that’s not a binary in / out, that’s inclusive and can bring joy and purpose to many people’s lives, but what I want more than those things is for the problems in the world to be solved — for kids to never go hungry or die from horrible diseases, for the existence of humanity a hundred years from now to not be an open research question, for billions+ of sentience beings around the world to not live lives of intense suffering. To the extent that many in the EA community share this common goal, perhaps we differ in how to get there, but the strategy of maximizing seems to me like it will do a lot better than treating EA like I do exercise or saving for retirement.
Thanks for writing this! Epistemic note: I am engaging in highly motivated reasoning and arguing for veg*n.
As BenStewart mentioned, virtue ethics seems relevant. I would similarly point to Kant’s moral imperative of universalizability: “act only in accordance with that maxim through which you can at the same time will that it become a universal law.” Not engaging in moral atrocities is a case where we should follow such an ideal in my opinion. We should at least consider the implications under moral uncertainty and worldview diversification.
My journey in EA has in large part been a journey of “aligning my life and my choices to my values,” or trying to lead a more ethical life. To this end, it is fairly clear that being veg*n is the ethical thing to do relative to eating animal products (I would note I’m somewhere between vegan and vegetarian, and I think moving toward veganism is ethically better).
The signaling effect of being veg*n seems huge at both an individual and community level. As Luke Freeman mentioned, it would be hard to take EA seriously if we were less veg*n than average. Personally, I would likely not be in EA if being veg*n wasn’t relatively normal. This was a signal to me that these people really care and aren’t just in it when it’s convenient for them. This point seems pretty important and one of the things that hopefully sets EA apart from other communities oriented around doing good. I want to call back Ben Kuhn’s idea from 2013 of trying vs. pretending to try in terms of EA:
“A lot of effective altruists still end up satisficing—finding actions that are on their face acceptable under core EA standards and then picking those which seem appealing because of other essentially random factors.”
3.5. At an individual level, when I tell people about Intro to EA seminars I can say things like “In Week 3 we read about expanding our moral consideration and animal welfare. I realized that I wasn’t giving animals the moral consideration I think they deserve, and now I try to eat fewer animal products to align my values and my actions.” (I’ve never said it this eloquently). While I haven’t empirically tested it, people seem to like anecdotes like this.
4. I think as a community we’re asking for a lot of trust; something like “we want to align AI to not kill us all, and nobody else is doing it, so you have to trust us to do it.” Maybe this is an argument for hedging under moral uncertainty, or similarly trying to be less radical. I feel like an EA that is mostly veg*n is less radical than one with no veg*ns due to some of the other ethical claims we make (e.g., strong longtermism). Being less radical while still upholding our values sounds like a reasonable spot to be in when (implicitly) asking for the reigns to the future.
4.5 In this awesome paper, Evan Williams argues that hedging against individual possible moral catastrophes is quite difficult. In this case, it appears to me that we can still hedge here, and we should, given our position of influence.
5. Intuitively, diversifying across a range of activities that might be valuable seems useful. So having some things that are 0.01% chance of avoiding x-risk, some things that are 10% chance of reducing animal suffering, some things that are 95% chance of reducing malaria deaths, and 50% chance of reducing the number of animals suffering on factory farms. I need to write out my thoughts on this in more detail, but I think it’s useful to diversify across {chance of having any impact at all}, and not eating animals is a place where we can be pretty sure we’re having an impact in the long term.
A visualization of some orgs in the AI Safety Pipeline
This is great and I’m glad you wrote it. For what it’s worth, the evidence from global health does not appear to me strong enough to justify high credence (>90%) in the claim “some ways of doing good are much better than others” (maybe operationalized as “the top 1% of charities are >50x more cost-effective than the median”, but I made up these numbers).
The DCP2 (2006) data (cited by Ord, 2013) gives the distribution of the cost-effectiveness of global health interventions. This is not the distribution of the cost-effectiveness of possible donations you can make. The data tells us that treatment of Kaposi Sarcoma is much less cost-effective than antiretroviral therapy in terms of avoiding HIV related DALYs, but it tell us nothing about the distribution of charities, and therefore does not actually answer the relevant question: of the options available to me, how much better are the best than the others?
If there is one charity focused on each of the health interventions in the DCP2 (and they are roughly equally good at turning money into the interventions) – and therefore one action corresponding to each intervention – then it is true that the very best ways of doing good available to me are better than average.
The other extreme is that the most cost-effective interventions were funded first (or people only set up charities to do the most cost-effective interventions) and therefore the best opportunities still available are very close to average cost-effectiveness. I expect we live somewhere between these two extremes, and there are more charities set up for antiretroviral therapy than kaposi sarcoma.
The evidence that would change my mind is if somebody publicly analyzed the cost-effectiveness of all (or many) charities focused on global health interventions. I have been meaning to look into this, but haven’t yet gotten around to it. It’s a great opportunity for the Red Teaming Contest, and others should try to do this before me. My sense is that GiveWell has done some of this but only publishes the analysis for their recommended charities; and probably they already look at charities they expect to be better than average – so they wouldn’t have a representative data set.
I’m a bit confused by this post. I’m going to summarize the main idea back, and I would appreciate it if you could correct me where I’m misinterpreting.
Human psychology is flawed in such a way that we consistently estimate the probability of existential risk from each cause to be ~10% by default. In reality, the probability of existential risk from particular causes is generally less than 10% [this feels like an implicit assumption], so finding more information about the risks causes us to decrease our worry about those risks. We can get more information about easier-to-analyze risks, so we update our probabilities downward after getting this correcting information, but for hard-to-analyze risk we do not get such correcting information so we remain quite worried. AI risk is currently hard-to-analyze, so we remain in this state of prior belief (although the 10% part varies by individual, could be 50% or 2%).
I’m also confused about this part specifically:
initially assign something on the order of a 10% credence to the hypothesis that it will by default lead to existentially bad outcomes. In each case, if we can gain much greater clarity about the risk, then we should think there’s about a 90% chance this clarity will make us less worried about it
– why is there a 90% chance that more information leads to less worry? Is this assuming that for 90% of risks, they have P(Doom) < 10%, and for the other 10% of risks P(Doom) ≥ 10%?
Thanks for writing this up, it’s fantastic to get a variety of perspectives on how different messaging strategies work.
Do you have evidence or a sense of if people you have talked to have changed their actions as a result? I worry that the approach you use is so similar to what people already think that it doesn’t lead to shifts in behavior. (But we need nudges where we can get them)
I also worry about anchoring on small near term problems and this leading to a moral-licensing type effect for safety (and a false sense of security). It is unclear how likely this is. As in, if people care about AI Safety but lack the big picture, they might establish a safety team dedicated to say algorithmic bias. If the counter factual is no safety team, this is likely good. If the counter factual is a safety team focused on interpretability, this is likely bad. It could be that “having a safety team” makes an org or the people in it feel more justified in taking risks or investing less in other elements of safety (seems likely); this would be bad. To me, the cruxes here are something like: “what do people do after these conversations” “are the safety things they work on relevant to big problems” “how does safety culture interact with security-licensing or false sense of security”. I hope this comment didn’t come off aggressively. I’m super excited about this approach and particularly the way you meet people where they’re at, which is usually a much better strategy than how messaging around this usually comes off.
Progressives might be turned off by the phrasing of EA as “helping others.” Here’s my understanding of why. Speaking anecdotally from my ongoing experience as a college student in the US, mutual aid is getting tons of support among progressives these days. Mutual aid involves members of a community asking for assistance (often monetary) from their community, and the community helping out. This is viewed as a reciprocal relationship in which different people will need help with different things and at different times from one another, so you help out when you can and you ask for assistance when you need it; it is also reciprocal because benefiting the community is inherently benefiting oneself. This model implies a level field of power among everybody in the community. Unlike charity, mutual aid relies on social relations and being in community to fight institutional and societal structures of oppression (https://ssw.uga.edu/news/article/what-is-mutual-aid-by-joel-izlar/).
“[Mutual Aid Funds] aim to create permanent systems of support and self-determination, whereas charity creates a relationship of dependency that fails to solve more permanent structural problems. Through mutual aid networks, everyone in a community can contribute their strengths, even the most vulnerable. Charity maintains the same relationships of power, while mutual aid is a system of reciprocal support.” (https://williamsrecord.com/376583/opinions/mutual-aid-solidarity-not-charity/).
Within this framework, the idea of “helping people” often relies on people with power aiding the helpless, but doing so in a way that reinforces power difference. To help somebody is to imply that they are lesser and in need of help, rather than an equal community member who is particularly hurt by the system right now. This idea also reminds people of the White Man’s Burden and other examples of people claiming to help others but really making things worse.
I could ask my more progressive friends if they think it is good to help people, and they would probably say yes – or at least I could demonstrate that they agree with me given a few minutes of conversation – but that doesn’t mean they wouldn’t be peeved at hearing “Effective Altruism is about using evidence and careful reasoning to help others the best we can”
I would briefly note that mutual aid is not incompatible with EA to the extent that EA is a question; however, requiring that we be in community with people in order to help them means that we are neglecting the world’s poorest people who do not have access to (for example) the communities in expensive private universities.
- 28 Nov 2021 11:29 UTC; 2 points) 's comment on The Explanatory Obstacle of EA by (
Great post Mauricio! I’m a senior undergrad this year and this is the first semester I have deliberately taken fewer classes and focussed on things I find more important/interesting (mostly EA organizing). Best decision I’ve made in a while, and I’m getting much more out of my college experience now than before.
In regard to caveat 3 and people who benefit from structure/oversight, I would suggest the following:
Participate in or facilitate fellowships/reading groups for EA if EA is something you want to do. Having other people depend on you or expect things from you can be really motivating.
This evidence doesn’t update me very much.
I would prefer an EA Forum without your critical writing on it, because I think your critical writing has similar problems to this post...
I interpret this quote to be saying, “this style of criticism — which seems to lack a ToC and especially fails to engage with the cruxes its critics have, which feels much closer to shouting into the void than making progress on existing disagreements — is bad for the forum discourse by my lights. And it’s fine for me to dissuade people from writing content which hurts discourse”
Buck’s top-level comment is gesturing at a “How to productively criticize EA via a forum post, according to Buck”, and I think it’s noble to explain this to somebody even if you don’t think their proposals are good. I think the discourse around the EA community and criticisms would be significantly better if everybody read Buck’s top level comment, and I plan on making it the reference I send to people on the topic.
Personally I disagree with many of the proposals in this post and I also wish the people writing it had a better ToC, especially one that helps make progress on the disagreement, e.g., by commissioning a research project to better understand a relevant consideration, or by steelmanning existing positions held by people like me, with the intent to identify the best arguments for both sides.
Here are my notes which might not be easier to understand, but they are shorter and capture the key ideas:
Uneasiness about chains of reasoning with imperfect concepts
Uneasy about conjunctiveness: It’s not clear how conjunctive AI doom is (AI doom being conjunctive would mean that Thing A and Thing B and Thing C all have to happen or be true in order for AI doom; this is opposed to being disjunctive where either A, or B, or C would be sufficient for AI Doom), and Nate Soares’s response to Carlsmith’s powerseeking AI report is not a silver bullet; there is social pressure in some places to just accept that Carlsmith’s report uses a biased methodology and to move on. But obviously there’s some element of conjunctiveness that has to be dealt with.
Don’t trust the concepts: a lot of the early AI Risk discussion’s came before Deep Learning. Some of the concepts should port over to near-term-likely AI systems, but not all of them (e.g., Alien values, Maximalist desire for world domination)
Uneasiness about in-the-limit reasoning: Many arguments go something like this: an arbitrarily intelligent AI will adopt instrumental power seeking tendencies and this will be very bad for humanity; progress is pushing toward that point, so that’s a big deal. Often this line of reasoning assumes we hit in-the-limit cases around or very soon after we hit greater than human intelligence; this may not be the case.
AGI, so what?: Thinking AGI will be transformative doesn’t mean maximally transformative. e.g., the Industrial revolution was such, because people adapted to it
I don’t trust chains of reasoning with imperfect concepts: When your concepts are not very clearly defined/understood, it is quite difficult to accurately use them in complex chains of reasoning.
Uneasiness about selection effects at the level of arguments
“there is a small but intelligent community of people who have spent significant time producing some convincing arguments about AGI, but no community which has spent the same amount of effort looking for arguments against”
The people who don’t believe the initial arguments don’t engage with the community or with further arguments. If you look at the reference class “people who have engaged with this argument for more than 1 hour” and see that they all worry about AI risk, you might conclude that the argument is compelling. However, you are ignoring the major selection effects in who engages with the argument for an hour. Many other ideological groups have a similar dynamic: the class “people who have read the new testament” is full of people who believe in the Christian god, which might lead you to believe that the balance of evidence is in their favor — but of course, that class of people is highly selected for those who already believe in god or are receptive to such a belief.
“the strongest case for scepticism is unlikely to be promulgated. If you could pin folks bouncing off down to explain their scepticism, their arguments probably won’t be that strong/have good rebuttals from the AI risk crowd. But if you could force them to spend years working on their arguments, maybe their case would be much more competitive with proponent SOTA”
Ideally we want to sum all the evidence for and all the evidence against and compare. What happens instead is skeptics come with 20 evidence and we shoot them down with 50 evidence for AI risk. In reality there could be 100 evidence against and only 50 evidence for, and we would not know this if we didn’t have really-well-informed skeptics or we weren’t summing their arguments over time.
“It is interesting that when people move to the Bay area, this is often very “helpful” for them in terms of updating towards higher AI risk. I think that this is a sign that a bunch of social fuckery is going on.”
“More specifically, I think that “if I isolate people from their normal context, they are more likely to agree with my idiosyncratic beliefs” is a mechanisms that works for many types of beliefs, not just true ones. And more generally, I think that “AI doom is near” and associated beliefs are a memeplex, and I am inclined to discount their specifics.”
Miscellanea
Difference between in-argument reasoning and all-things-considered reasoning: Often the gung-ho people don’t make this distinction.
Methodological uncertainty: forecasting is hard
Uncertainty about unknown unknowns: Most of the unknown unknowns seem likely to delay AGI, things like Covid and nuclear war
Updating on virtue: You can update based on how morally or epistemically virtuous somebody is. Historically, some of those pushing AI Risk were doing so not for the goal of truth seeking but for the goal of convincing people
Industry vs AI safety community: Those in industry seem to be influenced somewhat by AI Safety, so it is hard to isolate what they think
Conclusion
Main classes of things pointed out: Distrust of reasoning chains using fuzzy concepts, Distrust of selection effects at the level of arguments, Distrust of community dynamics
Now in a position where it may be hard to update based on other people’s object-level arguments