Trying to better understand the practical epistemology of EA, and how we can improve upon it.
Violet Hour
Effective Altruism’s Implicit Epistemology
FTX, ‘EA Principles’, and ‘The (Longtermist) EA Community’
(No problem with self-linking, I appreciate it!)
Also, I think there’s an adequate Kantian response to your example. Am I missing something?Part of the relevant context for your action includes, by stipulation, you knowing that the needed 50 votes have been cast. This information changes your context, and thus you reason “what action would rational agents in this situation (my epistemic state included) perform to best achieve my ends” — in this case, as the 50 votes have been cast, you don’t cast another.
So, I act on the basis of maxims, but changes in my epistemic state can still appropriately inform my decision-making.
Thanks for the comment, this is a useful source. I agree that SBF’s actions violated “high standards of honesty” (as well as, um, more lenient ones), and don’t seem like the actions of a good citizen.
Still, I’ll note that I still feel hesitant about claims like “Sam violated the principles of the EA community”, because your cited quote is not the only way that EA is defined. I agree that we can find accounts of EA under which Sam violated those principles. Relative to those criteria, it would be correct to say “Sam violated EA principles”. Thus, I think you and I both agree that saying things like “Sam acted in accordance with EA principles” would be wrong.However, I have highlighted other accounts of “what EA is about”, under which I think it’s much harder to say that Sam straightforwardly violated those principles — accounts which place more emphasis on the core idea of maximization. And my intuitions about when it was appropriate to make claims of the form ‘Person X violated the principles of this community’ requires something close to unanimity, prior to X’s action, of what the core principles actually are, and what they commit to you. Due to varying accounts of what EA ‘is’, or ‘about’, I reject the claim that Sam violated EA principles for much the same reason that I reject claims like Sam acted in accordance with them. So, I still think I stand behind my indeterminacy claim.
I’m unsure where we disagree. Do you think you have more lenient standards than me for when we should talk about ‘violating norms’, or do you think that (some of?) the virtues listed in your quote are core EA principles, and close-to-unanimously agreed upon?
The short answer is: I think the norm delivers meaningfully different verdicts for certain ways of cashing out ‘act consequentialism’, but I imagine that you (and many other consequentialists) are going to want to say that the ‘Practical Kantian’ norm is compatible with act consequentialism. I’ll first discuss the practical question of deontic norms and EA’s self-conception, and then respond to the more philosophical question.
1.
If I’m right about your view, my suggested Kantian spin would (for you) be one way among many to talk about deontic norms, which could be phrased in more explicitly act-consequentialist language. That said, I still think there’s an argument for EA as a whole making deontic norms more central to its self-conception, as opposed to a conception where some underlying theory of the good is more central. EA is trying to intervene on people’s actions, after all, and your underlying theory of the good (at least in principle) underdetermines your norms for action. So, to me, it seems better to just directly highlight the deontic norms we think are valuable. EA is not a movement of moral theorists qua moral theorists, we’re a movement of people trying to do stuff that makes the world better. Even as a consequentialist, I guess that you’re only going to want involvement with a movement that shares broadly similar views to you about the action-relevant implications of consequentialism.
I want to say that I also think there should be clear public work outlining how the various deontic norms we endorse in EA clearly follow from consequentialist theories. Otherwise, I can see internal bad actors (or even just outsiders) thinking that statements about the importance of deontological norms are just about ‘brand management’, or whatever. I think it’s important to have a consistent story about the ways in which our deontic norms related to our more foundational principles, both so that outsiders don’t feel like they’re being misled about what EA is about, and so that we have really explicit grounds on which to condemn certain behaviors as legitimately and unambiguously violating norms that we care about.
(Also, independently, I’ve (e.g.) met many people in EA who seem to flit between ‘EUT is the right procedure for practical decision-making’ and ‘EUT is an underratedly useful tool’ — even aside from discussions of side-constraints, I don’t think we have a clear conception as to what our deontic norms are, and I think this would independently beneficial. For instance, I think it would be good to have a clearer account of the procedures that really drive our prioritization decisions).2.
On a more philosophical level, I believe that various puzzle cases in decision theory help motivate the case for treating maxims as the appropriate evaluative focal point wrt rational decision-making, rather than acts. Here are some versions of act consequentialism that I think will diverge from the Practical Kantian norm:
Kant+CDT tells you to one-box in the standard Newcomb problem, whereas Consequentialism+CDT doesn’t.
Consequentialism+EDT is vulnerable XOR blackmail, whereas Kant+CDT isn’t.
Perhaps there is a satisfying decision theory which, combined with act-consequentialism, provides you with (what I believe to be) the right answers to decision-theoretic puzzle cases, though I’m currently not convinced. I think I might also disagree with you about the implications of collective action problems for consequentialism (though I agree that what you describe as “The Rounding to Zero Fallacy” and “The First-Increment Fallacy” are legitimate errors), but I’d want to think more about those arguments before saying anything more.
I disagree with this inference. If I’d heard that (say) supportive feminist tweets were routinely getting fewer retweets than tweets critical of feminism, I don’t think I’d believe that feminists were “definitely doing things wrong PR-wise”. Tweet numbers could be relevant evidence, given some wider context, like “there’s a social trend where the most controversial and peripheral feminist ideas get disproportionately promulgated, at the expense of more central and popular ideas”, but I’m not convinced EA is in a similar situation.
I don’t have a view on whether buying Wytham was a good idea, but I do agree with Owen that we should “let decisions be guided less by what we think looks good, and more by what we think is good”. I want people to act on important ideas, and I think it’s bad when people are turned away from important ideas — but one important idea I want to spread is Owen’s, where we emphasize the virtue of performing actions you can ultimately stand behind, even if the action has bad optics.
Some brief, off-the-cuff sociological reflections on the Bostrom email:
EA will continue to be appealing to those with an ‘edgelord’ streak. It’s worth owning up to this, and considering how to communicate going forward in light of that fact.
I think some of the reaction to the importance of population-level averages is indicative of an unhealthy attitude towards population-level averages.
I also think the ‘epistemic integrity’ angle is important.
Each consideration is discussed below.
1.
I think basically everyone here agrees that white people shouldn’t be using (or mentioning) racial slurs. I think you should also avoid generics. I think that you very rarely gain anything, even from a purely epistemic point of view, from saying (for example): “men are more stupid than women”.
EA skews young, and does a lot of outreach on university campuses. I also think that EA will continue to be attractive to people who like to engage in the world via a certain kind of communication, and I think many people interested in EA are likely to be drawn to controversial topics. I think this is unavoidable. Given that it’s unavoidable, it’s worth being conscious of this, and directly tackling what’s to be gained (and lost) from certain provocative modes of communication, in what contexts.
Lead poisoning affects IQ. The Flint water crisis affected an area that was majority African American, and we have strong evidence that lead poisoning affects IQ. Here’s one way of ‘provocatively communicating’ certain facts.
The US water system is poisoning black people.
I don’t like this statement and think it is true.
Deliberately provocative communication probably does have its uses, but it’s a mode of communication that can be in tension with nuanced epistemics, as well as kindness. If I’m to get back in touch with my own old edgelord streak for a moment, I’d say that one (though obviously not the major) benefit of EA is the way it can transform ‘edgelord energy’ into something that can actually make the world better.
I think, as with SBF, there’s a cognitive cluster which draws people towards both EA, and towards certain sorts of actions most of us wish to reflectively disavow. I think it’s reasonable to say: “EA messaging will appeal (though obviously will not only appeal) disproportionately to a certain kind of person. We recognize the downsides in this, and here’s what we’re doing in light of that.”
2.
There are fewer women than men in computer science. This doesn’t give you any grounds — once a woman says “I’m interested in computer science”, to be like “oh, well, but maybe she’s lying, or maybe it’s a joke, or … ”
Why? You have evidence that means you don’t need to rely on such coarse data! You don’t need to rely on population-level averages! To the extent that you do, or continue to treat such population-averages as highly salient in the face of more relevant evidence, I think you should be criticized. I think you should be criticized because it’s a sign that your epistemics have been infected by pernicious stereotypes, which makes you worse at understanding the world, in addition to being more likely to cause harm when interacting in that world.
3.
You should be careful to believe true things, even when they’re inconvenient.
On the epistemic level, I actually think that we’re not in an inconvenient possible world wrt the ‘genetic influence on IQ’, partially because I think certain conceptual discussions of ‘heritability’ are confused, and partially because I think that it’s obviously reasonable to look at the (historically quite recent!) effects of slavery, and conclude “yeaah, I’m not sure I’d expect the data we have to look all that different, conditioned on the effects of racism causing basically all of the effects we currently see”.
But, fine, suppose I’m in an inconvenient possible world. I could be faced with data that I’d hate to see, and I’d want to maintain epistemic integrity.
One reason I personally found Bostrom’s email sad was that I sensed a missing mood. To support this, here’s an intuition pump that might be helpful: suppose you’re back in the early days of EA, working for $15k in the basement of an estate agent. You’ve sacrificed a lot to do something weird, sometimes you feel a bit on the defensive, and you worry that people aren’t treating you with the seriousness you deserve. Then, someone comes along, says they’ve run some numbers, and told you that EA was more racist than other cosmopolitan groups, and — despite EA’s intention to do good — is actually far more harmful to the world than other comparable groups. Suppose further that we also ran surveys and IQ tests, and found that EA is also more stupid and unattractive than other groups. I wouldn’t say:
EA is harmful, racist, ugly, and stupid. I like this sentence and think it’s true.Instead, I’d communicate the information, if I thought it was important, in a careful and nuanced way. If I saw someone make the unqualified statement quoted above, I wouldn’t personally wish to entrust that person with promoting my best interests, or with leading an institute directed towards the future of humanity.
I raise this example not because I wish to opine on contemporary Bostrom, based on his email twenty-six years ago. I bring this example up because, while (like 𝕮𝖎𝖓𝖊𝖗𝖆) I’m glad that Bostrom didn’t distort his epistemics in the face of social pressure, I think it’s reasonable to think (like Habiba, apologies if this is an unfair phrase) that Bostrom didn’t take ownership for his previously missing mood, and communicate why his subsequent development leads him to now repudiate what he said.
I don’t want to be unnecessarily punitive towards people who do shitty things. That’s not kindness. But I also want to be part of a community that promotes genuinely altruistic standards, including a fair sense of penance. With that in mind, I think it’s healthy for people to say: “we accept that you don’t endorse your earlier remark (Bostrom originally apologized within 24 hours, after all), but we still think your apology misses something important, and we’re a community that wants people who are currently involved to meet certain standards.”
Puzzles for Some
Interesting comment! But I’m also not convinced. :P
… or, more precisely, I’m not convinced by all of your remarks. I actually think you’re right in many places, though I’ll start by focusing on points of disagreement.
(1) On Expected Payoffs.
You ask whether I’m saying: “when given a choice, you can just … choose the option with a worse payoff?”
I’m saying ‘it’s sometimes better to choose an option with lower expected payoff’. Still, you might ask: “why would I choose an option with lower expected payoff?”
First, I think the decision procedure “choose the option with the highest expected payoff” requires external justification. I take it that people appeal to (e.g.) long-run arguments for maximizing expected utility because they acknowledge that the decision procedure “choose the action with highest expected payoff” requires external justification.
Arguments for EU maximization are meant to show you how to do better by your own values. If I can come up with an alternative decision procedure which does better by my values, this is an argument for not choosing the action with the highest expected payoff. And I take myself to be appealing to the same standards which (doing better by your lights) which are appealed to in defense of EU maximization.
I intepret you as also asking a separate question, of the form: “you’re recommending a certain course of action — why (or on what basis) do you recommend that course of action?”
Trying to justify my more foundational reasons will probably take us a bit too far afield, but in short: when I decide upon some action, I ask myself the question “do I recommend that all rational agents with my values in this decision context follow the decision procedure I’m using to determine their action?
I think this criteria is independently justified, and indeed more foundational than purported justifications for EU maximization. Obviously, I don’t expect you (or anyone else) to be convinced by this short snippet, but I do take myself to have reasons for action. I just think those reasons provide more foundational justifications than the use of EUM as a decision-procedure, and in certain cases license rejecting EUM (qua decision procedure).
In the case offered by Beckstead and Thomas, I think STOCHASTIC does better by my values than the decision procedure “choose, at each point, the action with the highest expected payoff”. That’s why I decide in the way I do.
In summary: Beckstead and Thomas’ case provides me with exogenously given payoffs, and then invites me to choose, in line with my preferences over payoffs, on a given course of action. I don’t think I’m deciding to act in a way which, by my lights, is worse than some other act. My guess is that you interpret me as choosing an option which is worse by my own lights because you interpret me as having context-independent preferences over gambles, and choosing an option I disprefer.
I’m saying, instead, that the assumption of ‘context-independent preferences over gambles’ are not given for free, and are, at least, not explicitly provided in Beckstead and Thomas’ setup. I have preferences over fixed states of the world, or ‘degenerate’ gambles. Given that I don’t possess certainty over future states of the world, I use some decision procedure for navigating this uncertainty. This doesn’t mean that I ‘really’ possess context-independent preferences over gambles, and then act in accordance with them.
“Wait, are you really saying that you don’t prefer a state of the world which delivers Utopia with probability (), and something marginally worse than the current world with the remaining probability?”
Yes, but I think that’s less weird than it sounds. The gamble I mentioned is better than the status quo with some probability, and worse with some probability. Is that state of the world ‘better’? Well, it’s better with some probability, and worse with some probability! I don’t feel the need to construct some summary term which captures my “all-things-considered betterness ranking under uncertainty”.
(2) On STOCHASTIC More Specifically
I think you’re right that some alternative strategy to STOCHASTIC is preferable, and probably you’re right taking exactly N tickets is preferable. I’ll admit that I didn’t think through a variety of other procedures, STOCHASTIC was just the first thought that came to mind.
One final critical response.
I would also be disappointed if I drew a black ball first. But I think I would be similarly disappointed if I drew a black ball at a later time. I think this is just a consequence of the fact that you can design decision-environments in which people will always be disappointed.
For example, you can always (trivially) design an environment in which the option with highest payoff includes a “disappointment” term incorporated. In which case, you’ll always be disappointed if you choose the option with the highest payoff for you. Does this mean that you didn’t actually want the option with the highest payoff?
Thanks for the helpful pushback, and apologies for the late reply!
I have some interest in this, although I’m unsure whether I’d have time to read the whole book — I’m open to collaborations.
This is a really wonderful post, Joe. When I receive notifications for your posts, I feel like I’m put in touch with the excitement that people in the 1800s might have felt when being delivered newspapers containing serially published chapters of famous novels. : )
Okay, enough buttering up. Onto objections.
I very much like your notions of taking responsibility, and of seeing yourself whole. However, I object to certain ways you take yourself to be applying these criteria.
(I’ll respond in two comments; the points are related, but I wanted to make it easier to respond to each point independently)
1. Understanding who we are, and who it’s possible to be
My first point of pushback: I think that your suggested way of engaging with population axiology can, in many cases, impede one’s ability to take full responsibility for one’s values, through improperly narrowing the space of who it’s possible to be.
When I ask myself why I care about understanding what it’s possible to be, it’s because I care about who I can be — what sort of thing, with what principles, will the world allow me to be?
In your discussion of Utopia and Lizards, you could straightforwardly bring out a contradiction in the views of your interlocutor, because you engineered a direct comparison between concrete worlds, in a way that was analogous to the repugnant conclusion.
Moreover, your interlocutor endorsed certain principles that were collectively inconsistent. You need to have your interlocutor endorse principles, because you don’t get inconsistency results from mere behavior.
People can just decide between concrete worlds however they like. You can only show that someone is inconsistent if they take themselves to be acting on the basis of incompatible principles.
I agree that doing ethics (broadly construed) can, for the anti-realist, help them understand which sets of principles it even makes sense to endorse as a whole. So I agree with your abstract claim about ethics helping the anti-realist see which principles they can coherently endorse together. But I also believe that certain kinds of formal theorizing can inhibit our sense of what (or who) it’s possible to be, because certain kinds of theorizing can (incorrectly) lead us to believe that we are operating within a space which captures the only possible way to model our moral commitments.
For instance: I don’t think that I’m committed to a well-defined, impartial, and context-independent, aggregate welfare ranking with the property of finite fine-grainedness. The axioms of Arrehnius’ impossibility theorem (to which you allude) quantify over welfare levels with well-defined values.
If I reflect on my principles, I don’t find this aggregate welfare measure directly, nor do I see that it’s entailed by any of my other commitments. If I decide on one concrete world over another, I don’t take this to be grounded in a claim about aggregate welfare.
I don’t mean to say that I think there are no unambiguous cases where societies (worlds) are happier than others. Rather, I mean to say that granting some determinate welfare rankings over worlds doesn’t mean that I’m thereby committed to the existence of a well-defined, impartial welfare ranking over worlds in every context.
So: I think I have principles which endorse the claim: ‘Utopia > Lizards’, and I don’t think that leaves me endorsing some unfortunate preference about concrete states of affairs. In Utopia and Lizards, Z (to me) seems obviously worse than A+. In the original Mere Addition Paradox, it’s a bit trickier, because Parfit’s original presentation assumes the existence of ‘an’ aggregate welfare-level, which is meant to represent some (set of) concrete state of affairs. And I think more would need to be said in order to convince me that there’s some fact of the matter about which concrete situations instantiate Parfit’s puzzle.
How does this all relate to your initial defense of moral theorizing? In short, I think that moral theorizing can have benefits (which you suggest), but — from my current perspective — I feel as though moral theorizing can also impose an overly narrow picture of what a consistent moral self-conception must look like.
(Second Comment)
2. On seeing ourselves wholeYou say, in response to messy pluralism:
“We can talk, individually, about each of a zillion little choice vectors one by one; but we don’t know where they push in combination, what they are doing, what explains them; what they represent. We can see ourselves making any given specific choice. But we can’t see ourselves whole.”
I love the sentiment you express here. I engage in moral reasoning as an attempt to see (and indeed construct) myself whole. With that said, I’m unsure how much “self-knowledge” we actually lose by adopting messy pluralism. I want to look at three components of the quote, and explain how I see myself whole in response t
A. What are my little choice vectors doing?
At an abstract level, my choice vectors are pushing me towards actions I can genuinely stand behind. They’re pushing me towards actions which, if I reflect, I can prescribe for all agents with my fuzzy, inchoate values in the decision-context I find myself.
B. What explains my little choice vectors?
Well, there’ll be some causal stories of the ordinary, standard type. But you know this. I take it that, through this question, you’re asking: what rationalizes my choices? What makes it the case that I am acting agentically, and with responsibility? Thus the final question.
C. What do my choice vectors represent?
In the ideal case, they represent something like my answer in (A): that is, they represent the actions I’d prescribe for all agents with my fuzzy, inchoate values in my decision-context.
You might reasonably point out that this response is largely uninformative. What do my fuzzy, inchoate values actually represent?
To see myself whole is to see myself as I actually am. That means, yes, seeing myself as someone who is genuinely committed to certain principles, and seeing myself as someone who can be surprised by what’s entailed by my principles. But to see myself honestly is also to see myself as someone in the process of becoming more whole; it’s to see myself as someone who has not yet (fully, at least) worked themselves out.
So, what do my choice vectors represent? They represent a desire to alleviate the distress of those who are (and will be) suffering. They represent a desire to face up to the vast scale of the world, and a desire to face up to the fact that the world may not be how I wish it to be. And my choice vectors represent a desire to “look again” at morality, and to allow for the possibility that there’s something I might have missed.
D. Concluding messy pluralism
I think that acknowledging some degree of messy pluralism is part of what allows me to see myself whole. It allows me to encounter my values (my heart, my sentiment, whatever) as they actually are, rather than the values of some hypothetical, more precisely systematized offshoot of me.
I agree that, to see oneself whole, one should look at the totality of one’s choices and principles, and then ask “wait, what exactly is going on here?”. Indeed, I think that this is particularly important to do when certain tensions in our principles or actions are brought to light.
That said, I’m skeptical of how much more “self-knowledge” the utilitarian framework actually provides. The utilitarian can say, of course, that they are a force for “total utility”. But what does this mean, exactly? What’s the mapping between between valenced experiential states and welfare numbers, and, indeed, what justifies any particular mapping?
When we get to what exactly we mean by total utility, I do become unsure about what the utilitarian is “a force” for. I think this is clear in population axiology and infinite ethics. Of course, there are more humdrum cases when this is clearer (though so too for the right kind of pluralist), and we may hope for some clever workaround in infinite cases.
But, insofar as utilitarians hope for and try to construct workarounds to (e.g.) cases in infinite ethics, then I think we’ve shown that real-life utilitarians are primarily a force for something more fuzzy and foundational than straightforward utilitarianism. This force, after all, is what motivates utilitarians to reject claims of equal welfare between (some) infinite worlds. At bottom, I think, the utilitarian doesn’t really have much more of a sense of what “force” they are than (at least some) pluralists — they’re primarily using “total utility” as a placeholder for a set of more complicated sentiments.
Hm, I still feel as though Sanjay’s example cuts against your point somewhat. For instance, you mentioned encountering the following response:
“It is better for us to have AGI first than [other organization], that is less safety minded than us.”
To the extent that regulations slow down potential AGI competitors in China, I’d expect stronger incentives towards safety, and a correspondingly lower chance of encountering potentially dangerous capabilities races. So, even if export bans don’t directly slow down the frontier of AI development, it seems plausible that such bans could indirectly do so (by weakening the incentives to sacrifice safety for capabilities development).
Your post + comment suggests that you nevertheless expect such regulation to have ~0 effect on AGI development races, although I’m unsure which parts of your model are driving that conclusion. I can imagine a couple of alternative pictures, with potentially different policy implications.
Your model could involve potential participants in AGI development races viewing themselves primarily in competition with other (e.g.) US firms. This, combined with short timelines, could lead you to expect the export ban to have ~0 effect on capabilities development.
On this view, you would be skeptical about the usefulness of the export ban on the basis of skepticism about China developing AGI (given your timelines), while potentially being optimistic about the counterfactual value of domestic regulation relating to chip production.
If this is your model, I might start to wonder “Could the chip export ban affect the regulatory Overton Window, and increase the chance of domestic chip controls?”, in a way that makes the Chinese export ban potentially indirectly helpful for slowing down AGI.
To be clear, I’m not saying the answer to my question above is “yes”, only that this is one example of a question that I’d have on one reading of your model, which I wouldn’t have on other readings.
Alternatively, your model might instead be skeptical about the importance of compute, and consequently skeptical about the value of governance regimes surrounding a wide variety of even-somewhat-quixotic-suggestions relating to domestic chip regulation.
I sensed that you might have a less compute-centric view based on your questions to leading AI researchers, asking if they “truly believe there are any major obstacles left” which major AI companies were unable to “tear down with their [current?] resources”.
Based on that question – alongside your assigning a significant probability to <5 year timelines – I sensed that you might have a (potentially not-publicly-disclosable) impression about the current rate of algorithmic progress.[1]
I don’t want to raise overly pernickety questions, and I’m glad you’re sharing your concerns. I’m asking for more details about your underlying model because the audience here will consist of people who (despite being far more concerned about AGI than the general population) are on average far less concerned – and on average know less about the technical/governance space – than you are. If you’re skeptical about the value of extant regulation affecting AGI development, it would be helpful at least for me (and I’m guessing others?) to have a bit more detail on what’s driving that conclusion.
- ^
I don’t mean to suggest that you couldn’t have more ‘compute-centric’ reasons for believing in short timelines, only that some your claims (+tone) updated me a bit in this direction.
Thanks for the comment!
(Fair warning, my response will be quite long)
I understand you to be offering two potential stories to justify ‘speculativeness-discounting’.
First, EAs don’t (by and large) apply a speculativeness-discount ex post. Instead, there’s a more straightforward ‘Bayesian+EUM’ rationalization of the practice. For instance, the epistemic practice of EAs may be better explained with reference to more common-sense priors, potentially mediated by orthodox biases.
Or perhaps EAs do apply a speculativeness-discount ex post. This too can be justified on Bayesian grounds.
We often face doubts about our ability to reason through all the relevant considerations, particularly in speculative domains. For this reason, we update on higher-order uncertainty, and implement heuristics which themselves are justified on Bayesian grounds.
In my response, I’ll assume that your attempted rationale for Principle 4 involves justifying the norm with respect to the following two views:
Expected Utility Maximization (EUM) is the optimal decision-procedure.
The relevant probabilities to be used as inputs into our EUM calculation are our subjective credences.
The ‘Common Sense Priors’ Story
I think your argument in (1) is very unlikely to provide a rationalization of EA practice on ‘Bayesian + EUM’ grounds.[1]
Take Pascal’s Mugging. The stakes can be made high enough that the value involved can easily swamp your common-sense priors. Of course, people have stories for why they shouldn’t give the money to the mugger. But these stories are usually generated because handing over their wallet is judged to be ridiculous, rather than the judgment arising from an independent EU calculation. I think other fanatical cases will be similar. The stakes involved under (e.g.) various religious theories and our ability to acausally affect an infinite amount of value are simply going to be large enough to swamp our initial common-sense priors.
Thus, I think the only feasible ‘Bayes+EUM’ justification you could offer would have to rely on your ‘higher-order evidence’ story about the fallibility of our first-order reasoning, which we’ll turn to below.
The ‘Higher-Order Evidence’ Story
I agree that we can say: “we should be fanatical insofar as my reasoning is correct, but I am not confident in my reasoning.”
The question, then, is how to update after reflecting on your higher-order evidence. I can see two options: either you have some faith in your first-order reasoning, or no faith.
Let’s start with the case where you have some faith in your first-order reasoning. Higher-order evidence about your own reasoning might decrease the confidence in your initial conclusion. But, as you note, “we might find that the EV of pursuing the speculative path warrants fanaticism”. So, what to do in that case?
I think it’s true that many people will cite considerations of the form “let’s pragmatically deprioritize the high EV actions that are both speculative and fanatical, in anticipation of new evidence”. I don’t think that provides a sufficient justificatory story of the epistemic norms to which most of us hold ourselves.
Suppose we knew that our evidential situation was as good as it’s ever going to be. Whatever evidence we currently have about (e.g.) paradoxes in infinite ethics, or the truth of various religions constitutes ~all the evidence we’re ever going to have.
I still don’t expect people to follow through on the highest EV option, when that option is both speculative and fanatical.
Under MEC, EAs should plausibly be funneling all their money into soteriological research. Or perhaps you don’t like MEC, and think we should work out the most plausible worldview under which we can affect strongly-Ramsey-many sentient observers.[2]
Or maybe you have a bounded utility function. In that case, imagine that the world already contains a sufficiently large number of suffering entities. How blase are you, really, about the creation of arbitrarily many suffering-filled hellscapes?
There’s more to say here, but the long and short of it is: if you fail to reach a point where you entirely discount certain forms of speculative reasoning, I don’t think you’ll be able to recover anything like Principle 4. My honest view is that many EAs have a vague hope that such theories will recover something approaching normality, but very few people actually try to trace out the implications of such theories on their own terms, and follow through on these implications. I’m sympathetic to this quote from Paul Christiano:
I tried to answer questions like “How valuable is it to accelerate technological progress?” or “How bad is it if unaligned AI takes over the world?” and immediately found that EU maximization with anything like “utility linear in population size” seemed to be unworkable in practice. I could find no sort of common-sensical regularization that let me get coherent answers out of these theories, and I’m not sure what it would look like in practice to try to use them to guide our actions.
Higher-Order Evidence and Epistemic Learned Helplessness
Maybe you’d like to say: “in certain domains, we should assign our first-order calculations about which actions maximize EU zero weight. The heuristic ‘sometimes assign first-order reasoning zero weight’ can be justified on Bayesian grounds.”
I agree that we should sometimes assign our first-order calculations about which actions maximize EU zero weight. I’m doubtful that Bayesianism or EUM play much of a role in explaining why this norm is justified.
When we’re confronted with the output of an EUM calculation that feels off, we should listen to the parts of us which tell us to check again, and ask why we feel tempted to check again.
If we’re saying “no, sorry, sometimes I’m going to put zero weight on a subjective EU calculation”, then we’re already committed to a view under which subjective EU calculations only provide action-guidance in the presence of certain background conditions.
If we’re willing to grant that, then I think the interesting justificatory story is a story which informs us of what the background conditions for trusting EU calculations actually are — rather than attempts to tell post hoc stories about how our practices can ultimately be squared with more foundational theories like Bayesianism + EUM.
If you’re interested, I’ll have a post in April touching on these themes. :)
I’m not Joe, but I thought I’d offer my attempt. It’s a little more than a few lines (~350 words), though hopefully it’s of some use.
Moral anti-realists often think about moral philosophy, even though they believe there are no moral facts to discover. If there are no facts to be discovered, we might ask “why bother? What’s the point of doing ethics?”
Joe provides three possible reasons:
Through moral theorizing, we can better understand which sets of principles it’s possible to consistently endorse.
Sometimes, ethical theorizing can help you discover a tension among the different principles you’re drawn to. EJT offers one nice example in his comment. Joe takes various impossibility results in population ethics to provide another example.
Through moral theorizing, you can also develop a better self-understanding. If you’re just jumbling along without ever reflecting on your principles, you don’t know what you stand for.
Consider the total utilitarian. They’ve engaged in moral theorizing, and now better understand (so claims Joe) what they stand for. If you never reflect on your values, you forgo some degree of agency. You forgo the ability to properly push for what you care about, because to a large degree you don’t know exactly what you care about.
We can call the first two benefits of moral theorizing ‘static benefits’ (not Joe’s term). Moral theorizing can benefit you by taking for granted your psychology, and provide you with tools to better understand your psychology, and make your existing principles more coherent. However, there’s also a more dynamic benefit to be had from moral theorizing.
Moral theorizing can help you construct the person you want to be. This benefit is harder to precisely convey.
My analogy: I like going to galleries with friends who know more about the visual arts than I do. Sometimes, I’ll look at a painting and just not get it. Then, my friend will point out a detail I’ve missed, and get me to look again.
In many cases, this will make me like the painting more. It’s not that my friend provided me with more self-understanding by informing me that “I liked the painting all along”. Rather, I’ve grown to like the painting more through seeing it more clearly. Ethical theorizing can provide a similar benefit. When we engage in ethical theorizing, we “look again” or “look more deeply” at who we are. This is partly about understanding who we already were, and partly about understanding who we want to become.
Probabilities, Prioritization, and ‘Bayesian Mindset’
Thanks :)
I’m sympathetic to the view that calibration on questions with larger bodies of obviously relevant evidence aren’t transferable to predictions on more speculative questions. Ultimately I believe that the amount of skill transfer is an open empirical question, though I think the absence of strong theorizing about the relevant mechanisms involved heavily counts against deferring to (e.g.) Metaculus predictions about AI timelines.
A potential note of disagreement on your final sentence. While I think focusing on calibration can Goodhart us away from some of the most important sources of epistemic insight, there are “predictions” (broadly construed) that I think we ought to weigh more highly than “domain-relevant specific accomplishments and skills”.
E.g., if you’re sympathetic to EA’s current focus on AI, then I think it’s sensible to think “oh, maybe Yudkowsky was onto something”, and upweight the degree to which you should engage in detail with his worldview, and potentially defer to the extent that you don’t possess a theory which jointly explains both his foresight and the errors you currently think he’s making.
My objection to ‘Bayesian Mindset’ and the use of subjective probabilities to communicate uncertainty is (in part) due to the picture imposed by the probabilistic mode of thinking, which is something like “you have a clear set of well-identified hypotheses, and the primary epistemic task is to become calibrated on such questions.” This leads me to suspect that EAs are undervaluing the ‘novel hypotheses generation’ component of predictions, though there is still a lot of value to be had from (novel) predictions.
I haven’t read Kosoy & Diffractor’s stuff, but I will now!
FWIW I’m pretty skeptical that their framework will be helpful for making progress in practical epistemology (which I gather is not their main focus anyway?). That said, I’d be very happy to learn that I’m wrong here, so I’ll put some time into understanding what their approach is.
Nice post!
I think I’d want to revise your first taxonomy a bit. To me, one (perhaps the primary) disagreement among ML researchers regarding AI risk consists of differing attitudes to epistemological conservatism, which I think extends beyond making conservative predictions. Here’s why I prefer my framing:
As you note, to say that someone makes a conservative prediction comes with other connotations, like predictions being robust to uncertainty.
If I say that someone has a conservative epistemology, I think this more faithfully captures the underlying disposition — namely, that they are conservative about the power of abstract theoretical arguments to deliver strong conclusions in the absence of more straightforwardly relevant empirical data.
I don’t interpret the most conservative epistemologists as primarily driven by a fear of making ‘extreme’ predictions. Rather, I interpret them as expressing skepticism about the presence of any evidential signal offered by certain modes of more abstract argumentation.
For example, Richard has a more conservative epistemology than you, though obviously he is highly non-conservative relative to most. David Thorstad seems more conservative still. The hard-nosed lab scientist with little patience for philosophy is yet more conservative than David.
I also think that the language of conservative epistemology helps counteract (what I see as) a mistaken frame motivating this post. (I’ll try to motivate my claim, but I’ll note that I remain a little fuzzy on exactly what I’m trying to gesture at.)
The mistaken frame I see is something like “modeling conservative epistemologists as if they were making poor strategic choices within a non-conservative world-model”. You state:
The level of concern and seriousness I see from ML researchers discussing AGI on any social media platform or in any mainstream venue seems wildly out of step with “half of us think there’s a 10+% chance of our work resulting in an existential catastrophe”.
I have concerns about you inferring this claim from the survey data provided,[1] but perhaps more pertinently for my point: I think you’re implicitly interpreting the reported probabilities as something like all-things-considered credences in the proposition researchers were queried about. I’m much more tempted to interpret the probabilities offered by researchers as meaning very little. Sure, they’ll provide a number on a survey, but this doesn’t represent ‘their’ probability of an AI-induced existential catastrophe.
I don’t think that most ML researchers have, as a matter of psychological fact, any kind of mental state that’s well-represented by a subjective probability about the chance of an AI-induced existential catastrophe. They’re more likely to operate with a conservative epistemology, in a way that isn’t neatly translated into probabilistic predictions over an outcome space that includes the outcomes you are most worried about. I think many people are likely to filter out the hypothesis given the perceived lack of evidential support for the outcome.
I actually do think the distinction between ‘conservative predictions’ and ‘conservative decision-making’ is helpful, though I’m skeptical about its relevance for analyzing different attitudes to AI risk.
Here’s one place I think the distinction between ‘conservative predictions’ and ‘conservative decision-making’ would be useful: early decisions about COVID.
Many people (including epidemiologists!) claimed that we lacked evidence about the efficacy of masks for preventing COVID, but didn’t suggest that people should wear masks anyway.
I think ‘masks might help COVID’ would have been in the outcome space of relevant decision-makers, and so we can describe their decision-making as (overly) conservative, even given their conservative predictions.
However, I think that ‘literal extinction from AGI’ just isn’t in the outcome space of many ML researchers, because arguments for that claim become harder to make as your epistemology becomes more conservative.
I don’t think that ‘[Person] will offer a probability when asked in a survey’ provides much evidence about whether that outcome is in [Person]’s outcome space in anything like a stable way.
If my analysis is right, then a first-pass at the practical conclusions might consist in being more willing to center arguments about alignment from a more empirically grounded perspective (e.g. here), or more directly attempting to have conversations about the costs and benefits of more conservative epistemological approaches.
- ^
First, there are obviously selection effects present in surveying OpenAI and DeepMind researchers working on long-term AI. Citing this result without caveat feels similar using (e.g.) PhilPapers survey results revealing that most specialists in philosophy of religion are to support the claim that most philosophers are theists. I can also imagine similar selection effects being present (though to lesser degrees) in the AI Impacts Survey. Given selection effects, and given that response rates from the AI Impacts survey were ~17%, I think your claim is misleading.
Thanks for the suggestion! Done now. :)