I donāt know exactly what you mean by āfeels very hard to compareā. Iād appreciate more direct responses to the arguments in this post, namely, about how the comparison seems arbitrary.
I donāt know exactly what you mean by āfeels very hard to compareā.
It looks like you are inferring incomparability between the value of 2 futures (non-discrete overlap between their UEVs) from the subjective feeling (in your mind) that their EVs feel very hard to compare (given all the evidence you considered), as any comparisons involve decisive arbitrary assumptions. I mean āarbitraryā as used in common language.
Iād appreciate more direct responses to the arguments in this post, namely, about how the comparison seems arbitrary.
Comparisons among the expected cost-effectiveness of the vast majority of interventions seem arbitrary to me too due to effects on soil animals and microorganisms. However, the same goes for comparisons among the expected mass of seemingly identical objects with a similar mass if I can only assess their mass using my hands, but this does not mean their mass is incomparable. To assess this, we have to empirically determine which fraction of the uncertainty in their mass is irreducible. 10 k years ago, it would not have been possible to determine which of 2 rocks with around 1 kg was the heaviest if their mass only differed by 10^-6 kg. Yet, this is possible today. Some semi-micro balances have a resolution of 0.01 mg, 10^-8 kg. So I would say the expected mass of the rocks was comparable 10 k years ago. Do you agree? There could be some irreducible uncertainty in the mass of the rocks, but much less than suggested by the evidence available 10 k years ago.
However, the same goes for comparisons among the expected mass of seemingly identical objects with a similar mass if I can only assess their mass using my hands, but this does not mean their mass is incomparable.
I donāt exactly understand what argument youāre making here.
My core argument in the post is: Take any intervention X. We want to weigh up its impact for all sentient beings across the cosmos, where this āweighing upā is aggregation over all hypotheses. Now suppose we want to force ourselves to compare X with inaction, i.e., say either UEV(do X) > UEV(donāt do X) or vice versa. We have such an extremely coarse-grained understanding (if any) of these hypotheses[1] that, when we do the weighing-up, whether we say UEV(do X) > UEV(donāt do X) or vice versa seems to depend on an arbitrary choice.
Relative to the amount of fine-grained detail necessary to evaluate the hypothesis, when what we value is āwell-being of all sentient beings across the cosmosā.
My best guess about which of 2 identical objects has a larger mass in expectation will be arbitrary if their mass only differs by 10^-6 kg, and I have no way of assessing this small difference. However, this does not mean the expected mass of the 2 objects is fundamentally incomparable. Likewise, my best guess about which of 2 actions increases welfare more in expectation may be arbitrary without this implying that their expected change in welfare is incomparable.
I am not sure it matters whether one endorses precise expected values (EVs) or not. In practice, I still like to test different EVs when the underlying probability density function (PDF) is very arbitrary and uncertain, as it is the case for PDFs of welfare ranges. In such cases, I suspect decreasing uncertainty to find the best options has higher EV than the supposedly imprecise EVs of going with the current best option.
My best guess about which of 2 identical objects has a larger mass in expectation will be arbitrary is their mass only differs by 10^-6 kg, and I have no way of assessing this small difference. However, this does not mean the expected mass of the 2 objects is fundamentally incomparable
I worry youāre reifying āexpectationsā as something objective here. The relative actual masses of the objects are clearly comparable. But if you subjectively canāt compare them, then theyāre indeed incomparable āin expectationā in the relevant sense.
I would be able to subjectively compare the mass of the 2 objects with more evidence. Some comparisons may not be feasible with currently available evidence, but the degree of imprecision should be set by what is physically possible?
If you had more evidence, you could make the comparison. But you currently have no clue which direction the comparison would go, in expectation over the evidence you might receive. So how are you supposed to compare them right now?
I would simply say the expected mass is practically (not exactly) the same given the evidence available to me, and consider gathering additional evidence depending on how much I expected this to change future decisions. Likewise for altruistic interventions among which comparisons of the expected change in welfare feel very arbitrary.
I donāt know what you mean by āpractically the sameā, can you say more?
Regardless, the problem is that āgathering evidenceā vs ādoing something elseā is itself a decision, whose consequences youāll be clueless about. I discuss this more here.
I meant my future decisions would be the same in reality if I could not gather additional evidence regardless of whether the mass of the 2 identical objects was exactly the same or differed by 10^-6 kg.
Do you think annual human welfare per human-year has increased since 1900? Child mortality decreased 37.3 pp (= 0.41 ā 0.037) since then until 2023. If you agree annual human welfare per human-year has increased since 1900, are you confident that similar progress cannot be extented to non-humans? Would you have argued 200 years ago that we are all clueless about how to increase human welfare? I agree research can backfire. However, at least historically, doing research on the sentience of animals, and on how to increase their welfare has mostly been beneficial for the target animals?
I meant my future decisions would be the same in reality if I could not gather additional evidence
Perhaps, but thatās consistent with incomparability. Given the independent motivations weāve discussed (/āgiven in my post) for calling the two options incomparable, Iād say you should call them incomparable.
You are welcome to return to this later. I would be curious to know your thoughts.
EV is subjective. Iād recommend this post for more on this.
I liked the post. I agree EV is subjective to some extent. The same goes for the concept of mass, which depends on our imperfect understanding of physics. However, the expected mass of objects is still comparable, unless there is only an infinitesimal difference between their mass.
Do you think itās reasonable for two people with all of the same evidence to disagree on precise probabilities and expected values? If so, how would you justify picking your own precise probabilities over someone elseās, if you think theirs are just as defensible?
Or would you just average yours and theirs in some way to get a new distribution? How?
And how far would you go, if you consider all the defensible precise probability distributions anyonecould assign (whether or not anyone actually does so)? How do you weigh them all if there are infinitely many of them and no uniform distribution over them?
Do you think itās reasonable for two people with all of the same evidence to disagree on precise probabilities and expected values?
It depends on what is included in āall of the same evidenceā. If 2 people had exactly the same evidence about everything, including internal states about the plausibility of the probabilities, they would be the same people, and therefore would agree on everything. In practice, different people share some evidence, but start with different priors, and therefore do not have to agree on precise probabilities and expected values. The stronger the evidence they share relative to their priors, the more they will agree.
Or would you just average yours and theirs in some way to get a new distribution? How?
Side note. I often link to concepts in my comments that I am sure the person I am replying to is familiar with, but I do it anyway in case others find it relevant.
I think youāve simplified the problem too much. There can be special cases where we can use symmetry and just take simple averages, but many practical cases are not like that. Indeed, thatās the point of the distinction between complex and simple cluelessness in the first place.
I think, ideally, we should look for and exploit as much evidential symmetry as possible, but I donāt think weāll always find enough of it to land on a unique precise distribution, Iād guess in principle impossible in many cases (probably almost all cases of intervention and cause area research) without further evidence.
Itās true that direct impressions (e.g. internal states about the plausibility of the probabilities) could be considered evidence, but to the extent that for the same objective external evidence, these direct impressions can vary between people or depending on how or when you present the evidence, they seem arbitrary.
Would you take the fact that a direct impression came from your brain ā from an inscrutable process, prone to cognitive biases of various kinds, and whose reliability you can at best verify by track records in limited domains where feedback is practical, and where track records may not generalize across tasks and domains well ā is better evidence than a direct impression from another personās brain (with similar problems), with access to the same objective external evidence?
Or, what if there are multiple people with different distributions and different track records in relevant domains? How do you weigh them? How much should track record be worth? EDIT: What if their track records are measured in different ways, e.g. you have forecasters with Brier scores, investors or betters with measures of their gains and losses, researchers and grantmakers of various seniorities at different organizations?
And whatās the range of direct impressions humans or other semi-rational agents could have, and how would you weigh them all?
Iād also be keen to get your response to this (and also this, if you have the time.)
I agree with the points you make in the 1st 3 paragraphs of your comment.
Would you take the fact that a direct impression came from your brain ā from an inscrutable process, prone to cognitive biases of various kinds, and whose reliability you can at best verify by track records in limited domains where feedback is practical, and where track records may not generalize across tasks and domains well ā is better evidence than a direct impression from another personās brain, with access to the same objective external evidence?
Not necessarily. It depends on which evidence is being assessed. I am certainly not the best person to assess all kinds of evidence.
Or, what if there are multiple people with different distributions and different track records in relevant domains? How do you weigh them? How much should track record be worth?
I think track record weighted by the relevance of the domain to X is one of the most important sources of evidence to decide on how much to weigh the views of different people with respect to X. However, I believe it is often tricky to know how much a good track record in a given domain generalises to another domain.
And whatās the range of direct impressions humans or other semi-rational agents could have, and how would you weigh them all?
It depends on the case. Do you think my answer to the above should influence which interventions I prioritise? My current top recommendations are research on i) the welfare of soil animals and microorganisms, and ii) comparisons of (expectedhedonistic) welfare across species and digital systems. Could you see these changing if I thought EVs were imprecise instead of precise at a fundamental level?
Iād also be keen to get your response to this (and also this, if you have the time.)
It depends on the case. Do you think my answer to the above should influence which interventions I prioritise? My current top recommendations are research on i) the welfare of soil animals and microorganisms, and ii) comparisons of (expectedhedonistic) welfare across species and digital systems. Could you see these changing if I thought EVs were imprecise instead of precise at a fundamental level?
I think thereās a lot that could change if you very seriously weighed othersā actual or possible direct impressions/āintuitions without heavily privileging your own, before we even get into the question of precise vs imprecise credences. Epistemic modesty is going to do a lot of work first.
Holding your current normative views ~constant, with precise credences, then epistemic modesty would make infinite expected values (and possibly cardinally larger infinities) your focus, as long as there are well-defined consistent ways to handle them without always getting infinity minus infinity errors in practice. With imprecise credences, you could plausibly justify ignoring them on some versions of bracketing (also see here), say because theyāre so speculative and youāre clueless about the direction of your impacts on infinities, including possibly even the effects of research into infinite effects (because the research could be used in ways youād judge to be very bad).
(Independently of precise vs imprecise) If youāre a moral realist, then you wouldnāt privilege your own direct normative intuitions just for being yours either, and this would plausibly mean not privileging consequentialism, utilitarianism, hedonism, risk neutrality, etc.. This could have important implications. Your current priorities might still be among your top priorities, but your list of priorities could expand a lot.
It might be impossible to compare these priorities; thereās no universal common standard/āunit across all normative stances. You might go for a portfolio of interventions.
If youāre not a moral realist, or for the part of you that isnāt, you can just not care about views that conflict too much with your most important intuitions.
If youāre doing some version of bracketing with imprecise credences, some vertebrate welfare work could be worth prioritizing. Iām clueless about whether crops or nature is better for wild animals, even though Iām suffering-focused, so I ignore conversions between nature and crops. Far future effects and acausal influence could guide some priorities unless youāre clueless about them and bracket them away.
With imprecise credences, I think you would also be more pessimistic about the marginal value of research to compare welfare ranges and sentience across types of possible moral patients. You should also be more pessimistic about the value of further research into the sign of the welfare of moral patients. That doesnāt mean no such research is worth doing, but I think it would focus on scoping out possibilities and their implications and gathering evidence that could basically rule out the more extreme hypotheses (e.g. for (near-)constant welfare ranges and for welfare ranges with the most extreme ratios between potential moral patients). Arguments like the two envelopes problem, conscious subsystems, how moral weights could scale with neuron counts, gradations/āvagueness, looking for more ways to assign welfare ranges with very different implications from the ones we have now. If youāre gathering empirical evidence, you would aim it at shifting or ruling out extremes.
Personally, Iāve decided to draw some lines in practice, and basically leave out nematodes and simpler systems as priorities. This depends largely on my normative views (and Iām not a moral realist, so Iām more willing to make some judgement calls about this). I think what counts as consciousness is largely normative and subjective, I have some objections to aggregation (e.g. torture vs dust specks) and Iām not entirely risk neutral or ambiguity neutral. The capacities Iāve observed in them donāt seem so compelling. Maybe some of it is motivated reasoning, though. And maybe some sentience research on nematodes would be worth doing. If they met some of the standards here or here or we found evidence for some of the most sophisticated cognitive capacities we observe in fruit flies, I might take them pretty seriously.
Iād also be keen to get your response to this (and also this, if you have the time.)
I have replied to both comments.
I think thereās a lot that could change if you very seriously weighed othersā actual or possible direct impressions/āintuitions without heavily privileging your own, before we even get into the question of precise vs imprecise credences. Epistemic modesty is going to do a lot of work first.
Thanks for elaborating on this. I imagine I could arrive to different (practical) priorities if I changed my mind about the topics you listed. At the same time, my more foundational philosophical views have historically changed very little. Investigations about empirical matters have updated my priorities a lot more. So I would be curious to know if you think there are areas which are more amenable to empirical investigation, and where I am not giving enough consideration to the views of others.
Iām clueless about whether crops or nature is better for wild animals, even though Iām suffering-focused
I agree it is very unclear whether increasing cropland is good or bad, even for suffering-focussed people.
I donāt know exactly what you mean by āfeels very hard to compareā. Iād appreciate more direct responses to the arguments in this post, namely, about how the comparison seems arbitrary.
It looks like you are inferring incomparability between the value of 2 futures (non-discrete overlap between their UEVs) from the subjective feeling (in your mind) that their EVs feel very hard to compare (given all the evidence you considered), as any comparisons involve decisive arbitrary assumptions. I mean āarbitraryā as used in common language.
Comparisons among the expected cost-effectiveness of the vast majority of interventions seem arbitrary to me too due to effects on soil animals and microorganisms. However, the same goes for comparisons among the expected mass of seemingly identical objects with a similar mass if I can only assess their mass using my hands, but this does not mean their mass is incomparable. To assess this, we have to empirically determine which fraction of the uncertainty in their mass is irreducible. 10 k years ago, it would not have been possible to determine which of 2 rocks with around 1 kg was the heaviest if their mass only differed by 10^-6 kg. Yet, this is possible today. Some semi-micro balances have a resolution of 0.01 mg, 10^-8 kg. So I would say the expected mass of the rocks was comparable 10 k years ago. Do you agree? There could be some irreducible uncertainty in the mass of the rocks, but much less than suggested by the evidence available 10 k years ago.
I donāt exactly understand what argument youāre making here.
My core argument in the post is: Take any intervention X. We want to weigh up its impact for all sentient beings across the cosmos, where this āweighing upā is aggregation over all hypotheses. Now suppose we want to force ourselves to compare X with inaction, i.e., say either UEV(do X) > UEV(donāt do X) or vice versa. We have such an extremely coarse-grained understanding (if any) of these hypotheses[1] that, when we do the weighing-up, whether we say UEV(do X) > UEV(donāt do X) or vice versa seems to depend on an arbitrary choice.
Can you say how your argument relates to mine?
Relative to the amount of fine-grained detail necessary to evaluate the hypothesis, when what we value is āwell-being of all sentient beings across the cosmosā.
Thanks for following up, Anthony.
My best guess about which of 2 identical objects has a larger mass in expectation will be arbitrary if their mass only differs by 10^-6 kg, and I have no way of assessing this small difference. However, this does not mean the expected mass of the 2 objects is fundamentally incomparable. Likewise, my best guess about which of 2 actions increases welfare more in expectation may be arbitrary without this implying that their expected change in welfare is incomparable.
I am not sure it matters whether one endorses precise expected values (EVs) or not. In practice, I still like to test different EVs when the underlying probability density function (PDF) is very arbitrary and uncertain, as it is the case for PDFs of welfare ranges. In such cases, I suspect decreasing uncertainty to find the best options has higher EV than the supposedly imprecise EVs of going with the current best option.
I worry youāre reifying āexpectationsā as something objective here. The relative actual masses of the objects are clearly comparable. But if you subjectively canāt compare them, then theyāre indeed incomparable āin expectationā in the relevant sense.
I would be able to subjectively compare the mass of the 2 objects with more evidence. Some comparisons may not be feasible with currently available evidence, but the degree of imprecision should be set by what is physically possible?
If you had more evidence, you could make the comparison. But you currently have no clue which direction the comparison would go, in expectation over the evidence you might receive. So how are you supposed to compare them right now?
I would simply say the expected mass is practically (not exactly) the same given the evidence available to me, and consider gathering additional evidence depending on how much I expected this to change future decisions. Likewise for altruistic interventions among which comparisons of the expected change in welfare feel very arbitrary.
I donāt know what you mean by āpractically the sameā, can you say more?
Regardless, the problem is that āgathering evidenceā vs ādoing something elseā is itself a decision, whose consequences youāll be clueless about. I discuss this more here.
I meant my future decisions would be the same in reality if I could not gather additional evidence regardless of whether the mass of the 2 identical objects was exactly the same or differed by 10^-6 kg.
Do you think annual human welfare per human-year has increased since 1900? Child mortality decreased 37.3 pp (= 0.41 ā 0.037) since then until 2023. If you agree annual human welfare per human-year has increased since 1900, are you confident that similar progress cannot be extented to non-humans? Would you have argued 200 years ago that we are all clueless about how to increase human welfare? I agree research can backfire. However, at least historically, doing research on the sentience of animals, and on how to increase their welfare has mostly been beneficial for the target animals?
Perhaps, but thatās consistent with incomparability. Given the independent motivations weāve discussed (/āgiven in my post) for calling the two options incomparable, Iād say you should call them incomparable.
I think I address your questions in the second paragraph in āWhy weāre especially unaware of large-scale consequencesā (this post) and āMeta-extrapolationā (post #4). See also my discussion with Richard here.
(Sorry, due to lack of time I donāt expect Iāll reply further. But thank you for the discussion! A quick note:)
EV is subjective. Iād recommend this post for more on this.
You are welcome to return to this later. I would be curious to know your thoughts.
I liked the post. I agree EV is subjective to some extent. The same goes for the concept of mass, which depends on our imperfect understanding of physics. However, the expected mass of objects is still comparable, unless there is only an infinitesimal difference between their mass.
Do you think itās reasonable for two people with all of the same evidence to disagree on precise probabilities and expected values? If so, how would you justify picking your own precise probabilities over someone elseās, if you think theirs are just as defensible?
Or would you just average yours and theirs in some way to get a new distribution? How?
And how far would you go, if you consider all the defensible precise probability distributions anyone could assign (whether or not anyone actually does so)? How do you weigh them all if there are infinitely many of them and no uniform distribution over them?
Hereās another example I like.
Hi Michael.
It depends on what is included in āall of the same evidenceā. If 2 people had exactly the same evidence about everything, including internal states about the plausibility of the probabilities, they would be the same people, and therefore would agree on everything. In practice, different people share some evidence, but start with different priors, and therefore do not have to agree on precise probabilities and expected values. The stronger the evidence they share relative to their priors, the more they will agree.
If 2 probability density functions (PDFs) feel exactly as plausible, I would simply use the mean between them.
Side note. I often link to concepts in my comments that I am sure the person I am replying to is familiar with, but I do it anyway in case others find it relevant.
I think youāve simplified the problem too much. There can be special cases where we can use symmetry and just take simple averages, but many practical cases are not like that. Indeed, thatās the point of the distinction between complex and simple cluelessness in the first place.
I think, ideally, we should look for and exploit as much evidential symmetry as possible, but I donāt think weāll always find enough of it to land on a unique precise distribution, Iād guess in principle impossible in many cases (probably almost all cases of intervention and cause area research) without further evidence.
Itās true that direct impressions (e.g. internal states about the plausibility of the probabilities) could be considered evidence, but to the extent that for the same objective external evidence, these direct impressions can vary between people or depending on how or when you present the evidence, they seem arbitrary.
Would you take the fact that a direct impression came from your brain ā from an inscrutable process, prone to cognitive biases of various kinds, and whose reliability you can at best verify by track records in limited domains where feedback is practical, and where track records may not generalize across tasks and domains well ā is better evidence than a direct impression from another personās brain (with similar problems), with access to the same objective external evidence?
Or, what if there are multiple people with different distributions and different track records in relevant domains? How do you weigh them? How much should track record be worth? EDIT: What if their track records are measured in different ways, e.g. you have forecasters with Brier scores, investors or betters with measures of their gains and losses, researchers and grantmakers of various seniorities at different organizations?
And whatās the range of direct impressions humans or other semi-rational agents could have, and how would you weigh them all?
Iād also be keen to get your response to this (and also this, if you have the time.)
I agree with the points you make in the 1st 3 paragraphs of your comment.
Not necessarily. It depends on which evidence is being assessed. I am certainly not the best person to assess all kinds of evidence.
I think track record weighted by the relevance of the domain to X is one of the most important sources of evidence to decide on how much to weigh the views of different people with respect to X. However, I believe it is often tricky to know how much a good track record in a given domain generalises to another domain.
It depends on the case. Do you think my answer to the above should influence which interventions I prioritise? My current top recommendations are research on i) the welfare of soil animals and microorganisms, and ii) comparisons of (expected hedonistic) welfare across species and digital systems. Could you see these changing if I thought EVs were imprecise instead of precise at a fundamental level?
I have added the comments to my reading list.
Besides the links Michael shared, I highly recommend this really short post.
Thanks for sharing, Anthony. I just commented there.
I think thereās a lot that could change if you very seriously weighed othersā actual or possible direct impressions/āintuitions without heavily privileging your own, before we even get into the question of precise vs imprecise credences. Epistemic modesty is going to do a lot of work first.
Holding your current normative views ~constant, with precise credences, then epistemic modesty would make infinite expected values (and possibly cardinally larger infinities) your focus, as long as there are well-defined consistent ways to handle them without always getting infinity minus infinity errors in practice. With imprecise credences, you could plausibly justify ignoring them on some versions of bracketing (also see here), say because theyāre so speculative and youāre clueless about the direction of your impacts on infinities, including possibly even the effects of research into infinite effects (because the research could be used in ways youād judge to be very bad).
(Independently of precise vs imprecise) If youāre a moral realist, then you wouldnāt privilege your own direct normative intuitions just for being yours either, and this would plausibly mean not privileging consequentialism, utilitarianism, hedonism, risk neutrality, etc.. This could have important implications. Your current priorities might still be among your top priorities, but your list of priorities could expand a lot.
It might be impossible to compare these priorities; thereās no universal common standard/āunit across all normative stances. You might go for a portfolio of interventions.
If youāre not a moral realist, or for the part of you that isnāt, you can just not care about views that conflict too much with your most important intuitions.
If youāre doing some version of bracketing with imprecise credences, some vertebrate welfare work could be worth prioritizing. Iām clueless about whether crops or nature is better for wild animals, even though Iām suffering-focused, so I ignore conversions between nature and crops. Far future effects and acausal influence could guide some priorities unless youāre clueless about them and bracket them away.
Again, potentially impossible comparisons + portfolio.
With imprecise credences, I think you would also be more pessimistic about the marginal value of research to compare welfare ranges and sentience across types of possible moral patients. You should also be more pessimistic about the value of further research into the sign of the welfare of moral patients. That doesnāt mean no such research is worth doing, but I think it would focus on scoping out possibilities and their implications and gathering evidence that could basically rule out the more extreme hypotheses (e.g. for (near-)constant welfare ranges and for welfare ranges with the most extreme ratios between potential moral patients). Arguments like the two envelopes problem, conscious subsystems, how moral weights could scale with neuron counts, gradations/āvagueness, looking for more ways to assign welfare ranges with very different implications from the ones we have now. If youāre gathering empirical evidence, you would aim it at shifting or ruling out extremes.
Personally, Iāve decided to draw some lines in practice, and basically leave out nematodes and simpler systems as priorities. This depends largely on my normative views (and Iām not a moral realist, so Iām more willing to make some judgement calls about this). I think what counts as consciousness is largely normative and subjective, I have some objections to aggregation (e.g. torture vs dust specks) and Iām not entirely risk neutral or ambiguity neutral. The capacities Iāve observed in them donāt seem so compelling. Maybe some of it is motivated reasoning, though. And maybe some sentience research on nematodes would be worth doing. If they met some of the standards here or here or we found evidence for some of the most sophisticated cognitive capacities we observe in fruit flies, I might take them pretty seriously.
I have replied to both comments.
Thanks for elaborating on this. I imagine I could arrive to different (practical) priorities if I changed my mind about the topics you listed. At the same time, my more foundational philosophical views have historically changed very little. Investigations about empirical matters have updated my priorities a lot more. So I would be curious to know if you think there are areas which are more amenable to empirical investigation, and where I am not giving enough consideration to the views of others.
I agree it is very unclear whether increasing cropland is good or bad, even for suffering-focussed people.