Has anyone written down the thing you’re proposing in detail? I haven’t seen it in MCDA or Bayesian literature before and a quick Google Scholar search didn’t turn anything useful up. Does it have a name or some standard terms/keywords that I should search? Is there any particular thing you’d recommend reading?
Would you estimate what percentage of the EA community agrees with you and knows how to do this well?
Here’s an attempt to restate what you said in terms that are closer to how I think. Do you understand and agree with this?
Convert every dimension being evaluated into a utility dimension. The article uses the term “goodness” instead of “utility” but they’re the same concept.
When we only care about utility, dimensions are not relevantly qualitatively different. Each contains some amount of utility. Anything else contained in each dimension, which may be qualitatively different, is not relevant and doesn’t need to be converted. Information-losing conversions are OK as long as no relevant information is lost. Only information related to utility is relevant.
(So converting between qualitatively different dimensions is impossible in the general case, like the article says. But this is a big work around which we can use whenever we’re dealing with utility, e.g. for ~all decision making.)
When the dimensions are approximately independent, it’s pretty easy because we can evaluate utility for one dimension at a time, then use addition.
When the dimensions aren’t independent, then it may be complicated and hard.
Sometimes we should use multiplication instead of addition.
I may be misinterpreting something, but I think what Emrik has described is basically how generic multi-attribute utility instruments (MAUIs) are used by health economists in the calculation of QALYs.
For example, as described in this excellent overview, the EQ-5D questionnaire asks about 5 different dimensions of health, which are then valued in combination:
mobility (ability to walk about)
self-care (ability to wash and dress yourself)
usual activities (ability to work, study, do housework, engage in leisure activities, etc.)
pain/discomfort
anxiety/depression
Each level is scored 1 (no problems), 2 (moderate problems), or 3 (extreme problems). These scores are combined into a five-digit health state profile, e.g., 21232 means some problems walking about, no problems with self-care, some problems performing usual activities, extreme pain or discomfort, and moderate anxiety or depression. However, this number has no mathematical properties: 31111 is not necessarily better than 11112, as problems in one dimension may have a greater impact on quality of life than problems in another. Obtaining the weights for each health state, then, requires a valuation exercise.
Valuation methods
There are many ways of generating a value set (set of weights or utilities) for the health states described by a health utility instrument. (For reviews, see e.g., Brazier, Ratcliffe, et al., 2017 or Green, Brazier, & Deverill, 2000; they are also discussed further in Part 2.) The following five are the most common:
Time tradeoff: Respondents directly trade off duration and quality of life, by stating how much time in perfect health is equivalent to a fixed period in the target health state. For example, if they are indifferent between living 10 years with moderate pain or 8 years in perfect health, the weight for moderate pain (state 11121 in the EQ-5D-3L) is 0.8.
Standard gamble: Respondents trade off quality of life and risk of death, by choosing between a fixed period (e.g., 10 years) in the target health state and a “gamble” with two possible outcomes: the same period in perfect health, or immediate death. If they would be indifferent between the options when the gamble has a 20% probability of death, the weight is 0.8.
Discrete choice experiments: Respondents choose the “best” health state out of two (or sometimes three) options. Drawing on random utility theory, the location of the utilities on an interval scale is determined by the frequency each is chosen, e.g., if 55% of respondents say the first person is healthier than the second (and 45% the reverse), they are close together, whereas if the split is 80:20 they are far apart. This ordinal data then has to be anchored to 0 and 1; some ways of doing so are presented in Part 2. Less common ordinal methods include:
Ranking: Placing several health states in order of preference.
Best-worst scaling: Choosing the best and worst out of a selection of options.
Visual analog scale: Respondents mark the point on a thermometer-like scale, usually running from 0 (e.g., “the worst health you can imagine”) to 100 (e.g., “the best health you can imagine”), that they feel best represents the target health state. If they are also asked to place “dead” on the scale, a QALY value can be easily calculated. For example, with a score of 90⁄100 and a dead point of 20⁄100, the weight is (90-20)/(100-20) = 70⁄80 = 0.875.
Person tradeoff (previously called equivalence studies): Respondents trade off health (and/or life) across populations. For example, if they think an intervention that moves 500 people from the target state to perfect health for one year is as valuable as extending the life of 100 perfectly healthy people for a year, the QALY weight is 1 – (100/500) = 0.8.[13]
Thanks. I have seen similar valuation methods elsewhere which might interest you. 1000minds’ Multi-Criteria Decision Analysis (MCDA/MCDM) article has a list of methods, with summaries, like Direct rating, Points allocation, SMART, SMARTER, AHP, etc.
So, when you have 31111, each number is in a separate dimension and there’s no problem so far. Then the valuation method handles the hard part. Each valuation method you quote (and many MCDM ones) have a common property: they rely on the intuition or judgment of a decision maker. The decision maker is asked to make comparisons involving multiple dimensions. But that doesn’t explain how to do it; it relies on people somehow doing it by unspecified methods and then stating answers. Does that make sense and do you see the problem? Do you think you know how decision makers can come up with the answers needed in by these valuation methods?
Put another way, I read the valuation methods as attempts to make pre-existing knowledge more explicit and quantified. It assumes a decision maker already knows some answers about how to value different dimensions against each other, rather than telling him how to do it. But I’m interested in how to get the knowledge in the first place.
Yes, that’s an excellent reformulation of what I meant!
I think this roughly corresponds to how people do it intuitively in practice, but I doubt most people would be able to describe it in as much detail (or even be aware) if asked. But at least among people who read LessWrong, it’s normal to talk about “assigning utility”. The percentage among people who self-describe as EA who also think like this goes up the longer they’ve been in EA, and maybe like 80% of people who attended an EAG in 2022. (Pure speculation; I haven’t even been to an EAG.)
I don’t know anywhere it’s written like this, but it probably exists somewhere on LW and probably in academic literature as well. On second though, I remembered Arbital probably has something good on it. Here.
Yes, that’s an excellent reformulation of what I meant!
Great. Let’s try two things next.
First, do you think my solution could work? Do you think it’s merely inferior or is there something fully broken about it? I ask because there are few substantially different epistemologies that could work at all, so I think every one is valuable and should be investigated a lot. Maybe that point will make sense to you.
Second, I want to clarify how assigning utility works.
One model is: The utility function comes first. People can look at different outcomes and judge how much utility they have. People, in some sense, know what they want. Then, when making decisions, a major goal is trying to figure out which decisions will lead to higher utility outcomes. For complicated decisions, it’s not obvious how to get a good outcome, so you can e.g. break it down and evaluate factors separately. Simplifying, you might look at a correlation where many high utility outcomes have high scores for factor X (and you might also be able to explain why X is good), so then you’d be more inclined to make a decision that you think will get a lot of X, and in factor summing approaches you’d assign X a high weighting.
A different model is: Start with factors and then calculate utility. Utility is a downstream consequence of factors. Instead of knowing what has high utility by looking at it and judging how much you like it or something along those lines, you have to figure out what has high utility. You do that by figuring out e.g. what factors are good and then concluding that outcomes with a lot of those factors probably have high utility.
I don’t have time to engage very well, but I would say the first model you describe fits me better. I don’t look at the world to figure out my terminal utilities (well, I do, but the world’s information is used as a tool to figure out the parts of my brain which determine what I terminally want). Instead, there’s something in my brain that determines how I tend to assign utility to outcomes, and then I can reason about the likely paths to those outcomes and make decisions. The paths that lead to outcomes I assign terminal utility, will have instrumental utility.
I haven’t investigated this as nearly deeply as I would like, but supposedly there are some ways of using Aristotelian logic (although I don’t know which kind) to derive probability theory and expected utility theory from more basic postulates. I would also look at whether any epistemology I’m considering is liable to be dutch-booked, because I don’t want to be dutch-booked.
“I ask because there are few substantially different epistemologies that could work at all, so I think every one is valuable and should be investigated a lot.”
Agreed. Though it depends on what your strengths are what role you wish to play in the research-and-doing community. I think it’s fine that a lot of people defer to others on the logical foundations of probability and utility, but I still think some of us should be investigating it and calling “foul!” if they’ve discovered something that needs a revolution. This could be especially usefwl for a relative “outsider”[1] to try. I doubt it’ll succeed, but the expected utility of trying seems high. :p
I’m not making a strict delineation here, nor a value-judgment. I just mean that if someone is motivated to try to upend some popular EA/rationalist epistemology, then it might be easier to do that for someone who hasn’t been deeply steeped into that paradigm already.
Agreed. Though it depends on what your strengths are what role you wish to play in the research-and-doing community. I think it’s fine that a lot of people defer to others on the logical foundations of probability and utility, but I still think some of us should be investigating it
I agree. People can specialize in what works for them. Division of labor is reasonable.
That’s fine as long as there are some people working on the foundational research stuff and some of them are open to serious debate. I think EA has people doing that sort of research but I’m concerned that none of them are open to debate. So if they’re mistaken, there’s no good way for anyone who knows to fix it (and conversely, any would-be critic who is mistaken has no good way to receive criticism from EA and fix their own mistake).
To be fair, I don’t know of any Popperian experts who are very open to debate, either, besides me. I consider lack of willingness to debate a very large, widespread problem in the world.
I think working on that problem – poor openness to debate – might do more good than everything EA is currently doing. Better debates would e.g. improve science and could make a big difference to the replication crisis.
Another way better openness to debate would do good is: currently EA has a lot of high-effort, thoughtful arguments on topics like animal welfare, AI alignment, clean water, deworming, etc. Meanwhile, there are a bunch of charities, with a ton of money, which do significantly less effective (or even counter-productive) things and won’t listen, give counter arguments, or debate. Currently, EA tries to guide people to donate to better charities. It’d potentially be significantly higher leverage (conditional on ~being right) to debate the flawed charities and win, so then they change to use their money in better ways. I think many EAs would be very interested in participating in those debates; the thing blocking progress here is poor societal norms about debate and error correction. I think if EA’s own norms were much better on those topics, then it’d be in a better position to call out the problem, lead by example, and push for change in ways that many observers find rational and persuasive.
Has anyone written down the thing you’re proposing in detail? I haven’t seen it in MCDA or Bayesian literature before and a quick Google Scholar search didn’t turn anything useful up. Does it have a name or some standard terms/keywords that I should search? Is there any particular thing you’d recommend reading?
Would you estimate what percentage of the EA community agrees with you and knows how to do this well?
Here’s an attempt to restate what you said in terms that are closer to how I think. Do you understand and agree with this?
Convert every dimension being evaluated into a utility dimension. The article uses the term “goodness” instead of “utility” but they’re the same concept.
When we only care about utility, dimensions are not relevantly qualitatively different. Each contains some amount of utility. Anything else contained in each dimension, which may be qualitatively different, is not relevant and doesn’t need to be converted. Information-losing conversions are OK as long as no relevant information is lost. Only information related to utility is relevant.
(So converting between qualitatively different dimensions is impossible in the general case, like the article says. But this is a big work around which we can use whenever we’re dealing with utility, e.g. for ~all decision making.)
When the dimensions are approximately independent, it’s pretty easy because we can evaluate utility for one dimension at a time, then use addition.
When the dimensions aren’t independent, then it may be complicated and hard.
Sometimes we should use multiplication instead of addition.
I may be misinterpreting something, but I think what Emrik has described is basically how generic multi-attribute utility instruments (MAUIs) are used by health economists in the calculation of QALYs.
For example, as described in this excellent overview, the EQ-5D questionnaire asks about 5 different dimensions of health, which are then valued in combination:
Thanks. I have seen similar valuation methods elsewhere which might interest you. 1000minds’ Multi-Criteria Decision Analysis (MCDA/MCDM) article has a list of methods, with summaries, like Direct rating, Points allocation, SMART, SMARTER, AHP, etc.
So, when you have 31111, each number is in a separate dimension and there’s no problem so far. Then the valuation method handles the hard part. Each valuation method you quote (and many MCDM ones) have a common property: they rely on the intuition or judgment of a decision maker. The decision maker is asked to make comparisons involving multiple dimensions. But that doesn’t explain how to do it; it relies on people somehow doing it by unspecified methods and then stating answers. Does that make sense and do you see the problem? Do you think you know how decision makers can come up with the answers needed in by these valuation methods?
Put another way, I read the valuation methods as attempts to make pre-existing knowledge more explicit and quantified. It assumes a decision maker already knows some answers about how to value different dimensions against each other, rather than telling him how to do it. But I’m interested in how to get the knowledge in the first place.
Yes, that’s an excellent reformulation of what I meant!
I think this roughly corresponds to how people do it intuitively in practice, but I doubt most people would be able to describe it in as much detail (or even be aware) if asked. But at least among people who read LessWrong, it’s normal to talk about “assigning utility”. The percentage among people who self-describe as EA who also think like this goes up the longer they’ve been in EA, and maybe like 80% of people who attended an EAG in 2022. (Pure speculation; I haven’t even been to an EAG.)
I don’t know anywhere it’s written like this, but it probably exists somewhere on LW and probably in academic literature as well. On second though, I remembered Arbital probably has something good on it. Here.
Great. Let’s try two things next.
First, do you think my solution could work? Do you think it’s merely inferior or is there something fully broken about it? I ask because there are few substantially different epistemologies that could work at all, so I think every one is valuable and should be investigated a lot. Maybe that point will make sense to you.
Second, I want to clarify how assigning utility works.
One model is: The utility function comes first. People can look at different outcomes and judge how much utility they have. People, in some sense, know what they want. Then, when making decisions, a major goal is trying to figure out which decisions will lead to higher utility outcomes. For complicated decisions, it’s not obvious how to get a good outcome, so you can e.g. break it down and evaluate factors separately. Simplifying, you might look at a correlation where many high utility outcomes have high scores for factor X (and you might also be able to explain why X is good), so then you’d be more inclined to make a decision that you think will get a lot of X, and in factor summing approaches you’d assign X a high weighting.
A different model is: Start with factors and then calculate utility. Utility is a downstream consequence of factors. Instead of knowing what has high utility by looking at it and judging how much you like it or something along those lines, you have to figure out what has high utility. You do that by figuring out e.g. what factors are good and then concluding that outcomes with a lot of those factors probably have high utility.
Does one of these models fit your thinking well?
I don’t have time to engage very well, but I would say the first model you describe fits me better. I don’t look at the world to figure out my terminal utilities (well, I do, but the world’s information is used as a tool to figure out the parts of my brain which determine what I terminally want). Instead, there’s something in my brain that determines how I tend to assign utility to outcomes, and then I can reason about the likely paths to those outcomes and make decisions. The paths that lead to outcomes I assign terminal utility, will have instrumental utility.
I haven’t investigated this as nearly deeply as I would like, but supposedly there are some ways of using Aristotelian logic (although I don’t know which kind) to derive probability theory and expected utility theory from more basic postulates. I would also look at whether any epistemology I’m considering is liable to be dutch-booked, because I don’t want to be dutch-booked.
Agreed. Though it depends on what your strengths are what role you wish to play in the research-and-doing community. I think it’s fine that a lot of people defer to others on the logical foundations of probability and utility, but I still think some of us should be investigating it and calling “foul!” if they’ve discovered something that needs a revolution. This could be especially usefwl for a relative “outsider”[1] to try. I doubt it’ll succeed, but the expected utility of trying seems high. :p
I’m not making a strict delineation here, nor a value-judgment. I just mean that if someone is motivated to try to upend some popular EA/rationalist epistemology, then it might be easier to do that for someone who hasn’t been deeply steeped into that paradigm already.
and
I agree. People can specialize in what works for them. Division of labor is reasonable.
That’s fine as long as there are some people working on the foundational research stuff and some of them are open to serious debate. I think EA has people doing that sort of research but I’m concerned that none of them are open to debate. So if they’re mistaken, there’s no good way for anyone who knows to fix it (and conversely, any would-be critic who is mistaken has no good way to receive criticism from EA and fix their own mistake).
To be fair, I don’t know of any Popperian experts who are very open to debate, either, besides me. I consider lack of willingness to debate a very large, widespread problem in the world.
I think working on that problem – poor openness to debate – might do more good than everything EA is currently doing. Better debates would e.g. improve science and could make a big difference to the replication crisis.
Another way better openness to debate would do good is: currently EA has a lot of high-effort, thoughtful arguments on topics like animal welfare, AI alignment, clean water, deworming, etc. Meanwhile, there are a bunch of charities, with a ton of money, which do significantly less effective (or even counter-productive) things and won’t listen, give counter arguments, or debate. Currently, EA tries to guide people to donate to better charities. It’d potentially be significantly higher leverage (conditional on ~being right) to debate the flawed charities and win, so then they change to use their money in better ways. I think many EAs would be very interested in participating in those debates; the thing blocking progress here is poor societal norms about debate and error correction. I think if EA’s own norms were much better on those topics, then it’d be in a better position to call out the problem, lead by example, and push for change in ways that many observers find rational and persuasive.