Thought about this some more. This isn’t a summary of your work, it’s an attempt to understand it in my terms. Here’s how I see it right now: we can use pairwise comparisons of outcomes to elicit preferences, and people often do, but they typically choose to insist that each outcome has a value representable as a single number and use the pairwise comparisons to decide which number to assign each outcome. Insisting that each outcome has a value is a constraint on preferences that can allow us to compute which outcome is preferred between two outcomes for which we do not have direct data.
I see this post as arguing that we should instead represent preferences as a table of value ratios. This is not about eliciting preferences, but representing them. Why would we want to represent them like this? At first glance:
If the important thing is we represent preferences as a table, then we can capture every important comparison with a table of binary preferences
If we want to impose additional constraints so that we can extrapolate preferences, preference ratios seems to push us back to assigning one or more values to every outcome
What makes value ratios different from other schemes with multiple valuation functions is that value ratios give us a value function for each outcome we investigate. That is, there is a one-to-one correspondence between outcomes and value functions.
Here is a theory of why that might be useful: When we talk about the value of outcomes (such as “$5”), we are actually talking about that outcome in some context (such as “$5 for me now” or “$5 for someone who is very poor, now”). Preference relations can and do treat these outcomes as different depending on the context - $5 for me is worth less than $5 for someone who is very poor. Because of this, a value scale based on “$5-equivalents” will be different depending on the context of the $5.
A key proposition to motivate value ratios, Proposition 1: every outcome which we consider comes with a unique implied mixture of contexts. That is, if I say “the value of $5”, I mean ∑cP$5(c)V($5|c) where P$5 is the mixture of contexts implied by my having said “$5″.
This means, if I want to compare “the value of $10m” to “the value of saving a child’s life”, I have two options: I can compare ∑cP$10m(c)V($10m|c) to ∑cP$10m(c)V(savelife|c) or I can compare ∑cPsavelife(c)V($10m|c) to ∑cPsavelife(c)V(savelife|c). These might give me different answers, and the correct comparison depends which applied context I am considering these options in.
A value ratio could therefore be considered a table where each column is a context and each row specifies the relative value of the given item in that context. Note that, under this interpretation, we should not expect xij=1xji, unless i=j. This is because items have different values in different contexts.
This can be extended to distributions over value ratios, in which case perhaps each sample comes with a context sampled from the distribution of contexts for that column of the table (I’m not entirely sure that works, but maybe it does). This can allow us to represent within-column correlations if we know that one outcome is x times better than another, regardless of context.
I don’t think proposition 1 is plausible if we interpret it strictly. I’m pretty sure at different times people talk about the value of $5 with different implied contexts, and at other times I think people probably make some effort to consider the value of quite different outcomes in a common context. However, I think there still might be something to it. Whenever you’re weighing up different outcomes, you definitely have an implicit context in mind. Furthermore, there probably is a substantial correlation between the context and the outcome—if two different people are considering the value of saving a child’s life then there probably is substantial overlap between the contexts they’re considering. Moreover, it’s plausible that context sensitivity is an issue for the kinds of value comparisons that EAs want to make.
There’s a lot here, and it will take me some time to think about. It seems like you’re coming at this from the lens of the pairwise comparison literature. I was coming at this from the lens of (what I think is) simpler expected value maximization foundations.
I’ve spent some time trying to understand the pairwise comparison literature, but haven’t gotten very fair. What I’ve seen has been focused very much on (what seems to me) like narrow elicitation procedures. As you stated, I’m more focused on representation.
“Table of value ratios” are meant to be a natural extension of “big lists of expected values”.
You could definitely understand a “list of expected value estimates” to be a function that helps convey certain preferences, but it’s a bit of an unusual bridge, outside the pairwise comparison literature.
On Contexts
You spend a while expressing the importance of clear contexts. I agree that precise contexts are important. It’s possible that the $1 example I used was a bit misleading—the point I was trying to make is that many value ratios will be less sensitive to changes context, then absolute values (the typical alternative, in expected value theory) would be.
Valuing V($5)/V($1) should give fairly precise results, for people of many different income levels. This wouldn’t be the case if you tried converting dollars to a common unit of QALYs or something first, before dividing.
Now, I could definitely see people from the discrete choice literature saying, “of course you shouldn’t first convert to QALYs, instead you should use better mathematical abstractions to represent direct preferences”. In that case I’d agree, there’s just a somewhat pragmatic set of choices about which abstractions give a good fit of practicality and specificity. I would be very curious if people from this background would suggest other approaches to large-scale, collaborative, estimation, as I’m trying to achieve here.
I would expect that with Relative Value estimation, as with EV estimation, we’d generally want precise definitions of things, especially if they were meant as forecasting questions. But “precise definitions” could mean “a precise set of different contexts”. Like, “What is the expected value of $1, as judged by 5 random EA Forum readers, for themselves?”
The only piece of literature I had in mind was von Neumann and Morgenstern’s representation theorem. It says: if you have a set of probability distributions over a set of outcomes and for each pair of distributions you have a preference (one is better than the other, or they are equal) and if this relation satisfies the additional requirements of transitivity, continuity and independence from alternatives, then you can represent the preferences with a utility function unique up to affine transformation.
Given that this is a foundational result for expected utility theory, I don’t think it is unusual to think of a utility function as a representation of a preference relation.
Do you envision your value ratio table to be underwritten by a unique utility function? That is, could we assign a single number V(x) to every outcome x such that the table cell corresponding to three outcomes pair (x,y) is always equal to V(x)/V(y)? These utilities could be treated as noisy estimates, which allows for correlations between V(x) and V(y) for some pairs.
My remarks concern what a value ratio table might be if it is more than just a “visualisation” of a utility function.
The value ratio table, as shown, is a presentation/visualization of the utility function (assuming you have joint distributions).
The key question is how to store the information within the utility function.
It’s really messy to try to store meaningful joint distributions in regular ways, especially if you want to approximate said distributions using multiple pieces. It’s especially to do this with multiple people, because then they would need to coordinate to ensure they are using the right scales.
The value ratio functions are basically one specific way to store/organize and think about this information. I think this is feasible to work with, in order to approximate large utility functions without too many trade-offs.
“Joint distributions on values where the scales are arbitrary” seem difficult to intuit/understand, so I think that typically representing them as ratios is a useful practice.
So constructing a value ratio table means estimating a joint distribution of values from a subset of pairwise comparisons, then sampling from the distribution to fill out the table?
In that case, I think estimating the distribution is the hard part. Your example is straightforward because it features independent estimates, or simple functional relationships.
Estimation is actually pretty easy (using linear regression), and is essentially a solved problem since 1952. Scheffé, H. (1952). An Analysis of Variance for Paired Comparisons. Journal of the American Statistical Association, 47(259), 381–400. https://doi.org/10.1080/01621459.1952.10501179
I wrote about the methodology (before finding Scheffé′s paper) here.
I can see how this gets you E(valuei|comparisons) for each each item i, but not P((valuei)i∈items|comparisons). One of the advantages Ozzie raises is the possibility to keep track of correlations in value estimates, which requires more than the marginal expectations.
I’m not sure what you mean. I’m thinking about pairwise comparisons in the following way.
(a) Every pair of items i,j have a true ratio of expectations E(Xi)/E(Xj)=μij. I hope this is uncontroversial.
(b) We observe the variables Rij according to logRij∼logμij+ϵij for some some normally distributed ϵij. Error terms might be dependent, but that complicates the analysis. (And is most likely not worth it.) This step could be more controversial, as there are other possible models to use.
Note that you will get a distribution over every E(Xi) too with this approach, but that would be in the Bayesian sense, i.e., p(E(Xi)∣comparisons), when we have a prior over E(Xi).
I don’t understand your notion of context here. I’m understanding pairwise comparisons as standard decision theory—you are comparing the expected values of two lotteries, nothing more. Is the context about psychology somehow? If so, that might be interesting, but adds a layer of complexity this sort of methodology cannot be expected to handle.
Players may have different utility functions, but that might be reasonable to ignore when modelling all of this. In any case, every intervention Ai will have its own, unique, expected utility from each player p, hence xpij=E[Up(Ai)]/E[Up(Aj)]=1/xpji. (This is ignoring noise in the estimates, but that is pretty easy to handle.)
FWIW I think the general kind of model underlying what I’ve written is a joint distribution that models value something like P(c|outcome)P(V|c,outcome)
Thought about this some more. This isn’t a summary of your work, it’s an attempt to understand it in my terms. Here’s how I see it right now: we can use pairwise comparisons of outcomes to elicit preferences, and people often do, but they typically choose to insist that each outcome has a value representable as a single number and use the pairwise comparisons to decide which number to assign each outcome. Insisting that each outcome has a value is a constraint on preferences that can allow us to compute which outcome is preferred between two outcomes for which we do not have direct data.
I see this post as arguing that we should instead represent preferences as a table of value ratios. This is not about eliciting preferences, but representing them. Why would we want to represent them like this? At first glance:
If the important thing is we represent preferences as a table, then we can capture every important comparison with a table of binary preferences
If we want to impose additional constraints so that we can extrapolate preferences, preference ratios seems to push us back to assigning one or more values to every outcome
What makes value ratios different from other schemes with multiple valuation functions is that value ratios give us a value function for each outcome we investigate. That is, there is a one-to-one correspondence between outcomes and value functions.
Here is a theory of why that might be useful: When we talk about the value of outcomes (such as “$5”), we are actually talking about that outcome in some context (such as “$5 for me now” or “$5 for someone who is very poor, now”). Preference relations can and do treat these outcomes as different depending on the context - $5 for me is worth less than $5 for someone who is very poor. Because of this, a value scale based on “$5-equivalents” will be different depending on the context of the $5.
A key proposition to motivate value ratios, Proposition 1: every outcome which we consider comes with a unique implied mixture of contexts. That is, if I say “the value of $5”, I mean ∑cP$5(c)V($5|c) where P$5 is the mixture of contexts implied by my having said “$5″.
This means, if I want to compare “the value of $10m” to “the value of saving a child’s life”, I have two options: I can compare ∑cP$10m(c)V($10m|c) to ∑cP$10m(c)V(savelife|c) or I can compare ∑cPsavelife(c)V($10m|c) to ∑cPsavelife(c)V(savelife|c). These might give me different answers, and the correct comparison depends which applied context I am considering these options in.
A value ratio could therefore be considered a table where each column is a context and each row specifies the relative value of the given item in that context. Note that, under this interpretation, we should not expect xij=1xji, unless i=j. This is because items have different values in different contexts.
This can be extended to distributions over value ratios, in which case perhaps each sample comes with a context sampled from the distribution of contexts for that column of the table (I’m not entirely sure that works, but maybe it does). This can allow us to represent within-column correlations if we know that one outcome is x times better than another, regardless of context.
I don’t think proposition 1 is plausible if we interpret it strictly. I’m pretty sure at different times people talk about the value of $5 with different implied contexts, and at other times I think people probably make some effort to consider the value of quite different outcomes in a common context. However, I think there still might be something to it. Whenever you’re weighing up different outcomes, you definitely have an implicit context in mind. Furthermore, there probably is a substantial correlation between the context and the outcome—if two different people are considering the value of saving a child’s life then there probably is substantial overlap between the contexts they’re considering. Moreover, it’s plausible that context sensitivity is an issue for the kinds of value comparisons that EAs want to make.
There’s a lot here, and it will take me some time to think about. It seems like you’re coming at this from the lens of the pairwise comparison literature. I was coming at this from the lens of (what I think is) simpler expected value maximization foundations.
I’ve spent some time trying to understand the pairwise comparison literature, but haven’t gotten very fair. What I’ve seen has been focused very much on (what seems to me) like narrow elicitation procedures. As you stated, I’m more focused on representation.
“Table of value ratios” are meant to be a natural extension of “big lists of expected values”.
You could definitely understand a “list of expected value estimates” to be a function that helps convey certain preferences, but it’s a bit of an unusual bridge, outside the pairwise comparison literature.
On Contexts
You spend a while expressing the importance of clear contexts. I agree that precise contexts are important. It’s possible that the $1 example I used was a bit misleading—the point I was trying to make is that many value ratios will be less sensitive to changes context, then absolute values (the typical alternative, in expected value theory) would be.
Valuing V($5)/V($1) should give fairly precise results, for people of many different income levels. This wouldn’t be the case if you tried converting dollars to a common unit of QALYs or something first, before dividing.
Now, I could definitely see people from the discrete choice literature saying, “of course you shouldn’t first convert to QALYs, instead you should use better mathematical abstractions to represent direct preferences”. In that case I’d agree, there’s just a somewhat pragmatic set of choices about which abstractions give a good fit of practicality and specificity. I would be very curious if people from this background would suggest other approaches to large-scale, collaborative, estimation, as I’m trying to achieve here.
I would expect that with Relative Value estimation, as with EV estimation, we’d generally want precise definitions of things, especially if they were meant as forecasting questions. But “precise definitions” could mean “a precise set of different contexts”. Like, “What is the expected value of $1, as judged by 5 random EA Forum readers, for themselves?”
The only piece of literature I had in mind was von Neumann and Morgenstern’s representation theorem. It says: if you have a set of probability distributions over a set of outcomes and for each pair of distributions you have a preference (one is better than the other, or they are equal) and if this relation satisfies the additional requirements of transitivity, continuity and independence from alternatives, then you can represent the preferences with a utility function unique up to affine transformation.
Given that this is a foundational result for expected utility theory, I don’t think it is unusual to think of a utility function as a representation of a preference relation.
Do you envision your value ratio table to be underwritten by a unique utility function? That is, could we assign a single number V(x) to every outcome x such that the table cell corresponding to three outcomes pair (x,y) is always equal to V(x)/V(y)? These utilities could be treated as noisy estimates, which allows for correlations between V(x) and V(y) for some pairs.
My remarks concern what a value ratio table might be if it is more than just a “visualisation” of a utility function.
The value ratio table, as shown, is a presentation/visualization of the utility function (assuming you have joint distributions).
The key question is how to store the information within the utility function.
It’s really messy to try to store meaningful joint distributions in regular ways, especially if you want to approximate said distributions using multiple pieces. It’s especially to do this with multiple people, because then they would need to coordinate to ensure they are using the right scales.
The value ratio functions are basically one specific way to store/organize and think about this information. I think this is feasible to work with, in order to approximate large utility functions without too many trade-offs.
“Joint distributions on values where the scales are arbitrary” seem difficult to intuit/understand, so I think that typically representing them as ratios is a useful practice.
So constructing a value ratio table means estimating a joint distribution of values from a subset of pairwise comparisons, then sampling from the distribution to fill out the table?
In that case, I think estimating the distribution is the hard part. Your example is straightforward because it features independent estimates, or simple functional relationships.
Estimation is actually pretty easy (using linear regression), and is essentially a solved problem since 1952. Scheffé, H. (1952). An Analysis of Variance for Paired Comparisons. Journal of the American Statistical Association, 47(259), 381–400. https://doi.org/10.1080/01621459.1952.10501179
I wrote about the methodology (before finding Scheffé′s paper) here.
I can see how this gets you E(valuei|comparisons) for each each item i, but not P((valuei)i∈items|comparisons). One of the advantages Ozzie raises is the possibility to keep track of correlations in value estimates, which requires more than the marginal expectations.
I’m not sure what you mean. I’m thinking about pairwise comparisons in the following way.
(a) Every pair of items i,j have a true ratio of expectations E(Xi)/E(Xj)=μij. I hope this is uncontroversial. (b) We observe the variables Rij according to logRij∼logμij+ϵij for some some normally distributed ϵij. Error terms might be dependent, but that complicates the analysis. (And is most likely not worth it.) This step could be more controversial, as there are other possible models to use.
Note that you will get a distribution over every E(Xi) too with this approach, but that would be in the Bayesian sense, i.e., p(E(Xi)∣comparisons), when we have a prior over E(Xi).
I don’t understand your notion of context here. I’m understanding pairwise comparisons as standard decision theory—you are comparing the expected values of two lotteries, nothing more. Is the context about psychology somehow? If so, that might be interesting, but adds a layer of complexity this sort of methodology cannot be expected to handle.
Players may have different utility functions, but that might be reasonable to ignore when modelling all of this. In any case, every intervention Ai will have its own, unique, expected utility from each player p, hence xpij=E[Up(Ai)]/E[Up(Aj)]=1/xpji. (This is ignoring noise in the estimates, but that is pretty easy to handle.)
FWIW I think the general kind of model underlying what I’ve written is a joint distribution that models value something like P(c|outcome)P(V|c,outcome)