Someone asked for an ELI5 explanation to help them grok “what the heck is this thing”, so here is my attempt at a non-technical explainer.[1]
The high level idea is that this is supposed to be a flexible framework for evaluating head-to-head many different types of objects while not losing information where you don’t have to.
Here’s an example use case that Ozzie and I discussed IRL. [2]
Say that you want to create a framework to help evaluate how good various projects at CEA are compared to each other.
If you want to do head to head comparisons of objects (i.e. projects) that are quite similar (how much do we value this Forum post vs this other post? how valuable was EAGxBerlin vs. EAGxPrague?), it’s often somewhat easy to cache these out into the same ‘currency’ to do a reasonable approximation (e.g. karma / engagement, # connections).
But once you jump back to wanting to compare quite different objects (How many forum posts do we value relative to one EAGx?), you suddenly need to cache out into a whole other ‘currency’.
To solve this, people usually tend to try and convert everything into some sort of standard unit. e.g. 80,000 Hours might cache all of their programs out into ‘number of years of quality adjusted career speed ups bought per program’, or Giving What We Can into ‘USD moved to effective charities per program’. (These are made up for this example).
Some problems with the above approach include: (1) converting everything into the same hacky unit introduces much larger uncertainty ranges into your comparisons even when comparing similar-ish objects (forum posts or different EAGxs), and (2) is way less legible (what does an ‘adjusted speed up unit’ mean).
Suddenly, if you’re stuck with ‘# speed up years’ as your yardstick to do both the EAGxA vs. EAGxB comparison AND the ‘EAGxA vs. Forum post A’ comparison, you’re losing a ton of information in the first case.
So the core insight is, what if instead of converting everything into ONE currency and then using that currency to compare different objects…. what if actually you just treat every OBJECT as it’s own ‘currency’: i.e. relative value?
Imagine: a 4x4 matrix that contains as rows labelled as: EAGxA, EAGxB, Forum post A, Forum post B, and columns with the symmetric labels. The cells in the matrix indicate how much 1 [thing in column] is worth in terms of [thing in row]s, + the 80% CI. i.e. ‘how much do we value EAGxBerlin in terms of number of Forum post A’s’.
To populate this matrix, you can write a bunch of squiggle code to determine how much you value EAGxA vs. EAGxB (fairly small CI), Post A vs. Post B (small CI), and then one costly comparison between e.g. EAGxA v. Post A (large CI).
And now, notice you have all the information you need to fill out this matrix!
Notice especially you do not need to populate the EAGxB v. Post A, EAGxB v Post B, or EAGxA v Post B pairwise comparisons, you get those for free as long as your preferences are nicely transitive, comparable, etc.
This allows you to preserve the nice properties of being able to do a fairly robust EAGxA v EAGxB comparison, while letting you do more speculative comparisons too.
If you ever want to re-introduce ‘standard units’ like dollars or something (“we can’t just report how good all of CEA’s programs are in terms of # EAGxBerlin unit equivalents”), you can just add an ‘1 $’ object to the matrix above.
You might have other confusions like, “how do you create a set of these kinds of pairwise comparisons against each other in a robust way”, which I feel less equipped to answer.
(Ozzie pointed out privately that the above is a general challenge of value estimating using any units.)
As a disclaimer, this is not necessarily meant to represent what CEA as an organization does in our own internal evaluations — this is a toy example for learning!
Someone asked for an ELI5 explanation to help them grok “what the heck is this thing”, so here is my attempt at a non-technical explainer.[1]
The high level idea is that this is supposed to be a flexible framework for evaluating head-to-head many different types of objects while not losing information where you don’t have to.
Here’s an example use case that Ozzie and I discussed IRL. [2]
Say that you want to create a framework to help evaluate how good various projects at CEA are compared to each other.
If you want to do head to head comparisons of objects (i.e. projects) that are quite similar (how much do we value this Forum post vs this other post? how valuable was EAGxBerlin vs. EAGxPrague?), it’s often somewhat easy to cache these out into the same ‘currency’ to do a reasonable approximation (e.g. karma / engagement, # connections).
But once you jump back to wanting to compare quite different objects (How many forum posts do we value relative to one EAGx?), you suddenly need to cache out into a whole other ‘currency’.
To solve this, people usually tend to try and convert everything into some sort of standard unit. e.g. 80,000 Hours might cache all of their programs out into ‘number of years of quality adjusted career speed ups bought per program’, or Giving What We Can into ‘USD moved to effective charities per program’. (These are made up for this example).
Some problems with the above approach include: (1) converting everything into the same hacky unit introduces much larger uncertainty ranges into your comparisons even when comparing similar-ish objects (forum posts or different EAGxs), and (2) is way less legible (what does an ‘adjusted speed up unit’ mean).
Suddenly, if you’re stuck with ‘# speed up years’ as your yardstick to do both the EAGxA vs. EAGxB comparison AND the ‘EAGxA vs. Forum post A’ comparison, you’re losing a ton of information in the first case.
So the core insight is, what if instead of converting everything into ONE currency and then using that currency to compare different objects…. what if actually you just treat every OBJECT as it’s own ‘currency’: i.e. relative value?
Imagine: a 4x4 matrix that contains as rows labelled as: EAGxA, EAGxB, Forum post A, Forum post B, and columns with the symmetric labels. The cells in the matrix indicate how much 1 [thing in column] is worth in terms of [thing in row]s, + the 80% CI. i.e. ‘how much do we value EAGxBerlin in terms of number of Forum post A’s’.
To populate this matrix, you can write a bunch of squiggle code to determine how much you value EAGxA vs. EAGxB (fairly small CI), Post A vs. Post B (small CI), and then one costly comparison between e.g. EAGxA v. Post A (large CI).
And now, notice you have all the information you need to fill out this matrix!
Notice especially you do not need to populate the EAGxB v. Post A, EAGxB v Post B, or EAGxA v Post B pairwise comparisons, you get those for free as long as your preferences are nicely transitive, comparable, etc.
This allows you to preserve the nice properties of being able to do a fairly robust EAGxA v EAGxB comparison, while letting you do more speculative comparisons too.
If you ever want to re-introduce ‘standard units’ like dollars or something (“we can’t just report how good all of CEA’s programs are in terms of # EAGxBerlin unit equivalents”), you can just add an ‘1 $’ object to the matrix above.
You might have other confusions like, “how do you create a set of these kinds of pairwise comparisons against each other in a robust way”, which I feel less equipped to answer.
(Ozzie pointed out privately that the above is a general challenge of value estimating using any units.)
All mistakes are mine, and I’m writing in a personal capacity, not on behalf of my employer.
As a disclaimer, this is not necessarily meant to represent what CEA as an organization does in our own internal evaluations — this is a toy example for learning!