I’m currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.
Ozzie Gooen(Ozzie Gooen)
Thanks for writing this up!
Quick thing I flagged:
> Probably a spectrum is far too simple a way of thinking of this. Probably it’s more complicated, but I think economic forces probably push more toward the middle of the spectrum, not the very extreme end, for the reason that: suppose you’re employing someone to plan your wedding, you would probably like them to stick to the wedding planning and, you know, choose flowers and music and stuff like that, and not try to fix any problems in your family at the moment so that the seating arrangement can be better. [You want them to] understand what the role is and only do that. Maybe it would be better if they were known to be very good at this, so things could be different in the future, but it seems to be [that] economic forces push away from zero agency but also away from very high agency.
I think this intuition is misleading. In most cases we can imagine, a wedding planner that attempts to do other things would be bad at it, and thus undesirable. There’s a trope of people doing more than they’re tasked for, which often goes badly—one reason is that given that they were tasked with something specific, the requester probably assumed that they would mess up other things.
If the agent is good enough to actually do a good job at other things, the situation would look different from the start. If I knew that this person who can do “wedding planning” is also awesome at doing many things, including helping with my finances and larger family issues, then I’d probably ask them something more broad, like, “just make my life better”.
In cases where I trust someone to do a good job at many broad things in my life or business, I typically assign tasks accordingly.
Now, with GPT4, I’m definitely asking it to do many kinds of tasks.
I think economic forces are pushing for many bounded workflows, but because that’s just more effective and economical—it’s easier to make a great experience at “AI for writing”—not because people really otherwise would want it that way.
The value ratio table, as shown, is a presentation/visualization of the utility function (assuming you have joint distributions).
The key question is how to store the information within the utility function.
It’s really messy to try to store meaningful joint distributions in regular ways, especially if you want to approximate said distributions using multiple pieces. It’s especially to do this with multiple people, because then they would need to coordinate to ensure they are using the right scales.
The value ratio functions are basically one specific way to store/organize and think about this information. I think this is feasible to work with, in order to approximate large utility functions without too many trade-offs.
“Joint distributions on values where the scales are arbitrary” seem difficult to intuit/understand, so I think that typically representing them as ratios is a useful practice.
Is the meaning of each entry “How many times more value is there in than in ? (Provide a distribution)”?
Yep, that’s basically it.
Would one only use ‘direct steps’ in decision-making? How is “path dependency” interpreted?
I’m not sure what you are referring to here. I would flag that the relative value type specification is very narrow—it just states how valuable things are, not the “path of impact” or anything like that.
what is the necessary knowledge for people who want to use relative value functions? Can I do worse compared to using a single unit by using relative values naively?
You need some programming infrastructure to do them. The Squiggle example I provided is one way of going about this. I’d flag that it would take some fiddling to do this in other languages.
If you try doing relative values “naively” (without functions), then I’d expect you’d run into issues when dealing with a lot of heterogenous kinds of value estimates. (Assuming you’d be trying to compare them all to each other). Single unit evaluations are fine for small lists of similar things.
the tables created in the web app are fully compatible with having a single unit.
For every single table, one could use a single line of the table to generate the rest of the table. Knowing for all , we can use to construct arbitrary entries.
The problem here are the correlations. The function you describe would work, if you kept correlations, but this would be very difficult.
In practice, when lists are done with respect to a single unit, the correlations / joint densities are basically never captured.
If you don’t capture the correlations, then the equation you provided would result in a value that is often much more uncertain than would actually be the case.
There’s a lot here, and it will take me some time to think about. It seems like you’re coming at this from the lens of the pairwise comparison literature. I was coming at this from the lens of (what I think is) simpler expected value maximization foundations.
I’ve spent some time trying to understand the pairwise comparison literature, but haven’t gotten very fair. What I’ve seen has been focused very much on (what seems to me) like narrow elicitation procedures. As you stated, I’m more focused on representation.
“Table of value ratios” are meant to be a natural extension of “big lists of expected values”.
You could definitely understand a “list of expected value estimates” to be a function that helps convey certain preferences, but it’s a bit of an unusual bridge, outside the pairwise comparison literature.
You spend a while expressing the importance of clear contexts. I agree that precise contexts are important. It’s possible that the $1 example I used was a bit misleading—the point I was trying to make is that many value ratios will be less sensitive to changes context, then absolute values (the typical alternative, in expected value theory) would be.
Valuing V($5)/V($1) should give fairly precise results, for people of many different income levels. This wouldn’t be the case if you tried converting dollars to a common unit of QALYs or something first, before dividing.
Now, I could definitely see people from the discrete choice literature saying, “of course you shouldn’t first convert to QALYs, instead you should use better mathematical abstractions to represent direct preferences”. In that case I’d agree, there’s just a somewhat pragmatic set of choices about which abstractions give a good fit of practicality and specificity. I would be very curious if people from this background would suggest other approaches to large-scale, collaborative, estimation, as I’m trying to achieve here.
I would expect that with Relative Value estimation, as with EV estimation, we’d generally want precise definitions of things, especially if they were meant as forecasting questions. But “precise definitions” could mean “a precise set of different contexts”. Like, “What is the expected value of $1, as judged by 5 random EA Forum readers, for themselves?”
If all we are doing is binary comparisons between a set of items, it seems to me that it would be sufficient to represent relative values as a binary—i.e., is item1 better, or item2?
Why do you think this is all we’re doing? We often want to know how much better some items are than others—relative values estimate this information.
You can think of relative values a lot like “advanced and scalable expected value calculations”. There are many reasons to actually know the expected value of something. If you want to do extrapolation (“The EV of one person going blind is ~0.3 QALYs/year, so the EV of 20 people going blind is probably...”), it’s often not too hard to ballpark it.
Related, businesses often use dollar approximations of the costs of very different things. This is basically a set of estimates of the value of the cost.
Thanks for the comment!
If people were doing it by hand, there could be contradictory properties, as you mention. But with programming, which we likely want anyway, it’s often trivial or straightforward to make consistent tables.
> I think that this is actually the additional information which having such a table adds compared to using a single central unit of comparison. If there were no path dependency, the table would be redundant and could be replaced by a single central unit (= any single line of the table). This makes me extra curious about the question of what this “extra information” really means?
I think you might not quite yet grok the main benefits of relative values I’m trying to get at. I’ve had a hard time explaining them. It’s possible that going through the web app, especially with the video demo, would help.
Single tables could work for very similar kinds of items, but have a lot of trouble with heterogeneous items. There’s often no unit that’s a good fit for everything. If you were to try to put things into one table, you’d get the problems I flag in the two thought experiments.
> Possibly something like this is the best we can do as long as we cannot define an explicit utility function
To be clear, relative values, as I suggest, are basically more explicit than utility functions, not less. You still create explicit utility functions, but there’s better support for appreciating some uncertain combinations, while storing other signal.
Relative Value Functions: A Flexible New Format for Value Estimation
Good to know, thanks!
>We have a shared event calendar so that you can track whether your usage spikes might overlap
Minor note: The link to the calendar seems broken for me.
Happy to see this! My impression is that it could be really great to get a lot of EAs in spaces like this, but also that it can be quite tricky to do. Hopefully as the tech improves it will become easier.
I also am a fan of Immersed VR, but fewer people have VR headsets and work in them. I find that more engaging though.
I’ll try the Gather Town out, will see if I can use it in my workflow.
Yea, we don’t have many great options here. Have a choice between words with mediocre connotations and making up new terms.
The main thing to me is the concept. I see myself calling this “local/private benefit” in some cases, and “convenience or comfort” in others, but I’ll try to make it clear they’re referring to the same thing.
Hopefully if it’s a good idea, some common word will catch on, then we could consistently use that.
Thinking of Convenience as an Economic Term
Patrick Gruban on Effective Altruism Germany and Nonprofit Boards in EA
Estimating Everything Everywhere Always
Owain Evans on LLMs, Truthful AI, AI Composition, and More
Seeking expertise to improve EA organizations
I agree, the legal aspect is my main concern, double so if people can exchange/sell these agreements later on.
One weakness that these Agreements have is that they require the client (or a third party) to ensure that questions are written and scored, instead of the consultant.
This is a similar issue that Prediction Markets have, but not one that existing forecasting contracts often have. These contracts often have the forecasting contractors do the work of question specification and resolution.
So, Accuracy Agreements are probably in-between Prediction Markets and current contractor agreements, in complexity.
For big government construction projects, I believe some firms/agencies will do a lot of preparation and outlining, before a bidding process might begin. Getting things specific enough for a large bidding process is itself a fair bit of work. This can be useful for large projects, or in cases where the public has little trust in the key decision makers, but is probably cost-prohibitive for other situations.
Yea, I really don’t think they’re complicated conceptually, it’s just tricky to be explicit about. It’s a fairly simple format all things considered.
I think that using them in practice takes a little time to feel very comfortable. I imagine most users won’t need to think about a lot of the definitions that much.