On the whole I liked this a lot, and I broadly agree.
Around “academics being too optimistic”: I’ve seen similar a few times before and am pretty tired of it at this point. I’m happy that interesting ideas are brought forward, but I think the bias is pretty harmful. In fairness though, this is really a community issue; if our community epistemics were better, than the overconfidence of academic takes wouldn’t have lead to much overconfidence of community beliefs.
Some thoughts: 1. I agree that the implementation of “general purpose many-employee prediction markets/tournaments” so far has been fairly costly. In the very least, at this point, it’s clearly not, “a clear big win.”
2. That said, I think the above is more restrictive than what we should limit ourselves to. Note that: A. “Structured, team prediction systems” already perform well. See regular financial projections made by accounting teams, sales figures made by sales teams, tech time delivery estimates made by technical teams. I think that these systems can very clearly be improved by “prediction tournament” methods, like using scoring rules. B. Many companies already pay for analysts/consultants/strategists to do broad forecasts. These could also be augmented with forecasting techniques.
I think it’s pretty clear that some sorts of forecasting clearly work and provide business value, and that others don’t. This post didn’t seem to get into these sorts of forecasting setups.
3. So, a few groups have tried “in-house general purpose many-employee prediction markets/tournaments”, but they haven’t taken off. I think our prior should be that they’re difficult, as most “weird new trends” either take a lot of figuring-out to get right, or just die. In order to get a sense of the potential, it’s much more important to focus on some successes, than on the success of the average participant. This means that at this point, some promising future research would be, “Try to identify the successes, and understand them very well.” (Again, I agree that our expectations should be much lower than what some have suggested. And given that, it makes sense to realize that we’re starting more from the bottom than others thought, and take the corresponding steps).
4. I didn’t particularly like the section, “must not have too large negative side-effects”:
First, I agree that cultural dynamics matter a lot, and agree with a few specific points.
This section seemed interesting as a set of hypotheses as to why internal prediction markets haven’t taken off, but isn’t very valuable as a list of reasons why we should be excited about internal prediction markets or not.
To the above point: I’m sure there are also a lot of positive side-effects, and these weren’t mentioned in this piece. It seems to me like encouraging these markets would help make an organization more truth-seeking and candid, which is a cultural transformation that seems like it could be really positive. Public discussions of a company’s prediction systems would act as advertising to recruit the kinds of people this would be a fit for. My guess is that it’s a niche thing for now, but for the right sorts of corporate cultures (like hedge funds), it could be great.
The specific points in this section seemed particularly speculative to me, like they come from a particular worldview.
One likely bias is not that it has negative side-effects, but rather, that it’s just bad for an important manager for principal-agent problems. For example, a middle manager really doesn’t want others to know that a team is failing, even if the CEO would much prefer it be known. This would make these systems attractive to top management.
5. In the “conclusions” section, the authors seem to assume a binary; that the company can either “go all in on internal prediction markets” or “not do them completely”. Maybe this was assumed as part of the research proposal, but I don’t agree with the assumption.
Doing small-scale experiments with motivated actors is really cheap and almost always the first step when trying out any new method. I agree with the conclusions that “going all in at once” is a bad strategy, but it seems really easy to try it out on smaller scales for a while and see how it goes. This could mean: 1) Try it with a few small teams who seem like particularly good fits. 2) Make it available to the entire organization, but only have around 5-20 interesting questions per year. 3) Consider really bare-bones versions of prediction tournaments. Like, just having a whiteboard of some key questions, or simple spreadsheets, or similar.
6. Thinking about it more, it seems like internal prediction systems are probably a good fit for cultures that are candid/nerdy, and a bad fit for others (especially as it’s so early).
7. There’s also a pretty big positive externality of clever groups trying these sorts of methods out and writing about them. We need some organizations to do this in order to develop the methods better. I’m not sure at all if this was a key consideration for Upstart, but I think it could be for others reading this.
I’ve seen similar a few times before and am pretty tired of it at this point
I think I’d sort of encountered the issue theoretically, and maybe some ambiguous cases, but I researched this one at some depth, and it was more shocking.
Fair point on 2. (prediction markets being too restrictive) and 3. ()
4. I think is a feature of the report being aimed at a particular company, so considerations around e.g., office politics making prediction markets fail are still important. As you kind of point out, overall this isn’t really the report I would have written for EA, and I’m glad I got bought out of that.
5. I don’t think this is what we meant, e.g., see:
Like Eli below, I am also in favour of starting with small interventions and titrating one’s way towards more significant ones.
For internal predictions, start with interventions that take the least amount of employee time
I.e., we agree that small experiments (e.g., “Delphi-like automatic prediction markets built on top of dead-simple polls”) are great. This could maybe have been expressed more clearly.
On the other hand, I didn’t really have the impression that there was someone inside Upstart willing to put in the time to do the experiments if we didn’t.
6. Sure. One thing we were afraid was cultures sort of having the incentive to pretend they were more candid that they really are. Social desirability bias feels strong.
7. (experimentation having positive externalities.) Yep!
On 4., I very much agree that this section could be more nuanced by mentioning some positive side-effects as well. There might be many managers who fear being undermined by their employees. And surely many employees might feel shameful if they are wrong all the time. However, I think the converse is also true. That managers are insecure, and would love for the company to take decisions on complex hard to determine issues collectively. And that employees would like an arena to express their thoughts on things (where their judgments are heard, and maybe even serves to influence company strategy). I think this is an important consideration that didn’t get through very clearly. There are other plausible goods of prediction markets that aren’t mentioned in the value prop, but which might be relevant to their expected value.
On the whole I liked this a lot, and I broadly agree.
Around “academics being too optimistic”: I’ve seen similar a few times before and am pretty tired of it at this point. I’m happy that interesting ideas are brought forward, but I think the bias is pretty harmful. In fairness though, this is really a community issue; if our community epistemics were better, than the overconfidence of academic takes wouldn’t have lead to much overconfidence of community beliefs.
Some thoughts:
1. I agree that the implementation of “general purpose many-employee prediction markets/tournaments” so far has been fairly costly. In the very least, at this point, it’s clearly not, “a clear big win.”
2. That said, I think the above is more restrictive than what we should limit ourselves to. Note that:
A. “Structured, team prediction systems” already perform well. See regular financial projections made by accounting teams, sales figures made by sales teams, tech time delivery estimates made by technical teams. I think that these systems can very clearly be improved by “prediction tournament” methods, like using scoring rules.
B. Many companies already pay for analysts/consultants/strategists to do broad forecasts. These could also be augmented with forecasting techniques.
I think it’s pretty clear that some sorts of forecasting clearly work and provide business value, and that others don’t. This post didn’t seem to get into these sorts of forecasting setups.
3. So, a few groups have tried “in-house general purpose many-employee prediction markets/tournaments”, but they haven’t taken off. I think our prior should be that they’re difficult, as most “weird new trends” either take a lot of figuring-out to get right, or just die. In order to get a sense of the potential, it’s much more important to focus on some successes, than on the success of the average participant. This means that at this point, some promising future research would be, “Try to identify the successes, and understand them very well.” (Again, I agree that our expectations should be much lower than what some have suggested. And given that, it makes sense to realize that we’re starting more from the bottom than others thought, and take the corresponding steps).
4. I didn’t particularly like the section, “must not have too large negative side-effects”:
First, I agree that cultural dynamics matter a lot, and agree with a few specific points.
This section seemed interesting as a set of hypotheses as to why internal prediction markets haven’t taken off, but isn’t very valuable as a list of reasons why we should be excited about internal prediction markets or not.
To the above point: I’m sure there are also a lot of positive side-effects, and these weren’t mentioned in this piece. It seems to me like encouraging these markets would help make an organization more truth-seeking and candid, which is a cultural transformation that seems like it could be really positive. Public discussions of a company’s prediction systems would act as advertising to recruit the kinds of people this would be a fit for. My guess is that it’s a niche thing for now, but for the right sorts of corporate cultures (like hedge funds), it could be great.
The specific points in this section seemed particularly speculative to me, like they come from a particular worldview.
One likely bias is not that it has negative side-effects, but rather, that it’s just bad for an important manager for principal-agent problems. For example, a middle manager really doesn’t want others to know that a team is failing, even if the CEO would much prefer it be known. This would make these systems attractive to top management.
5. In the “conclusions” section, the authors seem to assume a binary; that the company can either “go all in on internal prediction markets” or “not do them completely”. Maybe this was assumed as part of the research proposal, but I don’t agree with the assumption.
Doing small-scale experiments with motivated actors is really cheap and almost always the first step when trying out any new method. I agree with the conclusions that “going all in at once” is a bad strategy, but it seems really easy to try it out on smaller scales for a while and see how it goes. This could mean:
1) Try it with a few small teams who seem like particularly good fits.
2) Make it available to the entire organization, but only have around 5-20 interesting questions per year.
3) Consider really bare-bones versions of prediction tournaments. Like, just having a whiteboard of some key questions, or simple spreadsheets, or similar.
6. Thinking about it more, it seems like internal prediction systems are probably a good fit for cultures that are candid/nerdy, and a bad fit for others (especially as it’s so early).
7. There’s also a pretty big positive externality of clever groups trying these sorts of methods out and writing about them. We need some organizations to do this in order to develop the methods better. I’m not sure at all if this was a key consideration for Upstart, but I think it could be for others reading this.
I think I’d sort of encountered the issue theoretically, and maybe some ambiguous cases, but I researched this one at some depth, and it was more shocking.
Fair point on 2. (prediction markets being too restrictive) and 3. ()
4. I think is a feature of the report being aimed at a particular company, so considerations around e.g., office politics making prediction markets fail are still important. As you kind of point out, overall this isn’t really the report I would have written for EA, and I’m glad I got bought out of that.
5. I don’t think this is what we meant, e.g., see:
I.e., we agree that small experiments (e.g., “Delphi-like automatic prediction markets built on top of dead-simple polls”) are great. This could maybe have been expressed more clearly.
On the other hand, I didn’t really have the impression that there was someone inside Upstart willing to put in the time to do the experiments if we didn’t.
6. Sure. One thing we were afraid was cultures sort of having the incentive to pretend they were more candid that they really are. Social desirability bias feels strong.
7. (experimentation having positive externalities.) Yep!
On 4., I very much agree that this section could be more nuanced by mentioning some positive side-effects as well. There might be many managers who fear being undermined by their employees. And surely many employees might feel shameful if they are wrong all the time. However, I think the converse is also true. That managers are insecure, and would love for the company to take decisions on complex hard to determine issues collectively. And that employees would like an arena to express their thoughts on things (where their judgments are heard, and maybe even serves to influence company strategy). I think this is an important consideration that didn’t get through very clearly. There are other plausible goods of prediction markets that aren’t mentioned in the value prop, but which might be relevant to their expected value.