I was going to raise a similar comment to what others have said here. I hope this adds something.
I think we need to distinguish quality and quantity of ‘output’ from ‘success’ (the outcome of their output). I am deliberately not using ‘performance’ as it’s unclear, in common language, which one of the two it refers to. Various outputs are sometimes very reproducible—anyone can listen to a music track, or read an academic paper. There are often huge rewards to being the best vs second best—eg winning in sports. And sometimes success generates further success (the ‘Matthew effect’) - more people want to work with you, etc. Hence, I don’t find it all weird to think that small differences in outputs, as measured on some cardinal scale, sometimes generate huge differences in outcomes.
I’m not sure exactly what follows from this. I’m a bit worried you’re concentrated on the wrong metric—success—when it’s outputs that are more important. Can you explain why you focus on outcomes?
Let’s say you’re thinking about funding research. How much does it matter to fund the best person? I mean, they will get most of the credit, but if you fund the less-than-best, that person’s work is probably not much worse and ends up being used by the best person anyway. If the best person gets 1,000 more citations, should you be prepared to spend 1,000 more to fund their work? Not obviously.
I’m suspicious you can do a good job of predicting ex ante outcomes. After all, that’s what VCs would want to do and they have enormous resources. Their strategy is basically to pick as many plausible winners as they can fund.
It might be interesting to investigate differences in quality and quantity of outputs separately. Intuitively, it seems the best people do produce lots more work than the good people, but it’s less obvious the quality of the best people is much higher than of the good. I recognise all these terms are vague.
On your main point, this was the kind of thing we were trying to make clearer, so it’s disappointing that hasn’t come through.
Just on the particular VC example:
I’m suspicious you can do a good job of predicting ex ante outcomes. After all, that’s what VCs would want to do and they have enormous resources. Their strategy is basically to pick as many plausible winners as they can fund.
YC having a low acceptance rate could mean they are highly confident in their ability to predict ex ante outcomes. It could also mean that they get a lot of unserious applications. Essays such as this one by Paul Graham bemoaning the difficulty of predicting ex ante outcomes make me think it is more the latter. (“it’s mostly luck once you get down to the top 1-5%” makes it sound to me like ultra-successful startups should have elite founders, but my take on Graham’s essay is that ultra-successful startups tend to be unusual, often in a way that makes them look non-elite according to traditional metrics—I tend to suspect this is true of exceptionally innovative people more generally)
I’m not trying to be obtuse, it wasn’t super clear to me on a quick-ish skim; maybe if I’d paid more attention I’ve have clocked it.
Yup, I was too hasty on VCs. It seems like they are pretty confident they know what the top >5% are, but not that can say anything more precise than. (Although I wonder what evidence indicates they can reliably tell the top 5% from those below, rather than they just think they can).
(Although I wonder what evidence indicates they can reliably tell the top 5% from those below, rather than they just think they can).
The Canadian inventors assistance program provides a rating of how good an invention is to inventors for a nominal fee. A large fraction of the people who get a bad rating try to make a company anyway, so we can judge the accuracy of their evaluations.
55% of the inventions which they give the highest rating to achieve commercial success, compared to 0% for the lowest rating.
ah, this is great. evidence the selectors could tell the top 2% from the rest, but 2%-20% was much of a muchness. Shame that it doesn’t give any more information on ‘commercial success’.
I’m not trying to be obtuse, it wasn’t super clear to me on a quick-ish skim; maybe if I’d paid more attention I’ve have clocked it.
FWIW I think it’s the authors’ job to anticipate how their audience is going to engage with their writing, where they’re coming from etc. - You were not the only one who reacted by pushing back against our framing as evident e.g. from Khorton’s much upvoted comment.
So no matter what we tried to convey, and what info is in the post or document if one reads closely enough, I think this primarily means that I (as main author of the wording in the post) could have done a better job, not that you or anyone else is being obtuse.
I’m suspicious you can do a good job of predicting ex ante outcomes. After all, that’s what VCs would want to do and they have enormous resources. Their strategy is basically to pick as many plausible winners as they can fund.
I agree that looking at e.g. VC practices is relevant evidence. However, it seems to me that if VCs thought they couldn’t predict anything, they would allocate their capital by a uniform lottery among all applicants, or something like that. I’m not aware of a VC adopting such a strategy (though possible I just haven’t heard of it); to the extent that they can distinguish “plausible” from “implausible” winners, this does suggest some amount of ex-ante predictability. Similarly, my vague impression is that VCs and other investors often specialize by domain/sector, which suggests they think they can utilize their knowledge and network when making decisions ex ante.
Sure, predictability may be “low” in some sense, but I’m not sure we’re saying anything that would commit us to denying this.
Yeah, I’d be interested to know if VC were better than chance. Not quite sure how you would assess this, but probably someone’s tried.
But here’s where it seems relevant. If you want to pick the top 1% of people, as they provide so much of the value, but you can only pick the top 10%, then your efforts to pick are much less cost-effective and you would likely want to rethink how you did it.
I think it’s plausible that VCs aren’t better than chance when choosing between a suitably restricted “population”, i.e. investment opportunities that have passed some bar of “plausibility”.
I don’t think it’s plausible that they are no better than chance simpliciter. In that case I would expect to see a lot of VCs who cut costs by investing literally zero time into assessing investment opportunities and literally fund on a first-come first-serve or lottery basis.
And yes, I totally agree that how well we can predict (rather than just the question whether predictability is zero or nonzero) is relevant in practice.
If the ex-post distribution is heavy-tailed, there are a bunch of subtle considerations here I’d love someone to tease out. For example, if you have a prediction method that is very good for the bottom 90% but biased toward ‘typical’ outcomes, i.e. the median, then you might be better off in expectation to allocate by a lottery over the full population (b/c this gets you the mean, which for heavy-tailed distributions will be much higher than the median).
Data from the IAP indicates that they can identify the top few percent of successful inventions with pretty good accuracy. (Where “success” is a binary variable – not sure how they perform if you measure financial returns.)
I’m not sure exactly what follows from this. I’m a bit worried you’re concentrated on the wrong metric—success—when it’s outputs that are more important. Can you explain why you focus on outcomes?
I’m not sure I agree that outputs are more important. I think it depends a lot on the question or decision we’re considering, which is why I highlighted a careful choice of metric as one of the key pieces of advice.
So e.g. if our goal is to set performance incentives (e.g. salaries), then it may be best to reward people for things that are under their control. E.g. pay people more if they work longer hours (inputs), or if there are fewer spelling mistakes in their report (cardinal output metric) or whatever. At other times, paying more attention to inputs or outputs rather than outcomes or things beyond the individual performer’s control may be justified by considerations around e.g. fairness or equality.
All of these things are of course really important to get right within the EA community as well, whether or not we care about them instrumentally or intrinsically. There are lot of tricky and messy questions here.
But if we can say anything general, then I think that especially in EA contexts we care more, ore more often, about outcomes/success/impact on the world, and less about inputs and outputs, than usual. We want to maximize well-being, and from ‘the point of view of the universe’ it doesn’t ultimately matter if someone is happy because someone else produced more outputs or because the same outputs had greater effects. Nor does it ultimately matter if impact differences are due to differences in talent, resource endowments, motivation, luck, or …
Another way to see this is that often actors that care more about inputs or outputs do so because they don’t internalize all the benefits from outcomes. But if a decision is motivated by impartial altruism, there is a sense in which there are no externalities.
Of course, we need to make all the usual caveats against ‘naive consequentialism’. But I do think there is something important in this observation.
I was thinking the emphasis on outputs might be the important part as those are more controllable than outcomes, and so the decision-relevant bit, even though we want to maximise impartial value (outcomes).
I can imagine someone thinking the following way: “we must find and fund the best scientists because they have such outsized outcomes, in terms of citations.” But that might be naive if it’s really just the top scientist who gets the citations and the work of all the good scientists has a more or less equal contribution to impartial value.
I’m sympathetic to the point that we’re lumping together quite different things under the vague label “performance”, perhaps stretching its beyond its common use. That’s why I said in bold that we’re using a loose notion of performance. But it’s possible it would have been better if I had spent more time to come up with a better terminology.
Okay good! Yeah, I would be curious to see how much it changed the analysis distinguishing outputs from outcomes and, further, between different types of outputs.
I was going to raise a similar comment to what others have said here. I hope this adds something.
I think we need to distinguish quality and quantity of ‘output’ from ‘success’ (the outcome of their output). I am deliberately not using ‘performance’ as it’s unclear, in common language, which one of the two it refers to. Various outputs are sometimes very reproducible—anyone can listen to a music track, or read an academic paper. There are often huge rewards to being the best vs second best—eg winning in sports. And sometimes success generates further success (the ‘Matthew effect’) - more people want to work with you, etc. Hence, I don’t find it all weird to think that small differences in outputs, as measured on some cardinal scale, sometimes generate huge differences in outcomes.
I’m not sure exactly what follows from this. I’m a bit worried you’re concentrated on the wrong metric—success—when it’s outputs that are more important. Can you explain why you focus on outcomes?
Let’s say you’re thinking about funding research. How much does it matter to fund the best person? I mean, they will get most of the credit, but if you fund the less-than-best, that person’s work is probably not much worse and ends up being used by the best person anyway. If the best person gets 1,000 more citations, should you be prepared to spend 1,000 more to fund their work? Not obviously.
I’m suspicious you can do a good job of predicting ex ante outcomes. After all, that’s what VCs would want to do and they have enormous resources. Their strategy is basically to pick as many plausible winners as they can fund.
It might be interesting to investigate differences in quality and quantity of outputs separately. Intuitively, it seems the best people do produce lots more work than the good people, but it’s less obvious the quality of the best people is much higher than of the good. I recognise all these terms are vague.
On your main point, this was the kind of thing we were trying to make clearer, so it’s disappointing that hasn’t come through.
Just on the particular VC example:
Most VCs only pick from the top 1-5% of startups. E.g. YC’s acceptance rate is 1%, and very few startups they reject make it to series A. More data on VC acceptance rates here: https://80000hours.org/2014/06/the-payoff-and-probability-of-obtaining-venture-capital/
So, I think that while it’s mostly luck once you get down to the top 1-5%, I think there’s a lot of predictors before that.
Also see more on predictors of startup performance here: https://80000hours.org/2012/02/entrepreneurship-a-game-of-poker-not-roulette/
YC having a low acceptance rate could mean they are highly confident in their ability to predict ex ante outcomes. It could also mean that they get a lot of unserious applications. Essays such as this one by Paul Graham bemoaning the difficulty of predicting ex ante outcomes make me think it is more the latter. (“it’s mostly luck once you get down to the top 1-5%” makes it sound to me like ultra-successful startups should have elite founders, but my take on Graham’s essay is that ultra-successful startups tend to be unusual, often in a way that makes them look non-elite according to traditional metrics—I tend to suspect this is true of exceptionally innovative people more generally)
Hello Ben.
I’m not trying to be obtuse, it wasn’t super clear to me on a quick-ish skim; maybe if I’d paid more attention I’ve have clocked it.
Yup, I was too hasty on VCs. It seems like they are pretty confident they know what the top >5% are, but not that can say anything more precise than. (Although I wonder what evidence indicates they can reliably tell the top 5% from those below, rather than they just think they can).
The Canadian inventors assistance program provides a rating of how good an invention is to inventors for a nominal fee. A large fraction of the people who get a bad rating try to make a company anyway, so we can judge the accuracy of their evaluations.
55% of the inventions which they give the highest rating to achieve commercial success, compared to 0% for the lowest rating.
https://www.researchgate.net/publication/227611370_Profitable_Advice_The_Value_of_Information_Provided_by_Canadas_Inventors_Assistance_Program
ah, this is great. evidence the selectors could tell the top 2% from the rest, but 2%-20% was much of a muchness. Shame that it doesn’t give any more information on ‘commercial success’.
This is amazing data, and not what I would have expected—I’ve just had my mind changed on the predictability of invention success. Thanks!
This is really cool, thank you!
That’s very interesting, thanks for sharing!
ETA: I’ve added this to our doc acknowledging your comment.
FWIW I think it’s the authors’ job to anticipate how their audience is going to engage with their writing, where they’re coming from etc. - You were not the only one who reacted by pushing back against our framing as evident e.g. from Khorton’s much upvoted comment.
So no matter what we tried to convey, and what info is in the post or document if one reads closely enough, I think this primarily means that I (as main author of the wording in the post) could have done a better job, not that you or anyone else is being obtuse.
I agree that looking at e.g. VC practices is relevant evidence. However, it seems to me that if VCs thought they couldn’t predict anything, they would allocate their capital by a uniform lottery among all applicants, or something like that. I’m not aware of a VC adopting such a strategy (though possible I just haven’t heard of it); to the extent that they can distinguish “plausible” from “implausible” winners, this does suggest some amount of ex-ante predictability. Similarly, my vague impression is that VCs and other investors often specialize by domain/sector, which suggests they think they can utilize their knowledge and network when making decisions ex ante.
Sure, predictability may be “low” in some sense, but I’m not sure we’re saying anything that would commit us to denying this.
Yeah, I’d be interested to know if VC were better than chance. Not quite sure how you would assess this, but probably someone’s tried.
But here’s where it seems relevant. If you want to pick the top 1% of people, as they provide so much of the value, but you can only pick the top 10%, then your efforts to pick are much less cost-effective and you would likely want to rethink how you did it.
I think it’s plausible that VCs aren’t better than chance when choosing between a suitably restricted “population”, i.e. investment opportunities that have passed some bar of “plausibility”.
I don’t think it’s plausible that they are no better than chance simpliciter. In that case I would expect to see a lot of VCs who cut costs by investing literally zero time into assessing investment opportunities and literally fund on a first-come first-serve or lottery basis.
And yes, I totally agree that how well we can predict (rather than just the question whether predictability is zero or nonzero) is relevant in practice.
If the ex-post distribution is heavy-tailed, there are a bunch of subtle considerations here I’d love someone to tease out. For example, if you have a prediction method that is very good for the bottom 90% but biased toward ‘typical’ outcomes, i.e. the median, then you might be better off in expectation to allocate by a lottery over the full population (b/c this gets you the mean, which for heavy-tailed distributions will be much higher than the median).
Data from the IAP indicates that they can identify the top few percent of successful inventions with pretty good accuracy. (Where “success” is a binary variable – not sure how they perform if you measure financial returns.)
I’m not sure I agree that outputs are more important. I think it depends a lot on the question or decision we’re considering, which is why I highlighted a careful choice of metric as one of the key pieces of advice.
So e.g. if our goal is to set performance incentives (e.g. salaries), then it may be best to reward people for things that are under their control. E.g. pay people more if they work longer hours (inputs), or if there are fewer spelling mistakes in their report (cardinal output metric) or whatever. At other times, paying more attention to inputs or outputs rather than outcomes or things beyond the individual performer’s control may be justified by considerations around e.g. fairness or equality.
All of these things are of course really important to get right within the EA community as well, whether or not we care about them instrumentally or intrinsically. There are lot of tricky and messy questions here.
But if we can say anything general, then I think that especially in EA contexts we care more, ore more often, about outcomes/success/impact on the world, and less about inputs and outputs, than usual. We want to maximize well-being, and from ‘the point of view of the universe’ it doesn’t ultimately matter if someone is happy because someone else produced more outputs or because the same outputs had greater effects. Nor does it ultimately matter if impact differences are due to differences in talent, resource endowments, motivation, luck, or …
Another way to see this is that often actors that care more about inputs or outputs do so because they don’t internalize all the benefits from outcomes. But if a decision is motivated by impartial altruism, there is a sense in which there are no externalities.
Of course, we need to make all the usual caveats against ‘naive consequentialism’. But I do think there is something important in this observation.
I was thinking the emphasis on outputs might be the important part as those are more controllable than outcomes, and so the decision-relevant bit, even though we want to maximise impartial value (outcomes).
I can imagine someone thinking the following way: “we must find and fund the best scientists because they have such outsized outcomes, in terms of citations.” But that might be naive if it’s really just the top scientist who gets the citations and the work of all the good scientists has a more or less equal contribution to impartial value.
FWIW, it’s not clear we’re disagreeing!
Thanks for this comment!
I’m sympathetic to the point that we’re lumping together quite different things under the vague label “performance”, perhaps stretching its beyond its common use. That’s why I said in bold that we’re using a loose notion of performance. But it’s possible it would have been better if I had spent more time to come up with a better terminology.
Okay good! Yeah, I would be curious to see how much it changed the analysis distinguishing outputs from outcomes and, further, between different types of outputs.