Can you explain why attributing all impact to senior staff increases the width of the confidence interval (in log space)? I’d naively expect this to remove a source of uncertainty.
I had a quick look at the Guesstimate model, and what I think is going on is that you just have much wider error bars over how much senior staff time will be taken; but you include scenarios with negative senior staff time(!), which may contribute significantly to expectation of the value-per-year figure, but isn’t very meaningful. Am I just confused?
This doesn’t look fixed to me (possible I’m seeing an older cached version?). I no longer see negative numbers in the summary statistics, but you’re still dividing by things involving normal distributions—these have a small chance of being extremely small or even negative. That in turn means that the expectation of the eventual distribution is undefined.
Empirically I think this is happening, because:
(i) the sampling seems unstable—refreshing the page a few times gives me quite different answers each time;
(ii) the “sensitivity” tool in Guesstimate suggests something funny is going on there (but I’m not sure exactly how this diagnostic tool works, so take with some salt).
To avoid this, I’d change all of the normal distributions that you may end up dividing by to log-normals.
I think this has removed the pathology. There’s still more variation in this number, but that comes from more uncertainty about amount of senior staff time needed. If the decision-relevant question under consideration is “how many of these could we do sequentially?” then this uncertainty is appropriate to weight like this.
No, that’s a good and valid question and I’m unsure of the answer. At first I thought it was because I had not properly separated out some of the calculations (e.g., I had a single cell that accounted for both number of years to take to become a top charity and number of staff years per year). But once I separated them out (see “Alternate Calculation” in the model), the uncertainty range is still larger for senior staff than all staff, even though the all staff calculation unambiguously includes all the values of the senior staff calculation and more.
So now I think it’s either possible there’s an error somewhere in my model or maybe some of the uncertainty are interfering with each other and cancelling out. I think this second one can happen when you divide two intervals.
Also, the fact that the senior staff number range includes a number close to 0 would create the potential for a very large tail of high impact, which would not be present on the total staff year model which does not include numbers close to 0.
Can you explain why attributing all impact to senior staff increases the width of the confidence interval (in log space)? I’d naively expect this to remove a source of uncertainty.
I had a quick look at the Guesstimate model, and what I think is going on is that you just have much wider error bars over how much senior staff time will be taken; but you include scenarios with negative senior staff time(!), which may contribute significantly to expectation of the value-per-year figure, but isn’t very meaningful. Am I just confused?
That part is now fixed, but it doesn’t look like it contributed meaningfully to the end calculation.
This doesn’t look fixed to me (possible I’m seeing an older cached version?). I no longer see negative numbers in the summary statistics, but you’re still dividing by things involving normal distributions—these have a small chance of being extremely small or even negative. That in turn means that the expectation of the eventual distribution is undefined.
Empirically I think this is happening, because: (i) the sampling seems unstable—refreshing the page a few times gives me quite different answers each time; (ii) the “sensitivity” tool in Guesstimate suggests something funny is going on there (but I’m not sure exactly how this diagnostic tool works, so take with some salt).
To avoid this, I’d change all of the normal distributions that you may end up dividing by to log-normals.
Okay, I’ve now done this.
Let me know if you think the model is better and I can update the post.
re (1), that is true because Guesstimate uses a Monte Carlo method with 5K samples I think.
re (2), I don’t know how to read the sensitivity outputs well, but nothing looks weird to me. Could you explain?
I think this has removed the pathology. There’s still more variation in this number, but that comes from more uncertainty about amount of senior staff time needed. If the decision-relevant question under consideration is “how many of these could we do sequentially?” then this uncertainty is appropriate to weight like this.
Thanks. I updated the post accordingly.
No, that’s a good and valid question and I’m unsure of the answer. At first I thought it was because I had not properly separated out some of the calculations (e.g., I had a single cell that accounted for both number of years to take to become a top charity and number of staff years per year). But once I separated them out (see “Alternate Calculation” in the model), the uncertainty range is still larger for senior staff than all staff, even though the all staff calculation unambiguously includes all the values of the senior staff calculation and more.
So now I think it’s either possible there’s an error somewhere in my model or maybe some of the uncertainty are interfering with each other and cancelling out. I think this second one can happen when you divide two intervals.
Also, the fact that the senior staff number range includes a number close to 0 would create the potential for a very large tail of high impact, which would not be present on the total staff year model which does not include numbers close to 0.