My comment is gonna be pretty negative, but I think it’s excellent that you included a cost-effectiveness estimate (CEE). IMO, CEEs should pretty much be mandatory for cause area proposals, and it bugs me every time I see a post without one. Writing a CEE makes you easier to pick on (which is why I’m picking on you), but having a flawed CEE is far preferable to being not even wrong. And your CEE is pretty structurally solid, and I’m glad you made it. I like how you look at not just the EV but the variance because variance matters—a higher variance means the new intervention has a better chance of being better than the baseline intervention. I also like that you’re comparing consistent units (happy hours per dollar).
Ok, here’s the negative part:
The CEE in the linked Guesstimate looks optimistic to the point of being impossible. Given the quoted numbers of 32 acts of kindness per day with each act producing an average of 0.7 happy hours, that’s 22 happy hours produced per person-day of acts of kindness. If you said people’s acts of kindness increased overall happiness by 10%, I’d say that sounds too high. If you say it produces 22 happy hours, when the average person is only awake for 17 hours...well that’s not even possible.
I am also very skeptical of the reported claim that a one-time intervention of “watching an elevating video, enacting prosocial behaviors, and
reflecting on how those behaviors relate to one’s value” (Baumsteiger 2019) can produce an average of 1600 additional acts of kindness per person. That number sounds about 1000x too high to me.
In general, psych studies are infamous for reporting impossibly massive effects and then failing to replicate. The given cost-effectiveness involves a conjunction of several impossibly massive effects, producing a resulting cost-effectiveness that I would guess is about 100,000x too high.
This still doesn’t tell us the cost-effectiveness of the proposed research project, which is what you were trying to estimate. The upside to the research project basically entirely comes from the small probability that the intervention turns out to be way more cost-effective than I think it is, and I think the chance of that is much higher than 1 in 100,000, but I still think the Guesstimate is significantly overestimating the cost-effectiveness of further research in this area.
This isn’t to say no behavioral intervention could be cost-effective; I understand that this was just one idea, and it’s a relatively simple model. I do think it’s important to have a promising preliminary cost-effectiveness estimate before putting a lot of funding into a cause area.
I have investigated the issues you highlighted, diagnosed the underlying errors, and revised the model accordingly. The root of the problem was that I had sourced some of the estimates of the frequency of prosocial behavior from studies on social behavior under special, unrepresentative conditions, such as infants interacting with adults for 10 min while being observed by researchers and prosocial behavior in TV series. I have removed those biased estimates of the frequency of prosocial behavior in the real world. As a consequence, the predicted lifetime increase in the number of kind acts per person reached by the intervention dropped from 1600 to 64. The predicted cost-effectiveness of the research dropped from 110 times the cost-effectiveness of StrongMinds to 7.5 times the cost-effectiveness of StrongMinds.
In producing this revised version, I also made a few additional improvements. The most consequential of those was to base the estimated cost of deploying the intervention on empirical data on the effectiveness of online advertising in $ per install.
I am currently using Squiggle to program a much more rigorous version of this analysis. That version will include additional improvements and rigorously document and justify each of the model’s assumptions.
Thank you for your feedback, Michael, and thank you very much for making me aware of those specialized prediction platforms. I really like your suggestion. I think making predictions about the likely results of replication studies would be helpful for me. It would push me to critically examine and quantify how much confidence I should put in the studies my models rely on. Obtaining the predictions of other people would be a good way to make that assessment more objective. We could then incorporate the aggregate prediction into the model. Moreover, we could use prediction markets to obtain estimates or forecasts for quantities for which no published studies are available yet. I think it might be a good idea to incorporate those steps into our methodology. I will discuss that with our team today.
Thank you for engaging with and critiquing the cost-effectiveness analysis, Michael! There seem to be a few misunderstandings I would like to correct.
The CEE in the linked Guesstimate looks optimistic to the point of being impossible. Given the quoted numbers of 32 acts of kindness per day with each act producing an average of 0.7 happy hours, that’s 22 happy hours produced per person-day of acts of kindness. If you said people’s acts of kindness increased overall happiness by 10%, I’d say that sounds too high. If you say it produces 22 happy hours, when the average person is only awake for 17 hours...well that’s not even possible.
The value you calculated is the sum of the additional happiness of all the people to whom the person was kind. This includes everyone they interacted with that day in any way. This includes everyone from the strangers they smiled at, to the friends they messaged, the colleagues they helped at work, the customers they served, their children, their partner, and their parents and other family members. If you consider that the benefit for the kindness might be benefited over more than a dozen people, then 22 hours of happiness, might be no more than 1-2 hours per person. Moreover, the estimates also take into account that a person who benefits from your kindness today might still be slightly more happy tomorrow.
I am also very skeptical of the reported claim that a one-time intervention of “watching an elevating video, enacting prosocial behaviors, and reflecting on how those behaviors relate to one’s value” (Baumsteiger 2019) can produce an average of 1600 additional acts of kindness per person. That number sounds about 1000x too high to me.
The intervention by Baumsteiger (2019) was a multi-session program that lasted 12 days and involved planning, performing, and documenting one’s prosocial behavior for 10 days in a row. The effect sizes distribution in the Guesstimate model is based on many different studies, some of which were even more intensive.
In general, psych studies are infamous for reporting impossibly massive effects and then failing to replicate.
Most of the estimates are based on meta-analyses of many studies. The results of meta-analyses are substantially more robust and more reliable than the result of a single study.
I think you are right that this first estimate was too optimistic. In particular, the probability distribution of the frequency of prosocial behavior is currently based on four estimates from different studies. One of those studies led to an estimate that appears to be far too high. This might be because they defined prosocial behavior more liberally because it involved interactions with children, or because participants knew that they were being observed. I will think about what the more general problem might be and how it can be addressed systematically.
My comment is gonna be pretty negative, but I think it’s excellent that you included a cost-effectiveness estimate (CEE). IMO, CEEs should pretty much be mandatory for cause area proposals, and it bugs me every time I see a post without one. Writing a CEE makes you easier to pick on (which is why I’m picking on you), but having a flawed CEE is far preferable to being not even wrong. And your CEE is pretty structurally solid, and I’m glad you made it. I like how you look at not just the EV but the variance because variance matters—a higher variance means the new intervention has a better chance of being better than the baseline intervention. I also like that you’re comparing consistent units (happy hours per dollar).
Ok, here’s the negative part:
The CEE in the linked Guesstimate looks optimistic to the point of being impossible. Given the quoted numbers of 32 acts of kindness per day with each act producing an average of 0.7 happy hours, that’s 22 happy hours produced per person-day of acts of kindness. If you said people’s acts of kindness increased overall happiness by 10%, I’d say that sounds too high. If you say it produces 22 happy hours, when the average person is only awake for 17 hours...well that’s not even possible.
I am also very skeptical of the reported claim that a one-time intervention of “watching an elevating video, enacting prosocial behaviors, and reflecting on how those behaviors relate to one’s value” (Baumsteiger 2019) can produce an average of 1600 additional acts of kindness per person. That number sounds about 1000x too high to me.
In general, psych studies are infamous for reporting impossibly massive effects and then failing to replicate. The given cost-effectiveness involves a conjunction of several impossibly massive effects, producing a resulting cost-effectiveness that I would guess is about 100,000x too high.
This still doesn’t tell us the cost-effectiveness of the proposed research project, which is what you were trying to estimate. The upside to the research project basically entirely comes from the small probability that the intervention turns out to be way more cost-effective than I think it is, and I think the chance of that is much higher than 1 in 100,000, but I still think the Guesstimate is significantly overestimating the cost-effectiveness of further research in this area.
This isn’t to say no behavioral intervention could be cost-effective; I understand that this was just one idea, and it’s a relatively simple model. I do think it’s important to have a promising preliminary cost-effectiveness estimate before putting a lot of funding into a cause area.
I have investigated the issues you highlighted, diagnosed the underlying errors, and revised the model accordingly. The root of the problem was that I had sourced some of the estimates of the frequency of prosocial behavior from studies on social behavior under special, unrepresentative conditions, such as infants interacting with adults for 10 min while being observed by researchers and prosocial behavior in TV series. I have removed those biased estimates of the frequency of prosocial behavior in the real world. As a consequence, the predicted lifetime increase in the number of kind acts per person reached by the intervention dropped from 1600 to 64. The predicted cost-effectiveness of the research dropped from 110 times the cost-effectiveness of StrongMinds to 7.5 times the cost-effectiveness of StrongMinds.
In producing this revised version, I also made a few additional improvements. The most consequential of those was to base the estimated cost of deploying the intervention on empirical data on the effectiveness of online advertising in $ per install.
I am currently using Squiggle to program a much more rigorous version of this analysis. That version will include additional improvements and rigorously document and justify each of the model’s assumptions.
Thanks for following up! Those sound like good changes.
Another thing you might do (if it’s feasible) is list the studies you’re using on something like Replication Markets.
Thank you for your feedback, Michael, and thank you very much for making me aware of those specialized prediction platforms. I really like your suggestion. I think making predictions about the likely results of replication studies would be helpful for me. It would push me to critically examine and quantify how much confidence I should put in the studies my models rely on. Obtaining the predictions of other people would be a good way to make that assessment more objective. We could then incorporate the aggregate prediction into the model. Moreover, we could use prediction markets to obtain estimates or forecasts for quantities for which no published studies are available yet. I think it might be a good idea to incorporate those steps into our methodology. I will discuss that with our team today.
Thank you for engaging with and critiquing the cost-effectiveness analysis, Michael! There seem to be a few misunderstandings I would like to correct.
The value you calculated is the sum of the additional happiness of all the people to whom the person was kind. This includes everyone they interacted with that day in any way. This includes everyone from the strangers they smiled at, to the friends they messaged, the colleagues they helped at work, the customers they served, their children, their partner, and their parents and other family members. If you consider that the benefit for the kindness might be benefited over more than a dozen people, then 22 hours of happiness, might be no more than 1-2 hours per person. Moreover, the estimates also take into account that a person who benefits from your kindness today might still be slightly more happy tomorrow.
The intervention by Baumsteiger (2019) was a multi-session program that lasted 12 days and involved planning, performing, and documenting one’s prosocial behavior for 10 days in a row. The effect sizes distribution in the Guesstimate model is based on many different studies, some of which were even more intensive.
Most of the estimates are based on meta-analyses of many studies. The results of meta-analyses are substantially more robust and more reliable than the result of a single study.
I think you are right that this first estimate was too optimistic. In particular, the probability distribution of the frequency of prosocial behavior is currently based on four estimates from different studies. One of those studies led to an estimate that appears to be far too high. This might be because they defined prosocial behavior more liberally because it involved interactions with children, or because participants knew that they were being observed. I will think about what the more general problem might be and how it can be addressed systematically.