EDIT (Jul 2022): I’m no longer nearly as confident in this idea, though if someone was excited about it it still might be cool.
Reflecting a little on my shortform from a few years ago, I think I wasn’t ambitious enough in trying to actually move this forward.
I want there to be an org that does “human challenge”-style RCTs across lots of important questions that are extremely hard to get at otherwise, e.g. (top 2 are repeated from previous shortform. edited to clarify: these are some quick examples off the top of my head, should be more consideration into which are the best for this org):
Health effects of veganism
Health effects of restricting sleep
Productivity of remote vs. in-person work
Productivity effects of blocking out focused/deep work
Edited to add: I no longer think “human challenge” is really the best way to refer to this idea (see comment that convinced me); I mean to say something like “large scale RCTs of important things on volunteers who sign up on an app to randomly try or not try an intervention.” I’m open to suggestions on succinct ways to refer to this.
I’d be very excited about such an org existing. I think it could even grow to become an effective megaproject, pending further analysis on how much it could increase wisdom relative to power. But, I don’t think it’s a good personal fit for me to found given my current interests and skills.
However, I think I could plausibly provide some useful advice/help to anyone who is interested in founding a many-domain human-challenge org. If you are interested in founding such an org or know someone who might be and want my advice, let me know. (I will also be linking this shortform to some people who might be able to help set this up.)
--
Some further inspiration I’m drawing on to be excited about this org:
Freakonomics’ RCT on measuring the effects of big life changes like quitting your job or breaking up with your partner. This makes me optimistic about the feasibility of getting lots of people to sign up.
Holden’s note on doing these type of experiments with digital people. He mentions some difficulties with running these types of RCTs today, but I think an org specializing in them could help. (edited to add: in particular, a mobile/web app for matching experiments to volunteers and tracking effects seems like it should be created)
Yeah these are interesting questions Eli. I’ve worked on a few big RCTs and they’re really hard and expensive to do. It’s also really hard to adequately power experiments for small effect sizes in noisy environments (e.g., productivity of remote/in-person work). Your suggestions to massively scale up those interventions and to do things online would make things easier. As Ozzie mentioned, the health ones require such long and slow feedback loops that I think they might not be better than well (statistically) controlled alternatives. I used to think RCTs were the only way to get definitive causal data. The problem is, because of biases that can be almost impossible to eliminate (https://sites.google.com/site/riskofbiastool/welcome/rob-2-0-tool) RCTs are seldom perfect causal data. Conversely, with good adjustment for confounding, observational data can provide very strong causal evidence (think smoking; I recommend my PhD students do this course for this reason https://www.coursera.org/learn/crash-course-in-causality). For the ones with fast feedback loops, I think some combination of “priors + best available evidence + lightweight tests in my own life” works pretty well to see if I should adopt something.
At a meta-level, in an ideal world, the NSF and NIH (and global equivalents) are probably designed to fund people to address questions that are most important and with the highest potential. There are probably dietetics/sleep/organisational psychology experts who have dedicated their careers to questions #1-4 above, and you’d hope that those people are getting funded if those questions are indeed critical to answer. In reality, science funding probably does not get distributed based on criteria that maximises impartial welfare, so maybe that’s why #1-4 would get missed. As mentioned in a recent forum post, I think the mega-org could be better focused nudging scientific incentives to focus on those questions rather than working on those questions ourselves https://forum.effectivealtruism.org/posts/JbddnNZHgySgj8qxj/improving-science-influencing-the-direction-of-research-and
On causal evidence of RCTs vs. observational data: I’m intuitively skeptical of this but the sources you linked seem interesting and worthwhile to think about more before setting an org up for this. (Edited to add:) Hearing your view already substantially updates mine, but I’d be really curious to hear more perspectives from others with lots of experience working on this type of stuff, to see if they’d agree, then I’d update more. If you have impressions of how much consensus there is on this question that would be valuable too.
On nudging scientific incentives to focus on important questions rather than working on them ourselves: this seems pretty reasonable to me. I think building an app to do this still seems plausibly very valuable and I’m not sure how much I trust others to do it, but maybe we combine the ideas and build an app then nudge other scientists to use this app to do important studies.
I should clarify: RCTs are obviously generally >> even a very well controlled propensity score matched quasi-experiment, but I just don’t think the former is ‘bulletproof’ anymore. The former should update your priors more but if you look at the variability among studies in meta-analyses, even among low-risk-of-bias RCTs, I’m now much less easily swayed by any single one.
I think the obvious answer is that doing controlled trials in these areas is a whole lot of work/expense for the benefit.
Some things like health effects can take a long time to play out; maybe 10-50 years. And I wouldn’t expect the difference to be particularly amazing. (I’d be surprised if the average person could increase their productivity by more than ~20% with any of those)
On “challenge trials”; I imagine the big question is how difficult it would be to convince people to accept a very different lifestyle for a long time. I’m not sure if it’s called “challenge trial” in this case.
I think the obvious answer is that doing controlled trials in these areas is a whole lot of work/expense for the benefit.
Some things like health effects can take a long time to play out; maybe 10-50 years. And I wouldn’t expect the difference to be particularly amazing. (I’d be surprised if the average person could increase their productivity by more than ~20% with any of those)
I think our main disagreement is around the likely effect sizes; e.g. I think blocking out focused work could easily have an effect size of >50% (but am pretty uncertain which is why I want the trial!). I agree about long-term effects being a concern, particularly depending on one’s TAI timelines.
On “challenge trials”; I imagine the big question is how difficult it would be to convince people to accept a very different lifestyle for a long time. I’m not sure if it’s called “challenge trial” in this case.
Yeah, I’m most excited about challenges that last more like a few months to a year, though this isn’t ideal in all domains (e.g. veganism), so maybe this wasn’t best as the top example. I have no strong views on terminology.
The health interventions seem very different to me than the productivity interventions.
The health interventions have issues with long time-scales, which productivity interventions don’t have as much.
However, productivity interventions have major challenges with generality. When I’ve looked into studies around productivity interventions, often they’re done in highly constrained environments, or environments very different from mine, and I have very little clue what to really make of them. If the results are highly promising, I’m particularly skeptical, so it would take multiple strong studies to make the case.
I think it’s really telling that Google and Amazon don’t have internal testing teams to study productivity/management techniques in isolation. In practice, I just don’t think you learn that much, for the cost of it.
What these companies do do, is to allow different managers to try things out, survey them, and promote the seemingly best practices throughout. This happens very quickly. I’m sure we could make tools to make this process go much faster. (Better elicitation, better data collection of what already happens, lots of small estimates of impact to see what to focus more on, etc).
In general, I think traditional scientific experimentation on humans is very inefficient, and we should be aiming for much more efficient setups. (But we should be working on these!)
This all makes sense to me overall. I’m still excited about this idea (slightly less so than before) but I think/agree there should be careful considerations on which interventions make the most sense to test.
I think it’s really telling that Google and Amazon don’t have internal testing teams to study productivity/management techniques in isolation. In practice, I just don’t think you learn that much, for the cost of it.
What these companies do do, is to allow different managers to try things out, survey them, and promote the seemingly best practices throughout. This happens very quickly. I’m sure we could make tools to make this process go much faster. (Better elicitation, better data collection of what already happens, lots of small estimates of impact to see what to focus more on, etc).
A few things come to mind here:
The point on the amount of evidence Google/Amazon not doing it provides feels related to the discussion around our corporate prediction market analysis. Note that I was the author who probably took the evidence that most corporations discontinued their prediction markets as the most weak (see my conclusion), though I still think it’s fairly substantial.
I also agree with the point in your reply that setting up prediction markets and learning from them has positive externalities, and a similar thing should apply here.
I agree that more data collection tools for what already happens and other innovations in that vein seem good as well!
A variant I’d also be excited about (could imagine even moreso, could go either way after more reflection) that could be contained within the same org or separate: the same thing but for companies (particularly, startups) edit to clarify: test policies/strategies across companies, not on people within companies
EDIT (Jul 2022): I’m no longer nearly as confident in this idea, though if someone was excited about it it still might be cool.
Reflecting a little on my shortform from a few years ago, I think I wasn’t ambitious enough in trying to actually move this forward.
I want there to be an org that does “human challenge”-style RCTs across lots of important questions that are extremely hard to get at otherwise, e.g. (top 2 are repeated from previous shortform. edited to clarify: these are some quick examples off the top of my head, should be more consideration into which are the best for this org):
Health effects of veganism
Health effects of restricting sleep
Productivity of remote vs. in-person work
Productivity effects of blocking out focused/deep work
Edited to add: I no longer think “human challenge” is really the best way to refer to this idea (see comment that convinced me); I mean to say something like “large scale RCTs of important things on volunteers who sign up on an app to randomly try or not try an intervention.” I’m open to suggestions on succinct ways to refer to this.
I’d be very excited about such an org existing. I think it could even grow to become an effective megaproject, pending further analysis on how much it could increase wisdom relative to power. But, I don’t think it’s a good personal fit for me to found given my current interests and skills.
However, I think I could plausibly provide some useful advice/help to anyone who is interested in founding a many-domain human-challenge org. If you are interested in founding such an org or know someone who might be and want my advice, let me know. (I will also be linking this shortform to some people who might be able to help set this up.)
--
Some further inspiration I’m drawing on to be excited about this org:
Freakonomics’ RCT on measuring the effects of big life changes like quitting your job or breaking up with your partner. This makes me optimistic about the feasibility of getting lots of people to sign up.
Holden’s note on doing these type of experiments with digital people. He mentions some difficulties with running these types of RCTs today, but I think an org specializing in them could help. (edited to add: in particular, a mobile/web app for matching experiments to volunteers and tracking effects seems like it should be created)
Yeah these are interesting questions Eli. I’ve worked on a few big RCTs and they’re really hard and expensive to do. It’s also really hard to adequately power experiments for small effect sizes in noisy environments (e.g., productivity of remote/in-person work). Your suggestions to massively scale up those interventions and to do things online would make things easier. As Ozzie mentioned, the health ones require such long and slow feedback loops that I think they might not be better than well (statistically) controlled alternatives. I used to think RCTs were the only way to get definitive causal data. The problem is, because of biases that can be almost impossible to eliminate (https://sites.google.com/site/riskofbiastool/welcome/rob-2-0-tool) RCTs are seldom perfect causal data. Conversely, with good adjustment for confounding, observational data can provide very strong causal evidence (think smoking; I recommend my PhD students do this course for this reason https://www.coursera.org/learn/crash-course-in-causality). For the ones with fast feedback loops, I think some combination of “priors + best available evidence + lightweight tests in my own life” works pretty well to see if I should adopt something.
At a meta-level, in an ideal world, the NSF and NIH (and global equivalents) are probably designed to fund people to address questions that are most important and with the highest potential. There are probably dietetics/sleep/organisational psychology experts who have dedicated their careers to questions #1-4 above, and you’d hope that those people are getting funded if those questions are indeed critical to answer. In reality, science funding probably does not get distributed based on criteria that maximises impartial welfare, so maybe that’s why #1-4 would get missed. As mentioned in a recent forum post, I think the mega-org could be better focused nudging scientific incentives to focus on those questions rather than working on those questions ourselves https://forum.effectivealtruism.org/posts/JbddnNZHgySgj8qxj/improving-science-influencing-the-direction-of-research-and
Really appreciate hearing your perspective!
On causal evidence of RCTs vs. observational data: I’m intuitively skeptical of this but the sources you linked seem interesting and worthwhile to think about more before setting an org up for this. (Edited to add:) Hearing your view already substantially updates mine, but I’d be really curious to hear more perspectives from others with lots of experience working on this type of stuff, to see if they’d agree, then I’d update more. If you have impressions of how much consensus there is on this question that would be valuable too.
On nudging scientific incentives to focus on important questions rather than working on them ourselves: this seems pretty reasonable to me. I think building an app to do this still seems plausibly very valuable and I’m not sure how much I trust others to do it, but maybe we combine the ideas and build an app then nudge other scientists to use this app to do important studies.
I should clarify: RCTs are obviously generally >> even a very well controlled propensity score matched quasi-experiment, but I just don’t think the former is ‘bulletproof’ anymore. The former should update your priors more but if you look at the variability among studies in meta-analyses, even among low-risk-of-bias RCTs, I’m now much less easily swayed by any single one.
I think the obvious answer is that doing controlled trials in these areas is a whole lot of work/expense for the benefit.
Some things like health effects can take a long time to play out; maybe 10-50 years. And I wouldn’t expect the difference to be particularly amazing. (I’d be surprised if the average person could increase their productivity by more than ~20% with any of those)
On “challenge trials”; I imagine the big question is how difficult it would be to convince people to accept a very different lifestyle for a long time. I’m not sure if it’s called “challenge trial” in this case.
It wouldn’t shock me if an average vegan diet decreased lifetime productivity by more than 20% by malnutrition → mental health link.
I think our main disagreement is around the likely effect sizes; e.g. I think blocking out focused work could easily have an effect size of >50% (but am pretty uncertain which is why I want the trial!). I agree about long-term effects being a concern, particularly depending on one’s TAI timelines.
Yeah, I’m most excited about challenges that last more like a few months to a year, though this isn’t ideal in all domains (e.g. veganism), so maybe this wasn’t best as the top example. I have no strong views on terminology.
The health interventions seem very different to me than the productivity interventions.
The health interventions have issues with long time-scales, which productivity interventions don’t have as much.
However, productivity interventions have major challenges with generality. When I’ve looked into studies around productivity interventions, often they’re done in highly constrained environments, or environments very different from mine, and I have very little clue what to really make of them. If the results are highly promising, I’m particularly skeptical, so it would take multiple strong studies to make the case.
I think it’s really telling that Google and Amazon don’t have internal testing teams to study productivity/management techniques in isolation. In practice, I just don’t think you learn that much, for the cost of it.
What these companies do do, is to allow different managers to try things out, survey them, and promote the seemingly best practices throughout. This happens very quickly. I’m sure we could make tools to make this process go much faster. (Better elicitation, better data collection of what already happens, lots of small estimates of impact to see what to focus more on, etc).
In general, I think traditional scientific experimentation on humans is very inefficient, and we should be aiming for much more efficient setups. (But we should be working on these!)
This post is relevant: https://www.lesswrong.com/posts/vCQpJLNFpDdHyikFy/are-the-social-sciences-challenging-because-of-fundamental
This all makes sense to me overall. I’m still excited about this idea (slightly less so than before) but I think/agree there should be careful considerations on which interventions make the most sense to test.
A few things come to mind here:
The point on the amount of evidence Google/Amazon not doing it provides feels related to the discussion around our corporate prediction market analysis. Note that I was the author who probably took the evidence that most corporations discontinued their prediction markets as the most weak (see my conclusion), though I still think it’s fairly substantial.
I also agree with the point in your reply that setting up prediction markets and learning from them has positive externalities, and a similar thing should apply here.
I agree that more data collection tools for what already happens and other innovations in that vein seem good as well!
A variant I’d also be excited about (could imagine even moreso, could go either way after more reflection) that could be contained within the same org or separate: the same thing but for companies (particularly, startups) edit to clarify: test policies/strategies across companies, not on people within companies
Votes/considerations on why this is a good or bad idea are also appreciated!