In general I think it’s not crazy to guess that the standard error of your measurement is proportional to the size of the effect you’re trying to measure
Take a hierarchical model for effects. Each intervention i has a true effect βi, and all the βi are drawn from a common distribution G. Now for each intervention, we run an RCT and estimate ^βi=βi+ϵi where ϵi is experimental noise.
By the CLT, ϵi∼N(0,σ2i/ni) where σ2i is the inherent sampling variance in your environment and ni is the sample size of your RCT. What you’re saying is that σ2i has the same order of magnitude as the variance of G. But even if that’s true, the standard error shrinks linearly as your RCT sample size grows, so they should not be in the same OOM for reasonable values of ni. I would have to do some simulations to confirm that, though.
I also don’t think it’s likely to be true that σ2i has the same OOM as the variance of G. The factors that cause sampling variance—randomness in how people respond to the intervention, randomness in who gets selected for a trial, etc—seem roughly comparable across interventions. But the intervention qualities are not roughly comparable—we know that the best interventions are OOMs better than the average intervention. I don’t think we have any reason to believe that the noisiest interventions are OOMs noisier than the average intervention.
(I think that for something as clean as a well-set-up experiment with independent trials of a representative sample of the real world, you can estimate the standard error well, but I think the real world is sufficiently messy that this is rarely the case.)
I’m not sure what you mean by this, I think any collection of RCTs satisfies the setting I’ve laid out.
The central limit theorem is exactly that √N(^βi−βi)∼N(0,σ2i) which implies what I said. The noise is not on the log scale because of the CLT.
Now, if you transform your coefficient into a log scale then all bets are off. But that is not happening throughout this post. And it’s not really what happens in reality either. I don’t know why anyone would do it.
Take a hierarchical model for effects. Each intervention i has a true effect βi, and all the βi are drawn from a common distribution G. Now for each intervention, we run an RCT and estimate ^βi=βi+ϵi where ϵi is experimental noise.
By the CLT, ϵi∼N(0,σ2i/ni) where σ2i is the inherent sampling variance in your environment and ni is the sample size of your RCT. What you’re saying is that σ2i has the same order of magnitude as the variance of G. But even if that’s true, the standard error shrinks linearly as your RCT sample size grows, so they should not be in the same OOM for reasonable values of ni. I would have to do some simulations to confirm that, though.
I also don’t think it’s likely to be true that σ2i has the same OOM as the variance of G. The factors that cause sampling variance—randomness in how people respond to the intervention, randomness in who gets selected for a trial, etc—seem roughly comparable across interventions. But the intervention qualities are not roughly comparable—we know that the best interventions are OOMs better than the average intervention. I don’t think we have any reason to believe that the noisiest interventions are OOMs noisier than the average intervention.
I’m not sure what you mean by this, I think any collection of RCTs satisfies the setting I’ve laid out.
I think you’re assuming your conclusion here:
What if the noise is on the log scale?
The central limit theorem is exactly that √N(^βi−βi)∼N(0,σ2i) which implies what I said. The noise is not on the log scale because of the CLT.
Now, if you transform your coefficient into a log scale then all bets are off. But that is not happening throughout this post. And it’s not really what happens in reality either. I don’t know why anyone would do it.