Thanks—I should have been a bit more careful with my words when I wrote that “measurement noise likely follows a distribution with fatter tails than a log-normal distribution”. The distribution I’m describing is your subjective uncertainty over the standard error of your experimental results. That is, you’re (perhaps reasonably) modeling your measurement as being the true quality plus some normally distributed noise. But—normal with what standard deviation? There’s an objectively right answer that you’d know if you were omniscient, but you don’t, so instead you have a subjective probability distribution over the standard deviation, and that’s what I was modeling as log-normal.
I chose the log-normal distribution because it’s a natural choice for the distribution of an always-positive quantity. But something more like a power law might’ve been reasonable too. (In general I think it’s not crazy to guess that the standard error of your measurement is proportional to the size of the effect you’re trying to measure—in which case, if your uncertainty over the size of the effect follows a power law, then so would your uncertainty over the standard error.)
(I think that for something as clean as a well-set-up experiment with independent trials of a representative sample of the real world, you can estimate the standard error well, but I think the real world is sufficiently messy that this is rarely the case.)
In general I think it’s not crazy to guess that the standard error of your measurement is proportional to the size of the effect you’re trying to measure
Take a hierarchical model for effects. Each intervention i has a true effect βi, and all the βi are drawn from a common distribution G. Now for each intervention, we run an RCT and estimate ^βi=βi+ϵi where ϵi is experimental noise.
By the CLT, ϵi∼N(0,σ2i/ni) where σ2i is the inherent sampling variance in your environment and ni is the sample size of your RCT. What you’re saying is that σ2i has the same order of magnitude as the variance of G. But even if that’s true, the standard error shrinks linearly as your RCT sample size grows, so they should not be in the same OOM for reasonable values of ni. I would have to do some simulations to confirm that, though.
I also don’t think it’s likely to be true that σ2i has the same OOM as the variance of G. The factors that cause sampling variance—randomness in how people respond to the intervention, randomness in who gets selected for a trial, etc—seem roughly comparable across interventions. But the intervention qualities are not roughly comparable—we know that the best interventions are OOMs better than the average intervention. I don’t think we have any reason to believe that the noisiest interventions are OOMs noisier than the average intervention.
(I think that for something as clean as a well-set-up experiment with independent trials of a representative sample of the real world, you can estimate the standard error well, but I think the real world is sufficiently messy that this is rarely the case.)
I’m not sure what you mean by this, I think any collection of RCTs satisfies the setting I’ve laid out.
The central limit theorem is exactly that √N(^βi−βi)∼N(0,σ2i) which implies what I said. The noise is not on the log scale because of the CLT.
Now, if you transform your coefficient into a log scale then all bets are off. But that is not happening throughout this post. And it’s not really what happens in reality either. I don’t know why anyone would do it.
Thanks—I should have been a bit more careful with my words when I wrote that “measurement noise likely follows a distribution with fatter tails than a log-normal distribution”. The distribution I’m describing is your subjective uncertainty over the standard error of your experimental results. That is, you’re (perhaps reasonably) modeling your measurement as being the true quality plus some normally distributed noise. But—normal with what standard deviation? There’s an objectively right answer that you’d know if you were omniscient, but you don’t, so instead you have a subjective probability distribution over the standard deviation, and that’s what I was modeling as log-normal.
I chose the log-normal distribution because it’s a natural choice for the distribution of an always-positive quantity. But something more like a power law might’ve been reasonable too. (In general I think it’s not crazy to guess that the standard error of your measurement is proportional to the size of the effect you’re trying to measure—in which case, if your uncertainty over the size of the effect follows a power law, then so would your uncertainty over the standard error.)
(I think that for something as clean as a well-set-up experiment with independent trials of a representative sample of the real world, you can estimate the standard error well, but I think the real world is sufficiently messy that this is rarely the case.)
Take a hierarchical model for effects. Each intervention i has a true effect βi, and all the βi are drawn from a common distribution G. Now for each intervention, we run an RCT and estimate ^βi=βi+ϵi where ϵi is experimental noise.
By the CLT, ϵi∼N(0,σ2i/ni) where σ2i is the inherent sampling variance in your environment and ni is the sample size of your RCT. What you’re saying is that σ2i has the same order of magnitude as the variance of G. But even if that’s true, the standard error shrinks linearly as your RCT sample size grows, so they should not be in the same OOM for reasonable values of ni. I would have to do some simulations to confirm that, though.
I also don’t think it’s likely to be true that σ2i has the same OOM as the variance of G. The factors that cause sampling variance—randomness in how people respond to the intervention, randomness in who gets selected for a trial, etc—seem roughly comparable across interventions. But the intervention qualities are not roughly comparable—we know that the best interventions are OOMs better than the average intervention. I don’t think we have any reason to believe that the noisiest interventions are OOMs noisier than the average intervention.
I’m not sure what you mean by this, I think any collection of RCTs satisfies the setting I’ve laid out.
I think you’re assuming your conclusion here:
What if the noise is on the log scale?
The central limit theorem is exactly that √N(^βi−βi)∼N(0,σ2i) which implies what I said. The noise is not on the log scale because of the CLT.
Now, if you transform your coefficient into a log scale then all bets are off. But that is not happening throughout this post. And it’s not really what happens in reality either. I don’t know why anyone would do it.