Eric Neyman comments on How much do you believe your results?

Eric Neyman 7 May 2023 20:47 UTC
13 points
1 ∶ 1
Thanks—I should have been a bit more careful with my words when I wrote that “measurement noise likely follows a distribution with fatter tails than a log-normal distribution”. The distribution I’m describing is your subjective uncertainty over the standard error of your experimental results. That is, you’re (perhaps reasonably) modeling your measurement as being the true quality plus some normally distributed noise. But—normal with what standard deviation? There’s an objectively right answer that you’d know if you were omniscient, but you don’t, so instead you have a subjective probability distribution over the standard deviation, and that’s what I was modeling as log-normal.
I chose the log-normal distribution because it’s a natural choice for the distribution of an always-positive quantity. But something more like a power law might’ve been reasonable too. (In general I think it’s not crazy to guess that the standard error of your measurement is proportional to the size of the effect you’re trying to measure—in which case, if your uncertainty over the size of the effect follows a power law, then so would your uncertainty over the standard error.)
(I think that for something as clean as a well-set-up experiment with independent trials of a representative sample of the real world, you can estimate the standard error well, but I think the real world is sufficiently messy that this is rarely the case.)
- Karthik Tadepalli 7 May 2023 22:57 UTC
  13 points
  1 ∶ 0
  Parent
  
  In general I think it’s not crazy to guess that the standard error of your measurement is proportional to the size of the effect you’re trying to measure
  
  Take a hierarchical model for effects. Each intervention $i$ has a true effect $β_{i}$ , and all the $β_{i}$ are drawn from a common distribution $G$ . Now for each intervention, we run an RCT and estimate ${^β}_{i} = β_{i} + ϵ_{i}$ where $ϵ_{i}$ is experimental noise.
  
  By the CLT, $ϵ_{i} \sim N (0, σ_{i}^{2} / n_{i})$ where $σ_{i}^{2}$ is the inherent sampling variance in your environment and $n_{i}$ is the sample size of your RCT. What you’re saying is that $σ_{i}^{2}$ has the same order of magnitude as the variance of $G$ . But even if that’s true, the standard error shrinks linearly as your RCT sample size grows, so they should not be in the same OOM for reasonable values of $n_{i}$ . I would have to do some simulations to confirm that, though.
  
  I also don’t think it’s likely to be true that $σ_{i}^{2}$ has the same OOM as the variance of $G$ . The factors that cause sampling variance—randomness in how people respond to the intervention, randomness in who gets selected for a trial, etc—seem roughly comparable across interventions. But the intervention qualities are not roughly comparable—we know that the best interventions are OOMs better than the average intervention. I don’t think we have any reason to believe that the noisiest interventions are OOMs noisier than the average intervention.
  
  (I think that for something as clean as a well-set-up experiment with independent trials of a representative sample of the real world, you can estimate the standard error well, but I think the real world is sufficiently messy that this is rarely the case.)
  
  I’m not sure what you mean by this, I think any collection of RCTs satisfies the setting I’ve laid out.
  - JoshuaBlake 9 May 2023 11:57 UTC
    3 points
    0 ∶ 0
    Parent
    I think you’re assuming your conclusion here:
    
    Now for each intervention, we run an RCT and estimate ${^β}_{i} = β_{i} + ϵ_{i}$ where $ϵ_{i}$ is experimental noise.
    
    What if the noise is on the log scale?
    - Karthik Tadepalli 10 May 2023 5:24 UTC
      4 points
      2 ∶ 0
      Parent
      The central limit theorem is exactly that $\sqrt{N} ({^β}_{i} - β_{i}) \sim N (0, σ_{i}^{2})$ which implies what I said. The noise is not on the log scale because of the CLT.
      
      Now, if you transform your coefficient into a log scale then all bets are off. But that is not happening throughout this post. And it’s not really what happens in reality either. I don’t know why anyone would do it.