david_reinstein comments on Make RCTs cheaper: smaller treatment, bigger control groups

david_reinstein 28 Feb 2023 13:18 UTC
2 points
1 ∶ 0
The “problems caused by unbalanced samples” doesn’t seem coherent to me; I’m not sure what they are talking about.

If the underlying variance is different between the treatment and the control group:
- That might justify a larger sample for the group with larger variance
- But I would expect the expected variance to tend to be larger for the treatment group in many/most relevant cases
- Overall, there will still tend to be some efficiency advantage of having more of the less-costly group, generally the control group
- Mario Reutter 4 Mar 2023 11:32 UTC
  3 points
  0 ∶ 0
  Parent
  Unbalanced samples are not a problem per se. You can run into a problem of representation/generalization for the smaller sample but this argument is independent of balancing and only has to do with small sample sizes.
  @david_reinstein made an excellent point about heteroscedasticity / variance. To factor this into your original post: You want to optimize the cost-effectiveness of the precision of your group-level difference score. This is achieved by minimizing the standard errors (SE) of the group-level estimates of each sample, which are just the standard deviations (SD) divided by the square root of the respective observations. So your term would expand to:
  Control-to-treat-ratio = sqrt(treatment_cost/control_cost) * control_SD/treatment_SD.
  The problem, in practice, is that you usually know the costs a priori but not the SDs. If variances are not equal, however, I would agree with @david_reinstein that the treatment group will more likely show greater variance on your outcome variable (if control group has more variance, I would rather reconsider the choice of the outcome variable).
  If you want to read more about the concept of precision and its relation to statistical power (also cf. the paper that @Karthik Tadepalli cited), we just put together a preprint here that is supposed to double as a teaching ressource: https://doi.org/10.31234/osf.io/m8c4k (introduction and discussion will suffice since the middle part focusses on biological/neuroscientific measurements that have vastly different properties than, e.g., questionnaire data).
  Here is the glossary that is mentioned in the paper: https://osf.io/2wjc4
  And here is the associated Twitter post with some digest about the most important insights: https://twitter.com/bioDGPs_DGPA/status/1616014732254756865