So, in our toy model with an R-square of 0.9, an estimate which is 1SD above the mean estimate puts the expected value at 0.9SD above the mean expected value.
I think there’s a confusion here about what the different distributions are. Normally when thinking about regression I think of having a prior over cost-effectiveness of the intervention at hand, and a distribution representing model uncertainty, which tells you the likelihood of having got your model output, given varying true values for the parameter. If those are the distributions you have, and they’re both normal with the same variance, then regression would end up centring on the mid-point (and so be linear with the number of SDs up you are).
I think that the distributions you are looking at, however, are the prior, and a prior over the distribution of the estimate numbers. Given this, the amount of regression is not linear with the number of standard deviations out. I think rather it goes up super-linearly.
Working with the perspective you are bringing in terms of the different distributions could be a useful angle on the problem. It’s not obvious to me it’s better than the normal approach to regression, though, mostly because it seems harder to give an inside view of correlation than of possible model error.
In the multivariate-normal case, the two approaches are exactly equivalent: if you know the marginals (unconditional true effectiveness and unconditional estimate value), and R^2, then you know the entire shape of the distribution (and hence the distribution of the true mean given the estimated mean).
A model in which the estimate is bivariate normal with R^2=0.9 to the ground truth corresponds to an estimate distribution of, if my stats is right, X~N(0, 0.9), E~N(0, 0.1), Y=X+E (where X is the ground truth, E the error, and Y the estimate; the second arguments are variances; this is true up to an affine transformation). As such, it follows from e.g. this theorem cited on Wikipedia that the actual mean scales linearly with the measured mean, although the coefficient of correlation is not quite what Gregory said (it’s R, not R^2).
I think there’s a confusion here about what the different distributions are. Normally when thinking about regression I think of having a prior over cost-effectiveness of the intervention at hand, and a distribution representing model uncertainty, which tells you the likelihood of having got your model output, given varying true values for the parameter. If those are the distributions you have, and they’re both normal with the same variance, then regression would end up centring on the mid-point (and so be linear with the number of SDs up you are).
I think that the distributions you are looking at, however, are the prior, and a prior over the distribution of the estimate numbers. Given this, the amount of regression is not linear with the number of standard deviations out. I think rather it goes up super-linearly.
Working with the perspective you are bringing in terms of the different distributions could be a useful angle on the problem. It’s not obvious to me it’s better than the normal approach to regression, though, mostly because it seems harder to give an inside view of correlation than of possible model error.
In the multivariate-normal case, the two approaches are exactly equivalent: if you know the marginals (unconditional true effectiveness and unconditional estimate value), and R^2, then you know the entire shape of the distribution (and hence the distribution of the true mean given the estimated mean).
A model in which the estimate is bivariate normal with R^2=0.9 to the ground truth corresponds to an estimate distribution of, if my stats is right, X~N(0, 0.9), E~N(0, 0.1), Y=X+E (where X is the ground truth, E the error, and Y the estimate; the second arguments are variances; this is true up to an affine transformation). As such, it follows from e.g. this theorem cited on Wikipedia that the actual mean scales linearly with the measured mean, although the coefficient of correlation is not quite what Gregory said (it’s R, not R^2).
Thanks Ben, you’re exactly right. I’d convinced myself of the contrary with a spurious geometric argument.