The respondents in a treatment were each shown a message and asked how compelling they thought it was. The control was shown no message.
Yeah; the plots are the predicted values for those given a particular treatment. and Average Treatment Effect is the difference with the control.
I did not include every control used in the provided questionnaire. There were a mix of demographics/attitudinal/behavioral questions asked in the survey that I also used. These controls, particularly previous donations, were important for decreasing variance.
I used a multilevel model to estimate the effects among those with and without a bachelor’s degree. So, the bachelor’s estimate borrow’s power from those without a degree, reducing problems with over fitting.
These models used STAN, which handles these multilevel models well. Convergence was assessed with gelman-rubin statistics.
Ah, I guess that’s better than no control, and presumably paying attention to a paragraph of text doesn’t make someone substantially more or less generous. Did you fit a bunch of models with different predictors and test for a sufficient improvement of fit with each? Might do to be wary of overfitting in those regards maybe… though since those aren’t focal Bayes tends to be pretty robust there, imo, so long as you used sensible priors
“I used a multilevel model to estimate the effects among those with and without a bachelor’s degree. So, the bachelor’s estimate borrow’s power from those without a degree, reducing problems with over fitting.”
If I’m understanding correctly, you had a hyperprior on the effect of education level? With just two options? IDK that that would help you much (if you had more: e.g. HS, BA/S, MS, PhD, etc. it might, but I’d try to preserve ordering there, myself).
“These models used STAN, which handles these multilevel models well. Convergence was assessed with gelman-rubin statistics.”
STAN’s great, but certainly not magic or perfect, and though idk them personally I’m sure its authors would strongly advocate paranoia about its output. So you got convergence with multiple (2?) chains from a random (hopefully) starting value? R_hats were all 1? That’s good! Did all the other cheap diagnostics turn up ok (e.g trace plots, autocorrelation times/ESS, marginal histograms, quick within-chain metrics, etc.)?
No; I did not fit multiple models. Lasso regression was used to fit a propensity model using the predictors.
Using bachelor’s vs. non-bachelor’s has advantages in interpretability, so I think this was the right move for my purposes.
I did not spend an exorbitant amount of time investigating diagnostics, for the same reason I used a proprietary package was has been built for running these tests at a production level and has been thoroughly code reviewed. I don’t think it’s worth the time to construct an overly customized analysis.
Ah, gotcha. But re: code review, even the most beautifully constructed chains can fail, and how you specify your model can easily cause things to go kabloom even if the machine’s doing everything exactly how it’s supposed to. And it only takes a few minutes to drag your log files into something like Tracer and do some basic peace-of-mind checks (and others, e.g. examine bivariate posterior distributions to assess nonidentifiably wrt your demographic params). More sophisticated diagnostics are scattered across a few programs but don’t take too long to run either (unless you have e.g. hundreds or thousands of chains, like in marginal likelihood estimation w/ stepping stones… a friend’s actually coming out with a program soon—BONSAI—that automates a lot of that grunt work, which might be worth looking out for!). :]
(on phone at gym with shit wifi so can’t provide links/refs atm, sorry!)
Sure! Though unfortunately most of the stuff comes from scattered lectures, workshops, discussions, book chapters, seminars, papers, etc. But for intro multilevel Bayesian regression in R/STAN I’d say John Kruschke’s “Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan” and Richard McElreath’s “Statistical Rethinking: A Bayesian Course with Examples in R and Stan” would be really solid (Richard also has his course lectures up on youtube if you prefer that, though I found his book super readable, so much so that when I took the class with him a few years back I skipped most of his lectures since the room was really hot. But don’t let that dissuade you from watching them, he’s a great guy/speaker and quite fun and funny!).
Purely in terms of building my own intuitions/understanding, though, I’ve found little more helpful than just looking up the relevant algorithms and implementing the damn things from scratch (to talk of reinventing square wheels above lol… though ofc you’d use the far superior underlying code others have written for your actual analysis).
Yup, binomial.
The respondents in a treatment were each shown a message and asked how compelling they thought it was. The control was shown no message.
Yeah; the plots are the predicted values for those given a particular treatment. and Average Treatment Effect is the difference with the control.
I did not include every control used in the provided questionnaire. There were a mix of demographics/attitudinal/behavioral questions asked in the survey that I also used. These controls, particularly previous donations, were important for decreasing variance.
I used a multilevel model to estimate the effects among those with and without a bachelor’s degree. So, the bachelor’s estimate borrow’s power from those without a degree, reducing problems with over fitting.
These models used STAN, which handles these multilevel models well. Convergence was assessed with gelman-rubin statistics.
Ah, I guess that’s better than no control, and presumably paying attention to a paragraph of text doesn’t make someone substantially more or less generous. Did you fit a bunch of models with different predictors and test for a sufficient improvement of fit with each? Might do to be wary of overfitting in those regards maybe… though since those aren’t focal Bayes tends to be pretty robust there, imo, so long as you used sensible priors
“I used a multilevel model to estimate the effects among those with and without a bachelor’s degree. So, the bachelor’s estimate borrow’s power from those without a degree, reducing problems with over fitting.”
If I’m understanding correctly, you had a hyperprior on the effect of education level? With just two options? IDK that that would help you much (if you had more: e.g. HS, BA/S, MS, PhD, etc. it might, but I’d try to preserve ordering there, myself).
“These models used STAN, which handles these multilevel models well. Convergence was assessed with gelman-rubin statistics.”
STAN’s great, but certainly not magic or perfect, and though idk them personally I’m sure its authors would strongly advocate paranoia about its output. So you got convergence with multiple (2?) chains from a random (hopefully) starting value? R_hats were all 1? That’s good! Did all the other cheap diagnostics turn up ok (e.g trace plots, autocorrelation times/ESS, marginal histograms, quick within-chain metrics, etc.)?
No; I did not fit multiple models. Lasso regression was used to fit a propensity model using the predictors.
Using bachelor’s vs. non-bachelor’s has advantages in interpretability, so I think this was the right move for my purposes.
I did not spend an exorbitant amount of time investigating diagnostics, for the same reason I used a proprietary package was has been built for running these tests at a production level and has been thoroughly code reviewed. I don’t think it’s worth the time to construct an overly customized analysis.
Ah, gotcha. But re: code review, even the most beautifully constructed chains can fail, and how you specify your model can easily cause things to go kabloom even if the machine’s doing everything exactly how it’s supposed to. And it only takes a few minutes to drag your log files into something like Tracer and do some basic peace-of-mind checks (and others, e.g. examine bivariate posterior distributions to assess nonidentifiably wrt your demographic params). More sophisticated diagnostics are scattered across a few programs but don’t take too long to run either (unless you have e.g. hundreds or thousands of chains, like in marginal likelihood estimation w/ stepping stones… a friend’s actually coming out with a program soon—BONSAI—that automates a lot of that grunt work, which might be worth looking out for!). :]
(on phone at gym with shit wifi so can’t provide links/refs atm, sorry!)
Do you have any good textbooks or educational resources to learn these kinds of techniques?
Sure! Though unfortunately most of the stuff comes from scattered lectures, workshops, discussions, book chapters, seminars, papers, etc. But for intro multilevel Bayesian regression in R/STAN I’d say John Kruschke’s “Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan” and Richard McElreath’s “Statistical Rethinking: A Bayesian Course with Examples in R and Stan” would be really solid (Richard also has his course lectures up on youtube if you prefer that, though I found his book super readable, so much so that when I took the class with him a few years back I skipped most of his lectures since the room was really hot. But don’t let that dissuade you from watching them, he’s a great guy/speaker and quite fun and funny!).
Purely in terms of building my own intuitions/understanding, though, I’ve found little more helpful than just looking up the relevant algorithms and implementing the damn things from scratch (to talk of reinventing square wheels above lol… though ofc you’d use the far superior underlying code others have written for your actual analysis).
Sounds interesting. Would love to take a look when you get a chance to provide the links.