A/B testing in general is great. For UI-related changes you generally want to run the experiment sticky per user, to reduce confusion and allow the time for users to adapt to changes. This does add statistical complexity, though, because one heavy user in an experimental treatment can have a large impact on aggregate statistics like “total number of comments per category”.
Happy to talk more about this if you’d find it helpful; this is an area I used to work in.
I am generally a fan of A/B testing. We have some nice architecture for doing so, custom-written by @jimrandomh, which I think we under-use. We were quite tempted to A/B test a specific change here, but were not tempted by A/B testing the entire UI refactor. Let me explain:
Getting our UI very-changed while also consistent was a large project. It totaled 4.5k lines of code changed. Making all those changes, while staying consistent with the old UI and the new UI, would have been a more challenging project. We would also have to maintain the correct behavior for LessWrong, who shares our codebase. Maybe it still would have been worth doing? I have an opinion on the answer, but the thing I want to communicate is the tradeoffs.
Have you considered A/B testing changes? As you note, looking at engagement numbers before/after isn’t capable of assigning causality.
A/B testing in general is great. For UI-related changes you generally want to run the experiment sticky per user, to reduce confusion and allow the time for users to adapt to changes. This does add statistical complexity, though, because one heavy user in an experimental treatment can have a large impact on aggregate statistics like “total number of comments per category”.
Happy to talk more about this if you’d find it helpful; this is an area I used to work in.
I am generally a fan of A/B testing. We have some nice architecture for doing so, custom-written by @jimrandomh, which I think we under-use. We were quite tempted to A/B test a specific change here, but were not tempted by A/B testing the entire UI refactor. Let me explain:
Getting our UI very-changed while also consistent was a large project. It totaled 4.5k lines of code changed. Making all those changes, while staying consistent with the old UI and the new UI, would have been a more challenging project. We would also have to maintain the correct behavior for LessWrong, who shares our codebase. Maybe it still would have been worth doing? I have an opinion on the answer, but the thing I want to communicate is the tradeoffs.