Thanks for the comment Jessica! This makes sense. I have a few thoughts about this:
More time for people to answer, and in particular to reflect, sounds like it could have been useful (though I wasn’t at the event, so I can’t comment on the tradeoffs here).
My impression is that the difficulty of the survey is mostly due to the inherent difficulty of the questions we were asked to elicit judgements about (either/both because the questions were substantively difficult and required a lot of information/reflection- e.g. what is the optimal growth rate for EA- or because they’re very conceptually difficult/counterintuitive- e.g. how much value do you assign to x relative to y controlling for the value of x’s converting into y’s), and less because of the operationalization of the questions themselves (see the confusion about earlier iterations of the questions).
One possible response to this, which was mentioned in feedback, is that it could be valuable to pose these questions to dedicated working groups, who devote extensive amounts of time to deliberating on them. Fwiw, this sounds like a very useful (though very costly) initiative to me. It would also have the downside of limiting input to an even smaller subset of the community: so perhaps ideally one would want to pose these questions to a dedicated group, presenting their findings to the wider MCF audiences, and then ask the MCF audience for their take after hearing the working group’s findings. Of course, this would take much more time from everyone, so it wouldn’t be valuable.
Another possible response is to just try to elicit much simpler judgements. For example, rather than trying to actually get a quantitative estimate of “how many resources do you think think each cause should get?”, we could just ask “Do you think x should get more/less?” I think the devil is in the details here, and it would work better for some questions than others e.g. in some cases, merely knowing whether people think a cause should get more/less would not be action-guiding for decisionmakers, but in other cases it would (we’re entirely dependent on what decisionmakers tell us they want to elicit here, since I see our role as designing questions to elicit the judgements we’re asked for, not deciding what judgements we should try to elicit).
If the survey had framed the same questions in multiple ways for higher reliability or had some kind of consistency checking* I would trust that respondents endorsed their numbers more. Not necessarily saying this is a good trade to make as it would increase the length of the survey.
*e.g., asking separately in different parts of the survey about the impact of:
• Animal welfare $ / Global health $
• Global health $ / AI $
• Animal welfare $ / AI $
…and then checking if the responses are consistent across all sections.
Yeh, I definitely agree that asking multiple questions per object of interest to assess reliability would be good. But also agree that this would lengthen a survey that people already thought was too long (which would likely reduce response quality in itself). So I think this would only be possible if people wanted us to prioritise gathering more data about a smaller number of questions.
Fwiw, for the value of hires questions, we have at least seen these questions posed in multiple different ways over the years (e.g. here) and continually produce very high valuations. My guess is that, if those high valuations are misleading, this is driven more by factors like social desirability than difficulty/conceptual confusion. There are some other questions which have been asked in different ways across years (we made a few changes to the wording this year to improve clarity, but aimed to keep the same where possible), but I’ve not formally assessed how those results differ.
Thanks for the comment Jessica! This makes sense. I have a few thoughts about this:
More time for people to answer, and in particular to reflect, sounds like it could have been useful (though I wasn’t at the event, so I can’t comment on the tradeoffs here).
My impression is that the difficulty of the survey is mostly due to the inherent difficulty of the questions we were asked to elicit judgements about (either/both because the questions were substantively difficult and required a lot of information/reflection- e.g. what is the optimal growth rate for EA- or because they’re very conceptually difficult/counterintuitive- e.g. how much value do you assign to x relative to y controlling for the value of x’s converting into y’s), and less because of the operationalization of the questions themselves (see the confusion about earlier iterations of the questions).
One possible response to this, which was mentioned in feedback, is that it could be valuable to pose these questions to dedicated working groups, who devote extensive amounts of time to deliberating on them. Fwiw, this sounds like a very useful (though very costly) initiative to me. It would also have the downside of limiting input to an even smaller subset of the community: so perhaps ideally one would want to pose these questions to a dedicated group, presenting their findings to the wider MCF audiences, and then ask the MCF audience for their take after hearing the working group’s findings. Of course, this would take much more time from everyone, so it wouldn’t be valuable.
Another possible response is to just try to elicit much simpler judgements. For example, rather than trying to actually get a quantitative estimate of “how many resources do you think think each cause should get?”, we could just ask “Do you think x should get more/less?” I think the devil is in the details here, and it would work better for some questions than others e.g. in some cases, merely knowing whether people think a cause should get more/less would not be action-guiding for decisionmakers, but in other cases it would (we’re entirely dependent on what decisionmakers tell us they want to elicit here, since I see our role as designing questions to elicit the judgements we’re asked for, not deciding what judgements we should try to elicit).
I would have preferred working groups especially for the questions around monetary value of talent which seemed especially hard to get a sense for.
If the survey had framed the same questions in multiple ways for higher reliability or had some kind of consistency checking* I would trust that respondents endorsed their numbers more. Not necessarily saying this is a good trade to make as it would increase the length of the survey.
*e.g., asking separately in different parts of the survey about the impact of: • Animal welfare $ / Global health $ • Global health $ / AI $ • Animal welfare $ / AI $
…and then checking if the responses are consistent across all sections.
Yeh, I definitely agree that asking multiple questions per object of interest to assess reliability would be good. But also agree that this would lengthen a survey that people already thought was too long (which would likely reduce response quality in itself). So I think this would only be possible if people wanted us to prioritise gathering more data about a smaller number of questions.
Fwiw, for the value of hires questions, we have at least seen these questions posed in multiple different ways over the years (e.g. here) and continually produce very high valuations. My guess is that, if those high valuations are misleading, this is driven more by factors like social desirability than difficulty/conceptual confusion. There are some other questions which have been asked in different ways across years (we made a few changes to the wording this year to improve clarity, but aimed to keep the same where possible), but I’ve not formally assessed how those results differ.