Very quickly: I feel like it’s useful to share that I did this survey and found it very hard, and a lot of other people did too. In particular, it did feel pretty rushed for such difficult questions that we didn’t necessarily have a fully informed pre-existing take on. OP does mention this, but I wanted to stress that for people reading this post.
I still think it has a lot of useful information and is directionally very informative. I might get a chance to write up more thoughts here, but I am not sure I will be able to. I mostly wanted to give a quick additional flag :)
I had a similar sense of feeling underprepared and rushed while taking the survey and think my input would have been better with more time and a different setting. At the same time I can see that it could have been hard to get the same group of people to answer without these constraints.
For the monetary value of talent I‘m especially cautious on putting much weight on them as I haven’t seen much discussion on such estimates and coming up with a numbers in minutes is hard.
Rather than accepting the numbers at face value, they may be more useful for illustrating directional thinking at a specific moment in time.
Thanks for the comment Jessica! This makes sense. I have a few thoughts about this:
More time for people to answer, and in particular to reflect, sounds like it could have been useful (though I wasn’t at the event, so I can’t comment on the tradeoffs here).
My impression is that the difficulty of the survey is mostly due to the inherent difficulty of the questions we were asked to elicit judgements about (either/both because the questions were substantively difficult and required a lot of information/reflection- e.g. what is the optimal growth rate for EA- or because they’re very conceptually difficult/counterintuitive- e.g. how much value do you assign to x relative to y controlling for the value of x’s converting into y’s), and less because of the operationalization of the questions themselves (see the confusion about earlier iterations of the questions).
One possible response to this, which was mentioned in feedback, is that it could be valuable to pose these questions to dedicated working groups, who devote extensive amounts of time to deliberating on them. Fwiw, this sounds like a very useful (though very costly) initiative to me. It would also have the downside of limiting input to an even smaller subset of the community: so perhaps ideally one would want to pose these questions to a dedicated group, presenting their findings to the wider MCF audiences, and then ask the MCF audience for their take after hearing the working group’s findings. Of course, this would take much more time from everyone, so it wouldn’t be valuable.
Another possible response is to just try to elicit much simpler judgements. For example, rather than trying to actually get a quantitative estimate of “how many resources do you think think each cause should get?”, we could just ask “Do you think x should get more/less?” I think the devil is in the details here, and it would work better for some questions than others e.g. in some cases, merely knowing whether people think a cause should get more/less would not be action-guiding for decisionmakers, but in other cases it would (we’re entirely dependent on what decisionmakers tell us they want to elicit here, since I see our role as designing questions to elicit the judgements we’re asked for, not deciding what judgements we should try to elicit).
If the survey had framed the same questions in multiple ways for higher reliability or had some kind of consistency checking* I would trust that respondents endorsed their numbers more. Not necessarily saying this is a good trade to make as it would increase the length of the survey.
*e.g., asking separately in different parts of the survey about the impact of:
• Animal welfare $ / Global health $
• Global health $ / AI $
• Animal welfare $ / AI $
…and then checking if the responses are consistent across all sections.
Yeh, I definitely agree that asking multiple questions per object of interest to assess reliability would be good. But also agree that this would lengthen a survey that people already thought was too long (which would likely reduce response quality in itself). So I think this would only be possible if people wanted us to prioritise gathering more data about a smaller number of questions.
Fwiw, for the value of hires questions, we have at least seen these questions posed in multiple different ways over the years (e.g. here) and continually produce very high valuations. My guess is that, if those high valuations are misleading, this is driven more by factors like social desirability than difficulty/conceptual confusion. There are some other questions which have been asked in different ways across years (we made a few changes to the wording this year to improve clarity, but aimed to keep the same where possible), but I’ve not formally assessed how those results differ.
Very quickly: I feel like it’s useful to share that I did this survey and found it very hard, and a lot of other people did too. In particular, it did feel pretty rushed for such difficult questions that we didn’t necessarily have a fully informed pre-existing take on. OP does mention this, but I wanted to stress that for people reading this post.
I still think it has a lot of useful information and is directionally very informative. I might get a chance to write up more thoughts here, but I am not sure I will be able to. I mostly wanted to give a quick additional flag :)
I had a similar sense of feeling underprepared and rushed while taking the survey and think my input would have been better with more time and a different setting. At the same time I can see that it could have been hard to get the same group of people to answer without these constraints.
For the monetary value of talent I‘m especially cautious on putting much weight on them as I haven’t seen much discussion on such estimates and coming up with a numbers in minutes is hard.
Rather than accepting the numbers at face value, they may be more useful for illustrating directional thinking at a specific moment in time.
Thanks for the comment Jessica! This makes sense. I have a few thoughts about this:
More time for people to answer, and in particular to reflect, sounds like it could have been useful (though I wasn’t at the event, so I can’t comment on the tradeoffs here).
My impression is that the difficulty of the survey is mostly due to the inherent difficulty of the questions we were asked to elicit judgements about (either/both because the questions were substantively difficult and required a lot of information/reflection- e.g. what is the optimal growth rate for EA- or because they’re very conceptually difficult/counterintuitive- e.g. how much value do you assign to x relative to y controlling for the value of x’s converting into y’s), and less because of the operationalization of the questions themselves (see the confusion about earlier iterations of the questions).
One possible response to this, which was mentioned in feedback, is that it could be valuable to pose these questions to dedicated working groups, who devote extensive amounts of time to deliberating on them. Fwiw, this sounds like a very useful (though very costly) initiative to me. It would also have the downside of limiting input to an even smaller subset of the community: so perhaps ideally one would want to pose these questions to a dedicated group, presenting their findings to the wider MCF audiences, and then ask the MCF audience for their take after hearing the working group’s findings. Of course, this would take much more time from everyone, so it wouldn’t be valuable.
Another possible response is to just try to elicit much simpler judgements. For example, rather than trying to actually get a quantitative estimate of “how many resources do you think think each cause should get?”, we could just ask “Do you think x should get more/less?” I think the devil is in the details here, and it would work better for some questions than others e.g. in some cases, merely knowing whether people think a cause should get more/less would not be action-guiding for decisionmakers, but in other cases it would (we’re entirely dependent on what decisionmakers tell us they want to elicit here, since I see our role as designing questions to elicit the judgements we’re asked for, not deciding what judgements we should try to elicit).
I would have preferred working groups especially for the questions around monetary value of talent which seemed especially hard to get a sense for.
If the survey had framed the same questions in multiple ways for higher reliability or had some kind of consistency checking* I would trust that respondents endorsed their numbers more. Not necessarily saying this is a good trade to make as it would increase the length of the survey.
*e.g., asking separately in different parts of the survey about the impact of: • Animal welfare $ / Global health $ • Global health $ / AI $ • Animal welfare $ / AI $
…and then checking if the responses are consistent across all sections.
Yeh, I definitely agree that asking multiple questions per object of interest to assess reliability would be good. But also agree that this would lengthen a survey that people already thought was too long (which would likely reduce response quality in itself). So I think this would only be possible if people wanted us to prioritise gathering more data about a smaller number of questions.
Fwiw, for the value of hires questions, we have at least seen these questions posed in multiple different ways over the years (e.g. here) and continually produce very high valuations. My guess is that, if those high valuations are misleading, this is driven more by factors like social desirability than difficulty/conceptual confusion. There are some other questions which have been asked in different ways across years (we made a few changes to the wording this year to improve clarity, but aimed to keep the same where possible), but I’ve not formally assessed how those results differ.
Could you flag which questions you felt most comfortable with? Or least comfortable with? Whichever is easier :)