Overall, my independent impression is that more/āmost/āall EA research orgs should run such surveys, and that it might be worth individual EA researchers experimenting with doing so as well. I also expect Iāll run another survey of this kind next year. This is all partly due to the potential upsides, but also partly about the seemingly low costs, as I think these surveys should usually take little time to run and reflect on. More details and caveats follow.
Two of our founding values at Rethink Priorities are transparency and impact assessment. Here we present the results of our stated intention to annually run a formal survey to discover if one of our target audiences, decision-makers and donors in the areas which we investigate, has read our work and if it has influenced their decision-making. Due to the small sample of 47 respondents, the disproportionate importance of some of these respondents, and the ability to highlight comments from only those who opted to share their responses publicly, the precise results should not be taken too seriously. It is also very important to note that this is just one of the many ways we are assessing our impact. Nevertheless, we will present the overall results and the results by cause area.
They help us understand whether weāre doing any good, and if so what part of our work is actually helping you.
That means we can focus on doing more of the things that are having an impact, and deprioritise those that arenāt valuable, or are even doing harm.
My views before running my survey
My views before running my survey were similar to my current view, though even more tentative and even less fleshed out. My basic reasoning was as follows:
First, I think that getting clear feedback on how well one is doing, and how much one is progressing, tends to be somewhat hard in general, but especially when it comes to:
Research
Actually improving the world compared to the counterfactual
Rather than, e.g., getting studentsā test scores up, meeting an organisationās KPIs, or publishing a certain number of papers
(I also think this applies especially to relatively big-picture/āabstract research, rather than applied research, and to longtermism. This was relevant to my case, but isnāt central to the following points.)
Second, I think some of the best metrics by which to judge research are whether people:
think theyāve actually changed their beliefs, decisions, or plans based on that research
etc.
I think this data is most useful if these people have relevant expertise, are in positions to make especially relevant and important decisions, etc. But anyone can at least provide input on things like how well-written or well-reasoned some work seems to have been. And whoever the respondents are, whether the research influenced them probably provides at least weak evidence regarding whether the research influenced some other set of people (or whether it could, if that set of people were to read it).
Third, impact surveys are one way to gather data on these metrics. Such surveys arenāt the only way to do that, and these metrics arenāt the only ones that matter. But I expect it to tend to be useful to gather more data than people would by default, and to gather data from a more diverse set of sources (each with their own, different limitations).
Fourth, a lot of the data Iād gotten was from people actively reaching out to me, unprompted and non-anonymously. I expect this data to be biased towards positive feedback, because:
people who like my work are more likely to reach out to me
a lack of anonymity may bias people towards being friendly /ā avoiding being ārudeā /ā avoiding hurting my feelings.
Surveys face similar sampling and response biases, but perhaps to a smaller extent, because:
people are at least prompted to participate (though they still choose at that point to opt in or out)
respondents are anonymous.
With my survey in particular, I wanted to get additional inputs into my thinking about:
whether EA-aligned research and/āor writing is my comparative advantage (as Iām also actively considering a range of alternative pathways)
which topics, methodologies, etc. within research and/āor writing are my comparative advantage
specific things I could improve about my research and/āor writing (e.g., topic choice, how rigorous vs rapid-fire my approach should be, how concise I should be)
How Iāve updated my views based on further thought
Potential relevance of oneās theory of change
Iād guess that key components of Rethink Priorities and 80,000 Hoursā theories of change involve relatively directly influencing key decisions that are (a) made by people outside their organisations, and (b) not just about further research. This could include things like career and donation decisions.
There may be many research orgs for which that is not the case. For example, some orgsā may view their research as being almost entirely intended to lay the groundwork for further research done by themselves or by others. (I expect Rethink and 80k also have this as one goal for their research, but that for them this goal isnāt as dominant.) Orgs in this category may include MIRI and most academic institutes (including GPI and FHI).
If this is true, it might mean that orgs like Rethink and 80k have an unusually large and hard-to-pin-down key audience for their work. Perhaps many other orgs can be satisfied simply with things like:
seeing how many citations their papers get, and how those papers build on their papers
getting a sense of their reputation among the other few dozen relevant researchers at a conference for their field
Potential relevance of how diverse oneās topic choices are
Similarly, if an org/āindividual writes only on a handful of relatively narrow areas, it may be easier for them to identify a small set of particularly relevant people and get their feedback without running a survey. In contrast, if an org/āindividualās writings span many areas, it may be more valuable to publicly post a survey in fora where their writings are read.
Potential relevance of oneās speed of output
Perhaps if an org/āindividual produces something like 1-5 outputs per year, it makes sense to just solicit input on individual pieces. In contrast, a larger amount of output per unit of time might increase the value of a survey gathering views on all of this output, including on which pieces were most widely read, seemed most useful, etc.
Potential reputational relevance of āEA vs notā and āacademic vs notā
Perhaps outside of EA, and perhaps in academia, running this sort of survey would seem weird and somehow tarnish oneās reputation? (I have no real reason for believing this; it just seems plausible.)
However...
I think of those points except the last one would merely somewhat reduce how useful surveys are, rather than making them useless. It still seems to me that surveys would often provide relevant and useful data, and data with different limitations to data from other sources (which is useful because it means findings that come up consistently are more likely to accurately reflect reality).
And I think surveys could be made, promoted, analysed, and reflected on in:
in 2 hours if one really wants to go fast
In fewer than 10 hours in most cases
Exceptions would be cases where one constructs a particularly large survey, gets text responses from a particularly large sample, or wants to reflect particularly rigorously/āextensively on the results. E.g., I expect the process for 80,000 Hoursā impact survey this year will end up having taken substantially more than 10 hours.
So it seems to me that the expected value of a research org running such a survey will tend to offset the costs, given that it:
will probably usually provide at least slightly useful info
will probably have a nontrivial chance of providing very useful info
will probably not take much time
Iām unsure if this is true for individual researchers, as theyāll tend to have less output and write on a smaller set of topics. But I do think it was worthwhile for me to run my survey. (Though note that Iāve written unusually many outputs and written on many different areas. This is in turn partly because Iāve been writing posts rather than papers.)
How Iāve updated my views based on the survey
I spent ~1 hour 10 minutes creating my survey; writing a post and comment to promote it and explain my rationale behind it; publishing that to the EA Forum and LessWrong; and later also publishing shortform comments promoting the survey. Replicating these steps next year would likely take closer to 20 minutes.
I spent ~2 hours analysing and reflecting on the results. I expect I could do this about twice as fast next year, though I may also get more responses, which would cause the reflection process to take longer.
I spent around 5 hours writing up my reflections publicly, as well as this post and comment. I think I was inefficient in how I did this. But in any case, other orgs/āresearchers could skip this step, or do a much smaller version of it. (The main reasons I did this step the way I did were that Iām interested in the question of whether and how others should run similar surveys, and because I think seeing my data, reflections, and thoughts might be useful for others.)
I think I benefitted noticeably, but not incredibly much, from the survey data. For details, see my reflections.
[Not meant to express an overall view.] I donāt think you mention the time of the respondents as a cost of these surveys, but I think it can be one of the main costs. Thereās also risk of survey fatigue if EA researchers all double down on surveys.
Strong upvote for two good points that, in retrospect, I feel shouldāve been obvious to me!
In light of those points as well as what I mentioned above, my new, quickly adjusted, bottom-line view, would be that:
People considering running these surveys should take into account that cost and that risk which you mention.
I probably still think most EA research organisations should run such a survey at least once.
In many cases, it may make the most sense to just send it to some particular group of people, or post it in some place more targeted to their target audience than the EA forum as a whole. This would reduce the risk of survey fatigue somewhat, in that not all these surveys are being publicised to basically all EAs.
In many cases, it may make sense for the survey to be even shorter than my one.
In many cases, it may make sense to run the survey only once, rather than something like annually.
Probably no/āvery few individual researchers who are working at organisations who are themselves running surveys should run their own, relatively publicly advertised individual surveys (even if itās at a different time to the orgās survey).
This is because those individuals survey would probably provide relatively little marginal value, while still having roughly the same time costs and survey fatigue risk.
But maybe this doesnāt hold if the org only does a survey once, and the researcher is considering running a survey more than a year later.
And maybe it doesnāt hold for surveys sent out in a more targeted manner.
Even among individual researchers who work independently, or whose org isnāt running surveys, probably relatively few should run their own, relatively publicly advertised individual surveys.
The exceptions may tend to be those who wrote a large number of outputs, on a wide range of topics, for relatively broad audiences. (For the reasons alluded to in my parent comment.)
I could definitely imagine shifting my views on this again, though.
This all seems reasonable to me though I havenāt thought much about my overall take.
I think the details matter a lot for āEven among individual researchers who work independently, or whose org isnāt running surveys, probably relatively few should run their own, relatively publicly advertised individual surveysā
A lot of people might get a lot of the value from a fairly small number of responses, which would minimise costs and negative externalities. I even think itās often possible to close a survey after a certain number of responses.
A counterargument is that the people who respond earliest might be unrepresentative. But for a lot of purposes, itās not obvious to me you need a representative sample. āAmong the people who are making the most use of my research, how is it usefulā can be pretty informative on its own.
A lot of people might get a lot of the value from a fairly small number of responses, which would minimise costs and negative externalities.
Agreed.
This sort of thing is part of why I wrote ārelatively publicly advertisedā, and added āAnd maybe it doesnāt hold for surveys sent out in a more targeted manner.ā But good point that someone could run a relatively publicly advertised survey and then just close it after a small-ish number of responses; I hadnāt considered that option.
On Q2
Overall, my independent impression is that more/āmost/āall EA research orgs should run such surveys, and that it might be worth individual EA researchers experimenting with doing so as well. I also expect Iāll run another survey of this kind next year. This is all partly due to the potential upsides, but also partly about the seemingly low costs, as I think these surveys should usually take little time to run and reflect on. More details and caveats follow.
Indications of other peopleās views
Rethinkās post about their survey says:
Max Daniel commented on that post:
(Though note that impact could also be evaluated without a survey of this kind.)
On the first page of their currently running annual impact survey, 80,000 Hours writes:
My views before running my survey
My views before running my survey were similar to my current view, though even more tentative and even less fleshed out. My basic reasoning was as follows:
First, I think that getting clear feedback on how well one is doing, and how much one is progressing, tends to be somewhat hard in general, but especially when it comes to:
Research
Actually improving the world compared to the counterfactual
Rather than, e.g., getting studentsā test scores up, meeting an organisationās KPIs, or publishing a certain number of papers
(I also think this applies especially to relatively big-picture/āabstract research, rather than applied research, and to longtermism. This was relevant to my case, but isnāt central to the following points.)
Second, I think some of the best metrics by which to judge research are whether people:
are bothering to pay attention to it
think itās interesting
think itās high-quality/ārigorous/āwell-reasoned
think it addresses important topics
think it provides important insights
think theyāve actually changed their beliefs, decisions, or plans based on that research
etc.
I think this data is most useful if these people have relevant expertise, are in positions to make especially relevant and important decisions, etc. But anyone can at least provide input on things like how well-written or well-reasoned some work seems to have been. And whoever the respondents are, whether the research influenced them probably provides at least weak evidence regarding whether the research influenced some other set of people (or whether it could, if that set of people were to read it).
Third, impact surveys are one way to gather data on these metrics. Such surveys arenāt the only way to do that, and these metrics arenāt the only ones that matter. But I expect it to tend to be useful to gather more data than people would by default, and to gather data from a more diverse set of sources (each with their own, different limitations).
Fourth, a lot of the data Iād gotten was from people actively reaching out to me, unprompted and non-anonymously. I expect this data to be biased towards positive feedback, because:
people who like my work are more likely to reach out to me
a lack of anonymity may bias people towards being friendly /ā avoiding being ārudeā /ā avoiding hurting my feelings.
Surveys face similar sampling and response biases, but perhaps to a smaller extent, because:
people are at least prompted to participate (though they still choose at that point to opt in or out)
respondents are anonymous.
With my survey in particular, I wanted to get additional inputs into my thinking about:
whether EA-aligned research and/āor writing is my comparative advantage (as Iām also actively considering a range of alternative pathways)
which topics, methodologies, etc. within research and/āor writing are my comparative advantage
specific things I could improve about my research and/āor writing (e.g., topic choice, how rigorous vs rapid-fire my approach should be, how concise I should be)
How Iāve updated my views based on further thought
Potential relevance of oneās theory of change
Iād guess that key components of Rethink Priorities and 80,000 Hoursā theories of change involve relatively directly influencing key decisions that are (a) made by people outside their organisations, and (b) not just about further research. This could include things like career and donation decisions.
There may be many research orgs for which that is not the case. For example, some orgsā may view their research as being almost entirely intended to lay the groundwork for further research done by themselves or by others. (I expect Rethink and 80k also have this as one goal for their research, but that for them this goal isnāt as dominant.) Orgs in this category may include MIRI and most academic institutes (including GPI and FHI).
If this is true, it might mean that orgs like Rethink and 80k have an unusually large and hard-to-pin-down key audience for their work. Perhaps many other orgs can be satisfied simply with things like:
seeing how many citations their papers get, and how those papers build on their papers
getting a sense of their reputation among the other few dozen relevant researchers at a conference for their field
Potential relevance of how diverse oneās topic choices are
Similarly, if an org/āindividual writes only on a handful of relatively narrow areas, it may be easier for them to identify a small set of particularly relevant people and get their feedback without running a survey. In contrast, if an org/āindividualās writings span many areas, it may be more valuable to publicly post a survey in fora where their writings are read.
Potential relevance of oneās speed of output
Perhaps if an org/āindividual produces something like 1-5 outputs per year, it makes sense to just solicit input on individual pieces. In contrast, a larger amount of output per unit of time might increase the value of a survey gathering views on all of this output, including on which pieces were most widely read, seemed most useful, etc.
Potential reputational relevance of āEA vs notā and āacademic vs notā
Perhaps outside of EA, and perhaps in academia, running this sort of survey would seem weird and somehow tarnish oneās reputation? (I have no real reason for believing this; it just seems plausible.)
However...
I think of those points except the last one would merely somewhat reduce how useful surveys are, rather than making them useless. It still seems to me that surveys would often provide relevant and useful data, and data with different limitations to data from other sources (which is useful because it means findings that come up consistently are more likely to accurately reflect reality).
And I think surveys could be made, promoted, analysed, and reflected on in:
in 2 hours if one really wants to go fast
In fewer than 10 hours in most cases
Exceptions would be cases where one constructs a particularly large survey, gets text responses from a particularly large sample, or wants to reflect particularly rigorously/āextensively on the results. E.g., I expect the process for 80,000 Hoursā impact survey this year will end up having taken substantially more than 10 hours.
So it seems to me that the expected value of a research org running such a survey will tend to offset the costs, given that it:
will probably usually provide at least slightly useful info
will probably have a nontrivial chance of providing very useful info
will probably not take much time
Iām unsure if this is true for individual researchers, as theyāll tend to have less output and write on a smaller set of topics. But I do think it was worthwhile for me to run my survey. (Though note that Iāve written unusually many outputs and written on many different areas. This is in turn partly because Iāve been writing posts rather than papers.)
How Iāve updated my views based on the survey
I spent ~1 hour 10 minutes creating my survey; writing a post and comment to promote it and explain my rationale behind it; publishing that to the EA Forum and LessWrong; and later also publishing shortform comments promoting the survey. Replicating these steps next year would likely take closer to 20 minutes.
I spent ~2 hours analysing and reflecting on the results. I expect I could do this about twice as fast next year, though I may also get more responses, which would cause the reflection process to take longer.
I spent around 5 hours writing up my reflections publicly, as well as this post and comment. I think I was inefficient in how I did this. But in any case, other orgs/āresearchers could skip this step, or do a much smaller version of it. (The main reasons I did this step the way I did were that Iām interested in the question of whether and how others should run similar surveys, and because I think seeing my data, reflections, and thoughts might be useful for others.)
I think I benefitted noticeably, but not incredibly much, from the survey data. For details, see my reflections.
[Not meant to express an overall view.] I donāt think you mention the time of the respondents as a cost of these surveys, but I think it can be one of the main costs. Thereās also risk of survey fatigue if EA researchers all double down on surveys.
Strong upvote for two good points that, in retrospect, I feel shouldāve been obvious to me!
In light of those points as well as what I mentioned above, my new, quickly adjusted, bottom-line view, would be that:
People considering running these surveys should take into account that cost and that risk which you mention.
I probably still think most EA research organisations should run such a survey at least once.
In many cases, it may make the most sense to just send it to some particular group of people, or post it in some place more targeted to their target audience than the EA forum as a whole. This would reduce the risk of survey fatigue somewhat, in that not all these surveys are being publicised to basically all EAs.
In many cases, it may make sense for the survey to be even shorter than my one.
In many cases, it may make sense to run the survey only once, rather than something like annually.
Probably no/āvery few individual researchers who are working at organisations who are themselves running surveys should run their own, relatively publicly advertised individual surveys (even if itās at a different time to the orgās survey).
This is because those individuals survey would probably provide relatively little marginal value, while still having roughly the same time costs and survey fatigue risk.
But maybe this doesnāt hold if the org only does a survey once, and the researcher is considering running a survey more than a year later.
And maybe it doesnāt hold for surveys sent out in a more targeted manner.
Even among individual researchers who work independently, or whose org isnāt running surveys, probably relatively few should run their own, relatively publicly advertised individual surveys.
The exceptions may tend to be those who wrote a large number of outputs, on a wide range of topics, for relatively broad audiences. (For the reasons alluded to in my parent comment.)
I could definitely imagine shifting my views on this again, though.
This all seems reasonable to me though I havenāt thought much about my overall take.
I think the details matter a lot for āEven among individual researchers who work independently, or whose org isnāt running surveys, probably relatively few should run their own, relatively publicly advertised individual surveysā
A lot of people might get a lot of the value from a fairly small number of responses, which would minimise costs and negative externalities. I even think itās often possible to close a survey after a certain number of responses.
A counterargument is that the people who respond earliest might be unrepresentative. But for a lot of purposes, itās not obvious to me you need a representative sample. āAmong the people who are making the most use of my research, how is it usefulā can be pretty informative on its own.
Agreed.
This sort of thing is part of why I wrote ārelatively publicly advertisedā, and added āAnd maybe it doesnāt hold for surveys sent out in a more targeted manner.ā But good point that someone could run a relatively publicly advertised survey and then just close it after a small-ish number of responses; I hadnāt considered that option.