Thanks Michael. Going through your options one by one.
Inform decisions about donations that are each in something like the $10-$5000 dollar range. Not an aim I had, but sure, why not.
Inform decisions about donations/grants that are each in something like the >$50,000 dollar range. So rather than inform those directly, inform the kind of research that you can either do or buy with money to inform that donation. $50,000 feels a little bit low for commissioning research to make a decision, though (could a $5k to $10k investment in a better version of this post make a $50k donation more than 10-20% better? Plausibly.
That said, I’d be curious if any largish donations are changed as a result of this post, and why, and in particular why they didn’t defer to the LTF fund.
Inform decisions about which of these orgs (if any) to work for. Not really for myself, but I’d be happy for people to read this post as part of their decisions. Also, 80,000 hours exists.
Provide feedback to these orgs that causes them to improve. Sure, but not a primary aim.
Provide an accountability mechanism for these orgs that causes them to work harder or smarter so that they look better on such evaluations in future. No, not really.
Just see if this sort of evaluation can be done, learn more about how to do that, and share that meta-level info with the EA public. Yep.
[Something else]. Show the kind of thing that an organization like QURI can do! In particular, you can’t do this kind of thing using software other than foretold (Metaculus is great, but the questions are too ambiguous; getting them approved takes time & in the case of a tournament, money, and for this post I only needed my own predictions (not that you can’t run a tournament on foretold.))
[Something else]. Learn more about the longtermist ecosystem myself
[Something else]. So this was sort of on the edges of this project, but for making large amounts of predictions, one does need a pipeline, and improving that pipeline has been on my mind (and on Ozzie Gooen’s). For instance, creating the 27 predictions one by one would be kind of a pain, so instead I use a Google doc script which feeds them to foretold.
I also think that 4. and 5. are too strongly worded. To the extent I’m providing feedback, I imagine it’s more of a) of the sanity check variety or b) about how a relatively sane person perceives these organizations. For instance, if I don’t get pushback about it in the comments, I’ll think that its a good idea for the APPGFG to expand, but I doubt it’s something that they themselves haven’t thought about.
In an ideal world we’d have intense evaluations of all organizations that are specific to all possible uses, done in communications styles relevant to all people.
Unfortunately this is an impossible amount of work, so we have to find some messy shortcuts that get much of the benefit at a decent cost.
I’m not sure how to best focus longtermist organization evaluations to maximize gains for a diversity of types of decisions. Fortunately I think whenever one makes an evaluation for one specific thing (funding decisions), these wind up relevant for other things (career decisions, organization decisions).
My primary interest at this point are evaluations of the following:
How much total impact is an organization having, positive or negative?
How can such impact be improved?
How efficient is the organization (in terms of money and talent)
How valuable is it to other groups or individuals to read / engage with the work of this organization? (Think Yelp or Amazon reviews)
My guess is that such investigations will help answer a wide assortment of different questions.
To echo what Nuño said, some of my interest in this specific task was in attempting a fairly general-purpose attempt. I think that increasingly substantial attempts is a pretty good bet, because a whole lot could either go wrong (this work upsets some group or includes falsities) or new ideas could be figured out (particularly by commenters, such as those on this post).
In the longer term my preference isn’t for QURI/Nuño to be doing the majority of public evaluations of longtermist orgs, but instead for others to do most of this work. Perhaps this could be something of a standard blog post type, and/or there could be 1-2 small organizations dedicated to it. I think it really should be done independently from other large orgs (to be less biased and more isolated), so it probably wouldn’t make sense for this work to be done as part of a much bigger organization.
Also, I’d agree that <$1Mil funding decisions aren’t the main thing I’m interested in. I think that talent and larger allocations are much more exciting.
For example, perhaps it’s realized that one small nonprofit’s work is much more valuable than expected, so future donors wind up spending $200Mil in related work down the line. Or, there are many systematic effects, like new founders are inspired by trends identified in the evaluations and make better new nonprofits because of it.
Thanks Michael. Going through your options one by one.
Inform decisions about donations that are each in something like the $10-$5000 dollar range. Not an aim I had, but sure, why not.
Inform decisions about donations/grants that are each in something like the >$50,000 dollar range. So rather than inform those directly, inform the kind of research that you can either do or buy with money to inform that donation. $50,000 feels a little bit low for commissioning research to make a decision, though (could a $5k to $10k investment in a better version of this post make a $50k donation more than 10-20% better? Plausibly.
That said, I’d be curious if any largish donations are changed as a result of this post, and why, and in particular why they didn’t defer to the LTF fund.
Inform decisions about which of these orgs (if any) to work for. Not really for myself, but I’d be happy for people to read this post as part of their decisions. Also, 80,000 hours exists.
Provide feedback to these orgs that causes them to improve. Sure, but not a primary aim.
Provide an accountability mechanism for these orgs that causes them to work harder or smarter so that they look better on such evaluations in future. No, not really.
Just see if this sort of evaluation can be done, learn more about how to do that, and share that meta-level info with the EA public. Yep.
[Something else]. Show the kind of thing that an organization like QURI can do! In particular, you can’t do this kind of thing using software other than foretold (Metaculus is great, but the questions are too ambiguous; getting them approved takes time & in the case of a tournament, money, and for this post I only needed my own predictions (not that you can’t run a tournament on foretold.))
[Something else]. Learn more about the longtermist ecosystem myself
[Something else]. So this was sort of on the edges of this project, but for making large amounts of predictions, one does need a pipeline, and improving that pipeline has been on my mind (and on Ozzie Gooen’s). For instance, creating the 27 predictions one by one would be kind of a pain, so instead I use a Google doc script which feeds them to foretold.
I also think that 4. and 5. are too strongly worded. To the extent I’m providing feedback, I imagine it’s more of a) of the sanity check variety or b) about how a relatively sane person perceives these organizations. For instance, if I don’t get pushback about it in the comments, I’ll think that its a good idea for the APPGFG to expand, but I doubt it’s something that they themselves haven’t thought about.
+1, to both the questions and the answers.
In an ideal world we’d have intense evaluations of all organizations that are specific to all possible uses, done in communications styles relevant to all people.
Unfortunately this is an impossible amount of work, so we have to find some messy shortcuts that get much of the benefit at a decent cost.
I’m not sure how to best focus longtermist organization evaluations to maximize gains for a diversity of types of decisions. Fortunately I think whenever one makes an evaluation for one specific thing (funding decisions), these wind up relevant for other things (career decisions, organization decisions).
My primary interest at this point are evaluations of the following:
How much total impact is an organization having, positive or negative?
How can such impact be improved?
How efficient is the organization (in terms of money and talent)
How valuable is it to other groups or individuals to read / engage with the work of this organization? (Think Yelp or Amazon reviews)
My guess is that such investigations will help answer a wide assortment of different questions.
To echo what Nuño said, some of my interest in this specific task was in attempting a fairly general-purpose attempt. I think that increasingly substantial attempts is a pretty good bet, because a whole lot could either go wrong (this work upsets some group or includes falsities) or new ideas could be figured out (particularly by commenters, such as those on this post).
In the longer term my preference isn’t for QURI/Nuño to be doing the majority of public evaluations of longtermist orgs, but instead for others to do most of this work. Perhaps this could be something of a standard blog post type, and/or there could be 1-2 small organizations dedicated to it. I think it really should be done independently from other large orgs (to be less biased and more isolated), so it probably wouldn’t make sense for this work to be done as part of a much bigger organization.
Also, I’d agree that <$1Mil funding decisions aren’t the main thing I’m interested in. I think that talent and larger allocations are much more exciting.
For example, perhaps it’s realized that one small nonprofit’s work is much more valuable than expected, so future donors wind up spending $200Mil in related work down the line. Or, there are many systematic effects, like new founders are inspired by trends identified in the evaluations and make better new nonprofits because of it.