Our solutions to at least remove incentives like that (but not to additionally penalize it) are in the Solutions section of the article that Ofer linked
Will those solutions work?
Do you have control over who can become a retro funder after the market is launched? To what extent will the retro funders understand or care about the complicated definition of Attributed Impact? And will they be aware of, and know how to account for, complicated considerations like: “if the certificate’s description is not very specific and leaves the door open to risky, net-negative, high-impact activities then we should take this fact into account when evaluating the ex ante impact”?
Could we end up with retro funders who use a much simpler criterion, e.g. just whether they “like” the impact? (Which is how altruistic retro funders are described in the OP.)
Have you resolved the unilateralist’s curse? To what extent have you consulted with the EA community about creating an impact market?
I can’t look into the future, so the most viable approach seems to me to be the one outlined in Toward Impact Markets. My current take:
We first spent a couple months (or about nine months) thinking about impact markets purely theoretically to assess whether they’re desirable at all. I’ve considered this done as of late March or so.
Then we start working on small-scale experiments, such as buying impact in EA Forum comments and soon hopefully buying impact in EA Forum posts in the context of a prize contest. The EA Forum has its own moderation team, so if someone posts something in the context of the contest that is harmful or likely to be harmful, we have two layers of protection against it: First the moderators can intervene, and then we and the retro funder (who seems really smart) can still catch and not buy any posts that still got published even though we consider them potentially net harmful.
We’ll learn from these experiments. If it turns out that the system incentivizes harmful actions, we can try to mitigate them or discontinue the whole project.
If we opt to mitigate them, we’ll run further iterations of the contest (and maybe other contests that seem similarly safe and contained to us) to test our solutions.
Finally we’ll be in a better position to answer your question about whether the solutions work than we are now when we haven’t tested them at all.
That seems safer to me than even abandoning the project altogether because other groups are also working on retro funding and impact certificates, and I haven’t heard them talk much about safety in that context. Not to mention the opportunity costs. (If we come to the conclusion that impact markets are bad, however, we can also pivot to pure impact markets safety research, as it were.)
Are you really worried about people posting harmful Forum posts due to the market’s influence or only about OpenAI-level projects years down the line? But we’ll definitely ramp this all up very gradually to catch catastrophes when they’re still on the level of someone writing a post about panspermia without considering the s-risks it might cause. (Hurts me to think of that but the actual impact on the world is probably very limited.)
If we want to make progress on impact markets, I think the time has come for carefully controlled, small-scale actual experiments.
In AI safety a strong case can be made for more up-front theoretical thought because catastrophe is overdetermined – can occur through many disjunctive channels – and is in many cases final. But with impact markets we operate well within the space where we can assume that all the other actors are human, with human intuitions and restraints and human law above them. I think this relatively increases the opportunity costs of purely theoretical thought compared to the safety benefits. I don’t know to what extent, though. Exactly how careful we should be, i.e. exactly where we should strike the balance, is something we constantly think about.
Someone who surveyed many of the existing impact markets projects even noted that we have an “extreme focus on risks.”
Also note that a lot of the bad incentives we worry about already exist. The question is not so much whether impact markets might, in certain edge cases, also bear these risks but whether they’re going to exacerbate them or create them in fields where they didn’t previously exist. E.g., lots of bad stuff can be monetized, or at least it’s plausible enough that it can be monitized to attract investors. When that’s possible, it’s about as bad as IMs can ever get. (Very bad, admittedly.) And charities have monetary incentives to lie about negative study results on their intervention. That’s also not a reason to be generally skeptical of the nonprofit format because there are also those who actually shut down or pivot when they find out that their invention doesn’t work.
In fact, I did just that (first pivot, later shut down) when, around 2010–2015, my own charity turned out to be nonoptimal. (That’s not even a euphemism; we were probably on par with Village Reach in cost-effectiveness but not scale. Village Reach was temporarily among the GW top charities.)
Do you have control over who can become a retro funder after the market is launched?
For the time being, “the market” is probably going to be a web2 platform that doesn’t exist yet with about the power of a spreadsheet on steroids. We can take it offline any time. The idea, however, is out there (has been for a long time, minus our thoughts on risk mitigation), so others can replicate it without our permission. Someone can simply decide to make tradeable SIBs, and we have a much more risky market on our hands than the one we’re aiming for because SIBs (always, I think) prescribe direct normativity.
So quite regardless of our marketplace, anyone can already become retro funder. We want to be very careful with who we recruit as retro funder though. So for example we’ll focus on winning over major EA funders for the job.
To what extent will the retro funders understand or care about the complicated definition of Attributed Impact?
Here’s a very highly gisted version that we’ve written for this purpose. Honestly, I would be surprised if our market (or prize contest) caused people to write more harmful forum posts than would’ve been written anyway. Then again receiving retro funding might signal-boost a post, so it’s something that we need to be careful with anyway. I remember very few forum posts, though, that I considered likely net harmful, and most were probably inconsequential. (Some might’ve also been harmful for reasons that would’ve been very hard to predict or that, even if predicted, were so unlikely that it still seemed valuable in expectation to publish the post.)
And will they be aware of, and know how to account for, complicated considerations like: “if the certificate’s description is not very specific and leaves the door open to risky, net-negative, high-impact activities then we should take this fact into account when evaluating the ex ante impact”?
(In case someone is confused: I think the idea here is similar to some forms of p-hacking. You write a vague description of what you’ve done; then you try all sorts of things that sort of fit into that vague description including super harmful ones; finally, when one of them succeeds and is not has super harmful as the others, you destroy all evidence of having had anything to do with the harmful attempts and only sell the successful one at a profit to the retro funder. E.g., you write a bunch of blog posts on stuff that is infohazardous if you’re wrong; publish them under different pseudonyms; wait for comments to come in that tell you whether you’re wrong; and then submit to the contest only the post that turned out well, with a description such as “I’ll write a thing.”)
I’m planning to read all submissions, so I hope I’ll catch potentially scammy ones. We haven’t assembled an evaluation committee yet. (Do you want to be on it? It’s one of my to-dos to recruit this group.) I think I’ll often err on the side of being lenient with issuers though, because vagueness is probably in most cases not going to be due to them trying to trick the retro funders but due to lacking experience with the completely new format.
But I think that for the foreseeable future and during our experiments the projects will be so small on average that very few will require monetary investments. The sorts of “investments” they’ll get will be rather in the form of proofreading, advice, designing graphics, etc. So to a large part the projects would’ve been possible anyway, the contest will just catalyze them. Most of the value will be in the value of information and only secondarily in that catalyzation.
We’re also interested in buying shares in exposés of certificates where Forum authors cheated in some fashion.
Could we end up with retro funders who use a much simpler criterion, e.g. just whether they “like” the impact? (Which is how altruistic retro funders are described in the OP.)
I intended “like” as an intuitive shorthand for “Has sufficiently high Attributed Impact, falls into a focus area of the retro funder, is within the budget of the retro funder, is legally accessible to the retro funder, etc.” So some thing stricter rather than more lose than Attributed Impact alone.
If a simpler criterion also does the job, that’ll be great. You’ll know about the feedback effects that I’ve baked into the Attributed Impact definition. It’s crucial for the early retro funders to uphold Attributed Impact for those to get going. Once that cycle has started, we’ll be in a much safer position. And eventually we can also implement the “pot” to further cement the commitment of the market to Attributed Impact.
To what extent have you consulted with the EA community about creating an impact market?
I’ve been a very active member of the community for 8 years and was a grantmaker at my own charity for 3–4 years before that. I worked for EA orgs, was board member of an EA org, started and was part of local communities, have been advising managers of EA orgs, etc., so I consider myself part of the community and most of my friends are EAs. If anything I’m oversharing when it comes to my EA interests. So I and my cofounders have discussed impact markets with maybe hundreds other community members at length for what must be hundreds of hours at this point.
Or if you’re looking for trusted names (for good reason): I’ve coauthored IM content with Owen Cotton-Barratt, have received very helpful feedback from Nicole Ross of CEA, which she amalgamated from the feedback of many CEA staff (though her own contributions were brilliant of course), discussed it with Nonlinear, with our FTX regrantor and friends, with Ben Todd, Greg Colbourn, Justin Shovelain, and many more. In many cases I explicitly asked them to red-team the concept. This is not to say that they all endorsed all of my ideas! I usually asked for red-teaming and got red-teaming, and that’s it.
Dony has been very active in talking to countless people about IMs – at Bay Area EA meetups, at EA Global, in video calls, etc. He’s probably talked to as many EAs as I have in a shorter time span! Matt, too, has done his share of networking even though he’s only working part time on it while bootstraping another startup!
And, as you know, we’ve always tried to lay out all of our thinking in EA Forum posts to attract further valuable feedback such as yours!
Will those solutions work?
Do you have control over who can become a retro funder after the market is launched? To what extent will the retro funders understand or care about the complicated definition of Attributed Impact? And will they be aware of, and know how to account for, complicated considerations like: “if the certificate’s description is not very specific and leaves the door open to risky, net-negative, high-impact activities then we should take this fact into account when evaluating the ex ante impact”?
Could we end up with retro funders who use a much simpler criterion, e.g. just whether they “like” the impact? (Which is how altruistic retro funders are described in the OP.)
Have you resolved the unilateralist’s curse? To what extent have you consulted with the EA community about creating an impact market?
I can’t look into the future, so the most viable approach seems to me to be the one outlined in Toward Impact Markets. My current take:
We first spent a couple months (or about nine months) thinking about impact markets purely theoretically to assess whether they’re desirable at all. I’ve considered this done as of late March or so.
Then we start working on small-scale experiments, such as buying impact in EA Forum comments and soon hopefully buying impact in EA Forum posts in the context of a prize contest. The EA Forum has its own moderation team, so if someone posts something in the context of the contest that is harmful or likely to be harmful, we have two layers of protection against it: First the moderators can intervene, and then we and the retro funder (who seems really smart) can still catch and not buy any posts that still got published even though we consider them potentially net harmful.
We’ll learn from these experiments. If it turns out that the system incentivizes harmful actions, we can try to mitigate them or discontinue the whole project.
If we opt to mitigate them, we’ll run further iterations of the contest (and maybe other contests that seem similarly safe and contained to us) to test our solutions.
Finally we’ll be in a better position to answer your question about whether the solutions work than we are now when we haven’t tested them at all.
That seems safer to me than even abandoning the project altogether because other groups are also working on retro funding and impact certificates, and I haven’t heard them talk much about safety in that context. Not to mention the opportunity costs. (If we come to the conclusion that impact markets are bad, however, we can also pivot to pure impact markets safety research, as it were.)
Are you really worried about people posting harmful Forum posts due to the market’s influence or only about OpenAI-level projects years down the line? But we’ll definitely ramp this all up very gradually to catch catastrophes when they’re still on the level of someone writing a post about panspermia without considering the s-risks it might cause. (Hurts me to think of that but the actual impact on the world is probably very limited.)
If we want to make progress on impact markets, I think the time has come for carefully controlled, small-scale actual experiments.
In AI safety a strong case can be made for more up-front theoretical thought because catastrophe is overdetermined – can occur through many disjunctive channels – and is in many cases final. But with impact markets we operate well within the space where we can assume that all the other actors are human, with human intuitions and restraints and human law above them. I think this relatively increases the opportunity costs of purely theoretical thought compared to the safety benefits. I don’t know to what extent, though. Exactly how careful we should be, i.e. exactly where we should strike the balance, is something we constantly think about.
Someone who surveyed many of the existing impact markets projects even noted that we have an “extreme focus on risks.”
Also note that a lot of the bad incentives we worry about already exist. The question is not so much whether impact markets might, in certain edge cases, also bear these risks but whether they’re going to exacerbate them or create them in fields where they didn’t previously exist. E.g., lots of bad stuff can be monetized, or at least it’s plausible enough that it can be monitized to attract investors. When that’s possible, it’s about as bad as IMs can ever get. (Very bad, admittedly.) And charities have monetary incentives to lie about negative study results on their intervention. That’s also not a reason to be generally skeptical of the nonprofit format because there are also those who actually shut down or pivot when they find out that their invention doesn’t work.
In fact, I did just that (first pivot, later shut down) when, around 2010–2015, my own charity turned out to be nonoptimal. (That’s not even a euphemism; we were probably on par with Village Reach in cost-effectiveness but not scale. Village Reach was temporarily among the GW top charities.)
For the time being, “the market” is probably going to be a web2 platform that doesn’t exist yet with about the power of a spreadsheet on steroids. We can take it offline any time. The idea, however, is out there (has been for a long time, minus our thoughts on risk mitigation), so others can replicate it without our permission. Someone can simply decide to make tradeable SIBs, and we have a much more risky market on our hands than the one we’re aiming for because SIBs (always, I think) prescribe direct normativity.
So quite regardless of our marketplace, anyone can already become retro funder. We want to be very careful with who we recruit as retro funder though. So for example we’ll focus on winning over major EA funders for the job.
Here’s a very highly gisted version that we’ve written for this purpose. Honestly, I would be surprised if our market (or prize contest) caused people to write more harmful forum posts than would’ve been written anyway. Then again receiving retro funding might signal-boost a post, so it’s something that we need to be careful with anyway. I remember very few forum posts, though, that I considered likely net harmful, and most were probably inconsequential. (Some might’ve also been harmful for reasons that would’ve been very hard to predict or that, even if predicted, were so unlikely that it still seemed valuable in expectation to publish the post.)
(In case someone is confused: I think the idea here is similar to some forms of p-hacking. You write a vague description of what you’ve done; then you try all sorts of things that sort of fit into that vague description including super harmful ones; finally, when one of them succeeds and is not has super harmful as the others, you destroy all evidence of having had anything to do with the harmful attempts and only sell the successful one at a profit to the retro funder. E.g., you write a bunch of blog posts on stuff that is infohazardous if you’re wrong; publish them under different pseudonyms; wait for comments to come in that tell you whether you’re wrong; and then submit to the contest only the post that turned out well, with a description such as “I’ll write a thing.”)
I’m planning to read all submissions, so I hope I’ll catch potentially scammy ones. We haven’t assembled an evaluation committee yet. (Do you want to be on it? It’s one of my to-dos to recruit this group.) I think I’ll often err on the side of being lenient with issuers though, because vagueness is probably in most cases not going to be due to them trying to trick the retro funders but due to lacking experience with the completely new format.
But I think that for the foreseeable future and during our experiments the projects will be so small on average that very few will require monetary investments. The sorts of “investments” they’ll get will be rather in the form of proofreading, advice, designing graphics, etc. So to a large part the projects would’ve been possible anyway, the contest will just catalyze them. Most of the value will be in the value of information and only secondarily in that catalyzation.
We’re also interested in buying shares in exposés of certificates where Forum authors cheated in some fashion.
I intended “like” as an intuitive shorthand for “Has sufficiently high Attributed Impact, falls into a focus area of the retro funder, is within the budget of the retro funder, is legally accessible to the retro funder, etc.” So some thing stricter rather than more lose than Attributed Impact alone.
If a simpler criterion also does the job, that’ll be great. You’ll know about the feedback effects that I’ve baked into the Attributed Impact definition. It’s crucial for the early retro funders to uphold Attributed Impact for those to get going. Once that cycle has started, we’ll be in a much safer position. And eventually we can also implement the “pot” to further cement the commitment of the market to Attributed Impact.
What unilateralist’s curse do you mean?
I’ve been a very active member of the community for 8 years and was a grantmaker at my own charity for 3–4 years before that. I worked for EA orgs, was board member of an EA org, started and was part of local communities, have been advising managers of EA orgs, etc., so I consider myself part of the community and most of my friends are EAs. If anything I’m oversharing when it comes to my EA interests. So I and my cofounders have discussed impact markets with maybe hundreds other community members at length for what must be hundreds of hours at this point.
Or if you’re looking for trusted names (for good reason): I’ve coauthored IM content with Owen Cotton-Barratt, have received very helpful feedback from Nicole Ross of CEA, which she amalgamated from the feedback of many CEA staff (though her own contributions were brilliant of course), discussed it with Nonlinear, with our FTX regrantor and friends, with Ben Todd, Greg Colbourn, Justin Shovelain, and many more. In many cases I explicitly asked them to red-team the concept. This is not to say that they all endorsed all of my ideas! I usually asked for red-teaming and got red-teaming, and that’s it.
Dony has been very active in talking to countless people about IMs – at Bay Area EA meetups, at EA Global, in video calls, etc. He’s probably talked to as many EAs as I have in a shorter time span! Matt, too, has done his share of networking even though he’s only working part time on it while bootstraping another startup!
And, as you know, we’ve always tried to lay out all of our thinking in EA Forum posts to attract further valuable feedback such as yours!