Dawn’s (Denis’s) Intellectual Turing Test Red-Teaming Impact Markets
[Edit: Before you read this, note that I failed. See the comment below.]
I want to check how well I understand Ofer’s position against impact markets. The “Imagined Ofer” below is how I imagine Ofer to respond (minus language – I’m not trying to imitate his writing style though our styles seem similar to me). I would like to ask the real Ofer to correct me wherever I’m misunderstanding his true position.
I currently favor using the language of prize contests to explain impact markets unless I talk to someone intimately familiar with for-profit startups. People seem to understand it more easily that way.
Dawn: I’m doing these prize contests now where I encourage people to help each other (monetarily and otherwise) to produce awesome work to reduce x-risks and finally I reward all participants in the best ones of the projects. I’m writing software to facilitate this. I will only reward them in proportion to the gains from moral trade that they’ve generated, and I’ll use my estimate of their ex ante EV as a ceiling for my overall evaluation of a project.
This has all sorts of benefits! It’s basically a wide-open regrantor program where the quasi-regrantors (the investors) absorb most of the risk. It scales grantmaking up and down – grantmakers have ~ 10x less work and can thus scale their operation up by 10x, and the investors can be anyone around the world, so they can draw on their existing networks for their investments, so they can consider many more much smaller investments or investments which require very niche knowledge or access. Many more ideas will get tried, and it’ll be easier for people to start projects even when they still lack personal contact to the right grantmakers.
Imagined Ofer: That seems very dangerous to me. What if someone else also offers a reward and also encourages people to help each other with the projects but does not apply your complicated ex ante EV ceiling? Someone may create a flashy but extremely risky project and attract a lot of investors for it.
Dawn: But they can do that already? All sorts of science prizes, all the other EA-related prizes, Bountied Rationality, new prizes they promote on Twitter, etc.
Imagined Ofer: Okay, but you’re building a software to make it easier, so presumably you’ll thereby increase the number of people who will offer such prizes and increase the number of people who will attract investments in advance because the user experience and networking with investors is smoother and because they’re encouraged to do so.
Dawn: That’s true. We should make our software relatively unattractive to such prize offerers and their audiences, for example by curating the projects on it such that only the ones that are deemed to be robustly positive in impact are displayed (something I proposed from the start, in Aug. 2021). I could put together a team of experts for this.
Imagined Ofer: That’s not enough. What if you or your panel of experts overlook that a project was actually ex ante net-negative in EV, for example because it has already matured and so happened to turn out good? You’d be biased in a predictably upward direction in your assessment of the ex ante EV. In fact, people could do a lot of risky projects and then only ever submit the ones that worked out fine.
Dawn: Well, we can try really hard… Pay bounties for spotting projects that were negative in ex ante EV but slipped through; set up a network of auditors; make it really easy and effortless to hold compounding short positions on projects that manage their −1x leverage automatically; recruit firm like Hindenburg Research (or individuals with similar missions) to short projects and publish exposés on them; require issuers to post collateral; set up a mechanisms whereby it becomes unlikely that there’ll be other prizes with any but a small market share (such as the “pot”); maybe even require preregistration of projects to avoid the tricks you mention; etc. (All the various fixes I propose in Toward Impact Markets.)
Imagined Ofer: Those are only unreliable patches for a big fundamental problem. None of them is going to be enough, not even in combination. They are slow and incomplete. Ex ante negative projects can slip through the cracks or remain undetected for long enough to cause harm in this world or a likely counterfactual world.
Dawn: Okay, so one slips through, attracts a lot of investment, gets big, maybe even manages to fool us into awarding it prize money. It or new projects in the same reference class have some positive per-year probability of being found out due to all the safety mechanisms. Eventually a short-seller or an exposé-bounty poster will spot them and make a lot of money for doing so. We will react and make it super-duper clear going forward that we will not reward projects in that reference class ever again. Anyone who wants to get investments will need to make the case that their project is not in that reference class.
Imagined Ofer: But by that time the harm is done, be it to a counterfactual world. Next time the harm will be done to the factual world. Besides, regardless of how safe you actually make the system, what’s important is that there can always be issuers and investors who believe (be it wrongly believe) that they can get their risky project retro-funded. You can’t prevent that no matter how safe you make the system.
Dawn: But that seems overly risk averse to me because prospective funders can also make mistakes, and current prizes – including prizes in EA – are nowhere near as safe. Once our system is safer than any other existing methods, the bad actors will prefer the existing methods.
Imagined Ofer: The existing methods are much safer. Prospective funding is as safe as it gets, and current prizes have a time window of months or so, so by the time the prizes are awarded, the projects that they are awarded to are still very young, so the prizes are awarded on the basis of something that is still very close to ex ante EV.
Dawn: But retroactive funders can decide when to award prizes. In fact, we have gone with a month in our experiment. But admittedly, in the end I imagine that cycles of a year or two are more realistic. That is still not that much more. (See this draft FAQ for some calculations. Retro funders will pay out prizes of up to 1000% in the success case, but outside the success case investors will lose all or most of their principal. They are hits-based investors, so their riskless benchmark profit is probably much higher than 5% per year. They’ll probably not want to stay in certificates for more than a few years even at 1000% return in the success case.)
Imagined Ofer: A lot more can happen in a year or two than in a month. EA, for example, looked very different in 2013 compared to 2015, but it looked about the same in January vs. February 2015. But more importantly, you write about tying the windfall clauses of AGI companies to retro funding with enormous budgets, budgets that surely offset even the 20 years that it may take to get to that point and the very low probability.
Dawn: The plan I wrote about has these windfalls reward projects that were previously already rewarded by our regular retro funders, no more.
Imagined Ofer: But what keeps a random, unaligned AGI company from just using the mechanism to reward anyone they like?
Dawn: True. Nothing. Let’s keep this idea private. I can unpublish my EA Forum post too, but maybe that’s the audience that should know about it if anyone should. As an additional safeguard against uncontrolled speculation, how about we require people to always select one or several actual present prize rounds when they submit a project?
Imagined Ofer: That might help, but people could just churn out small projects and select whatever prize happens to be offered at the time whereas in actuality they’re hoping that one of these prizes will eventually be mentioned in a windfall clause or their project will otherwise be retro funded through a windfall clause or some other future funder who ignore the setting.
Dawn: Okay, but consider how far down the rabbit hole we’ve gone now: We have a platform that is moderated; we have relatively short cycles for the prize contest (currently just one month); we explicitly offer prizes for exposés; we limit our prizes to stuff that is, by dint of its format, unlikely to be very harmful; we even started with EA Forum posts, a forum that has another highly qualified moderation team. Further, we want to institute more mechanisms – besides exposés – that make it easy to short certificates to encourage people to red-team them; mechanisms to retain control of the market norms even if many new retro funders enter; even stricter moderation; etc. We’re even considering requiring preregistration, mandatory selection of present prize rounds (even though it runs counter to how I feel impact markets should work), and very narrow targets set by retro funders (like my list of research questions in our present contest). Compare that to other EA prize contests. Meanwhile, the status quo is that anyone with some money and Twitter following can do a prize contest, and anyone can make a contract with a rich friend to secure a seed investment that they’ll repay if they win. All of our countless safeguards should make it vastly easier for unaligned retro funders and unaligned project founders to do anything other than use our platform. All that remains is that maybe we’re spreading the meme that you can seed-invest into potential prize winners, but that’s also something that is already happening around the world with countless science prizes. What more can we do!
Imagined Ofer: This is not an accusation – we’re all human – but money and sunk time-cost fallacy corrupt. For all I know this could be a motte-and-bailey type of situation: The moment a big crypto funder offers you a $1m grant, you might throw caution to the wind and write a wide-open ungated blockchain implementation of an impact market.
Dawn: I hope I’ve made clear in my 20,000+ words of writing on impact market safety that were unprompted by your comments (other than the first one in 2021) that my personal prioritization has long rested on robustness over mere positive EV. I’ve just quit my well-paid ETG job as software engineer in Switzerland to work on this. If I were in it for the money, I wouldn’t be. (More than what I need for my financial safety.) Our organization is also set up with a very general purview so that we can pivot easily. So if I should start work on a more open version of the currently fully moderated, centralized implementation, it’s because I’ve come to believe that it’s more robustly positive than I currently think it is. (Or it may well be possible to find a synthesis of permissionlessness and curation.) The only thing that can convince me otherwise are evidence and arguments.
Imagined Ofer: I think that most interventions that have a substantial chance to prevent an existential catastrophe also have a substantial chance to cause an existential catastrophe, such that it’s very hard to judge whether they are net-positive or net-negative (due to complex cluelessness dynamics that are caused by many known and unknown crucial considerations). So the typical EA Forum post with sufficient leverage over our future to make a difference at all is about equally likely to increase or to decrease x-risk.
Dawn: I find that to be an unusual opinion. CEA and others try to encourage people to post on the EA Forum rather than discourage them. That was also the point of the CEA-run EA Forum contest. Personally, I also find it unintuitive that that should be the case: For any given post, I try to think of pathways along which it could be beneficial and detrimental. Usually there are few detrimental pathways, and if there are any, there are strong social norms around malice and government institutions such as the police in the way of pursuing the paths. A few posts come to mind that are rare, unusual exceptions to this theme, but it’s been several years since I read one of those. Complex cluelessness also doesn’t seem to make a difference here because it applies equally to any prospective funding, to prizes after one month, and to prizes after one year. Do you think that writing on high-leverage topics such as x-risks should generally be discouraged rather than encouraged on the EA Forum?
Imagined Ofer:Even if you create even a very controlled impact market that is safer than the average EA prize contest you are still creating a culture and a meme regarding retroactive funding. You could inspire someone to post on Twitter “The current impact markets are too curated. I’m offering a $10m retro prize for dumping 500 tons of iron sulfate into the ocean to solve climate change.” If someone posted this now no one would take them seriously. If you create an impact market with tens of millions of dollars flowing through it and many market actors, it will become believable to some rouge players that this payout is likely real.
I do not endorse the text written by “Imagined Ofer” here. Rather than describing all the differences between that text and what I would really say, I’ve now published this reply to your first comment.
Dawn’s (Denis’s) Intellectual Turing Test Red-Teaming Impact Markets
[Edit: Before you read this, note that I failed. See the comment below.]
I want to check how well I understand Ofer’s position against impact markets. The “Imagined Ofer” below is how I imagine Ofer to respond (minus language – I’m not trying to imitate his writing style though our styles seem similar to me). I would like to ask the real Ofer to correct me wherever I’m misunderstanding his true position.
I currently favor using the language of prize contests to explain impact markets unless I talk to someone intimately familiar with for-profit startups. People seem to understand it more easily that way.
My model of Ofer is informed by (at least) these posts/comment threads.
Dawn: I’m doing these prize contests now where I encourage people to help each other (monetarily and otherwise) to produce awesome work to reduce x-risks and finally I reward all participants in the best ones of the projects. I’m writing software to facilitate this. I will only reward them in proportion to the gains from moral trade that they’ve generated, and I’ll use my estimate of their ex ante EV as a ceiling for my overall evaluation of a project.
This has all sorts of benefits! It’s basically a wide-open regrantor program where the quasi-regrantors (the investors) absorb most of the risk. It scales grantmaking up and down – grantmakers have ~ 10x less work and can thus scale their operation up by 10x, and the investors can be anyone around the world, so they can draw on their existing networks for their investments, so they can consider many more much smaller investments or investments which require very niche knowledge or access. Many more ideas will get tried, and it’ll be easier for people to start projects even when they still lack personal contact to the right grantmakers.
Imagined Ofer: That seems very dangerous to me. What if someone else also offers a reward and also encourages people to help each other with the projects but does not apply your complicated ex ante EV ceiling? Someone may create a flashy but extremely risky project and attract a lot of investors for it.
Dawn: But they can do that already? All sorts of science prizes, all the other EA-related prizes, Bountied Rationality, new prizes they promote on Twitter, etc.
Imagined Ofer: Okay, but you’re building a software to make it easier, so presumably you’ll thereby increase the number of people who will offer such prizes and increase the number of people who will attract investments in advance because the user experience and networking with investors is smoother and because they’re encouraged to do so.
Dawn: That’s true. We should make our software relatively unattractive to such prize offerers and their audiences, for example by curating the projects on it such that only the ones that are deemed to be robustly positive in impact are displayed (something I proposed from the start, in Aug. 2021). I could put together a team of experts for this.
Imagined Ofer: That’s not enough. What if you or your panel of experts overlook that a project was actually ex ante net-negative in EV, for example because it has already matured and so happened to turn out good? You’d be biased in a predictably upward direction in your assessment of the ex ante EV. In fact, people could do a lot of risky projects and then only ever submit the ones that worked out fine.
Dawn: Well, we can try really hard… Pay bounties for spotting projects that were negative in ex ante EV but slipped through; set up a network of auditors; make it really easy and effortless to hold compounding short positions on projects that manage their −1x leverage automatically; recruit firm like Hindenburg Research (or individuals with similar missions) to short projects and publish exposés on them; require issuers to post collateral; set up a mechanisms whereby it becomes unlikely that there’ll be other prizes with any but a small market share (such as the “pot”); maybe even require preregistration of projects to avoid the tricks you mention; etc. (All the various fixes I propose in Toward Impact Markets.)
Imagined Ofer: Those are only unreliable patches for a big fundamental problem. None of them is going to be enough, not even in combination. They are slow and incomplete. Ex ante negative projects can slip through the cracks or remain undetected for long enough to cause harm in this world or a likely counterfactual world.
Dawn: Okay, so one slips through, attracts a lot of investment, gets big, maybe even manages to fool us into awarding it prize money. It or new projects in the same reference class have some positive per-year probability of being found out due to all the safety mechanisms. Eventually a short-seller or an exposé-bounty poster will spot them and make a lot of money for doing so. We will react and make it super-duper clear going forward that we will not reward projects in that reference class ever again. Anyone who wants to get investments will need to make the case that their project is not in that reference class.
Imagined Ofer: But by that time the harm is done, be it to a counterfactual world. Next time the harm will be done to the factual world. Besides, regardless of how safe you actually make the system, what’s important is that there can always be issuers and investors who believe (be it wrongly believe) that they can get their risky project retro-funded. You can’t prevent that no matter how safe you make the system.
Dawn: But that seems overly risk averse to me because prospective funders can also make mistakes, and current prizes – including prizes in EA – are nowhere near as safe. Once our system is safer than any other existing methods, the bad actors will prefer the existing methods.
Imagined Ofer: The existing methods are much safer. Prospective funding is as safe as it gets, and current prizes have a time window of months or so, so by the time the prizes are awarded, the projects that they are awarded to are still very young, so the prizes are awarded on the basis of something that is still very close to ex ante EV.
Dawn: But retroactive funders can decide when to award prizes. In fact, we have gone with a month in our experiment. But admittedly, in the end I imagine that cycles of a year or two are more realistic. That is still not that much more. (See this draft FAQ for some calculations. Retro funders will pay out prizes of up to 1000% in the success case, but outside the success case investors will lose all or most of their principal. They are hits-based investors, so their riskless benchmark profit is probably much higher than 5% per year. They’ll probably not want to stay in certificates for more than a few years even at 1000% return in the success case.)
Imagined Ofer: A lot more can happen in a year or two than in a month. EA, for example, looked very different in 2013 compared to 2015, but it looked about the same in January vs. February 2015. But more importantly, you write about tying the windfall clauses of AGI companies to retro funding with enormous budgets, budgets that surely offset even the 20 years that it may take to get to that point and the very low probability.
Dawn: The plan I wrote about has these windfalls reward projects that were previously already rewarded by our regular retro funders, no more.
Imagined Ofer: But what keeps a random, unaligned AGI company from just using the mechanism to reward anyone they like?
Dawn: True. Nothing. Let’s keep this idea private. I can unpublish my EA Forum post too, but maybe that’s the audience that should know about it if anyone should. As an additional safeguard against uncontrolled speculation, how about we require people to always select one or several actual present prize rounds when they submit a project?
Imagined Ofer: That might help, but people could just churn out small projects and select whatever prize happens to be offered at the time whereas in actuality they’re hoping that one of these prizes will eventually be mentioned in a windfall clause or their project will otherwise be retro funded through a windfall clause or some other future funder who ignore the setting.
Dawn: Okay, but consider how far down the rabbit hole we’ve gone now: We have a platform that is moderated; we have relatively short cycles for the prize contest (currently just one month); we explicitly offer prizes for exposés; we limit our prizes to stuff that is, by dint of its format, unlikely to be very harmful; we even started with EA Forum posts, a forum that has another highly qualified moderation team. Further, we want to institute more mechanisms – besides exposés – that make it easy to short certificates to encourage people to red-team them; mechanisms to retain control of the market norms even if many new retro funders enter; even stricter moderation; etc. We’re even considering requiring preregistration, mandatory selection of present prize rounds (even though it runs counter to how I feel impact markets should work), and very narrow targets set by retro funders (like my list of research questions in our present contest). Compare that to other EA prize contests. Meanwhile, the status quo is that anyone with some money and Twitter following can do a prize contest, and anyone can make a contract with a rich friend to secure a seed investment that they’ll repay if they win. All of our countless safeguards should make it vastly easier for unaligned retro funders and unaligned project founders to do anything other than use our platform. All that remains is that maybe we’re spreading the meme that you can seed-invest into potential prize winners, but that’s also something that is already happening around the world with countless science prizes. What more can we do!
Imagined Ofer: This is not an accusation – we’re all human – but money and sunk time-cost fallacy corrupt. For all I know this could be a motte-and-bailey type of situation: The moment a big crypto funder offers you a $1m grant, you might throw caution to the wind and write a wide-open ungated blockchain implementation of an impact market.
Dawn: I hope I’ve made clear in my 20,000+ words of writing on impact market safety that were unprompted by your comments (other than the first one in 2021) that my personal prioritization has long rested on robustness over mere positive EV. I’ve just quit my well-paid ETG job as software engineer in Switzerland to work on this. If I were in it for the money, I wouldn’t be. (More than what I need for my financial safety.) Our organization is also set up with a very general purview so that we can pivot easily. So if I should start work on a more open version of the currently fully moderated, centralized implementation, it’s because I’ve come to believe that it’s more robustly positive than I currently think it is. (Or it may well be possible to find a synthesis of permissionlessness and curation.) The only thing that can convince me otherwise are evidence and arguments.
Imagined Ofer: I think that most interventions that have a substantial chance to prevent an existential catastrophe also have a substantial chance to cause an existential catastrophe, such that it’s very hard to judge whether they are net-positive or net-negative (due to complex cluelessness dynamics that are caused by many known and unknown crucial considerations). So the typical EA Forum post with sufficient leverage over our future to make a difference at all is about equally likely to increase or to decrease x-risk.
Dawn: I find that to be an unusual opinion. CEA and others try to encourage people to post on the EA Forum rather than discourage them. That was also the point of the CEA-run EA Forum contest. Personally, I also find it unintuitive that that should be the case: For any given post, I try to think of pathways along which it could be beneficial and detrimental. Usually there are few detrimental pathways, and if there are any, there are strong social norms around malice and government institutions such as the police in the way of pursuing the paths. A few posts come to mind that are rare, unusual exceptions to this theme, but it’s been several years since I read one of those. Complex cluelessness also doesn’t seem to make a difference here because it applies equally to any prospective funding, to prizes after one month, and to prizes after one year. Do you think that writing on high-leverage topics such as x-risks should generally be discouraged rather than encouraged on the EA Forum?
Imagined Ofer: Even if you create even a very controlled impact market that is safer than the average EA prize contest you are still creating a culture and a meme regarding retroactive funding. You could inspire someone to post on Twitter “The current impact markets are too curated. I’m offering a $10m retro prize for dumping 500 tons of iron sulfate into the ocean to solve climate change.” If someone posted this now no one would take them seriously. If you create an impact market with tens of millions of dollars flowing through it and many market actors, it will become believable to some rouge players that this payout is likely real.
I do not endorse the text written by “Imagined Ofer” here. Rather than describing all the differences between that text and what I would really say, I’ve now published this reply to your first comment.