It’s not that hard to see when a project was at risk of having large downsides
I strongly disagree. It’s often extremely hard to judge whether a project related to anthropogenic x-risks was ex-ante net-negative. For example, was the creation of OpenAI/Anthropic/CSET net-positive or net-negative (ex-ante)? How about any particular gain-of-function research effort, or the creation of any particular BSL-4 virology lab?
Given a past project that is related to anthropogenic x-risks or meta-EA, it can be extremely hard to evaluate the ex-ante potential harm that the project could have had by, for example:
Potentially drawing attention to info hazards (e.g. certain exciting approaches for developing AGI).
If a researcher believes they came up with an impressive insight, they will probably be biased towards publishing it, even if it may draw attention to potentially dangerous information. Their career capital, future compensation and status may be on the line.
I think if you have the opposite perspective and think we live in a really vulnerable world — maybe an offense-biased world where it’s much easier to do great harm than to protect against it — I think that increasing attention to anthropogenic risks could be really dangerous in that world. Because I think not very many people, as we discussed, go around thinking about the vast future.
If one in every 1,000 people who go around thinking about the vast future decide, “Wow, I would really hate for there to be a vast future; I would like to end it,” and if it’s just 1,000 times easier to end it than to stop it from being ended, that could be a really, really dangerous recipe where again, everybody’s well intentioned, we’re raising attention to these risks that we should reduce, but the increasing salience of it could have been net negative.
Potentially “patching” a problem and preventing a non-catastrophic, highly-visible outcome that would have caused an astronomically beneficial “immune response”. Here’s Nick Bostrom (“lightly edited for readability”):
Small and medium scale catastrophe prevention? Also looks good. So global catastrophic risks falling short of existential risk. Again, very difficult to know the sign of that. Here we are bracketing leverage at all, even just knowing whether we would want more or less, if we could get it for free, it’s non-obvious. On the one hand, small-scale catastrophes might create an immune response that makes us better, puts in place better safeguards, and stuff like that, that could protect us from the big stuff. If we’re thinking about medium-scale catastrophes that could cause civilizational collapse, large by ordinary standards but only medium-scale in comparison to existential catastrophes, which are large in this context, again, it is not totally obvious what the sign of that is: there’s a lot more work to be done to try to figure that out. If recovery looks very likely, you might then have guesses as to whether the recovered civilization would be more likely to avoid existential catastrophe having gone through this experience or not.
Potentially causing decision makers to have a false sense of security.
For example, perhaps it’s not feasible to solve AI alignment in a competitive way without strong coordination, etcetera. But researchers are biased towards saying good things about their field, their colleagues and their (potential) employers.
Potentially accelerating progress in AI capabilities in a certain way.
Potentially intensifying the competition dynamics among AI labs / states.
Potentially decreasing the EV of the EA community by exacerbating bad incentives and conflicts of interest, and by reducing coordination.
For example, by creating impact markets.
Potentially causing accidental harm via outreach campaigns or regulation advocacy (e.g. by causing people to get a bad first impression of something important).
Potentially causing a catastrophic leak from a virology lab.
You wrote:
unless the early funders irrationally expect oraculars to buy up bad EV bets which paid off.
Depending on the implementation of the impact market, it may be rational to expect that many retro funders will buy the impact of ex-ante net-negative projects that ended up being beneficial. Especially if the impact market is decentralized and cannot be controlled by anyone, and if it allows people to profit from recruiting new retro funders who are not very careful. For more important arguments about this point, see the section “Mitigating the risk is hard” in our post.
As for “weak consensus”, we have Scott Alexander, Paul Christiano, and Eliezer Yudkowsky coming down on the side of “Yes, retrofunding is great”. I’m not sure how that could be seen as anything other than strong consensus of key thought leaders,
The statement “retrofunding is great” is very vague. AFAIK, none of the people you mentioned gave a blank endorsement for all possible efforts to create an impact market (including a decentralized market that no one can control). There should be a consensus in EA about a specific potential intervention to create an impact market, before it is decided to carry out that intervention. Also, the EA community is large, so it’s wrong to claim that there is a “strong consensus of key thought leaders” for doing something risky because a few brilliant, high-status people wrote positive things about it (and especially if they wrote those things before there was substantial discourse about the downside risks).
not to mention the many other people who’ve thought carefully about this and decided it is one of the most important interventions to improve the future.
This is prohibitively vague. How do you operationalize this exactly? Can you give examples of when EA has achieved a consensus that is analogical with what you desire in this situation?
If e.g. Future Fund and Open Phil were to use it, wouldn’t that be a pretty strong signal, especially since they would want to derisk it pretty heavily before scaled up usage of it, with months of dialogue and planning? What are you looking for here that wouldn’t already happen as a matter of course during the significant amount of downside mitigation work that would need to happen while building and scaling it up in concert with grantmakers, donors, and charities, who are pretty risk-averse on average and will generally incline towards wanting to be satisfied with at least interim solutions to at least some of the downside risks we, you, and others have identified (and probably ones not yet identified)?
I am pretty happy with Avengers Assemble-ing some kind of group discussion on impact markets as a consensing vehicle, perhaps a virtual event on the EA Forum using Polis, maybe a pop-up event during EAG SF, if it will please you or meet some objective criteria you specify. I find this sort of thing generally desirable regardless (cost-willing) but I additionally want to know what gets your thumbs up specifically given my impression is you want to stop people doing anything before achieving this consensus, whereas I view this downside work as a thing done concurrently alongside the long and arduous road of the empirical work of building that will provide spades of course-correcting feedback.
about a specific potential intervention to create an impact market, before it is decided to carry out that intervention
Decisionmaking by committee (my current impression of your ask) on specific product versions is not how things get built, especially early-on, especially with multiple parties involved, and is a recipe for not getting things done. The space of decisions is way too high-dimensional and things change based on feedback. Approaching a consensus on the important parts of the general theory of impact markets early on such that robust net-positivity is agreed-upon seems much more tractable and important in comparison, as well as generally keeping people working in the field coordinated and in communication as they iterate through different parameters of their visions.
I should have phrased differently: It’s not that hard to pick out highly risky projects retroactively, relative to identifying them prospectively. I also think that the reference class which is most worrying is genuinely not that hard to identify as strongly negative EV.
Impact markets don’t solve the problem of funders being able to fund harmful projects. But they don’t make it differentially worse (it empowers funders generally, but I don’t expect you would argue that grantmakers are net negative, so this still comes out net-positive).
I would welcome attempts to cause the culture of big grantmakers to more reliably make sure the recipients stay focused on the major challenges, but that is a separate project.
The classes of problem you list are all important questions of what should be funded, and it would be great to have better models of the EV of funding them, but none of them are impact-market specific. It’s already true that the funder who is most enthusiastic about a project can fund it unilaterally, and that this will sometimes be EV-negative. People can already recruit new funders who are risk-tolerant.
We’re making grantmakers generally more powerful, and that’s not entirely free of negative effects[1], but it does seem very likely net-positive.[2]
I do think it makes sense to not rush into creating a decentralized unregulatable system on general principles of caution, as we certainly should watch the operation of a more controllable one for some time before moving towards that.
The community as a whole cannot come to consensus on each of the huge number of important decisions to be made, the bandwidth of common knowledge is far, far to low, and we are faced with too many choices for that to be a viable option. Having several of the most relevant people strongly on board is about as good a sign as you could expect currently. I’m open to more opinions coming in, and would be very interested in seeing you debate with people on the other side and try to double crux on this or get more people on board with your position, but turning this into a committee is going to stall the project.
There are some other effects around cultural effects of making money flows more legible which seem possibly concerning, but I’m not super worried about negative EV projects being run.
It’s not that hard to pick out highly risky projects retroactively, relative to identifying them prospectively.
Do you mean that, if a project ends up being harmful we have Bayesian evidence that it was ex-ante highly risky? If so, I agree. But that fact does not alleviate the distribution mismatch problem, which is caused by the prospect of a risky project ending up going well.
Impact markets don’t solve the problem of funders being able to fund harmful projects. But they don’t make it differentially worse (it empowers funders generally, but I don’t expect you would argue that grantmakers are net negative, so this still comes out net-positive).
If the distribution mismatch problem is not mitigated (and it seems hard to mitigate), investors are incentivized to fund high-stakes projects while regarding potential harmful outcomes as if they were neutral. (Including in anthropogenic x-risks and meta-EA domains.) That is not the case with EA funders today.
There are some other effects around cultural effects of making money flows more legible which seem possibly concerning, but I’m not super worried about negative EV projects being run.
I think this is a highly over-optimistic take about cranking up the profit-seeking lever in EA and the ability to mitigate the effects of Goodhart’s law. It seems that when humans have an opportunity to make a lot of money (without breaking laws or norms) at the expense of some altruistic values, they usually behave in a way that is aligned with their local incentives (while convincing themselves it’s also the altruistic thing to do).
I do think it makes sense to not rush into creating a decentralized unregulatable system on general principles of caution, as we certainly should watch the operation of a more controllable one for some time before moving towards that.
If you run a fully controlled (Web2) impact market for 6-12 months, and the market funds great projects/posts and there’s no sign of trouble, will you then launch a decentralized impact market that no one can control (in which people can sell the impact of recruiting additional retro funders, and the impact of establishing that very market)?
If the distribution mismatch problem is not mitigated (and it seems hard to mitigate), investors are incentivized to fund high-stakes projects while regarding potential harmful outcomes as if they were neutral.
I thought a bunch more about this, and I do think there is something here worth paying attention to.
I am not certain that there is a notable enough pool of the projects in the category you’re worried about to offset the benefits of impact markets, but we would incentivise those that exist, and that has a cost.
If we’re limited to accredited investors, as Scott proposed, we have some pretty strong mitigation options. In particular, we can let oraculars pay to mark projects as having been strongly net negative, and have this detract from the ability of those who funded that project to earn on their entire portfolio. Since accounts will be hard to generate and only available to accredited investors, generating a new account for each item is not an available option.
I think I can make some modifications to the Awesome Auto Auction to include this fairly simply, and AAA does not allow selling as an action which removes the other risk of people dumping their money / provides a natural structure for limiting withdrawals (just cut off their automatic payments until the “debt” is repayed).
Would this be sufficient mitigation? And if not, what might you still fear about this?
If you run a fully controlled (Web2) impact market for 6-12 months, and the market funds great projects/posts and there’s no sign of trouble, will you then launch a decentralized impact market that no one can control (in which people can sell the impact of recruiting additional retro funders, and the impact of establishing that very market)?
I don’t see much benefit to a Web3 one assuming we can do microtransactions on a Web2, so I’d be fine with either not doing the Web3 or only doing it after several years of having a Web2 without any of those restrictions and nothing going badly wrong (retaining the option to restrict new markets for it at any time).
In particular, we can let oraculars pay to mark projects as having been strongly net negative, and have this detract from the ability of those who funded that project to earn on their entire portfolio.
I think this approach has the following problems:
Investors will still be risking only the total amount of money they invest in the market (or place as a collateral), while their potential gain is unlimited.
People tend to avoid doing things that directly financially harm other individuals. Therefore, I expect retro funders would usually not use their power to mark a project as “ex-ante net negative”, even if it was a free action and the project was clearly ex-ante net negative (let alone if the retro funders need to spend money on doing it; and if it’s very hard to judge whether the project was ex-ante net negative, which seems a much more common situation).
Seems essentially fine. There’s a reason society converged to loss-limited companies being the right thing to do, even though there is unlimited gain and limited downside, and that’s that individuals tend to be far too risk averse. Exposing them to a risk to the rest of their portfolio should be more than sufficient to make this not a concern.
Might be a fair point, but remember, this is in the case where some project was predictably net negative and then actually was badly net negative. My guess is at least some funders would be willing to step in and disincentivise that kind of activity, and the threat of it would keep people off the worst projects.
There’s a reason society converged to loss-limited companies being the right thing to do, even though there is unlimited gain and limited downside, and that’s that individuals tend to be far too risk averse.
I think the reason that states tend to allow loss-limited companies is that it causes them to have larger GDP (and thus all the good/adaptive things that are caused by having larger GDP). But loss-limited companies may be a bad thing from an EA perspective, considering that such companies may be financially incentivized to act in net-negative ways (e.g. exacerbating x-risks), especially in situations where lawmakers/regulators are lagging behind.
Yes, and greater GDP maps fairly well to greater effectiveness of altruism. I think you’re focused on downside risks too strongly. They exist, and they are worth mitigating, but inaction due to fear of them will cause far more harm. Inaction due to heckler’s veto is a not a free outcome.
Companies not being loss-limited would not cause them to stop producing x-risks when the literal death of all their humans is an insufficient motivation to discourage them. It would reduce a bunch of other categories of harm, but we’ve converged to accepting that risk to avoid crippling risk aversion in the economy.
I strongly disagree. It’s often extremely hard to judge whether a project related to anthropogenic x-risks was ex-ante net-negative. For example, was the creation of OpenAI/Anthropic/CSET net-positive or net-negative (ex-ante)? How about any particular gain-of-function research effort, or the creation of any particular BSL-4 virology lab?
Given a past project that is related to anthropogenic x-risks or meta-EA, it can be extremely hard to evaluate the ex-ante potential harm that the project could have had by, for example:
Potentially drawing attention to info hazards (e.g. certain exciting approaches for developing AGI).
If a researcher believes they came up with an impressive insight, they will probably be biased towards publishing it, even if it may draw attention to potentially dangerous information. Their career capital, future compensation and status may be on the line.
Here’s Alexander Berger (co-CEO of OpenPhil):
Potentially “patching” a problem and preventing a non-catastrophic, highly-visible outcome that would have caused an astronomically beneficial “immune response”. Here’s Nick Bostrom (“lightly edited for readability”):
Potentially causing decision makers to have a false sense of security.
For example, perhaps it’s not feasible to solve AI alignment in a competitive way without strong coordination, etcetera. But researchers are biased towards saying good things about their field, their colleagues and their (potential) employers.
Potentially accelerating progress in AI capabilities in a certain way.
Potentially intensifying the competition dynamics among AI labs / states.
Potentially decreasing the EV of the EA community by exacerbating bad incentives and conflicts of interest, and by reducing coordination.
For example, by creating impact markets.
Potentially causing accidental harm via outreach campaigns or regulation advocacy (e.g. by causing people to get a bad first impression of something important).
Potentially causing a catastrophic leak from a virology lab.
You wrote:
Depending on the implementation of the impact market, it may be rational to expect that many retro funders will buy the impact of ex-ante net-negative projects that ended up being beneficial. Especially if the impact market is decentralized and cannot be controlled by anyone, and if it allows people to profit from recruiting new retro funders who are not very careful. For more important arguments about this point, see the section “Mitigating the risk is hard” in our post.
The statement “retrofunding is great” is very vague. AFAIK, none of the people you mentioned gave a blank endorsement for all possible efforts to create an impact market (including a decentralized market that no one can control). There should be a consensus in EA about a specific potential intervention to create an impact market, before it is decided to carry out that intervention. Also, the EA community is large, so it’s wrong to claim that there is a “strong consensus of key thought leaders” for doing something risky because a few brilliant, high-status people wrote positive things about it (and especially if they wrote those things before there was substantial discourse about the downside risks).
Who are you referring to here?
This is prohibitively vague. How do you operationalize this exactly? Can you give examples of when EA has achieved a consensus that is analogical with what you desire in this situation?
If e.g. Future Fund and Open Phil were to use it, wouldn’t that be a pretty strong signal, especially since they would want to derisk it pretty heavily before scaled up usage of it, with months of dialogue and planning? What are you looking for here that wouldn’t already happen as a matter of course during the significant amount of downside mitigation work that would need to happen while building and scaling it up in concert with grantmakers, donors, and charities, who are pretty risk-averse on average and will generally incline towards wanting to be satisfied with at least interim solutions to at least some of the downside risks we, you, and others have identified (and probably ones not yet identified)?
I am pretty happy with Avengers Assemble-ing some kind of group discussion on impact markets as a consensing vehicle, perhaps a virtual event on the EA Forum using Polis, maybe a pop-up event during EAG SF, if it will please you or meet some objective criteria you specify. I find this sort of thing generally desirable regardless (cost-willing) but I additionally want to know what gets your thumbs up specifically given my impression is you want to stop people doing anything before achieving this consensus, whereas I view this downside work as a thing done concurrently alongside the long and arduous road of the empirical work of building that will provide spades of course-correcting feedback.
Decisionmaking by committee (my current impression of your ask) on specific product versions is not how things get built, especially early-on, especially with multiple parties involved, and is a recipe for not getting things done. The space of decisions is way too high-dimensional and things change based on feedback. Approaching a consensus on the important parts of the general theory of impact markets early on such that robust net-positivity is agreed-upon seems much more tractable and important in comparison, as well as generally keeping people working in the field coordinated and in communication as they iterate through different parameters of their visions.
I should have phrased differently: It’s not that hard to pick out highly risky projects retroactively, relative to identifying them prospectively. I also think that the reference class which is most worrying is genuinely not that hard to identify as strongly negative EV.
Impact markets don’t solve the problem of funders being able to fund harmful projects. But they don’t make it differentially worse (it empowers funders generally, but I don’t expect you would argue that grantmakers are net negative, so this still comes out net-positive).
I would welcome attempts to cause the culture of big grantmakers to more reliably make sure the recipients stay focused on the major challenges, but that is a separate project.
The classes of problem you list are all important questions of what should be funded, and it would be great to have better models of the EV of funding them, but none of them are impact-market specific. It’s already true that the funder who is most enthusiastic about a project can fund it unilaterally, and that this will sometimes be EV-negative. People can already recruit new funders who are risk-tolerant.
We’re making grantmakers generally more powerful, and that’s not entirely free of negative effects[1], but it does seem very likely net-positive.[2]
I do think it makes sense to not rush into creating a decentralized unregulatable system on general principles of caution, as we certainly should watch the operation of a more controllable one for some time before moving towards that.
The community as a whole cannot come to consensus on each of the huge number of important decisions to be made, the bandwidth of common knowledge is far, far to low, and we are faced with too many choices for that to be a viable option. Having several of the most relevant people strongly on board is about as good a sign as you could expect currently. I’m open to more opinions coming in, and would be very interested in seeing you debate with people on the other side and try to double crux on this or get more people on board with your position, but turning this into a committee is going to stall the project.
And your cataloging of them does seem useful, there are things we can adjust to minimize them.
There are some other effects around cultural effects of making money flows more legible which seem possibly concerning, but I’m not super worried about negative EV projects being run.
Do you mean that, if a project ends up being harmful we have Bayesian evidence that it was ex-ante highly risky? If so, I agree. But that fact does not alleviate the distribution mismatch problem, which is caused by the prospect of a risky project ending up going well.
If the distribution mismatch problem is not mitigated (and it seems hard to mitigate), investors are incentivized to fund high-stakes projects while regarding potential harmful outcomes as if they were neutral. (Including in anthropogenic x-risks and meta-EA domains.) That is not the case with EA funders today.
I think this is a highly over-optimistic take about cranking up the profit-seeking lever in EA and the ability to mitigate the effects of Goodhart’s law. It seems that when humans have an opportunity to make a lot of money (without breaking laws or norms) at the expense of some altruistic values, they usually behave in a way that is aligned with their local incentives (while convincing themselves it’s also the altruistic thing to do).
If you run a fully controlled (Web2) impact market for 6-12 months, and the market funds great projects/posts and there’s no sign of trouble, will you then launch a decentralized impact market that no one can control (in which people can sell the impact of recruiting additional retro funders, and the impact of establishing that very market)?
I thought a bunch more about this, and I do think there is something here worth paying attention to.
I am not certain that there is a notable enough pool of the projects in the category you’re worried about to offset the benefits of impact markets, but we would incentivise those that exist, and that has a cost.
If we’re limited to accredited investors, as Scott proposed, we have some pretty strong mitigation options. In particular, we can let oraculars pay to mark projects as having been strongly net negative, and have this detract from the ability of those who funded that project to earn on their entire portfolio. Since accounts will be hard to generate and only available to accredited investors, generating a new account for each item is not an available option.
I think I can make some modifications to the Awesome Auto Auction to include this fairly simply, and AAA does not allow selling as an action which removes the other risk of people dumping their money / provides a natural structure for limiting withdrawals (just cut off their automatic payments until the “debt” is repayed).
Would this be sufficient mitigation? And if not, what might you still fear about this?
I don’t see much benefit to a Web3 one assuming we can do microtransactions on a Web2, so I’d be fine with either not doing the Web3 or only doing it after several years of having a Web2 without any of those restrictions and nothing going badly wrong (retaining the option to restrict new markets for it at any time).
I think this approach has the following problems:
Investors will still be risking only the total amount of money they invest in the market (or place as a collateral), while their potential gain is unlimited.
People tend to avoid doing things that directly financially harm other individuals. Therefore, I expect retro funders would usually not use their power to mark a project as “ex-ante net negative”, even if it was a free action and the project was clearly ex-ante net negative (let alone if the retro funders need to spend money on doing it; and if it’s very hard to judge whether the project was ex-ante net negative, which seems a much more common situation).
Seems essentially fine. There’s a reason society converged to loss-limited companies being the right thing to do, even though there is unlimited gain and limited downside, and that’s that individuals tend to be far too risk averse. Exposing them to a risk to the rest of their portfolio should be more than sufficient to make this not a concern.
Might be a fair point, but remember, this is in the case where some project was predictably net negative and then actually was badly net negative. My guess is at least some funders would be willing to step in and disincentivise that kind of activity, and the threat of it would keep people off the worst projects.
I think the reason that states tend to allow loss-limited companies is that it causes them to have larger GDP (and thus all the good/adaptive things that are caused by having larger GDP). But loss-limited companies may be a bad thing from an EA perspective, considering that such companies may be financially incentivized to act in net-negative ways (e.g. exacerbating x-risks), especially in situations where lawmakers/regulators are lagging behind.
Yes, and greater GDP maps fairly well to greater effectiveness of altruism. I think you’re focused on downside risks too strongly. They exist, and they are worth mitigating, but inaction due to fear of them will cause far more harm. Inaction due to heckler’s veto is a not a free outcome.
Companies not being loss-limited would not cause them to stop producing x-risks when the literal death of all their humans is an insufficient motivation to discourage them. It would reduce a bunch of other categories of harm, but we’ve converged to accepting that risk to avoid crippling risk aversion in the economy.