Ofer (and Owen), I want to understand and summarize your cruxes one by one, in order to sufficiently pass your Ideological Turing Test that I can regenerate the core of your perspective. Consider me your point person for communications.
Crux: Distribution Mismatch of Impact Markets & Anthropogenic X-Risk
If I understand one of the biggest planks of your perspective correctly, you believe that there is a high-variance normal distribution of utility centered around 0 for x-risk projects, such that x-risk projects can often increase x-risk rather than decrease it. I have been concerned for a while that the x-risk movement may be bad for x-risk, so I am quite sympathetic to this claim, though I do believe some significant fraction of potential x-risk projects approach being robustly good. That said I think we are basically in agreement that a large subset of potential mathematically realisable x-risk projects actually increase it, though it’s harder to be sure about the share of it in-practice with real x-risk projects given that people generally if not totally avoid the obviously bad stuff.
It seems especially important to prevent the risk from materializing in the domains of anthropogenic x-risks and meta-EA.
The examples you are most concerned by in particular are biosecurity and AI safety (as mentioned in a previous comment of yours), due to potential infohazards of posts on the EA Forum, as well as meta EA mentioned above. You have therefore suggested that impact markets should not deal with these causes, either early on such as during our contest or presumably indefinitely.
Let me use at least one example set of particular submissions that may fall under these topics, and let me know what you think of them.
I was thinking it would be quite cool if both Yudkowsky and Christiano respectively submitted certificates for their posts, ‘List of Lethalities’ and ‘Where I agree and disagree with Eliezer’. These are valuable posts in my opinion and they would help grow an impact marketplace.
My model of you would say either that:
1) funding those particular posts is net bad, or
2) funding those two posts in particular may be net good, but it sets a precedent that will cause there to be further counterfactual AI safety posts on EA Forum due to retroactive funding, which is net bad, or
3) posts on the EA Forum/LW/Alignment Forum being further incentivized would be net good (minus stuff such as infohazards, etc), but a more mature impact market at scale risks funding the next OpenAI or other such capabilities project, therefore it’s not worth retroactively funding forum posts if it risks causing that.
I am tentatively guessing your view is something at least subtly different from those rough disjunctions, though not too different.
Looking at our current submissions empirically, my sense is that the potentially riskiest certificate we have received is ‘The future of nuclear war’ by Alexei Turchin. The speculation in it could potentially provide new ideas to bad actors. I don’t know, I haven’t read/thought about this one in detail yet. For instance, core degasation could be a new x-risk but it also seems highly unlikely. This certificate could also be the most valuable. My model of you says this certificate is net-negative. I would agree that it may be an example of the sort of situation where some people believe a project is a positive externality and some believe it’s a negative externality, but the mismatch distribution means it’s valuated positively by a marketplace that can observe the presence of information but not its absence. Or maybe the market thinks riskier stuff may win the confidence game. ‘Variance is sexy’. This is a very provisional thought and not anything I would clearly endorse; I respect Alexei’s work quite highly!
After your commentary saying it would be good to ban these topics, I was considering conceding that condition because it doesn’t seem too problematic to do so for the contest, and by and large I still think that, though again I would also specifically quite like to see those two AI posts submitted if the authors want that.
I’m curious to know your evaluation of the following possible courses of action, particularly by what percentage your concern is reduced vs other issues:
impact markets are isolated from x-risk topics for all time using magic, they are not isolated from funding meta EA which could downstream affect x-risk
impact markets are isolated from x-risk topics and from funding meta EA for all time using magic, they only fund object-level stuff such as global health and development
we don’t involve x-risk topics in our marketplace for the rest of the year
we don’t involve x-risk topics until there is a clear counterbalancing force to mismatch distribution in the mechanism design, in a way that can be mathematically modelled, which may be necessary if not sufficient for proving the mechanism design works
or you or agents you designate are satisfied that a set of informal processes, norms, curation processes, etc. are achieving this for a centralized marketplace
though I would predict this does not address your crux that a centralized impact market may inspire / devolve into a simpler set of equilibria of retro funding that doesn’t use e.g. Attributed Impact, probably in conjunction with decentralization
I can start a comment thread for discussing this crux separately
we do one of the above but allow the two AI posts as exceptions
That list is just a rough mapping of potential actions. I have probably not characterized sufficiently well your position to offer a full menu of actions you may like to see taken regarding this issue.
tl;dr is that I’m basically curious 1) how much you think the risk is dominated by mismatch distribution applying specifically to x-risk vs say global poverty, 2) on which timeframes it is most important to shape the cause scope of the market in light of that (now? at full scale? both?), 3) whether banning x-risk topics from early impact markets (in ~2022) is a significant risk reducer by your lights.
(Meta note: I will drop in more links and quotes some time after publishing this.)
I think that most interventions that have a substantial chance to prevent an existential catastrophe also have a substantial chance to cause an existential catastrophe, such that it’s very hard to judge whether they are net-positive or net-negative (due to complex cluelessness dynamics that are caused by many known and unknown crucial considerations).
My model of you would say either that:
funding those particular posts is net bad, or
funding those two posts in particular may be net good, but it sets a precedent that will cause there to be further counterfactual AI safety posts on EA Forum due to retroactive funding, which is net bad, or
posts on the EA Forum/LW/Alignment Forum being further incentivized would be net good (minus stuff such as infohazards, etc), but a more mature impact market at scale risks funding the next OpenAI or other such capabilities project, therefore it’s not worth retroactively funding forum posts if it risks causing that.
My best guess is that those particular two posts are net-positive (I haven’t read them entirely / at all). Of course, this does not imply that it’s net-positive to use these posts in a way that leads to the creation of an impact market.
In (3) you wrote “posts on the EA Forum/LW/Alignment Forum […] (minus stuff such as infohazards, etc)”. I think this description essentially assumes the problem away. Posts are merely information in a written form, so if you exclude all the posts that contain harmful information (i.e. info hazards), the remaining posts are by definition not net-negative. The hard part is to tell which posts are net-negative. (Or more generally, which interventions/projects are net-negative.)
My model of you says this certificate is net-negative. I would agree that it may be an example of the sort of situation where some people believe a project is a positive externality and some believe it’s a negative externality, but the mismatch distribution means it’s valuated positively by a marketplace that can observe the presence of information but not its absence. Or maybe the market thinks riskier stuff may win the confidence game. ‘Variance is sexy’. This is a very provisional thought and not anything I would clearly endorse;
The distribution mismatch problem is not caused by different people judging the EV differently. It would be relevant even if everyone in the world was in the same epistemic state. The problem is that if a project ends up being extremely harmful, its certificates end up being worth $0, same as if it ended up being neutral. Therefore, when market participants who follow their local financial incentives evaluate a project, they treat potential outcomes that are extremely harmful as if they were neutral. I’m happy to discuss this point further if you don’t agree with it. It’s the core argument in the OP, so I want to first reach an agreement about it before discussing possible courses of action.
Ofer (and Owen), I want to understand and summarize your cruxes one by one, in order to sufficiently pass your Ideological Turing Test that I can regenerate the core of your perspective. Consider me your point person for communications.
Crux: Distribution Mismatch of Impact Markets & Anthropogenic X-Risk
If I understand one of the biggest planks of your perspective correctly, you believe that there is a high-variance normal distribution of utility centered around 0 for x-risk projects, such that x-risk projects can often increase x-risk rather than decrease it. I have been concerned for a while that the x-risk movement may be bad for x-risk, so I am quite sympathetic to this claim, though I do believe some significant fraction of potential x-risk projects approach being robustly good. That said I think we are basically in agreement that a large subset of potential mathematically realisable x-risk projects actually increase it, though it’s harder to be sure about the share of it in-practice with real x-risk projects given that people generally if not totally avoid the obviously bad stuff.
The examples you are most concerned by in particular are biosecurity and AI safety (as mentioned in a previous comment of yours), due to potential infohazards of posts on the EA Forum, as well as meta EA mentioned above. You have therefore suggested that impact markets should not deal with these causes, either early on such as during our contest or presumably indefinitely.
Let me use at least one example set of particular submissions that may fall under these topics, and let me know what you think of them.
I was thinking it would be quite cool if both Yudkowsky and Christiano respectively submitted certificates for their posts, ‘List of Lethalities’ and ‘Where I agree and disagree with Eliezer’. These are valuable posts in my opinion and they would help grow an impact marketplace.
My model of you would say either that:
1) funding those particular posts is net bad, or
2) funding those two posts in particular may be net good, but it sets a precedent that will cause there to be further counterfactual AI safety posts on EA Forum due to retroactive funding, which is net bad, or
3) posts on the EA Forum/LW/Alignment Forum being further incentivized would be net good (minus stuff such as infohazards, etc), but a more mature impact market at scale risks funding the next OpenAI or other such capabilities project, therefore it’s not worth retroactively funding forum posts if it risks causing that.
I am tentatively guessing your view is something at least subtly different from those rough disjunctions, though not too different.
Looking at our current submissions empirically, my sense is that the potentially riskiest certificate we have received is ‘The future of nuclear war’ by Alexei Turchin. The speculation in it could potentially provide new ideas to bad actors. I don’t know, I haven’t read/thought about this one in detail yet. For instance, core degasation could be a new x-risk but it also seems highly unlikely. This certificate could also be the most valuable. My model of you says this certificate is net-negative. I would agree that it may be an example of the sort of situation where some people believe a project is a positive externality and some believe it’s a negative externality, but the mismatch distribution means it’s valuated positively by a marketplace that can observe the presence of information but not its absence. Or maybe the market thinks riskier stuff may win the confidence game. ‘Variance is sexy’. This is a very provisional thought and not anything I would clearly endorse; I respect Alexei’s work quite highly!
After your commentary saying it would be good to ban these topics, I was considering conceding that condition because it doesn’t seem too problematic to do so for the contest, and by and large I still think that, though again I would also specifically quite like to see those two AI posts submitted if the authors want that.
I’m curious to know your evaluation of the following possible courses of action, particularly by what percentage your concern is reduced vs other issues:
impact markets are isolated from x-risk topics for all time using magic, they are not isolated from funding meta EA which could downstream affect x-risk
impact markets are isolated from x-risk topics and from funding meta EA for all time using magic, they only fund object-level stuff such as global health and development
we don’t involve x-risk topics in our marketplace for the rest of the year
we don’t involve x-risk topics until there is a clear counterbalancing force to mismatch distribution in the mechanism design, in a way that can be mathematically modelled, which may be necessary if not sufficient for proving the mechanism design works
or you or agents you designate are satisfied that a set of informal processes, norms, curation processes, etc. are achieving this for a centralized marketplace
though I would predict this does not address your crux that a centralized impact market may inspire / devolve into a simpler set of equilibria of retro funding that doesn’t use e.g. Attributed Impact, probably in conjunction with decentralization
I can start a comment thread for discussing this crux separately
we do one of the above but allow the two AI posts as exceptions
That list is just a rough mapping of potential actions. I have probably not characterized sufficiently well your position to offer a full menu of actions you may like to see taken regarding this issue.
tl;dr is that I’m basically curious 1) how much you think the risk is dominated by mismatch distribution applying specifically to x-risk vs say global poverty, 2) on which timeframes it is most important to shape the cause scope of the market in light of that (now? at full scale? both?), 3) whether banning x-risk topics from early impact markets (in ~2022) is a significant risk reducer by your lights.
(Meta note: I will drop in more links and quotes some time after publishing this.)
I think that most interventions that have a substantial chance to prevent an existential catastrophe also have a substantial chance to cause an existential catastrophe, such that it’s very hard to judge whether they are net-positive or net-negative (due to complex cluelessness dynamics that are caused by many known and unknown crucial considerations).
My best guess is that those particular two posts are net-positive (I haven’t read them entirely / at all). Of course, this does not imply that it’s net-positive to use these posts in a way that leads to the creation of an impact market.
In (3) you wrote “posts on the EA Forum/LW/Alignment Forum […] (minus stuff such as infohazards, etc)”. I think this description essentially assumes the problem away. Posts are merely information in a written form, so if you exclude all the posts that contain harmful information (i.e. info hazards), the remaining posts are by definition not net-negative. The hard part is to tell which posts are net-negative. (Or more generally, which interventions/projects are net-negative.)
The distribution mismatch problem is not caused by different people judging the EV differently. It would be relevant even if everyone in the world was in the same epistemic state. The problem is that if a project ends up being extremely harmful, its certificates end up being worth $0, same as if it ended up being neutral. Therefore, when market participants who follow their local financial incentives evaluate a project, they treat potential outcomes that are extremely harmful as if they were neutral. I’m happy to discuss this point further if you don’t agree with it. It’s the core argument in the OP, so I want to first reach an agreement about it before discussing possible courses of action.