I think that most interventions that have a substantial chance to prevent an existential catastrophe also have a substantial chance to cause an existential catastrophe, such that it’s very hard to judge whether they are net-positive or net-negative (due to complex cluelessness dynamics that are caused by many known and unknown crucial considerations).
My model of you would say either that:
funding those particular posts is net bad, or
funding those two posts in particular may be net good, but it sets a precedent that will cause there to be further counterfactual AI safety posts on EA Forum due to retroactive funding, which is net bad, or
posts on the EA Forum/LW/Alignment Forum being further incentivized would be net good (minus stuff such as infohazards, etc), but a more mature impact market at scale risks funding the next OpenAI or other such capabilities project, therefore it’s not worth retroactively funding forum posts if it risks causing that.
My best guess is that those particular two posts are net-positive (I haven’t read them entirely / at all). Of course, this does not imply that it’s net-positive to use these posts in a way that leads to the creation of an impact market.
In (3) you wrote “posts on the EA Forum/LW/Alignment Forum […] (minus stuff such as infohazards, etc)”. I think this description essentially assumes the problem away. Posts are merely information in a written form, so if you exclude all the posts that contain harmful information (i.e. info hazards), the remaining posts are by definition not net-negative. The hard part is to tell which posts are net-negative. (Or more generally, which interventions/projects are net-negative.)
My model of you says this certificate is net-negative. I would agree that it may be an example of the sort of situation where some people believe a project is a positive externality and some believe it’s a negative externality, but the mismatch distribution means it’s valuated positively by a marketplace that can observe the presence of information but not its absence. Or maybe the market thinks riskier stuff may win the confidence game. ‘Variance is sexy’. This is a very provisional thought and not anything I would clearly endorse;
The distribution mismatch problem is not caused by different people judging the EV differently. It would be relevant even if everyone in the world was in the same epistemic state. The problem is that if a project ends up being extremely harmful, its certificates end up being worth $0, same as if it ended up being neutral. Therefore, when market participants who follow their local financial incentives evaluate a project, they treat potential outcomes that are extremely harmful as if they were neutral. I’m happy to discuss this point further if you don’t agree with it. It’s the core argument in the OP, so I want to first reach an agreement about it before discussing possible courses of action.
I think that most interventions that have a substantial chance to prevent an existential catastrophe also have a substantial chance to cause an existential catastrophe, such that it’s very hard to judge whether they are net-positive or net-negative (due to complex cluelessness dynamics that are caused by many known and unknown crucial considerations).
My best guess is that those particular two posts are net-positive (I haven’t read them entirely / at all). Of course, this does not imply that it’s net-positive to use these posts in a way that leads to the creation of an impact market.
In (3) you wrote “posts on the EA Forum/LW/Alignment Forum […] (minus stuff such as infohazards, etc)”. I think this description essentially assumes the problem away. Posts are merely information in a written form, so if you exclude all the posts that contain harmful information (i.e. info hazards), the remaining posts are by definition not net-negative. The hard part is to tell which posts are net-negative. (Or more generally, which interventions/projects are net-negative.)
The distribution mismatch problem is not caused by different people judging the EV differently. It would be relevant even if everyone in the world was in the same epistemic state. The problem is that if a project ends up being extremely harmful, its certificates end up being worth $0, same as if it ended up being neutral. Therefore, when market participants who follow their local financial incentives evaluate a project, they treat potential outcomes that are extremely harmful as if they were neutral. I’m happy to discuss this point further if you don’t agree with it. It’s the core argument in the OP, so I want to first reach an agreement about it before discussing possible courses of action.