First, you increase the pressure on the “justification generator” to mask various black boxes by generating arguments supporting their conclusions.
.
Third, there’s a risk that people get convinced based on bad arguments—because their “justification generator” generated a weak legible explanation, you managed to refute it, and they updated. The problem comes if this involves discarding the output of the neural network, which was much smarter than the reasoning they accepted.
On the other hand, if someone in EA is making decisions about high-stakes interventions while their judgement is being influenced by a subconscious optimization for things like status and power, I think it’s probably beneficial to subject their “justification generator” to a lot of pressure (in the hope that that will cause them, and onlookers, to end up making the best decisions from an EA perspective).
.
On the other hand, if someone in EA is making decisions about high-stakes interventions while their judgement is being influenced by a subconscious optimization for things like status and power, I think it’s probably beneficial to subject their “justification generator” to a lot of pressure (in the hope that that will cause them, and onlookers, to end up making the best decisions from an EA perspective).