I think Leif Wenar’s “Open Letter to Young EAs” has significant flaws, but also has a lot going for it, and I would seriously recommend people who want to think about the ideal shape of EA should read it.
I went through the letter making annotations about the bits I thought were good or bad. If you want to see my annotated version, you can do that here. If you want to be able to comment, let me know and I’ll quite likely be happy to grant you permission (but didn’t want to set it to “anyone with the link can comment” for fear of it getting overwhelmed).
As with ~all criticisms of EA, this open letter doesn’t have any concrete description of what would be better than EA. Like just once, I would like to see a criticism say, “You shouldn’t donate to GiveWell top charities, instead you should donate to X, and here is my cost-effectiveness analysis.”
The only proposal I saw was (paraphrased) “EA should be about getting teenagers excited to be effectively altruistic.” Ok, the movement-building arm of EA already does that. What is your proposal for what those teenagers should then actually do?
I mean it kind of has the proposal that they each need to work that out for themselves. (I think this is mistaken, and not the place I found the letter valuable.)
Most possible goals for AI systems are concerned with process as well as outcomes.
People talking about possible AI goals sometimes seem to assume something like “most goals are basically about outcomes, not how you get there”. I’m not entirely sure where this idea comes from, and I think it’s wrong. The space of goals which are allowed to be concerned with process is much higher-dimensional than the space of goals which are just about outcomes, so I’d expect that on most reasonable sense of “most” process can have a look-in.
What’s the interaction with instrumental convergence? (I’m asking because vibe-wise it seems like instrumental convergence is associated with an assumption that goals won’t be concerned with process.)
Process-concerned goals could undermine instrumental convergence (since some process-concerned goals could be fundamentally opposed to some of the things that would otherwise get converged-to), but many process-concerned goals won’t
Since instrumental convergence is basically about power-seeking, there’s an evolutionary argument that you should expect the systems which end up with most power to have the power-seeking behaviours
I actually think there are a couple of ways for this argument to fail:
If at some point you get a singleton, there’s now no evolutionary pressure on its goals (beyond some minimum required to stay a singleton)
A social environment can punish power-seeking, so that power-seeking behaviour is not the most effective way to arrive at power
(There are some complications to this I won’t get into here)
But even if it doesn’t fail, it pushes towards things which have Omuhundro’s basic AI drives (and so pushes away from process-concerned goals which could preclude those), but it doesn’t push all the way to purely outcome-concerned goals
In general I strongly expect humans to try to instil goals that are concerned with process as well as outcomes. Even if that goes wrong, I mostly expect them to end up something which has incorrect preferences about process, not something that doesn’t care about process.
How could you get to purely outcome-concerned goals? I basically think this should be expected just if someone makes a deliberate choice to aim for that (though that might be possible via self-modification; the set of goals that would choose to self-modify to be purely outcome-concerned may be significantly bigger than the set of purely outcome-concerned goals). Overall I think purely outcome-concerned goals (or almost purely outcome-concerned goals) are a concern, and worth further consideration, but I really don’t think they should be treated as a default.
I think Leif Wenar’s “Open Letter to Young EAs” has significant flaws, but also has a lot going for it, and I would seriously recommend people who want to think about the ideal shape of EA should read it.
I went through the letter making annotations about the bits I thought were good or bad. If you want to see my annotated version, you can do that here. If you want to be able to comment, let me know and I’ll quite likely be happy to grant you permission (but didn’t want to set it to “anyone with the link can comment” for fear of it getting overwhelmed).
As with ~all criticisms of EA, this open letter doesn’t have any concrete description of what would be better than EA. Like just once, I would like to see a criticism say, “You shouldn’t donate to GiveWell top charities, instead you should donate to X, and here is my cost-effectiveness analysis.”
The only proposal I saw was (paraphrased) “EA should be about getting teenagers excited to be effectively altruistic.” Ok, the movement-building arm of EA already does that. What is your proposal for what those teenagers should then actually do?
I mean it kind of has the proposal that they each need to work that out for themselves. (I think this is mistaken, and not the place I found the letter valuable.)
Most possible goals for AI systems are concerned with process as well as outcomes.
People talking about possible AI goals sometimes seem to assume something like “most goals are basically about outcomes, not how you get there”. I’m not entirely sure where this idea comes from, and I think it’s wrong. The space of goals which are allowed to be concerned with process is much higher-dimensional than the space of goals which are just about outcomes, so I’d expect that on most reasonable sense of “most” process can have a look-in.
What’s the interaction with instrumental convergence? (I’m asking because vibe-wise it seems like instrumental convergence is associated with an assumption that goals won’t be concerned with process.)
Process-concerned goals could undermine instrumental convergence (since some process-concerned goals could be fundamentally opposed to some of the things that would otherwise get converged-to), but many process-concerned goals won’t
Since instrumental convergence is basically about power-seeking, there’s an evolutionary argument that you should expect the systems which end up with most power to have the power-seeking behaviours
I actually think there are a couple of ways for this argument to fail:
If at some point you get a singleton, there’s now no evolutionary pressure on its goals (beyond some minimum required to stay a singleton)
A social environment can punish power-seeking, so that power-seeking behaviour is not the most effective way to arrive at power
(There are some complications to this I won’t get into here)
But even if it doesn’t fail, it pushes towards things which have Omuhundro’s basic AI drives (and so pushes away from process-concerned goals which could preclude those), but it doesn’t push all the way to purely outcome-concerned goals
In general I strongly expect humans to try to instil goals that are concerned with process as well as outcomes. Even if that goes wrong, I mostly expect them to end up something which has incorrect preferences about process, not something that doesn’t care about process.
How could you get to purely outcome-concerned goals? I basically think this should be expected just if someone makes a deliberate choice to aim for that (though that might be possible via self-modification; the set of goals that would choose to self-modify to be purely outcome-concerned may be significantly bigger than the set of purely outcome-concerned goals). Overall I think purely outcome-concerned goals (or almost purely outcome-concerned goals) are a concern, and worth further consideration, but I really don’t think they should be treated as a default.
Just a prompt to say that if you’ve been kicking around an idea of possible relevance to the essay competition on the automation of wisdom and philosophy, now might be the moment to consider writing it up—entries are due in three weeks.