Scattered and rambly note I jotted down in a slack in February 2023, and didn’t really follow up on
thinking of jotting down some notes about “what AI pessimism funding ought to be”, that takes into account forecasting and values disagreements.The premises:
threatmodels drive research. This is true on lesswrong when everyone knows it and agonizes over “am I splitting my time between hard math/cs and forecasting or thinking about theories of change correctly?” and it’s true in academia when people halfass a “practical applications” paragraph in their paper.
people who don’t really buy into the threatmodel they’re ostensibly working on do research poorly
social pressures like funding and status make it hard to be honest about what threatmodels motivate you.
I don’t overrate democracy or fairness as terminal values, I’m bullish on a lot of deference and technocracy (whatever that means), but I may be feeling some virtue-ethicsy attraction toward “people feeling basically represented by governance bodies that represent them”, that I think is tactically useful for researchers because the above point about research outputs being more useful when the motivation is clearheaded and honest.
fact-value orthogonality, additionally the binary is good and we don’t need a secret third thing if we confront uncertainty well enough
The problems I want to solve:
thinking about inclusion and exclusion (into “colleagueness” or stuff that funder’s care about like “who do I fund”) is fogged by tribal conflict where people pathologize eachother (salient in “AI ethics vs. AI alignment”. twitter is the mindkiller but occasionally I’ll visit, and I always feel like it makes me think less clearly)
no actual set of standards for disagreement to take place in, instead we have wishy washy stuff like “the purple hats undervalue standpoint epistemology, which is the only possible reason they could take extinction-level events seriously” or “the yellow hats don’t unconsciously signal that they’ve read the sequences in their vocabulary, so I don’t trust them”. i.e. we want to know if disagreements are of belief (anticipation constraints) or values (what matters), and we might want to coalition with people who don’t think super clearly about the distinction.
standard “loud people (or people who are really good at grantwriting) are more salient than polling data” problems
standard forecasting error bar problems
funding streams misaligned with on the ground viewpoint diversity
I’m foggy headed about whether I’m talking about “how openphil should allocate AI funds” vs. “how DARPA should allocate AI funds” vs. “how an arbitrary well meaning ‘software might be bad’ foundation should allocate AI funds”, sorry.The desiderata for the solution
“marketplace of ideas” applied to threatmodels has a preference aggregation (what people care about) part and forecasting part (what people think is gonna go down)
preference aggregation part: it might be good for polling data about the population’s valuation of future lives to drive the proportion of funding that goes to extinction-level threatmodels.
forecasting part: what are the relative merits of different threat models?
resolve the deep epistemic or evidentiary inequality between threatmodels where the ship sailed in 2015, where we might think crunch time is right now or next year, and what we won’t know until literal 2100.
mediating between likelihood (which is determined by forecasters) and importance (which is determined by polling data) for algorithmic funding decisions. No standard EV theory, because values aren’t well typed (not useful to “add” the probability of everything Alice loves being wiped out times its disvalue to the probability of everything Bob loves being reduced by 60% times its disvalue)
unite on pessimism. go after untapped synergies within the “redteaming software” community, be able to know when you have actual enemies besides just in the sense that they’re competing against you for finite funding. Think clearly about when an intervention designed for Alice’s threatmodel also buys assurances for Bob’s threatmodel, when it doesn’t, when Alice’s or Bob’s research outputs work against Sally’s threatmodel. (an interesting piece of tribal knowledge that I don’t think lesswrong has a name for is if you’re uncertain about whether you’ll end up in world A or world B, you make sure your plan for improving world A doesn’t screw you over in the event that you end up in world B. there’s a not very well understood generalization of this to social choice, uncertainty over your peers’ uncertainty over world states, uncertainty over disagreements about what it means to improve a state, etc.)
the only people that should really be excluded are optimists who think everything’s fine, even tho people who’s views aren’t as popular as they think they are will feel excluded regardless.
an “evaluations engineering stack” to iterate on who’s research outputs are actually making progress on their ostensible threatmodels, over time.
This institution couldn’t possibly be implemented in real life, but I think if we got like one desiderata at least a little institutionalized it’d be a big W.I’m predicting that Eli Lifland wants to be a part of this conversation. Maybe Ozzie Gooen’s podcast is the appropriate venue? I feel stronger about my ability to explore verbally with someone than my ability to just drag the post into existence myself (I managed to paint a pretty convincing picture of the institution i’m dreaming about to Viv, my housemate who some of you know, verbally yesterday.). Critch obviously has a lot to say, too, but he may not care about the “write down fantasy world desiderata” approach to communicating or progressing (idk I’ve never actually talked to him)related notes i jotted down on EA Forum yesterdayIs it acceptable to platform “not an AI Gov guy, applied type theorist who tried a cryptography-interp hybrid project and realized it was secretly a governance project a few weeks ago” (who, to be fair, has been prioritizing Critch-like threatmodels every since ARCHES was published) instead of an AI Gov expert? This is also something that could frame a series of “across the aisle” dialogues, where we find someone who doesn’t get extinction-level software threatmodels at all or who has a disgust reaction at any currently alive vs. future lives tradeoff and invite them onto the pod or something? maybe that’s a stretch goal lol.
Scattered and rambly note I jotted down in a slack in February 2023, and didn’t really follow up on
thinking of jotting down some notes about “what AI pessimism funding ought to be”, that takes into account forecasting and values disagreements.The premises:
threatmodels drive research. This is true on lesswrong when everyone knows it and agonizes over “am I splitting my time between hard math/cs and forecasting or thinking about theories of change correctly?” and it’s true in academia when people halfass a “practical applications” paragraph in their paper.
people who don’t really buy into the threatmodel they’re ostensibly working on do research poorly
social pressures like funding and status make it hard to be honest about what threatmodels motivate you.
I don’t overrate democracy or fairness as terminal values, I’m bullish on a lot of deference and technocracy (whatever that means), but I may be feeling some virtue-ethicsy attraction toward “people feeling basically represented by governance bodies that represent them”, that I think is tactically useful for researchers because the above point about research outputs being more useful when the motivation is clearheaded and honest.
fact-value orthogonality, additionally the binary is good and we don’t need a secret third thing if we confront uncertainty well enough
The problems I want to solve:
thinking about inclusion and exclusion (into “colleagueness” or stuff that funder’s care about like “who do I fund”) is fogged by tribal conflict where people pathologize eachother (salient in “AI ethics vs. AI alignment”. twitter is the mindkiller but occasionally I’ll visit, and I always feel like it makes me think less clearly)
no actual set of standards for disagreement to take place in, instead we have wishy washy stuff like “the purple hats undervalue standpoint epistemology, which is the only possible reason they could take extinction-level events seriously” or “the yellow hats don’t unconsciously signal that they’ve read the sequences in their vocabulary, so I don’t trust them”. i.e. we want to know if disagreements are of belief (anticipation constraints) or values (what matters), and we might want to coalition with people who don’t think super clearly about the distinction.
standard “loud people (or people who are really good at grantwriting) are more salient than polling data” problems
standard forecasting error bar problems
funding streams misaligned with on the ground viewpoint diversity
I’m foggy headed about whether I’m talking about “how openphil should allocate AI funds” vs. “how DARPA should allocate AI funds” vs. “how an arbitrary well meaning ‘software might be bad’ foundation should allocate AI funds”, sorry.The desiderata for the solution
“marketplace of ideas” applied to threatmodels has a preference aggregation (what people care about) part and forecasting part (what people think is gonna go down)
preference aggregation part: it might be good for polling data about the population’s valuation of future lives to drive the proportion of funding that goes to extinction-level threatmodels.
forecasting part: what are the relative merits of different threat models?
resolve the deep epistemic or evidentiary inequality between threatmodels where the ship sailed in 2015, where we might think crunch time is right now or next year, and what we won’t know until literal 2100.
mediating between likelihood (which is determined by forecasters) and importance (which is determined by polling data) for algorithmic funding decisions. No standard EV theory, because values aren’t well typed (not useful to “add” the probability of everything Alice loves being wiped out times its disvalue to the probability of everything Bob loves being reduced by 60% times its disvalue)
some related ideas to Nuno’s reply to Dustin on decentralization, voting theory / mechdzn, etc. to a minor degree. https://forum.effectivealtruism.org/posts/zuqpqqFoue5LyutTv/the-ea-community-does-not-own-its-donors-money?commentId=SuctaksGSaH26xMy2
unite on pessimism. go after untapped synergies within the “redteaming software” community, be able to know when you have actual enemies besides just in the sense that they’re competing against you for finite funding. Think clearly about when an intervention designed for Alice’s threatmodel also buys assurances for Bob’s threatmodel, when it doesn’t, when Alice’s or Bob’s research outputs work against Sally’s threatmodel. (an interesting piece of tribal knowledge that I don’t think lesswrong has a name for is if you’re uncertain about whether you’ll end up in world A or world B, you make sure your plan for improving world A doesn’t screw you over in the event that you end up in world B. there’s a not very well understood generalization of this to social choice, uncertainty over your peers’ uncertainty over world states, uncertainty over disagreements about what it means to improve a state, etc.)
the only people that should really be excluded are optimists who think everything’s fine, even tho people who’s views aren’t as popular as they think they are will feel excluded regardless.
an “evaluations engineering stack” to iterate on who’s research outputs are actually making progress on their ostensible threatmodels, over time.
This institution couldn’t possibly be implemented in real life, but I think if we got like one desiderata at least a little institutionalized it’d be a big W.I’m predicting that Eli Lifland wants to be a part of this conversation. Maybe Ozzie Gooen’s podcast is the appropriate venue? I feel stronger about my ability to explore verbally with someone than my ability to just drag the post into existence myself (I managed to paint a pretty convincing picture of the institution i’m dreaming about to Viv, my housemate who some of you know, verbally yesterday.). Critch obviously has a lot to say, too, but he may not care about the “write down fantasy world desiderata” approach to communicating or progressing (idk I’ve never actually talked to him)related notes i jotted down on EA Forum yesterdayIs it acceptable to platform “not an AI Gov guy, applied type theorist who tried a cryptography-interp hybrid project and realized it was secretly a governance project a few weeks ago” (who, to be fair, has been prioritizing Critch-like threatmodels every since ARCHES was published) instead of an AI Gov expert? This is also something that could frame a series of “across the aisle” dialogues, where we find someone who doesn’t get extinction-level software threatmodels at all or who has a disgust reaction at any currently alive vs. future lives tradeoff and invite them onto the pod or something? maybe that’s a stretch goal lol.