yeah whatever you want to call the academic and cultural memeplexes that are obsessed with demographic identities, I do think “when literally everyone dies in our threatmodels, the nonwhitemales die too” is kinda all you need to say. Based on my simulation of those memeplexes (which I sorta bought into when I first started reading the sequences, so I have a minor inside view), we can probably expect the retort to be “are you introspective enough to be sure you’re not doing motivated reasoning about those threatmodels, rather than these other threatmodels over here?” at best, but my more honest money is on something more condescending than that. And to be clear, there’s a difference between constant vigilance about “maybe I rolled a nat 1 on introspection this morning” (good) and falsely implying that someone accusing you of not being introspective enough is your peer when they’re not (bad).
I think I’d like to highlight an outsider (i.e. someone who isn’t working on extinction-level threatmodels) who seems to be doing actually useful labor of reframing for folks of extinction-level threatmodels: I just read the prologue to David Chapman’s new book, and I’m optimistic about it being good. I made a claim sorta related to his prologue in a lightning talk at the MA4 unconference just the other day: I claimed that “AI ethics” and “AI alignment” can unite on pessimism, and furthermore that there’s a massive umbrella sense of “everyone who redteams software” that I think could be a productive way of figuring out who to include and exclude. I.e. I do think that research communities should have some fairly basic low-bandwidth channels open between everyone who redteams software and high-bandwidth channels open between everyone who shares their particular threatmodels, and only really excluding optimists who think everything’s fine.
I have an off topic thing to continue on saying that I’ll put in a reply to this comment.
Threatmodel homogeneity is a major ecosystem risk in alignment in particular. There’s this broad sense of “eliezer and holden are interested in extinction-level events” leading to “it’s not cool to be interested in subextinction level events” that leads some people to unclear reasoning, which at worst becomes “guess the password to secure the funding”. The whole “if eliezer is right the stakes are so high but I have nagging questions or can’t wrap my head around exactly what’s going on with the forecasts” thing leads to 1. an impossible to be happy with prioritization/tradeoff problem between dumping time into hard math/cs and dumping time into forecasting and theories of change (which would be a challenge for alignment researchers regardless), 2. an attack surface with a lot of opening to vultures or password guessers (many of whom aren’t really distinguishable from people earnestly doing their best, so I may regret framing them as “attackers”) , but most of all 3. people getting nowhere with research goals ostensibly directed at extinction-level threatmodels because they don’t deep down in their heart of hearts understand those threatmodels when they could instead be making actual progress on other threatmodels that they do understand.
yeah whatever you want to call the academic and cultural memeplexes that are obsessed with demographic identities, I do think “when literally everyone dies in our threatmodels, the nonwhitemales die too” is kinda all you need to say. Based on my simulation of those memeplexes (which I sorta bought into when I first started reading the sequences, so I have a minor inside view), we can probably expect the retort to be “are you introspective enough to be sure you’re not doing motivated reasoning about those threatmodels, rather than these other threatmodels over here?” at best, but my more honest money is on something more condescending than that. And to be clear, there’s a difference between constant vigilance about “maybe I rolled a nat 1 on introspection this morning” (good) and falsely implying that someone accusing you of not being introspective enough is your peer when they’re not (bad).
I think I’d like to highlight an outsider (i.e. someone who isn’t working on extinction-level threatmodels) who seems to be doing actually useful labor of reframing for folks of extinction-level threatmodels: I just read the prologue to David Chapman’s new book, and I’m optimistic about it being good. I made a claim sorta related to his prologue in a lightning talk at the MA4 unconference just the other day: I claimed that “AI ethics” and “AI alignment” can unite on pessimism, and furthermore that there’s a massive umbrella sense of “everyone who redteams software” that I think could be a productive way of figuring out who to include and exclude. I.e. I do think that research communities should have some fairly basic low-bandwidth channels open between everyone who redteams software and high-bandwidth channels open between everyone who shares their particular threatmodels, and only really excluding optimists who think everything’s fine.
I have an off topic thing to continue on saying that I’ll put in a reply to this comment.
Threatmodel homogeneity is a major ecosystem risk in alignment in particular. There’s this broad sense of “eliezer and holden are interested in extinction-level events” leading to “it’s not cool to be interested in subextinction level events” that leads some people to unclear reasoning, which at worst becomes “guess the password to secure the funding”. The whole “if eliezer is right the stakes are so high but I have nagging questions or can’t wrap my head around exactly what’s going on with the forecasts” thing leads to 1. an impossible to be happy with prioritization/tradeoff problem between dumping time into hard math/cs and dumping time into forecasting and theories of change (which would be a challenge for alignment researchers regardless), 2. an attack surface with a lot of opening to vultures or password guessers (many of whom aren’t really distinguishable from people earnestly doing their best, so I may regret framing them as “attackers”) , but most of all 3. people getting nowhere with research goals ostensibly directed at extinction-level threatmodels because they don’t deep down in their heart of hearts understand those threatmodels when they could instead be making actual progress on other threatmodels that they do understand.