Threatmodel homogeneity is a major ecosystem risk in alignment in particular. There’s this broad sense of “eliezer and holden are interested in extinction-level events” leading to “it’s not cool to be interested in subextinction level events” that leads some people to unclear reasoning, which at worst becomes “guess the password to secure the funding”. The whole “if eliezer is right the stakes are so high but I have nagging questions or can’t wrap my head around exactly what’s going on with the forecasts” thing leads to 1. an impossible to be happy with prioritization/tradeoff problem between dumping time into hard math/cs and dumping time into forecasting and theories of change (which would be a challenge for alignment researchers regardless), 2. an attack surface with a lot of opening to vultures or password guessers (many of whom aren’t really distinguishable from people earnestly doing their best, so I may regret framing them as “attackers”) , but most of all 3. people getting nowhere with research goals ostensibly directed at extinction-level threatmodels because they don’t deep down in their heart of hearts understand those threatmodels when they could instead be making actual progress on other threatmodels that they do understand.
Threatmodel homogeneity is a major ecosystem risk in alignment in particular. There’s this broad sense of “eliezer and holden are interested in extinction-level events” leading to “it’s not cool to be interested in subextinction level events” that leads some people to unclear reasoning, which at worst becomes “guess the password to secure the funding”. The whole “if eliezer is right the stakes are so high but I have nagging questions or can’t wrap my head around exactly what’s going on with the forecasts” thing leads to 1. an impossible to be happy with prioritization/tradeoff problem between dumping time into hard math/cs and dumping time into forecasting and theories of change (which would be a challenge for alignment researchers regardless), 2. an attack surface with a lot of opening to vultures or password guessers (many of whom aren’t really distinguishable from people earnestly doing their best, so I may regret framing them as “attackers”) , but most of all 3. people getting nowhere with research goals ostensibly directed at extinction-level threatmodels because they don’t deep down in their heart of hearts understand those threatmodels when they could instead be making actual progress on other threatmodels that they do understand.