On whether advancements in humanity’s understanding of AI alignment will be fast enough compared to advancements in its understanding of how to create AGI, many factors stack in favor of AGI: more organizations are working on it, there’s a direct financial incentive to do so, people tend to be more excited about the prospect of AGI than cautious about misalignment, et cetera. But one factor that gives me a bit of hope (besides the idea that alignment might turn out to be easier to figure out than AGI) is that alignment researchers tend to be cooperative while AGI researchers tend to be competitive. Alignment researchers are motivated to save the world, not make a buck, so if their discoveries are helpful for alignment, they’ll go public, and if they’re helpful for alignment but also maybe capabilities, they’ll go public only to other alignment researchers. Meanwhile, each company trying to create AGI only has their own cutting-edge research to work with – they tend to keep to themselves, while we’re more united.
I’m curious about the ways that the alignment research community could augment this dynamic. One way could be restricting access to helpful information to only other alignment researchers, namely 1) discoveries that might be helpful for alignment but also AGI and 2) knowledge related to AI-assisted research and development. I get the impression this is already a norm, but the community might benefit from more formal and overt methods for doing this. For example, tammy created a “locked post” feature on her website that gives her control over who can decrypt certain posts of hers that relate to capabilities. Along the same vein, maybe the AI Alignment Forum could add a feature that works similar to Twitter Circle, where access to posts could be restricted to trusted members of a group:
Twitter Circle is a way to send Tweets to select people, and share your thoughts with a smaller crowd. You choose who’s in your Twitter Circle, and only the individuals you’ve added can reply to and interact with the Tweets you share in the circle.
Of course, then the forum developer team would have to up their security since nation state actors (for example) would be incentivized to hack the forum to learn all the latest AGI-related discoveries those alignment people are trying to keep to themselves. Another worry is that moles will network their way deep into the alignment community to gain access to privileged information, then pass it on to some company or nation (there might even already be moles today, without formal methods of restricting information). I’m sure there’s pre-existing literature on how to mitigate these risks.
Those with more knowledge about AI strategy, feel free to pick apart these thoughts; I only felt comfortable sharing them in the shortform because I feel like there’s a lot about this subject that I’m missing. Perhaps this has been written about before.
giving the alignment research community an edge
epistemic status: shower thought
On whether advancements in humanity’s understanding of AI alignment will be fast enough compared to advancements in its understanding of how to create AGI, many factors stack in favor of AGI: more organizations are working on it, there’s a direct financial incentive to do so, people tend to be more excited about the prospect of AGI than cautious about misalignment, et cetera. But one factor that gives me a bit of hope (besides the idea that alignment might turn out to be easier to figure out than AGI) is that alignment researchers tend to be cooperative while AGI researchers tend to be competitive. Alignment researchers are motivated to save the world, not make a buck, so if their discoveries are helpful for alignment, they’ll go public, and if they’re helpful for alignment but also maybe capabilities, they’ll go public only to other alignment researchers. Meanwhile, each company trying to create AGI only has their own cutting-edge research to work with – they tend to keep to themselves, while we’re more united.
I’m curious about the ways that the alignment research community could augment this dynamic. One way could be restricting access to helpful information to only other alignment researchers, namely 1) discoveries that might be helpful for alignment but also AGI and 2) knowledge related to AI-assisted research and development. I get the impression this is already a norm, but the community might benefit from more formal and overt methods for doing this. For example, tammy created a “locked post” feature on her website that gives her control over who can decrypt certain posts of hers that relate to capabilities. Along the same vein, maybe the AI Alignment Forum could add a feature that works similar to Twitter Circle, where access to posts could be restricted to trusted members of a group:
Of course, then the forum developer team would have to up their security since nation state actors (for example) would be incentivized to hack the forum to learn all the latest AGI-related discoveries those alignment people are trying to keep to themselves. Another worry is that moles will network their way deep into the alignment community to gain access to privileged information, then pass it on to some company or nation (there might even already be moles today, without formal methods of restricting information). I’m sure there’s pre-existing literature on how to mitigate these risks.
Those with more knowledge about AI strategy, feel free to pick apart these thoughts; I only felt comfortable sharing them in the shortform because I feel like there’s a lot about this subject that I’m missing. Perhaps this has been written about before.