Thanks for your response. You’re right that imminent AGI from AI similar to LLMs is controversial and I should’ve spelled that out more explicitly. And I agree they wouldn’t be pure LLMs but my understanding is that all the advances people talk about like using o1 wouldn’t alter the impacts of pre-training data significantly.
My intuition is that LLMs (especially base models) work as simulators, outputting whatever seems like the most likely completion. But what seems most likely can only come from the training data. So if we include a lot of pro-animal data (and especially data from animal perspectives) then the LLM is more likely to ‘believe’ that the most likely completion is one which supports animals. E.g. base models are already much more likely to complete text mentioning murder from the perspective that murder is bad, because almost all of their pretraining data treats murder as bad. While it might seem that this is inherently dumb behavior and incompatible with AGI (much less ASI), I think humans work mostly the same way. We like the food and music we grew up with, we mostly internalize the values and factual beliefs we see most often in our society and the more niche some values or factual beliefs are the less willing we are to take it seriously. So going from e.g. 0.0001% data from animal perspectives to 0.1% would be a 1000x increase, and hopefully greatly decrease the chance that astronomical animal suffering is ignored even if the cost to stop it would be small (but non-zero).
Thanks for your response. You’re right that imminent AGI from AI similar to LLMs is controversial and I should’ve spelled that out more explicitly. And I agree they wouldn’t be pure LLMs but my understanding is that all the advances people talk about like using o1 wouldn’t alter the impacts of pre-training data significantly.
My intuition is that LLMs (especially base models) work as simulators, outputting whatever seems like the most likely completion. But what seems most likely can only come from the training data. So if we include a lot of pro-animal data (and especially data from animal perspectives) then the LLM is more likely to ‘believe’ that the most likely completion is one which supports animals. E.g. base models are already much more likely to complete text mentioning murder from the perspective that murder is bad, because almost all of their pretraining data treats murder as bad. While it might seem that this is inherently dumb behavior and incompatible with AGI (much less ASI), I think humans work mostly the same way. We like the food and music we grew up with, we mostly internalize the values and factual beliefs we see most often in our society and the more niche some values or factual beliefs are the less willing we are to take it seriously. So going from e.g. 0.0001% data from animal perspectives to 0.1% would be a 1000x increase, and hopefully greatly decrease the chance that astronomical animal suffering is ignored even if the cost to stop it would be small (but non-zero).