Example: LLM is built with goal of “pass the Turing test”
Turing test is defined as “a survey of randomly selected members of the population shows that the outputted text resembles the text provided by a human”
This allows for the LLM to optimise for its goal by
(a) changing the nature of the outputted text
(b) change perceptions so that survey respondents give more favourable answers
(c) change the way that humans speak so that it’s easier make LLM output look similar to text provided by humans
It could be possible to achieve goals (b) or (c) if the LLM is offering an API which is plugged into lots of applications and is used by billions of users, because the LLM then has a way of interacting with and influencing lots of users
This idea was inspired by Stuart Russell’s observation that social media algorithms which are designed to maximise click-through achieve this by changing the preferences of users—i.e. something like this is already happening
I’m not arguing that this is comprehensive, or the worst way this could happen, just giving one example.
Great question. Here’s one possible answer:
Example: LLM is built with goal of “pass the Turing test”
Turing test is defined as “a survey of randomly selected members of the population shows that the outputted text resembles the text provided by a human”
This allows for the LLM to optimise for its goal by
(a) changing the nature of the outputted text
(b) change perceptions so that survey respondents give more favourable answers
(c) change the way that humans speak so that it’s easier make LLM output look similar to text provided by humans
It could be possible to achieve goals (b) or (c) if the LLM is offering an API which is plugged into lots of applications and is used by billions of users, because the LLM then has a way of interacting with and influencing lots of users
This idea was inspired by Stuart Russell’s observation that social media algorithms which are designed to maximise click-through achieve this by changing the preferences of users—i.e. something like this is already happening
I’m not arguing that this is comprehensive, or the worst way this could happen, just giving one example.