Okay, since youâre giving me the last word, Iâll take it.
There are some ambiguities in terms of how to interpret the concept of the Turing test. People have disagreed about what the rules should be. I will say that in Turingâs original paper, he did introduce the concept of testing the computer via sub-games:
Q: Do you play chess?
A: Yes.
Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play?
A: (After a pause of 15 seconds) R-R8 mate.
Including other games or puzzles, like the ARC-AGI 2 puzzles, seems in line with this.
My understanding of the Turing test has always been that there should be basically no restrictions at all â no time limit, no restrictions on what can be asked, no word limit, no question limit.
In principle, I donât see why you wouldnât allow sending of images, but if you only allowed text-based questions, I suppose even then a judge could tediously write out the ARC-AGI 2 tasks, since they consist of coloured squares in a 30 x 30 grid, and ask the interlocutor to re-create them in Paint.
To be clear, I donât think ARC-AGI 2 is nearly the only thing you could use to make an LLM fail the Turing test, itâs just an easy example.
In Daniel Dennettâs 1985 essay âCan Machines Think?â on the Turing test (included in the anthology Brainchildren), Dennett says that âthe unrestricted testâ is âthe only test that is of any theoretical interest at allâ. He emphasizes that judges should be able to ask anything:
People typically ignore the prospect of having the judge ask off-the-wall questions in the Turing test, and hence they underestimate the competence a computer would have to have to pass the test. But remember, the rules of the imitation game as Turing presented it permit the judge to ask any question that could be asked of a human beingâno holds barred.
He also warns:
Cheapened versions of the Turing test are everywhere in the air. Turingâs test is not just effective, it is entirely naturalâthis is, after all, the way we assay the intelligence of each other every day. And since incautious use of such judgments and such tests is the norm, we are in some considerable danger of extrapolating too easily, and judging too generously, about the understanding of the systems we are using.
Itâs true that before we had LLMs we had lower expectations of what computers can do and asked easier questions. But it doesnât seem right to me to say that as computers get better at natural language, we shouldnât be able to ask harder questions.
I do think the definition and conception of the Turing test is important. If people say that LLMs have passed the Turing test and thatâs not true, it gives a false impression of LLMsâ capabilities, just like when people falsely claim LLMs are AGI.
You could qualify this by saying LLMs can pass a restricted, weak version of the Turing test â but not an unrestricted, adversarial Turing test â which was also true of older computer systems before deep learning. This would sidestep the question of defining the âtrueâ Turing test and still give accurate information.
Okay, since youâre giving me the last word, Iâll take it.
There are some ambiguities in terms of how to interpret the concept of the Turing test. People have disagreed about what the rules should be. I will say that in Turingâs original paper, he did introduce the concept of testing the computer via sub-games:
Including other games or puzzles, like the ARC-AGI 2 puzzles, seems in line with this.
My understanding of the Turing test has always been that there should be basically no restrictions at all â no time limit, no restrictions on what can be asked, no word limit, no question limit.
In principle, I donât see why you wouldnât allow sending of images, but if you only allowed text-based questions, I suppose even then a judge could tediously write out the ARC-AGI 2 tasks, since they consist of coloured squares in a 30 x 30 grid, and ask the interlocutor to re-create them in Paint.
To be clear, I donât think ARC-AGI 2 is nearly the only thing you could use to make an LLM fail the Turing test, itâs just an easy example.
In Daniel Dennettâs 1985 essay âCan Machines Think?â on the Turing test (included in the anthology Brainchildren), Dennett says that âthe unrestricted testâ is âthe only test that is of any theoretical interest at allâ. He emphasizes that judges should be able to ask anything:
He also warns:
Itâs true that before we had LLMs we had lower expectations of what computers can do and asked easier questions. But it doesnât seem right to me to say that as computers get better at natural language, we shouldnât be able to ask harder questions.
I do think the definition and conception of the Turing test is important. If people say that LLMs have passed the Turing test and thatâs not true, it gives a false impression of LLMsâ capabilities, just like when people falsely claim LLMs are AGI.
You could qualify this by saying LLMs can pass a restricted, weak version of the Turing test â but not an unrestricted, adversarial Turing test â which was also true of older computer systems before deep learning. This would sidestep the question of defining the âtrueâ Turing test and still give accurate information.