Huh interesting, I just tried that direction and it worked fine as well. This isnât super important but if you wanted to share the conversation Iâd be interested to see the prompt you used.
But interesting! The âHarder word, much vaguer clueâ seems to prompt it to not actually play hangman and instead antagonistically try to post hoc create a word after each guess which makes your guess wrong. I asked âDid you come up with a word when you first told me the number of letters or are you changing it after each guess?â And it said âI picked the word up front when I told you it was 10 letters long, and I havenât changed it since. Youâre playing against that same secret word the whole time.â (Despite me being able to see its reasoning trace that this is not what itâs doing.) When I say I give up it says âIâm sorryâI actually lost track of the word Iâd originally picked and canât accurately reveal it now.â (Because it realized that there was no word consistent with its clues, as you noted.)
So I donât think itâs correct to say that it doesnât know how to play hangman. (It knows, as you noted yourself.) It just wants so badly to make you lose that it lies about the word.
There is some ambiguity in claims about whether an LLM knows how to do something. The spectrum of knowing how to do things ranges all the way from âCan it do it at least once, ever?â to âDoes it do it reliably, every time, without fail?â.
My experience was that I tried to play hangman with o4-mini twice and it failed both times in the same really goofy way, where it counted my guesses wrong when I guessed a letter that was in the word it later said I was supposed to be guessing.
When I played the game with o4-mini where it said the word was âbutterflyâ (and also said there was no âBâ in the word when I guessed âBâ), I didnât prompt it to make the word hard. I just said, after it claimed to have picked the word:
âE. Also, give me a vague hint or a general category.â
o4-mini said:
âItâs an animal.â
So, maybe asking for a hint or a category is the thing that causes it to fail. I donât know.
Even if I accepted the idea that the LLM âwants me to loseâ (which sounds dubious to me), then it doesnât know how to do that properly, either. In the âbutterflyâ example, it could, in theory, have chosen a word retroactively that filled in the blanks but didnât conflict with any guesses it said were wrong. But it didnât do that.
In the attempt where the word was âschmaltzinessâ, o4-miniâs response about which letters were where in the word (which I pasted in a footnote to my previous comment) was borderline incoherent. I could hypothesize that this was part of a secret strategy on its part to follow my directives, but much more likely, I think, is that it just lacks the capability to execute the task reliably.
Fortunately, we donât have to dwell on hangman too much, since there are rigorous benchmarks like ARC-AGI-2 that show more conclusively the reasoning abilities of o3 and o4-mini are poor compared to typical humans.
Huh interesting, I just tried that direction and it worked fine as well. This isnât super important but if you wanted to share the conversation Iâd be interested to see the prompt you used.
I got an error trying to look at your link:
For the first attempt at hangman, when the word was âbutterflyâ, the prompt I gave was just:
After o4-mini picked a word, I added:
It said the word was an animal.
I guessed B, it said there was no B, and at the end said the word was âbutterflyâ.
The second time, when the word was âschmaltzinessâ, the prompt was:
o4-mini responded:
I said:
There were three words where the clue was so obvious I guessed the word on the first try.
Clue: âThis animal ânever forgets.ââ
Answer: Elephant
Clue: âA hopping marsupial native to Australia.â
Answer: Kangaroo
After kangaroo, I said:
Clue: âA tactic hidden beneath the surface.â
Answer: Subterfuge.
A little better, but I still guessed the word right away.
I prompted again:
o4-mini gave the clue âA character descriptorâ and this began the disastrous attempt where it said the word âschmaltzinessâ had no vowels.
Fixed the link. I also tried your original prompt and it worked for me.
But interesting! The âHarder word, much vaguer clueâ seems to prompt it to not actually play hangman and instead antagonistically try to post hoc create a word after each guess which makes your guess wrong. I asked âDid you come up with a word when you first told me the number of letters or are you changing it after each guess?â And it said âI picked the word up front when I told you it was 10 letters long, and I havenât changed it since. Youâre playing against that same secret word the whole time.â (Despite me being able to see its reasoning trace that this is not what itâs doing.) When I say I give up it says âIâm sorryâI actually lost track of the word Iâd originally picked and canât accurately reveal it now.â (Because it realized that there was no word consistent with its clues, as you noted.)
So I donât think itâs correct to say that it doesnât know how to play hangman. (It knows, as you noted yourself.) It just wants so badly to make you lose that it lies about the word.
There is some ambiguity in claims about whether an LLM knows how to do something. The spectrum of knowing how to do things ranges all the way from âCan it do it at least once, ever?â to âDoes it do it reliably, every time, without fail?â.
My experience was that I tried to play hangman with o4-mini twice and it failed both times in the same really goofy way, where it counted my guesses wrong when I guessed a letter that was in the word it later said I was supposed to be guessing.
When I played the game with o4-mini where it said the word was âbutterflyâ (and also said there was no âBâ in the word when I guessed âBâ), I didnât prompt it to make the word hard. I just said, after it claimed to have picked the word:
âE. Also, give me a vague hint or a general category.â
o4-mini said:
âItâs an animal.â
So, maybe asking for a hint or a category is the thing that causes it to fail. I donât know.
Even if I accepted the idea that the LLM âwants me to loseâ (which sounds dubious to me), then it doesnât know how to do that properly, either. In the âbutterflyâ example, it could, in theory, have chosen a word retroactively that filled in the blanks but didnât conflict with any guesses it said were wrong. But it didnât do that.
In the attempt where the word was âschmaltzinessâ, o4-miniâs response about which letters were where in the word (which I pasted in a footnote to my previous comment) was borderline incoherent. I could hypothesize that this was part of a secret strategy on its part to follow my directives, but much more likely, I think, is that it just lacks the capability to execute the task reliably.
Fortunately, we donât have to dwell on hangman too much, since there are rigorous benchmarks like ARC-AGI-2 that show more conclusively the reasoning abilities of o3 and o4-mini are poor compared to typical humans.