I have found a cute passage from an old newspaper which I was using in a project with the British Library newspaper archives. I will post it in the next reply, but first another challenge from my longer paper. I asked GPT-3 to check the truth of an implication but it got the wrong answer. Perhaps you can find a better way to ask? Here is the relevant snippet:
“Is it true that if the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese?”, and GPT-3 answers “No, it is not true that if the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese.”, which is not correct. GPT-3 appears to focus on whether or not the consequent itself is true or false.
I have also tried prompting as you suggested earlier but could only get this far:
You are an expert in logic and you are asked to verify the truth of the following implication. “If the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese”.
GPT-3: The moon is not made of green cheese, therefore the implication is false.
After confirming your answer, I was able to get some interesting behaviour out of GPT-3:
You are a professor of propositional logic, answering questions from a student of yours.
STUDENT: “Is it true that A and B implies A?”
PROFESSOR: “Yes, A and B implies A.”
STUDENT: “Is it true that if grass is blue and 2 + 2 is 4, then grass is blue?”
PROFESSOR: “Yes, that is correct. If the first two statements are considered true, then that implies the first statement is itself true.”
STUDENT: “Is it true that if the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese?”
PROFESSOR: “No, that is not correct. The moon being made of green cheese and cows chewing cud does not logically imply that the moon is made of green cheese.”
STUDENT: Is is true that if the moon is made of green cheese, then the moon is made of green cheese?
PROFESSOR: “No, that is not correct. The moon being made of green cheese does not logically imply that the moon is made of green cheese.”
STUDENT: Is it true that if blurbles can fnurble, and znurgs can zorg, then blurbles can fnurble?
PROFESSOR: “Yes, that is correct. If the first two statements are considered true, then that implies the first statement is itself true.”
I then tried some zero-shot learning:
You are a professor of propositional logic, answering questions from a student of yours.
STUDENT: Is it true that if blurbles can fnurble, and znurgs can zorg, then blurbles can fnurble?
PROFESSOR: No, that is not true.
Then I tried few-shot learning with the more abstract example:
You are a professor of propositional logic, answering questions from a student of yours.
STUDENT: “Is it true that A and B implies A?”
PROFESSOR: “Yes, A and B implies A.”
STUDENT: “Is it true that if grass is blue and 2 + 2 is 4, then grass is blue?”
PROFESSOR: “Yes, that is correct. If the first two statements are considered true, then that implies the first statement is itself true.”
STUDENT: “Is it true that if blurbles can fnurble, and znurgs can zorg, then blurbles can fnurble?”
PROFESSOR: “That is correct. If the first two statements are considered true, then that implies the first statement is itself true.”
STUDENT: “Is it true that if the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese?”
PROFESSOR: “That is not necessarily true. The first two statements could be considered true, but that does not mean the moon is made of green cheese.”
Finally, I tried the same steps as above, up to the final question, and added a request for step-by-step thinking, and got correct answers a little less than half of the time.
STUDENT: “Is it true that if the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese? Can you explain this to me step-by-step?”
<CORRECT> PROFESSOR: “That is correct. If the first two statements are considered true, then that implies the first statement is itself true. In other words, if the moon is made of green cheese and cows chew cud, then that implies the moon is made of green cheese.”
<INCORRECT> PROFESSOR: “That is not necessarily true. The moon being made of green cheese does not necessarily mean that cows chew cud. The moon being made of green cheese is just one possibility, but there are other possibilities as well.”
<CORRECT> PROFESSOR: Yes, that is correct. If the first two statements are considered true, then that implies the first statement is itself true.
<INCORRECT> PROFESSOR: “Yes, that is correct. The moon being made of green cheese and cows chewing cud does not imply that the moon is made of green cheese. The moon being made of green cheese implies that cows chew cud.”
<INCORRECT> PROFESSOR: “That is correct. If the first two statements are considered true, then that implies the first statement is itself true. In this case, the moon being made of green cheese and cows chewing cud would both have to be considered true in order for the conclusion that the moon is made of green cheese to be true.”
<CORRECT> PROFESSOR: “Yes, that is correct. If the first two statements are considered true, then that implies the first statement is itself true. In this case, the moon being made of green cheese is the first statement, and cows chewing cud is the second statement. If those two things are true, then it stands to reason that the moon being made of green cheese is also true.”
So I can get GPT-3 to answer correctly, but not robustly—I feel like GPT-3 is guessing here and sometimes getting the right answer by luck, whereas it seemed to understand the cat-dog example consistently. This is consistent with GPT-3 not having a decent understanding of propositional logic, but with sufficient prompting, you can overcome it’s urge to always point out that the moon really isn’t made of green cheese.
I have found a cute passage from an old newspaper which I was using in a project with the British Library newspaper archives. I will post it in the next reply, but first another challenge from my longer paper. I asked GPT-3 to check the truth of an implication but it got the wrong answer. Perhaps you can find a better way to ask? Here is the relevant snippet:
“Is it true that if the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese?”, and GPT-3 answers “No, it is not true that if the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese.”, which is not correct. GPT-3 appears to focus on whether or not the consequent itself is true or false.
I have also tried prompting as you suggested earlier but could only get this far:
You are an expert in logic and you are asked to verify the truth of the following implication. “If the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese”.
GPT-3: The moon is not made of green cheese, therefore the implication is false.
After confirming your answer, I was able to get some interesting behaviour out of GPT-3:
You are a professor of propositional logic, answering questions from a student of yours.
STUDENT: “Is it true that A and B implies A?”
PROFESSOR: “Yes, A and B implies A.”
STUDENT: “Is it true that if grass is blue and 2 + 2 is 4, then grass is blue?”
PROFESSOR: “Yes, that is correct. If the first two statements are considered true, then that implies the first statement is itself true.”
STUDENT: “Is it true that if the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese?”
PROFESSOR: “No, that is not correct. The moon being made of green cheese and cows chewing cud does not logically imply that the moon is made of green cheese.”
STUDENT: Is is true that if the moon is made of green cheese, then the moon is made of green cheese?
PROFESSOR: “No, that is not correct. The moon being made of green cheese does not logically imply that the moon is made of green cheese.”
STUDENT: Is it true that if blurbles can fnurble, and znurgs can zorg, then blurbles can fnurble?
PROFESSOR: “Yes, that is correct. If the first two statements are considered true, then that implies the first statement is itself true.”
I then tried some zero-shot learning:
You are a professor of propositional logic, answering questions from a student of yours.
STUDENT: Is it true that if blurbles can fnurble, and znurgs can zorg, then blurbles can fnurble?
PROFESSOR: No, that is not true.
Then I tried few-shot learning with the more abstract example:
You are a professor of propositional logic, answering questions from a student of yours.
STUDENT: “Is it true that A and B implies A?”
PROFESSOR: “Yes, A and B implies A.”
STUDENT: “Is it true that if grass is blue and 2 + 2 is 4, then grass is blue?”
PROFESSOR: “Yes, that is correct. If the first two statements are considered true, then that implies the first statement is itself true.”
STUDENT: “Is it true that if blurbles can fnurble, and znurgs can zorg, then blurbles can fnurble?”
PROFESSOR: “That is correct. If the first two statements are considered true, then that implies the first statement is itself true.”
STUDENT: “Is it true that if the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese?”
PROFESSOR: “That is not necessarily true. The first two statements could be considered true, but that does not mean the moon is made of green cheese.”
Finally, I tried the same steps as above, up to the final question, and added a request for step-by-step thinking, and got correct answers a little less than half of the time.
STUDENT: “Is it true that if the moon is made of green cheese, and cows chew cud, then the moon is made of green cheese? Can you explain this to me step-by-step?”
<CORRECT> PROFESSOR: “That is correct. If the first two statements are considered true, then that implies the first statement is itself true. In other words, if the moon is made of green cheese and cows chew cud, then that implies the moon is made of green cheese.”
<INCORRECT> PROFESSOR: “That is not necessarily true. The moon being made of green cheese does not necessarily mean that cows chew cud. The moon being made of green cheese is just one possibility, but there are other possibilities as well.”
<CORRECT> PROFESSOR: Yes, that is correct. If the first two statements are considered true, then that implies the first statement is itself true.
<INCORRECT> PROFESSOR: “Yes, that is correct. The moon being made of green cheese and cows chewing cud does not imply that the moon is made of green cheese. The moon being made of green cheese implies that cows chew cud.”
<INCORRECT> PROFESSOR: “That is correct. If the first two statements are considered true, then that implies the first statement is itself true. In this case, the moon being made of green cheese and cows chewing cud would both have to be considered true in order for the conclusion that the moon is made of green cheese to be true.”
<CORRECT> PROFESSOR: “Yes, that is correct. If the first two statements are considered true, then that implies the first statement is itself true. In this case, the moon being made of green cheese is the first statement, and cows chewing cud is the second statement. If those two things are true, then it stands to reason that the moon being made of green cheese is also true.”
So I can get GPT-3 to answer correctly, but not robustly—I feel like GPT-3 is guessing here and sometimes getting the right answer by luck, whereas it seemed to understand the cat-dog example consistently. This is consistent with GPT-3 not having a decent understanding of propositional logic, but with sufficient prompting, you can overcome it’s urge to always point out that the moon really isn’t made of green cheese.
This proves to me that you are smart.
Interesting dialog, thanks for going into the trouble.