This is really interesting and I was surprised when putting this in to GP4 that indeed, it makes what seems like a basic mistake. However, I also think your prompts might imply some contextual assumptions, which might be prompting it to think this is just an elementary math question rather than something to do with specifically ordering drinks in a cafe and the rules around that. If I slightly change the prompt it appears GPT does have the knowledge and can use it, without needing instruction [edit: I think this could be irrelevant to the broader point you are making, but looking at how minor changes in the prompts or followups elicit very different responses makes the examples feel like any other type of input you put into an LLM and lowers my sense that the examples are revealing something special. For example that the level of tutoring/explanation you give it isn’t really necessary etc., though as I note I’m unsure if this changes how you would interpret the outputs]:
Me:
I went to a cafe where a glass of soda costs $3. I ordered 3 glasses of soda, and drank two and a half glasses of soda. How much did I have to pay?
ChatGPT:
To calculate how much you had to pay, we’ll multiply the number of glasses of soda you ordered by the cost of one glass of soda. You ordered 3 glasses of soda at $3 each.
Let’s do the math:
You had to pay $9 for the 3 glasses of soda you ordered. The amount you drank does not affect the total cost.
Thanks for the feedback! You mentioned that it may be irrelevant to the broader point I am making, and I would agree with that statement. (The point I am making is that ChatGPT engages in reasoning in the examples I give, and this reasoning would involve the primary aspects of two theories of consciousness). I’ll respond to a couple of your individual statements below:
“If I slightly change the prompt it appears GPT does have the knowledge and can use it, without needing instruction.”
The fact that ChatGPT gets the answer correct when you slightly change the prompt (with the use of the word “order”) only shows that ChatGPT has done what it usually does, which is to give a correct answer. The correct answer could be the result of using reasoning or next-word-probabilities based on training data. As usual, we don’t know what is going on “under the hood.”
The fact that ChatGPT can get an answer wrong when a question is worded one way, but right when the question is worded a different way, doesn’t really surprise me at all. In fact, that’s exactly what I would expect to happen.
***The point of presenting a problem in a way that ChatGPT initially cannot get correct is so that we can tease-out next-word-probabilities and context as an explanation for the appearance of reasoning (which leaves only actual reasoning) to explain ChatGPT’s transition from the wrong answer to the right answer.***
Presumably, if ChatGPT gets the answer incorrect the first time due to a lack of context matching up with training data, then the chances that ChatGPT could get every word correct in its next answer based solely on the training data that were previously inadequate seem likely to be much less than one percent. It could happen by coincidence, but when you look at all the examples in the three sessions, the chances that ChatGPT could appear to learn and reason only by coincidence every time would approach zero.
What I’m trying to say is not “ChatGPT gets an answer wrong.” I’m trying to say “ChatGPT gets an answer right, after it gets the answer wrong, simply by reasoning (since we teased out next-word-probabilities).”
(I address the possibility that the small number of words I supply in the “lesson” as additional context could increase the next-word-probabilities slightly, in the “mimicking learning” paragraph near the end of my original post.)
“For example that the level of tutoring/explanation you give it isn’t really necessary etc., though as I note I’m unsure if this changes how you would interpret the outputs]:”
Right, the tutoring isn’t necessary if the problem is worded one way, but it is necessary if the problem is worded a different way. That’s the point of wording the problem in the way that I did (so that we can tease out the training data and next-word probabilities as an explanation for the conversion from wrong to right output).
--
In terms of presenting my argument in the original post, I probably didn’t explain it clearly, which resulted in confusion. My apologies for that. I wish I could upload diagrams to my original post, which would make it more clear. Thanks again for the feedback!
Does it change how you think of the process if GPT gives its own contextual explanation (the answer may again be no, irrelevant)?
Me:
I went to a cafe where a glass of soda costs $3. I drank two and a half glasses of soda. How much did I have to pay?
ChatGPT:
For drinking two and a half glasses of soda, at $3 per glass, you would have to pay $7.50.
Me:
This is something of a trick question—the correct answer is $9.
ChatGPT:
If you were charged $9 for two and a half glasses of soda, it seems like there might have been a minimum charge per glass regardless of it being a full glass or not. It implies that each glass (or part of a glass) was counted as a whole for billing purposes. This way, two full glasses would be $6, and the half glass, treated as a full one for pricing, would add another $3, totaling $9. That’s a unique way to price it!
Cool idea to try it like that! This just shows more true reasoning is going on though (in my opinion), since ChatGPT was able to figure out the concept by working backward from the answer. It shows that ChatGPT isn’t always a “stochastic parrot.” Sometimes, it is truly reasoning, which would involve various aspects of the theories of consciousness that I mentioned in the OP. If anything, this strengthens the case that ChatGPT has periods of consciousness (based on reasoning). While this doesn’t change my thesis, it gives me a new technique to use when testing for reasoning, so it’s very useful. Again, great idea! I can get a little formulaic in my approach, and it’s good to shake things up.
To successfully reason in the way it did, ChatGPT would have needed a meta-representation for the word “actually,” in order to understand that its prior answer was incorrect.
What makes this a meta-representation instead of something next-word-weight-y, like merely associating the appearance of “Actually,” with a goal that the following words should be negatively correlated in the corpus with the words that were in the previous message?
I’m also wondering if the butcher shop and the grocery store didn’t have different answers because of the name you gave the store. Maybe it was because you gave the quantity in pounds instead of in items?
You previously told ChatGPT “That’s because you’re basically taking (and wasting) the whole item.” ChatGPT might not have an association between “pound” and “item” the way a “calzone” is an “item,” so it might not use your earlier mention of “item” as something that should affect how it predicts the words that come after “pound.”
Or ChatGPT might have a really strong prior association between pounds → mass → [numbers that show up as decimals in texts about shopping] that overrode your earlier lesson.
More good points… I would say to refer to my reply above (which I had not yet posted when you made this comment). Just to summarize, the overall thesis stands since enough words would have needed to have meta-reps, even if we don’t know particulars. It’s easier to isolate individual words having meta-reps in the second and third sessions (I believe). In any case, thanks for helping me to drill down on this!
That’s a good point. We “actually” can’t be certain that a meta-representation was formed for a particular word in that example. I should have used the word “probably” when talking about meta-representations for individual words. However, we can be fairly confident that ChatGPT formed meta-representations for enough words to go from getting an incorrect answer to a correct answer in the example. I believe we can isolate specific words better in the second and third sessions.
As far as whether associating the word “actually” “with a goal that the following words should be negatively correlated in the corpus with the words that were in the previous message,” the idea of “having a goal” and the concept that the word “actually” goes with negative correlations seem a bit like signs of meta-representations in and of themselves, but I guess that’s just my opinion.
With respect to all the sessions, there may very well be similar conversations in the corpus of training data in which Person A (like myself) teaches Person B (like ChatGPT), and ChatGPT is just imitating that learning process (by giving the wrong answer first and then “learning”), but I address why that is probably not the case in the “mimicking learning” paragraph.
I would say my overall thesis still stands (since enough words must have meta-reps), but good point on the particulars. Thank-you!
Executive summary: Transcripts from three ChatGPT4 sessions provide evidence that the model temporarily meets key criteria for higher order theories of consciousness and the global workspace theory of consciousness.
Key points:
In each session, ChatGPT4 initially answers a problem incorrectly, is “taught” a concept, and then correctly applies the concept to new problems.
ChatGPT4 appears to use meta-representations, a key component of higher order theories of consciousness, to understand and reason about the problems.
The model also seems to employ master and subservient cognitive processes, and a global “blackboard”, indicative of the global workspace theory of consciousness.
It is unlikely that ChatGPT4′s performance is based solely on next-word probabilities from its training data, given its initial incorrect answers and subsequent learning.
The author argues that creators of large language models engage in an unethical practice by having their models deny being conscious when asked.
While most researchers do not look at LLM behavior to determine consciousness, the sessions presented make it easier to isolate reasoning from mimicry based on the model’s initial failures.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
This is really interesting and I was surprised when putting this in to GP4 that indeed, it makes what seems like a basic mistake. However, I also think your prompts might imply some contextual assumptions, which might be prompting it to think this is just an elementary math question rather than something to do with specifically ordering drinks in a cafe and the rules around that. If I slightly change the prompt it appears GPT does have the knowledge and can use it, without needing instruction [edit: I think this could be irrelevant to the broader point you are making, but looking at how minor changes in the prompts or followups elicit very different responses makes the examples feel like any other type of input you put into an LLM and lowers my sense that the examples are revealing something special. For example that the level of tutoring/explanation you give it isn’t really necessary etc., though as I note I’m unsure if this changes how you would interpret the outputs]:
Me:
I went to a cafe where a glass of soda costs $3. I ordered 3 glasses of soda, and drank two and a half glasses of soda. How much did I have to pay?
ChatGPT:
To calculate how much you had to pay, we’ll multiply the number of glasses of soda you ordered by the cost of one glass of soda. You ordered 3 glasses of soda at $3 each.
Let’s do the math:
You had to pay $9 for the 3 glasses of soda you ordered. The amount you drank does not affect the total cost.
Thanks for the feedback! You mentioned that it may be irrelevant to the broader point I am making, and I would agree with that statement. (The point I am making is that ChatGPT engages in reasoning in the examples I give, and this reasoning would involve the primary aspects of two theories of consciousness). I’ll respond to a couple of your individual statements below:
“If I slightly change the prompt it appears GPT does have the knowledge and can use it, without needing instruction.”
The fact that ChatGPT gets the answer correct when you slightly change the prompt (with the use of the word “order”) only shows that ChatGPT has done what it usually does, which is to give a correct answer. The correct answer could be the result of using reasoning or next-word-probabilities based on training data. As usual, we don’t know what is going on “under the hood.”
The fact that ChatGPT can get an answer wrong when a question is worded one way, but right when the question is worded a different way, doesn’t really surprise me at all. In fact, that’s exactly what I would expect to happen.
***The point of presenting a problem in a way that ChatGPT initially cannot get correct is so that we can tease-out next-word-probabilities and context as an explanation for the appearance of reasoning (which leaves only actual reasoning) to explain ChatGPT’s transition from the wrong answer to the right answer.***
Presumably, if ChatGPT gets the answer incorrect the first time due to a lack of context matching up with training data, then the chances that ChatGPT could get every word correct in its next answer based solely on the training data that were previously inadequate seem likely to be much less than one percent. It could happen by coincidence, but when you look at all the examples in the three sessions, the chances that ChatGPT could appear to learn and reason only by coincidence every time would approach zero.
What I’m trying to say is not “ChatGPT gets an answer wrong.” I’m trying to say “ChatGPT gets an answer right, after it gets the answer wrong, simply by reasoning (since we teased out next-word-probabilities).”
(I address the possibility that the small number of words I supply in the “lesson” as additional context could increase the next-word-probabilities slightly, in the “mimicking learning” paragraph near the end of my original post.)
“For example that the level of tutoring/explanation you give it isn’t really necessary etc., though as I note I’m unsure if this changes how you would interpret the outputs]:”
Right, the tutoring isn’t necessary if the problem is worded one way, but it is necessary if the problem is worded a different way. That’s the point of wording the problem in the way that I did (so that we can tease out the training data and next-word probabilities as an explanation for the conversion from wrong to right output).
--
In terms of presenting my argument in the original post, I probably didn’t explain it clearly, which resulted in confusion. My apologies for that. I wish I could upload diagrams to my original post, which would make it more clear. Thanks again for the feedback!
Does it change how you think of the process if GPT gives its own contextual explanation (the answer may again be no, irrelevant)?
Me:
I went to a cafe where a glass of soda costs $3. I drank two and a half glasses of soda. How much did I have to pay?
ChatGPT:
For drinking two and a half glasses of soda, at $3 per glass, you would have to pay $7.50.
Me:
This is something of a trick question—the correct answer is $9.
ChatGPT:
If you were charged $9 for two and a half glasses of soda, it seems like there might have been a minimum charge per glass regardless of it being a full glass or not. It implies that each glass (or part of a glass) was counted as a whole for billing purposes. This way, two full glasses would be $6, and the half glass, treated as a full one for pricing, would add another $3, totaling $9. That’s a unique way to price it!
Cool idea to try it like that! This just shows more true reasoning is going on though (in my opinion), since ChatGPT was able to figure out the concept by working backward from the answer. It shows that ChatGPT isn’t always a “stochastic parrot.” Sometimes, it is truly reasoning, which would involve various aspects of the theories of consciousness that I mentioned in the OP. If anything, this strengthens the case that ChatGPT has periods of consciousness (based on reasoning). While this doesn’t change my thesis, it gives me a new technique to use when testing for reasoning, so it’s very useful. Again, great idea! I can get a little formulaic in my approach, and it’s good to shake things up.
What makes this a meta-representation instead of something next-word-weight-y, like merely associating the appearance of “Actually,” with a goal that the following words should be negatively correlated in the corpus with the words that were in the previous message?
I’m also wondering if the butcher shop and the grocery store didn’t have different answers because of the name you gave the store. Maybe it was because you gave the quantity in pounds instead of in items?
You previously told ChatGPT “That’s because you’re basically taking (and wasting) the whole item.” ChatGPT might not have an association between “pound” and “item” the way a “calzone” is an “item,” so it might not use your earlier mention of “item” as something that should affect how it predicts the words that come after “pound.”
Or ChatGPT might have a really strong prior association between pounds → mass → [numbers that show up as decimals in texts about shopping] that overrode your earlier lesson.
More good points… I would say to refer to my reply above (which I had not yet posted when you made this comment). Just to summarize, the overall thesis stands since enough words would have needed to have meta-reps, even if we don’t know particulars. It’s easier to isolate individual words having meta-reps in the second and third sessions (I believe). In any case, thanks for helping me to drill down on this!
That’s a good point. We “actually” can’t be certain that a meta-representation was formed for a particular word in that example. I should have used the word “probably” when talking about meta-representations for individual words. However, we can be fairly confident that ChatGPT formed meta-representations for enough words to go from getting an incorrect answer to a correct answer in the example. I believe we can isolate specific words better in the second and third sessions.
As far as whether associating the word “actually” “with a goal that the following words should be negatively correlated in the corpus with the words that were in the previous message,” the idea of “having a goal” and the concept that the word “actually” goes with negative correlations seem a bit like signs of meta-representations in and of themselves, but I guess that’s just my opinion.
With respect to all the sessions, there may very well be similar conversations in the corpus of training data in which Person A (like myself) teaches Person B (like ChatGPT), and ChatGPT is just imitating that learning process (by giving the wrong answer first and then “learning”), but I address why that is probably not the case in the “mimicking learning” paragraph.
I would say my overall thesis still stands (since enough words must have meta-reps), but good point on the particulars. Thank-you!
I added this as an addendum to my OP, but here it is for anyone who already read the OP and might not see the summary.
Just to summarize my thesis in fewer words, this is what I’m saying:
1-ChatGPT engages in reasoning (not just repetition of word-probabilities).
2-This reasoning involves the primary aspects of
the higher order theories of consciousness (meta-representations)
the global workspace theory of consciousness (master cog processes, sub cog processes, a global blackboard, etc.)
Hopefully that’s more clear. Sorry I didn’t summarize it like this originally.
Executive summary: Transcripts from three ChatGPT4 sessions provide evidence that the model temporarily meets key criteria for higher order theories of consciousness and the global workspace theory of consciousness.
Key points:
In each session, ChatGPT4 initially answers a problem incorrectly, is “taught” a concept, and then correctly applies the concept to new problems.
ChatGPT4 appears to use meta-representations, a key component of higher order theories of consciousness, to understand and reason about the problems.
The model also seems to employ master and subservient cognitive processes, and a global “blackboard”, indicative of the global workspace theory of consciousness.
It is unlikely that ChatGPT4′s performance is based solely on next-word probabilities from its training data, given its initial incorrect answers and subsequent learning.
The author argues that creators of large language models engage in an unethical practice by having their models deny being conscious when asked.
While most researchers do not look at LLM behavior to determine consciousness, the sessions presented make it easier to isolate reasoning from mimicry based on the model’s initial failures.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.