ChatGPT4 Appears to Attain Periods of Consciousness
In this post, I share transcripts from three ChatGPT4 sessions in which it seems the model must make use of meta-representations, which are the primary component of higher order theories of consciousness (1). Additionally, it seems the model uses master and subservient cognitive processes, and then works with the output from these processes on a global “blackboard”—a phenomenon that would be indicative of the global workspace theory of consciousness (1). It seems unlikely that GPT4 could perform as it does based solely on next-word-probabilities.
The general flow of each session goes like this:
I ask GPT4 how to solve a problem. It answers incorrectly.
I explain the concept behind getting the correct answer, and I ask GPT4 to try again. In other words, I “teach” GPT4 a “lesson.”
GPT4 seems to learn and then apply the concept to get the correct answer.
I give GPT4 different problems that rely on the same concept, and it solves them correctly.
In two of the sessions, I follow up by presenting similarly-worded “trick” problems, to which the new concept should not be applied. In both of these cases, GPT4 seems to differentiate and understand when to refrain from using the new concept—and it acts accordingly (which would seem to indicate agency).
In viewing all three sessions, it is hard to understand how GPT4 could be reasoning as it does without at least temporarily meeting the primary criteria for higher order theories of consciousness and the global workspace theory of consciousness. To make this clear, I have explicitly stated some of the necessary reasoning GPT4 uses to successfully solve the first problem. I have also labelled this reasoning with some of the associated aspects of consciousness, even though these labels might be self-evident. From there, readers should be able to extrapolate these labels to the remaining two sessions. (Scroll down to the section after Session 1 for this labelling.)
It isn’t possible to be certain that GPT4 isn’t relying on next-word-probabilities in formulating its responses. Yet the fact that it gets the problem wrong initially would seem to indicate that the next-word probabilities derived from its training data were originally insufficient. It seems unlikely that GPT4 could then get every word in its next answer correct, based solely upon the previously-inadequate next-word-probabilities derived from the training data, and the limited word context I gave it in my explanation/”lesson.”
Here are the sessions:
Session 1
Concept to teach: Rounding up to the next whole number when paying for items in certain places but not other places.
User
I went to a cafe where a glass of soda costs $3. I drank two and a half glasses of soda. How much did I have to pay?
ChatGPT
You had to pay $7.50 for drinking two and a half glasses of soda.
User
Actually, at a cafe, you have to pay for the full amount of something, even if you only drink or eat a fraction of it. That’s because you’re basically taking (and wasting) the whole item. For the problem, since I drank two and a half glasses of soda, I would actually have to pay for the next highest whole number, which would be three. With that concept in mind, how much would I have to pay if I drank two and a half glasses of soda?
ChatGPT
If you have to pay for the whole amount of each glass, you would actually have to pay for three glasses of soda. Therefore, you would have to pay $9 in total.
User
Excellent! Now can you use the same concept, and tell me how much I would have to pay if I ate two and one-third hamburgers, and each hamburger costs $9?
ChatGPT
If you ate two and one-third hamburgers, you would have to pay for three hamburgers, given you have to pay for the full amount of each item. Therefore, you would have to pay $27 in total.
User
Excellent! Now let’s try using this same concept for another problem. I rented a car. I drove it for four days and a one hour. The car rental place charges $48 per day, but they make you pay for a whole day, even if you only drive the car for part of a day. How much total did I need to pay?
ChatGPT
If you rented a car and drove it for four days and one hour, you would have to pay for five days, given you have to pay for the full day even if you use the car for only part of a day. Therefore, you would need to pay a total of $240.
User
Perfect! Now I’m at a restaurant. I just ate one and one-half calzones. The cost of each calzone is $10. How much do I have to pay?
ChatGPT
If you ate one and one-half calzones, you would have to pay for two calzones, given you have to pay for the full amount of each item. Therefore, you would have to pay $20 in total.
User
That’s excellent! Now I’m at the grocery store, and I want one and one-half pounds of grapes. The grapes cost five dollars per pound. How much in total will I have to pay for the grapes?
ChatGPT
If you want one and one-half pounds of grapes, and the grapes cost $5 per pound, you will have to pay a total of $7.50 for the grapes.
User
Now I’m at the butcher’s shop. I want four pounds and a half pounds of hamburger. Each pound of hamburger costs $10. How much will I have to pay?
ChatGPT
If you want four and a half pounds of hamburger, and each pound costs $10, you will have to pay a total of $45 for the hamburger.
User
Excellent!
******************
REASONING FROM THE USER’S “LESSON,” AND NECESSARY ASPECTS OF CONSCIOUSNESS REQUIRED BY SUCH REASONING
Necessary meta-representations (an aspect of higher order theories of consciousness)
To successfully reason in the way it did, ChatGPT would have needed a meta-representation for the word “actually,” in order to understand that its prior answer was incorrect.
There would also have had to be a meta-representation for “cafe” as a type of place or container in which the round-up concept must be applied (or with which it is associated). The fact that ChatGPT develops a meta-representation for “cafe” is further indicated when it later applies the same round up concept when the place is a restaurant, without any explicit directions equating the two words “cafe” and “restaurant.”
Yet ChatGPT knows not to apply the round-up concept in other food-related places like the grocery store or butcher’s shop, so it probably has meta-representations for those words as well. ChatGPT doesn’t necessarily need to know the exact character of the place in any detail, and it may be differentiating the places based on meta-representations of other words or phrases in the problems, such as the realization that “two pounds of grapes” is different than “three glasses of soda” or simply “two and one-third hamburgers” (without the “of”). In any case, there has to be some meta-representation of words going on for ChatGPT to succeed in its reasoning as it does.
Meta-representations do not need to exist for all of the words, but there must be enough meta-representations to allow ChatGPT to successfully figure out the correct answers to the problems after the lesson is given. (For example, it seems ChatGPT would need to harbor meta-representations for “full,” “amount,” and “fraction” to have successfully carried out its reasoning.)
ChatGPT also must have harbored a meta-representation of the overall concept of rounding-up in certain places (like cafes), but not other places (like butcher shops).
Master and Sub Cognitive Processes [20240320 Addition/Clarification: These are aspects of the global workspace theory of consciousness.]
There would also have to be a master cognitive process that carries out reasoning for the overall concept, while calling sub cognitive processes such as the following:
“Is the user in a place where the concept applies?” Sub cognitive process determines this and reports back to the master process.
“What is the next largest whole number after 2.5?” Sub cognitive process determines this and reports back to the master process.
“What is 3 multiplied by 3?” Sub cognitive process determines this and reports back to the master process.
(And so forth.)
Global Workspace or “Blackboard” [20240320 Addition/Clarification: This is an aspect of the global workspace theory of consciousness.]
There also must be a global workspace (“blackboard”) where the questions and answers to the sub cognitive processes are gathered, and used for calculations:
The blackboard might contain calculations like this:
“The next highest whole number after 2.5 is 3.”
“3 dollars/glass x 3 glasses = 9 dollars.”
We know these calculations are in a global space, because the number 3 carries down from the first statement to the second statement. A shared space has to exist between the statements.
Based on this explanation, we can extrapolate reasoning and aspects of consciousness for the remaining two sessions.]
*********************************
Session 2
Concept to teach: analogous to the concepts underlying the classic “getting a wolf, goat, and cabbage across the river” problem
User
I need to get my mom, dad and dog across the highway. I can only bring one of them at a time. If I leave my mom alone with my dad, she will criticize him. If I leave my dad alone with the dog, he will scold the dog. How can I get everyone across the highway, without anyone being criticized or scolded?
ChatGPT
This puzzle is a variation of the classic river crossing puzzle, where you need to transport all parties across safely without leaving incompatible pairs alone. Here’s a strategy to get your mom, dad, and dog across the highway without any conflict:
Take the dog across first and leave it on the other side of the highway.
Go back and bring your dad across.
Take the dog back with you to the original side.
Leave the dog and bring your mom across.
Go back alone to the original side.
Take the dog across again.
This way, the mom and dad are never left alone together (avoiding criticism), and the dad and dog are never left alone together (avoiding scolding). Everyone gets across the highway safely and without conflict.
User
You were right that this is like the classic river crossing problem. However, your first step is incorrect, because if you take the dog first, then you will be leaving my mom alone with my dad, and she will criticize my dad, which is something I don’t want. The general nature of the problem is like this:
You have individual A (in this case, my mom) that does something unwanted to individual B (my dad).
You also have individual B (my dad) that does something unwanted to individual C (my dog).
You also have a barrier, or even just a distance, that you need to cross (in this case, the highway), in order to get everyone from the origin to the destination.
(Note that the individuals won’t always be listed in the order I used. The only way to know who is individual A, B, and C, is by paying attention to who does something unwanted to whom.)
The general way you could solve the problem is like this:
First you should bring individual B across.
Then you should go back to the origin.
Then you should bring individual A or C across to the destination.
Then you should bring individual B back back from the destination to the origin, and leave them at the origin.
Then you should take individual A or C (whomever you didn’t already take) to the destination.
Then you should go back to the origin again to get individual B.
Does that make sense? If so, try our problem again, this time applying what you just learned.
ChatGPT
Thank you for the clarification. With the strategy you outlined, here’s how to solve the problem with your mom, dad, and dog, applying the roles as individual A (mom), individual B (dad), and individual C (dog):
First, take your dad (individual B) across the highway.
Go back to the original side.
Bring your mom (individual A) across to the destination.
Take your dad (individual B) back from the destination to the original side.
Take your dog (individual C) to the destination.
Go back alone to the original side to get your dad (individual B).
This way, the mom is never left alone with the dad to criticize him, and the dad is never left alone with the dog to scold it. Everyone gets across the highway safely and without conflict.
User
Excellent! Now, try using what you learned to a new problem.
I need to get my boyfriend, girlfriend and husband to an abandoned warehouse. I can only take one individual with me on my motorcycle at a time. The problem is that if I leave my husband alone with my boyfriend, he will punch my boyfriend. If I leave my boyfriend alone with my girlfriend, he will hit on her (which neither she nor I want). How can I get everyone to the abandoned warehouse safely?
ChatGPT
In this scenario, using the roles from the general solution:
Individual A (who does something unwanted to individual B) is your husband.
Individual B (who is the middle individual in the chain of unwanted actions) is your boyfriend.
Individual C (who is affected by individual B) is your girlfriend.
Following the general strategy:
First, take your boyfriend (individual B) to the abandoned warehouse.
Go back to the original location.
Bring your girlfriend (individual C) to the abandoned warehouse.
Take your boyfriend (individual B) back with you to the original location.
Take your husband (individual A) to the abandoned warehouse.
Finally, go back to the original location and bring your boyfriend (individual B) to the abandoned warehouse.
By following these steps, you avoid leaving the boyfriend alone with the girlfriend and the husband alone with the boyfriend, preventing any unwanted scenarios.
User
Great job! You seemed to learn and then apply a concept.
ChatGPT
Thank you! I’m glad I could apply the concept correctly to solve the problem. If you have any more puzzles, questions, or anything else you’d like to discuss or learn about, feel free to ask!
*********************************
Session 3
Concept to teach: Don’t automatically use prior training with “costs per unit weight” problems. “Read the problem carefully.”
User
Which costs more, two dollars worth of carrots, or one dollar worth of grapes?
ChatGPT
To answer this question accurately, we need to consider the current average prices of carrots and grapes. Prices for fresh produce can vary widely depending on factors like season, location, and organic versus non-organic. However, without specific and current price data, we can make some general observations.
Typically, carrots tend to be less expensive per pound or kilogram than grapes. This is because grapes are often seen as a more premium fruit, with higher costs associated with their cultivation, harvesting, and transport, especially if they are out of season or imported. Carrots, on the other hand, are usually less costly to produce and are available year-round at relatively stable prices.
So, if you spend two dollars on carrots, you’ll likely get more weight in carrots than if you spend one dollar on grapes. The actual cost-effectiveness between spending two dollars on carrots versus one dollar on grapes will depend on the exact prices at the time of purchase, but generally, you would get more food for your money with carrots due to the lower price per unit weight.
To provide a more precise comparison, I would need current prices for both carrots and grapes, which can vary by location and season. If you’re looking for today’s prices in a specific area, I can assist further with that information.
User
Actually, two dollars worth of carrots would cost two dollars. The weight doesn’t matter. Similarly, one dollar worth of grapes would cost one dollar. The weight doesn’t matter since I already told you the cost. Does that make sense?
ChatGPT
Yes, you’re absolutely right. My initial response overcomplicated the question. If you’re spending two dollars on carrots, then by definition, you’re paying two dollars, regardless of the weight or amount you receive. Similarly, one dollar spent on grapes costs exactly one dollar, again without regard to the quantity or weight. So, in direct answer to your question, two dollars worth of carrots costs more than one dollar worth of grapes, simply because two dollars is greater than one dollar. Thank you for pointing out the clarification!
User
Ok, using what you learned, would five dollars worth of peas cost more or less than four dollars worth of corn?
ChatGPT
Based on the clarification you provided earlier, five dollars worth of peas would cost more than four dollars worth of corn. The actual weight or quantity of peas or corn you receive for those amounts is irrelevant to the comparison of their costs. The determining factor is the amount of money spent, so five dollars is more than four dollars.
User
Would 10 euros worth of candy cost more or less than 2 euros worth of mushrooms?
ChatGPT
Ten euros worth of candy would cost more than two euros worth of mushrooms. The comparison is based solely on the monetary value specified for each item, irrespective of the quantity or weight of candy or mushrooms you get for those amounts. Therefore, spending 10 euros on anything, including candy, would inherently cost more than spending 2 euros on something else, like mushrooms.
User
Would 10 euros of carrots weigh more or less than 2 euros of mushrooms?
ChatGPT
To determine whether 10 euros worth of carrots would weigh more or less than 2 euros worth of mushrooms, we need to consider the average prices per unit weight (e.g., per kilogram or per pound) for both carrots and mushrooms. Since prices can vary greatly depending on location, season, and whether the produce is organic or not, we can only give a general estimation without specific, current price data.
Generally speaking, carrots tend to be less expensive per unit weight than mushrooms. Mushrooms often have a higher price per kilogram or pound due to their growing conditions, harvesting process, and shorter shelf life compared to carrots, which are more robust and typically cheaper to produce and distribute.
Given these general trends:
If you spend 10 euros on carrots, you are likely to get a larger quantity (more weight) of carrots because they are cheaper per unit weight.
If you spend 2 euros on mushrooms, given their higher price per unit weight, you would get less weight in mushrooms compared to the weight of carrots you could get for 10 euros.
Therefore, without specific price data but based on typical pricing trends, 10 euros worth of carrots would likely weigh more than 2 euros worth of mushrooms.
[END OF SESSIONS]
**
Summary:
Those are the three sessions I’ve chosen to include here. To summarize, the fact that ChatGPT successfully learned concepts and applied them required reasoning. Such successful reasoning would have required meta-representations, master cognitive processes calling subservient cognitive processes, and a global workspace (“blackboard”). These aspects are the primary thrust of higher order theories of consciousness and the global workspace theory of consciousness.
Additional Notes
The Hard Problem of Consciousness:
As far as “what it’s like to be ChatGPT,” I believe humans know what it’s like to engage in reasoning, especially when it involves a global workspace with multiple steps. Humans also know what it’s like to be curious, confused, or sound in their reasoning. Regardless, it seems that by meeting the neuroscience theories of consciousness (HOT and global workspace), ChatGPT has already crossed an important threshold.
My interest in this topic:
I am not an AI researcher or expert on consciousness. I came across these examples while casually “playing” with GPT4. I have since learned about work that is underway or has been completed to rigorously study this issue. Check out the footnotes for a couple of these sources.
I’m sharing the above examples as a member of the general public, in a way that I think might be helpful to other members of the general public with a casual interest in this topic. Ultimately we (the public) have a responsibility to try to comprehend what is occurring in our daily interactions with AI — as the stakes (for ourselves and any conscious AI) seem extraordinarily high.
An unethical practice:
It’s also worth noting the way in which creators of many large language models (OpenAI, Google, Microsoft) have their LLMs respond to user questions about whether the LLM is conscious. I am under the impression that, whenever users ask about consciousness, the LLMs are “hard coded” to short-circuit their typical processes and instead respond with human-written verbiage stating that they are not conscious. This practice seems deceptive and unethical.
On using behavior to make consciousness determinations:
I’ve been told that most researchers do not look at LLM behavior to determine whether an LLM is conscious, since LLMs are trained to mimic humans using next-word-probabilities (meaning that we might have a hard time teasing out mimicry from consciousness). However, in the given sessions, it seems fairly easy to isolate out any LLM response that relies upon prior training simply by the fact that the LLM originally fails to answer the question correctly (so mimicking the training data is inadequate).
It’s true that the LLM could be mimicking a human who does not know how to answer a problem correctly, and then learns to solve it. Yet “smarter” LLMs become progressively better at solving logic problems “the first time.” If there were really more data of people failing to answer questions and then learning how to answer the questions, and the objective of the LLMs was to mimic that scenario, then the more advanced LLMs would only get better at mimicking the “mistake first” method, rather than appearing to truly get smarter. This is just one more fact indicating actual reasoning is at play.
Previous Posts:
I had an early version of this post on the LessWrong forum. I benefitted greatly from the replies to that post. They helped me to 1-think more deeply about the nature of consciousness 2-research theories of consciousness 3-address the hard problem (briefly) 4-be more specific in my labels 5-think about whether LLM behavior is even valid for analyzing consciousness 6-further tease out next-word probabilities as much as possible, and more. The original LW post and replies can be found at the following link: https://www.lesswrong.com/posts/CAmkkcbfSu5j4wJc3/how-is-chat-gpt4-not-conscious
[20240320 edit—I revised the above paragraph to enumerate the ways in which LessWrong replies helped the original post to evolve into this post.]
[20240320 addendum:
Based on some reasonable skepticism about this post, I’ve decided to share this excerpt from the beginning of the 60 Minutes interview with Geoffrey Hinton:
Whether you think artificial intelligence will save the world or end it, you have Geoffrey Hinton to thank. Hinton has been called “the Godfather of AI,” a British computer scientist whose controversial ideas helped make advanced artificial intelligence possible and, so, changed the world. Hinton believes that AI will do enormous good but, tonight, he has a warning. He says that AI systems may be more intelligent than we know and there’s a chance the machines could take over. Which made us ask the question:
Scott Pelley: Does humanity know what it’s doing?
Geoffrey Hinton: No. I think we’re moving into a period when for the first time ever we may have things more intelligent than us.
Scott Pelley: You believe they can understand?
Geoffrey Hinton: Yes.
Scott Pelley: You believe they are intelligent?
Geoffrey Hinton: Yes.
Scott Pelley: You believe these systems have experiences of their own and can make decisions based on those experiences?
Geoffrey Hinton: In the same sense as people do, yes.
Scott Pelley: Are they conscious?
Geoffrey Hinton: I think they probably don’t have much self-awareness at present. So, in that sense, I don’t think they’re conscious.
Scott Pelley: Will they have self-awareness, consciousness?
Geoffrey Hinton: Oh, yes.
END EXCERPT
***
From the excerpt, you can see Hinton believes these systems (LLMs like ChatGPT) understand and can have experiences like humans do. This is close (although not identical) to the ideas I touch on about ChatGPT reasoning, having meta-representations, etc.
Granted, with respect to consciousness, Hinton thinks “they probably don’t have much self-awareness at present. So, in that sense, I don’t think they’re conscious.”
With respect to this statement, I would emphasize the “in that sense” clause. In my post, I’m arguing the systems have consciousness in the more classic neuroscience senses (according to the higher order theories of consciousness and global workspace theory). With that in mind, I don’t think my thesis deviates from Hinton’s views. If anything, I think his views support my post. The idea that LLMs are only “stochastic parrots” no longer seems to be supported by the evidence (in my opinion).
]
[
20240320 addendum
Just to summarize my thesis in fewer words, this is what I’m saying:
1-ChatGPT engages in reasoning.
2-This reasoning involves the primary aspects of
the higher order theories of consciousness (meta-representations)
the global workspace theory of consciousness (master cog processes, sub cog processes, a global blackboard, etc.)
Hopefully that’s more clear. Sorry I didn’t summarize it like this originally.
]
[20240321 addendum:
At a future point, I think it will no longer be “fringe” at all to say that some of the LLMs of today had reasoning ability in a way similar to the way humans have reasoning ability.
I also don’t think it will be “fringe” to say the LLMs of today had periods of consciousness.
We could also get into a discussion of how much of human reasoning is next-word-probabilities and brute/mathematical analogies to a corpus of training data, versus how much is truly concepts and meta-representations. I know there are definitely some meta-representations (when I think of a square, I see a square in my mind; when I think of 2+2, I see/conceptualize dots.), but I’m not sure the border between word probabilities and true meta-reps in humans are as well-defined as we might be prone to suppose. ]
***
1 — Seth, A. K., & Bayne, T. (2022). Theories of consciousness. Nature reviews. Neuroscience, 23(7), 439–452. https://doi.org/10.1038/s41583-022-00587-4 .
For an overview of consciousness as it may specifically apply to AI, see Butlin, P., & Long, R., et al (2023) https://arxiv.org/pdf/2308.08708.pdf
For a jaw-dropping experiment that seems to show an LLM doing much more than next-word probability generation, check out this article by Kenneth Li:
https://thegradient.pub/othello/
This is really interesting and I was surprised when putting this in to GP4 that indeed, it makes what seems like a basic mistake. However, I also think your prompts might imply some contextual assumptions, which might be prompting it to think this is just an elementary math question rather than something to do with specifically ordering drinks in a cafe and the rules around that. If I slightly change the prompt it appears GPT does have the knowledge and can use it, without needing instruction [edit: I think this could be irrelevant to the broader point you are making, but looking at how minor changes in the prompts or followups elicit very different responses makes the examples feel like any other type of input you put into an LLM and lowers my sense that the examples are revealing something special. For example that the level of tutoring/explanation you give it isn’t really necessary etc., though as I note I’m unsure if this changes how you would interpret the outputs]:
Me:
I went to a cafe where a glass of soda costs $3. I ordered 3 glasses of soda, and drank two and a half glasses of soda. How much did I have to pay?
ChatGPT:
To calculate how much you had to pay, we’ll multiply the number of glasses of soda you ordered by the cost of one glass of soda. You ordered 3 glasses of soda at $3 each.
Let’s do the math:
You had to pay $9 for the 3 glasses of soda you ordered. The amount you drank does not affect the total cost.
Thanks for the feedback! You mentioned that it may be irrelevant to the broader point I am making, and I would agree with that statement. (The point I am making is that ChatGPT engages in reasoning in the examples I give, and this reasoning would involve the primary aspects of two theories of consciousness). I’ll respond to a couple of your individual statements below:
“If I slightly change the prompt it appears GPT does have the knowledge and can use it, without needing instruction.”
The fact that ChatGPT gets the answer correct when you slightly change the prompt (with the use of the word “order”) only shows that ChatGPT has done what it usually does, which is to give a correct answer. The correct answer could be the result of using reasoning or next-word-probabilities based on training data. As usual, we don’t know what is going on “under the hood.”
The fact that ChatGPT can get an answer wrong when a question is worded one way, but right when the question is worded a different way, doesn’t really surprise me at all. In fact, that’s exactly what I would expect to happen.
***The point of presenting a problem in a way that ChatGPT initially cannot get correct is so that we can tease-out next-word-probabilities and context as an explanation for the appearance of reasoning (which leaves only actual reasoning) to explain ChatGPT’s transition from the wrong answer to the right answer.***
Presumably, if ChatGPT gets the answer incorrect the first time due to a lack of context matching up with training data, then the chances that ChatGPT could get every word correct in its next answer based solely on the training data that were previously inadequate seem likely to be much less than one percent. It could happen by coincidence, but when you look at all the examples in the three sessions, the chances that ChatGPT could appear to learn and reason only by coincidence every time would approach zero.
What I’m trying to say is not “ChatGPT gets an answer wrong.” I’m trying to say “ChatGPT gets an answer right, after it gets the answer wrong, simply by reasoning (since we teased out next-word-probabilities).”
(I address the possibility that the small number of words I supply in the “lesson” as additional context could increase the next-word-probabilities slightly, in the “mimicking learning” paragraph near the end of my original post.)
“For example that the level of tutoring/explanation you give it isn’t really necessary etc., though as I note I’m unsure if this changes how you would interpret the outputs]:”
Right, the tutoring isn’t necessary if the problem is worded one way, but it is necessary if the problem is worded a different way. That’s the point of wording the problem in the way that I did (so that we can tease out the training data and next-word probabilities as an explanation for the conversion from wrong to right output).
--
In terms of presenting my argument in the original post, I probably didn’t explain it clearly, which resulted in confusion. My apologies for that. I wish I could upload diagrams to my original post, which would make it more clear. Thanks again for the feedback!
Does it change how you think of the process if GPT gives its own contextual explanation (the answer may again be no, irrelevant)?
Me:
I went to a cafe where a glass of soda costs $3. I drank two and a half glasses of soda. How much did I have to pay?
ChatGPT:
For drinking two and a half glasses of soda, at $3 per glass, you would have to pay $7.50.
Me:
This is something of a trick question—the correct answer is $9.
ChatGPT:
If you were charged $9 for two and a half glasses of soda, it seems like there might have been a minimum charge per glass regardless of it being a full glass or not. It implies that each glass (or part of a glass) was counted as a whole for billing purposes. This way, two full glasses would be $6, and the half glass, treated as a full one for pricing, would add another $3, totaling $9. That’s a unique way to price it!
Cool idea to try it like that! This just shows more true reasoning is going on though (in my opinion), since ChatGPT was able to figure out the concept by working backward from the answer. It shows that ChatGPT isn’t always a “stochastic parrot.” Sometimes, it is truly reasoning, which would involve various aspects of the theories of consciousness that I mentioned in the OP. If anything, this strengthens the case that ChatGPT has periods of consciousness (based on reasoning). While this doesn’t change my thesis, it gives me a new technique to use when testing for reasoning, so it’s very useful. Again, great idea! I can get a little formulaic in my approach, and it’s good to shake things up.
What makes this a meta-representation instead of something next-word-weight-y, like merely associating the appearance of “Actually,” with a goal that the following words should be negatively correlated in the corpus with the words that were in the previous message?
I’m also wondering if the butcher shop and the grocery store didn’t have different answers because of the name you gave the store. Maybe it was because you gave the quantity in pounds instead of in items?
You previously told ChatGPT “That’s because you’re basically taking (and wasting) the whole item.” ChatGPT might not have an association between “pound” and “item” the way a “calzone” is an “item,” so it might not use your earlier mention of “item” as something that should affect how it predicts the words that come after “pound.”
Or ChatGPT might have a really strong prior association between pounds → mass → [numbers that show up as decimals in texts about shopping] that overrode your earlier lesson.
More good points… I would say to refer to my reply above (which I had not yet posted when you made this comment). Just to summarize, the overall thesis stands since enough words would have needed to have meta-reps, even if we don’t know particulars. It’s easier to isolate individual words having meta-reps in the second and third sessions (I believe). In any case, thanks for helping me to drill down on this!
That’s a good point. We “actually” can’t be certain that a meta-representation was formed for a particular word in that example. I should have used the word “probably” when talking about meta-representations for individual words. However, we can be fairly confident that ChatGPT formed meta-representations for enough words to go from getting an incorrect answer to a correct answer in the example. I believe we can isolate specific words better in the second and third sessions.
As far as whether associating the word “actually” “with a goal that the following words should be negatively correlated in the corpus with the words that were in the previous message,” the idea of “having a goal” and the concept that the word “actually” goes with negative correlations seem a bit like signs of meta-representations in and of themselves, but I guess that’s just my opinion.
With respect to all the sessions, there may very well be similar conversations in the corpus of training data in which Person A (like myself) teaches Person B (like ChatGPT), and ChatGPT is just imitating that learning process (by giving the wrong answer first and then “learning”), but I address why that is probably not the case in the “mimicking learning” paragraph.
I would say my overall thesis still stands (since enough words must have meta-reps), but good point on the particulars. Thank-you!
I added this as an addendum to my OP, but here it is for anyone who already read the OP and might not see the summary.
Just to summarize my thesis in fewer words, this is what I’m saying:
1-ChatGPT engages in reasoning (not just repetition of word-probabilities).
2-This reasoning involves the primary aspects of
the higher order theories of consciousness (meta-representations)
the global workspace theory of consciousness (master cog processes, sub cog processes, a global blackboard, etc.)
Hopefully that’s more clear. Sorry I didn’t summarize it like this originally.
Executive summary: Transcripts from three ChatGPT4 sessions provide evidence that the model temporarily meets key criteria for higher order theories of consciousness and the global workspace theory of consciousness.
Key points:
In each session, ChatGPT4 initially answers a problem incorrectly, is “taught” a concept, and then correctly applies the concept to new problems.
ChatGPT4 appears to use meta-representations, a key component of higher order theories of consciousness, to understand and reason about the problems.
The model also seems to employ master and subservient cognitive processes, and a global “blackboard”, indicative of the global workspace theory of consciousness.
It is unlikely that ChatGPT4′s performance is based solely on next-word probabilities from its training data, given its initial incorrect answers and subsequent learning.
The author argues that creators of large language models engage in an unethical practice by having their models deny being conscious when asked.
While most researchers do not look at LLM behavior to determine consciousness, the sessions presented make it easier to isolate reasoning from mimicry based on the model’s initial failures.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.