To successfully reason in the way it did, ChatGPT would have needed a meta-representation for the word “actually,” in order to understand that its prior answer was incorrect.
What makes this a meta-representation instead of something next-word-weight-y, like merely associating the appearance of “Actually,” with a goal that the following words should be negatively correlated in the corpus with the words that were in the previous message?
I’m also wondering if the butcher shop and the grocery store didn’t have different answers because of the name you gave the store. Maybe it was because you gave the quantity in pounds instead of in items?
You previously told ChatGPT “That’s because you’re basically taking (and wasting) the whole item.” ChatGPT might not have an association between “pound” and “item” the way a “calzone” is an “item,” so it might not use your earlier mention of “item” as something that should affect how it predicts the words that come after “pound.”
Or ChatGPT might have a really strong prior association between pounds → mass → [numbers that show up as decimals in texts about shopping] that overrode your earlier lesson.
More good points… I would say to refer to my reply above (which I had not yet posted when you made this comment). Just to summarize, the overall thesis stands since enough words would have needed to have meta-reps, even if we don’t know particulars. It’s easier to isolate individual words having meta-reps in the second and third sessions (I believe). In any case, thanks for helping me to drill down on this!
That’s a good point. We “actually” can’t be certain that a meta-representation was formed for a particular word in that example. I should have used the word “probably” when talking about meta-representations for individual words. However, we can be fairly confident that ChatGPT formed meta-representations for enough words to go from getting an incorrect answer to a correct answer in the example. I believe we can isolate specific words better in the second and third sessions.
As far as whether associating the word “actually” “with a goal that the following words should be negatively correlated in the corpus with the words that were in the previous message,” the idea of “having a goal” and the concept that the word “actually” goes with negative correlations seem a bit like signs of meta-representations in and of themselves, but I guess that’s just my opinion.
With respect to all the sessions, there may very well be similar conversations in the corpus of training data in which Person A (like myself) teaches Person B (like ChatGPT), and ChatGPT is just imitating that learning process (by giving the wrong answer first and then “learning”), but I address why that is probably not the case in the “mimicking learning” paragraph.
I would say my overall thesis still stands (since enough words must have meta-reps), but good point on the particulars. Thank-you!
What makes this a meta-representation instead of something next-word-weight-y, like merely associating the appearance of “Actually,” with a goal that the following words should be negatively correlated in the corpus with the words that were in the previous message?
I’m also wondering if the butcher shop and the grocery store didn’t have different answers because of the name you gave the store. Maybe it was because you gave the quantity in pounds instead of in items?
You previously told ChatGPT “That’s because you’re basically taking (and wasting) the whole item.” ChatGPT might not have an association between “pound” and “item” the way a “calzone” is an “item,” so it might not use your earlier mention of “item” as something that should affect how it predicts the words that come after “pound.”
Or ChatGPT might have a really strong prior association between pounds → mass → [numbers that show up as decimals in texts about shopping] that overrode your earlier lesson.
More good points… I would say to refer to my reply above (which I had not yet posted when you made this comment). Just to summarize, the overall thesis stands since enough words would have needed to have meta-reps, even if we don’t know particulars. It’s easier to isolate individual words having meta-reps in the second and third sessions (I believe). In any case, thanks for helping me to drill down on this!
That’s a good point. We “actually” can’t be certain that a meta-representation was formed for a particular word in that example. I should have used the word “probably” when talking about meta-representations for individual words. However, we can be fairly confident that ChatGPT formed meta-representations for enough words to go from getting an incorrect answer to a correct answer in the example. I believe we can isolate specific words better in the second and third sessions.
As far as whether associating the word “actually” “with a goal that the following words should be negatively correlated in the corpus with the words that were in the previous message,” the idea of “having a goal” and the concept that the word “actually” goes with negative correlations seem a bit like signs of meta-representations in and of themselves, but I guess that’s just my opinion.
With respect to all the sessions, there may very well be similar conversations in the corpus of training data in which Person A (like myself) teaches Person B (like ChatGPT), and ChatGPT is just imitating that learning process (by giving the wrong answer first and then “learning”), but I address why that is probably not the case in the “mimicking learning” paragraph.
I would say my overall thesis still stands (since enough words must have meta-reps), but good point on the particulars. Thank-you!