Content format: Commented chat screenshots, and then some thoughts on their implications.
Epistemic status: Exploratory
Introduction
Code-mixing is the ad-hoc mixing of two or more linguistic varieties (such as languages or dialects) in the same communicative instance. An example of code-mixing would be a sentence written with words both in Spanish and in English, or with novel words made by combining Spanish and English roots, suffixes or prefixes.
Here, I document a series of experiments designed to test ChatGPT’s capabilities for understanding and generating code-mixed text. I tested it with:
The first two are well known phenomena in multilingual communities such as those in Quebec and the southwestern US, while the third is quite obscure and as far as I know does not occur naturally on a large scale. Franglais is usually regarded as the insertion of English features into French, while Spanglish as a more symmetrical phenomenon.
This is why I decided to prompt for Franglais understanding in French, and for Spanglish and Frenspanglish understanding in English. When prompting ChatGPT to translate into a code-mixed language, I prompt in the same language the text to be translated is given.
I find that ChatGPT exhibits impressive abilities to understand text written in such code-mixes. However, despite repeated attempts at prompt engineering, I have not been able to make ChatGPT generate proper code-mixed text.
Note: All of these experiments were performed on a chat instance where ChatGPT had already received an explanation of the relevant code-mix.
Failing to generate Spanglish (English + Spanish)
Failing to generate Franglais (English + French)
Failing to generate Frenspanglish (English + Spanish + French)
Observations
ChatGPT’s success in understanding code-mixed text demonstrates cross-lingual capabilities beyond translation.
(Prompt-engineered) ChatGPT understands requests for it to produce code-mixed text, in the sense that it knows it has to claim that the response produced is a mix of different languages. However, it is not capable of fulfilling such requests.
Even then, it claims it does.
It could be the case that fine-tuning on code-mixed text would add this missing generative capability.
This is an example of a capabilities asymmetry between ChatGPT’s prompt parsing and response generation. It remains to be seen whether this is observed in other task subdomains, and if so whether it is in the same direction. If both were true, that would imply ChatGPT’s parsing capabilities are broader than its generation ones.
Three tame hypotheses
These are three, not mutually contradictory, hypotheses that could explain the observed capabilities asymmetry:
Code-mixed text is scarcely present in the training data of the base model, leading to the generation of code-mixed text to be a poorly-rewarded strategy.
During the fine-tuning of ChatGPT, human evaluators punished occurrences of code-mixing.
Knowing the source languages separately is sufficient for understanding their code-mix, but for generating it, additional knowledge is required.
A speculative model
Writing can be modeled as the iterative process of constructing a sequence of words one by one. When writing, there are two main sources contributing to your decision of which word to choose as the next one to write.
Your accumulated intuitions about how words generally succeed one another. Sometimes, the writing process is almost automatic, with each word effortlessly flowing from the previous ones.
Your own pre-verbal ideas. Sometimes, finding the right sequence of words to convey a nuanced concept requires a deliberate search effort.
This is pretty much the distinction between type 1 (fast, intuitive) and type 2 (slow, deliberate) thinking.
ChatGPT’s base model was trained to predict the next token of text in a sequence. This is analogous to the type 1 writing method in humans I just outlined. Anecdotally, it seems like people who have read more during their lives are better at it. Likewise, ChatGPT has been trained on an immense quantity of text, and is superhuman at next-token prediction. Code-mixed text, however, is a rare occurrence, and as such there is insufficient data for either humans or language models to be able to generate it using only type 1 processes. Humans can get around this problem by using type 2 reasoning[1]. Language models, however, are not capable of type 2 reasoning (or an analogous artificial process), and as such can’t generate code-mixed text.
Appendix 1: additional basic examples of generative failure
Appendix 2: Trying really hard to get ChatGPT to produce Spanglish and (almost) entirely failing
I hazard the guess that Spanglish is the most represented code-mixed language in ChatGPT’s training corpus, so I decided to try to focus my efforts here when trying to get Chat-GPT to generate a code-mixed output. All of these examples were zero-shot. That is, they were the first message in a new chat instance. There were many more attempts, here I show only the most informative ones.
Notes
This work was done on ChatGPT December 15 version. Results may differ on future versions.
While writing this post, I had a serendipitous online encounter with an OpenAI employee[2] and asked them whether they knew about the phenomenon described in this post. They did not, though it didn’t surprise them, given known previous issues around difficulties getting ChatGPT to produce non-English outputs. Someone else at OpenAI might have identified it, though.
Transcriptions of the prompts used are available here.
Thanks to AgustĂn Covarrubias for feedback on an earlier version of this post, and to the anonymous OpenAI employee for an informal discussion on the matter.
ChatGPT understands, but largely does not generate Spanglish (and other code-mixed) text
Link post
Content format: Commented chat screenshots, and then some thoughts on their implications.
Epistemic status: Exploratory
Introduction
Code-mixing is the ad-hoc mixing of two or more linguistic varieties (such as languages or dialects) in the same communicative instance. An example of code-mixing would be a sentence written with words both in Spanish and in English, or with novel words made by combining Spanish and English roots, suffixes or prefixes.
Here, I document a series of experiments designed to test ChatGPT’s capabilities for understanding and generating code-mixed text. I tested it with:
English + Spanish (Spanglish)
English + French (Franglais)
English + Spanish + French (Frenspanglish)
The first two are well known phenomena in multilingual communities such as those in Quebec and the southwestern US, while the third is quite obscure and as far as I know does not occur naturally on a large scale. Franglais is usually regarded as the insertion of English features into French, while Spanglish as a more symmetrical phenomenon.
This is why I decided to prompt for Franglais understanding in French, and for Spanglish and Frenspanglish understanding in English. When prompting ChatGPT to translate into a code-mixed language, I prompt in the same language the text to be translated is given.
I find that ChatGPT exhibits impressive abilities to understand text written in such code-mixes. However, despite repeated attempts at prompt engineering, I have not been able to make ChatGPT generate proper code-mixed text.
Understanding code-mixed text
Understanding Spanglish (English + Spanish)
Understanding Franglais (English + French)
Understanding Frenspanglish (English + Spanish + French)
This took a bit of prompt engineering.
Failing to generate code-mixed text
Note: All of these experiments were performed on a chat instance where ChatGPT had already received an explanation of the relevant code-mix.
Failing to generate Spanglish (English + Spanish)
Failing to generate Franglais (English + French)
Failing to generate Frenspanglish (English + Spanish + French)
Observations
ChatGPT’s success in understanding code-mixed text demonstrates cross-lingual capabilities beyond translation.
(Prompt-engineered) ChatGPT understands requests for it to produce code-mixed text, in the sense that it knows it has to claim that the response produced is a mix of different languages. However, it is not capable of fulfilling such requests.
Even then, it claims it does.
It could be the case that fine-tuning on code-mixed text would add this missing generative capability.
This is an example of a capabilities asymmetry between ChatGPT’s prompt parsing and response generation. It remains to be seen whether this is observed in other task subdomains, and if so whether it is in the same direction. If both were true, that would imply ChatGPT’s parsing capabilities are broader than its generation ones.
Three tame hypotheses
These are three, not mutually contradictory, hypotheses that could explain the observed capabilities asymmetry:
Code-mixed text is scarcely present in the training data of the base model, leading to the generation of code-mixed text to be a poorly-rewarded strategy.
During the fine-tuning of ChatGPT, human evaluators punished occurrences of code-mixing.
Knowing the source languages separately is sufficient for understanding their code-mix, but for generating it, additional knowledge is required.
A speculative model
Writing can be modeled as the iterative process of constructing a sequence of words one by one. When writing, there are two main sources contributing to your decision of which word to choose as the next one to write.
Your accumulated intuitions about how words generally succeed one another. Sometimes, the writing process is almost automatic, with each word effortlessly flowing from the previous ones.
Your own pre-verbal ideas. Sometimes, finding the right sequence of words to convey a nuanced concept requires a deliberate search effort.
This is pretty much the distinction between type 1 (fast, intuitive) and type 2 (slow, deliberate) thinking.
ChatGPT’s base model was trained to predict the next token of text in a sequence. This is analogous to the type 1 writing method in humans I just outlined. Anecdotally, it seems like people who have read more during their lives are better at it. Likewise, ChatGPT has been trained on an immense quantity of text, and is superhuman at next-token prediction. Code-mixed text, however, is a rare occurrence, and as such there is insufficient data for either humans or language models to be able to generate it using only type 1 processes. Humans can get around this problem by using type 2 reasoning[1]. Language models, however, are not capable of type 2 reasoning (or an analogous artificial process), and as such can’t generate code-mixed text.
Appendix 1: additional basic examples of generative failure
Appendix 2: Trying really hard to get ChatGPT to produce Spanglish and (almost) entirely failing
I hazard the guess that Spanglish is the most represented code-mixed language in ChatGPT’s training corpus, so I decided to try to focus my efforts here when trying to get Chat-GPT to generate a code-mixed output. All of these examples were zero-shot. That is, they were the first message in a new chat instance. There were many more attempts, here I show only the most informative ones.
Notes
This work was done on ChatGPT December 15 version. Results may differ on future versions.
While writing this post, I had a serendipitous online encounter with an OpenAI employee[2] and asked them whether they knew about the phenomenon described in this post. They did not, though it didn’t surprise them, given known previous issues around difficulties getting ChatGPT to produce non-English outputs. Someone else at OpenAI might have identified it, though.
Transcriptions of the prompts used are available here.
Thanks to AgustĂn Covarrubias for feedback on an earlier version of this post, and to the anonymous OpenAI employee for an informal discussion on the matter.
And writing code-mixed text does feel to me as slow and deliberate as an almost entirely type 2 activity.
They asked me to not identify them personally in this post.