Are there any concrete reasons to suspect language models to start to act more like consequentialists the better they get at modelling them? I think I’m asking something subtle, so let me rephrase. This is probably a very basic question, I’m just confused about it.
If an LLM is smart enough give us a step-by-step robust plan that covers everything with regards to solving alignment and steering the future to where we want it, are there concrete reasons to expect it to also apply a similar level of unconstrained causal reasoning wrt its own loss function or evolved proxies?
At the moment, I can’t settle this either way, so it’s cause for both worry and optimism.
From my current understanding of LLMs they do not have the capability to reason or have a will as of now. I know there are plans to see if with specific build in prompts this can be made possible, but the way the models are build at the moment is that they do not have an understanding of what they are writing.
Aside from my understanding of the underlying workings of GPT-4, an example that illustrates this, is that sometimes if you ask GPT-4 questions that it doesn’t know the precise answer to, it will “hallucinate”, meaning it will give a confident answer that is factually incorrect / not based on it’s training data. It doesn’t “understand” your question, it is trained on a lot of text and based on the text you give it, it generates some other text that is likely a good response, to say it really simplified.
You could make an argument that even the people at OpenAI don’t truly know why GPT-4 gives the answers that it does, since it’s pretty much a black box that is trained on a preset of data and then OpenAI adds some human feedback, to quote from their website:
So as of now if I get your question right there is no evidence that I’m aware of that would point towards these LLMs “applying” anything, they are totally reliant on the input they are given and don’t learn significantly beyond their training data.
Thanks for reply! I don’t think the fact that they hallucinate is necessarily indicative of limited capabilities. I’m not worried about how dumb they are at their dumbest, but how smart they are at their smartest. Same with humans lol.
Though, for now, I still struggle with getting GPT-4 to be creative. But this could either be because it’s habit to stick to training data, and not really about it being too dumb to come up with creative plans. …I remember when I was in school, I didn’t much care for classes, but I studied math on my own. If my reward function hasn’t been attuned to whatever tests other people have designed for me, I’m just not going to try very hard.
Maybe to explain a bit more in detail what I meant with the example of hallucinating, rather than showcasing it’s limitation it’s showcasing it’s lack of understanding.
For example if you ask a human something and they’re honest about it, if they don’t know something they will not make something up but just tell you the information they have and beyond that they don’t know.
While in the hallucinating case the AI doesn’t say that it doesn’t know something, which it often does btw, but it doesn’t understand that it doesn’t know and just comes up with something “random”.
So I meant to say that it hallucinating is showcasing it’s lack of understanding.
I have to say though that I can’t be sure why it hallucinates really, it’s just my likely guess. Also for creativity there is some that you can do with prompt engineering but indeed at the end you’re limited by the training data + the max tokens that you can input where it can learn context from.
Hmm, I have a different take. I think if I tried to predict as many tokens as possible in response to a particular question, I would say all the words that I could guess someone who knew the answer would say, and then just blank out the actual answer because I couldn’t predict it.
Ah, you want to know about the Riemann hypothesis? Yes, I can explain to you what this hypothesis is, because I know it well. Wise of you ask me in particular, because you certainly wouldn’t ask anyone you knew didn’t have a clue. I will state its precise definition as follows:
~Kittens on the rooftop they sang nya nya nya.~
And that, you see, is what the hypothesis that Riemann hypothesised.
I’m not very good at even pretending to pretend to know what it is, so even if you blanked out the middle, you could still guess I was making it up. But if you blank out the substantive parts of GPT’s answer when it’s confabulating, you’ll have a hard time telling whether it knows the answer or not. It’s just good at what it does.
Are there any concrete reasons to suspect language models to start to act more like consequentialists the better they get at modelling them? I think I’m asking something subtle, so let me rephrase. This is probably a very basic question, I’m just confused about it.
If an LLM is smart enough give us a step-by-step robust plan that covers everything with regards to solving alignment and steering the future to where we want it, are there concrete reasons to expect it to also apply a similar level of unconstrained causal reasoning wrt its own loss function or evolved proxies?
At the moment, I can’t settle this either way, so it’s cause for both worry and optimism.
From my current understanding of LLMs they do not have the capability to reason or have a will as of now. I know there are plans to see if with specific build in prompts this can be made possible, but the way the models are build at the moment is that they do not have an understanding of what they are writing.
Aside from my understanding of the underlying workings of GPT-4, an example that illustrates this, is that sometimes if you ask GPT-4 questions that it doesn’t know the precise answer to, it will “hallucinate”, meaning it will give a confident answer that is factually incorrect / not based on it’s training data. It doesn’t “understand” your question, it is trained on a lot of text and based on the text you give it, it generates some other text that is likely a good response, to say it really simplified.
You could make an argument that even the people at OpenAI don’t truly know why GPT-4 gives the answers that it does, since it’s pretty much a black box that is trained on a preset of data and then OpenAI adds some human feedback, to quote from their website:
> So when prompted with a question, the base model can respond in a wide variety of ways that might be far from a user’s intent. To align it with the user’s intent within guardrails, we fine-tune the model’s behavior using reinforcement learning with human feedback (RLHF).
So as of now if I get your question right there is no evidence that I’m aware of that would point towards these LLMs “applying” anything, they are totally reliant on the input they are given and don’t learn significantly beyond their training data.
Thanks for reply! I don’t think the fact that they hallucinate is necessarily indicative of limited capabilities. I’m not worried about how dumb they are at their dumbest, but how smart they are at their smartest. Same with humans lol.
Though, for now, I still struggle with getting GPT-4 to be creative. But this could either be because it’s habit to stick to training data, and not really about it being too dumb to come up with creative plans. …I remember when I was in school, I didn’t much care for classes, but I studied math on my own. If my reward function hasn’t been attuned to whatever tests other people have designed for me, I’m just not going to try very hard.
Maybe to explain a bit more in detail what I meant with the example of hallucinating, rather than showcasing it’s limitation it’s showcasing it’s lack of understanding.
For example if you ask a human something and they’re honest about it, if they don’t know something they will not make something up but just tell you the information they have and beyond that they don’t know.
While in the hallucinating case the AI doesn’t say that it doesn’t know something, which it often does btw, but it doesn’t understand that it doesn’t know and just comes up with something “random”.
So I meant to say that it hallucinating is showcasing it’s lack of understanding.
I have to say though that I can’t be sure why it hallucinates really, it’s just my likely guess. Also for creativity there is some that you can do with prompt engineering but indeed at the end you’re limited by the training data + the max tokens that you can input where it can learn context from.
Hmm, I have a different take. I think if I tried to predict as many tokens as possible in response to a particular question, I would say all the words that I could guess someone who knew the answer would say, and then just blank out the actual answer because I couldn’t predict it.
I’m not very good at even pretending to pretend to know what it is, so even if you blanked out the middle, you could still guess I was making it up. But if you blank out the substantive parts of GPT’s answer when it’s confabulating, you’ll have a hard time telling whether it knows the answer or not. It’s just good at what it does.