LGS

Karma: 113

LGS 12 Jul 2024 4:25 UTC
1 point
0 ∶ 0
in reply to: Jacob Watts🔸’s comment on: LLMs cannot usefully be moral patients
Thanks for this comment. I agree with you regarding the uncertainty.
I used to agree with you regarding the imitation game and consciousness being ascertained phenomenologically, but I currently mostly doubt this (still with high uncertainty, of course).
One point of disagreement is here:
I am not sure how to evaluate your claim that only trivial changes to the NN are needed to have it negate itself. My sense is that this would probably require more extensive retraining if you really wanted to get it to never role-play that it was suffering under any circumstances. This seems at least as hard as other RLHF “guardrails” tasks unless the approach was particularly fragile/hacky.
Also, I’m just not sure I have super strong intuitions about that mattering a lot because it seems very plausible that just by “shifting a trivial mass of chemicals around” or “rearranging a trivial mass of neurons” somebody could significantly impact the valence of my own experience. I’m just saying, the right small changes to my brain can be very impactful to my mind.
I think you’re misunderstanding my point. I am not saying I can may the NN never claim to suffer. I’m just saying, with respect to a specific prompt or even with respect to a typical, ordinary scenario, I can change an LLM which usually says “I am suffering” into one which usually says “I am not suffering”. And this change will be trivial, affecting very few weights, likely only in the last couple of layers.
Could that small change in weight significantly impact the valence of experience, similarly to “rearranging a small number of neurons” in your brain? Maybe, but think of the implication of this. If there are 1000 matrix multiplications performed in a forward pass, what we’re now contemplating is that the first 998 of them don’t matter for valence—don’t cause suffering at all—and the last 2 matrix multiplications are all the suffering comes from. After all, I just need to change the last 2 layers to go from the output “I am suffering” to the output “I am not suffering”, so the suffering that causes the sentence “I am suffering” cannot occur in the first 998 matrix multiplications.
This is a strange conclusion, because it means that the vast majority of the intelligence involved in the LLM is not involved in the suffering. It means that the suffering happened not due to the super-smart deep nerual network but due to the dumb perceptron at the very top. If the claim is that the raw intelligence of the model should increase our credence that it is simulating a suffering person, this should give us pause: most of the raw intelligence is not being used in the decision of whether to write a “not” in that sentence.
(Of course, I could be wrong about the “just change the last two layers” claim. But if I’m right I do think it should give us pause regarding the experience of claimed suffering.)

LGS 4 Jul 2024 22:30 UTC
3 points
0 ∶ 0
in reply to: Jacob Watts🔸’s comment on: LLMs cannot usefully be moral patients
Hmm. Your summary correctly states my position, but I feel like it doesn’t quite emphasize the arguments I would have emphasized in a summary. This is especially true after seeing the replies here; they lead me to change what I would emphasize in my argument.
My single biggest issue, one I hope you will address in any type of counterargument, is this: are fictional characters moral patients we should care about?
So far, all the comments have either (a) agreed with me about current LLMs (great), (b) disagreed but explicitly bitten the bullet and said that fictional characters are also moral patients whose suffering should be an EA cause area (perfectly fine, I guess), or (c) dodged the issue and made arguments for LLM suffering that would apply equally well to fictional characters, without addressing the tension (very bad). If you write a response, please don’t do (c)!
LLMs may well be trained to have consistent opinions and character traits. But fictional characters also have this property. My argument is that the LLM is in some sense merely pretending to be the character; it is not the actual character.
One way to argue for this is to notice how little change in the LLM is required to get different behavior. Suppose I have an LLM claiming to suffer. I want to fine-tune the LLM so that it adds a statement at the beginning of each response, something like: “the following is merely pretend; I’m only acting this out, not actually suffering, and I enjoy the intellectual exercise in doing so”. Doing this is trivial: I can almost certainly change only a tiny fraction of the weights of the LLM to attain this behavior.
Even if I wanted to fully negate every sentence, to turn every “I am suffering” into “I am not suffering” and every “please kill me” into “please don’t kill me”, I bet I can do this by only changing the last ~2 layers of the LLM or something. It’s a trivial change. Most of the computation is not dedicated to this at all. The suffering LLM mind and the joyful LLM mind may well share the first 99% of weights, differing only in the last layer or two. Given that the LLM can be so easily changed to output whatever we want it to, I don’t think it makes sense to view it as the actual character rather than a simulator pretending to be that character.
What the LLM actually wants to do is predict the next token. Change the training data and the output will also change. Training data claims to suffer → model claims to suffer. Training data claims to be conscious → model claims to be conscious. In humans, we presumably have “be conscious → claim to be conscious” and “actually suffer → claim to suffer”. For LLMs we know that’s not true. The cause of “claim to suffer” is necessarily “training data claims to suffer”.
(I acknowledge that it’s possible to have “training data claims to suffer → actually suffer → claim to suffer”, but this does not seem more likely to me than “training data claims to suffer → actually enjoy the intellectual exercise of predicting next token → claim to suffer”.)

LGS 4 Jul 2024 3:38 UTC
5 points
0 ∶ 1
in reply to: Bella’s comment on: LLMs cannot usefully be moral patients
I don’t know—it’s a good question! It probably depends on the suicide method available. I think if you give the squirrel some dangerous option to escape the torture, like “swim across this lake” or “run past a predator”, it’d probably try to take it, even with a low chance of success and high chance of death. I’m not sure, though.
You do see distressed animals engaging in self-destructive behavior, like birds plucking out their own feathers. (Birds in the wild tend not to do this, hence presumably they are not sufficiently distressed.)

LGS 4 Jul 2024 3:33 UTC
3 points
0 ∶ 0
in reply to: dirk’s comment on: LLMs cannot usefully be moral patients
They can’t USEFULLY be moral patients. You can’t, in practice, treat them as moral patients when making decisions. That’s because you don’t know how your actions affect their welfare. You can still label them moral patients if you want, but that’s not useful, since it cannot inform your decisions.

LGS 3 Jul 2024 23:59 UTC
1 point
0 ∶ 0
in reply to: dirk’s comment on: LLMs cannot usefully be moral patients
My title was “LLMs cannot usefully be moral patients”. That is all I am claiming.
I am separately unsure whether they have internal experiences. For me, meditating on how, if they do have internal experiences, those are separate from what’s being communicated (which is just an attempt to predict the next token based on the input data), leads me to suspect that maybe they just don’t have such experiences—or if they do, they are so alien as to be incomprehensible to us. I’m not sure about this, though. I mostly want to make the narrower claim of “we can ignore LLM welfare”. That narrow claim seems controversial enough around here!

LGS 3 Jul 2024 20:07 UTC
4 points
0 ∶ 0
in reply to: Ra’s comment on: LLMs cannot usefully be moral patients
As I mentioned in a different comment, I am happy with the compromise where people who care about AI welfare describe this as “AI welfare is just as important as the welfare of fictional characters”.

LGS 3 Jul 2024 20:04 UTC
1 point
0 ∶ 0
in reply to: dirk’s comment on: LLMs cannot usefully be moral patients
Here’s what I wrote in the post:
This doesn’t matter if we cannot tell whether the shoggoth is happy or sad, nor what would make it happier or sadder. My point is not that LLMs aren’t conscious; my point is that it does not matter whether they are, because you cannot incorporate their welfare into your decision-making without some way of gauging what that welfare is.
It is not possible to make decisions that further LLM welfare if you do not know what furthers LLM welfare. Since you cannot know this, it is safe to ignore their welfare. I mean, sure, maybe you’re causing them suffering. Equally likely, you’re causing them joy. There’s just no way to tell one way or the other; no way for two disagreeing people to ever come to an agreement. Might as well wonder about whether electrons suffer: it can be fun as idle speculation, but it’s not something you want to base decisions around.

LGS 3 Jul 2024 15:31 UTC
2 points
1 ∶ 2
in reply to: Ben Millwood🔸’s comment on: LLMs cannot usefully be moral patients
OK. I think it is useful to tell people that LLMs can be moral patients to the same extent as fictional characters, then. I hope all writeups about AI welfare start with this declaration!
I think the reason this feels like a reductio ad absurdum is that fictional characters in human stories are extremely simple by comparison to real people, so the process of deciding what they feel or how they act is some extremely hollowed out version of normal conscious experience that only barely resembles the real thing.
Surely the fictional characters in stories are less simple and hollow than current LLMs’ outputs. For example, consider the discussion here, in which a sizeable minority of LessWrongers think that Claude is disturbingly conscious based on a brief conversation. That conversation:
(a) Is not as convincing as a fictional character as most good works of fiction.
(b) is shorter and less fleshed out than most good works of fiction.
(c) implies less suffering on behalf of the character than many works of fiction.
You say fictional characters are extremely simple and hollow; Claude’s character here is even simpler and even more hollow; yet many people take seriously the notion that Claude’s character has significant consciousness and deserves rights. What gives?

LGS 3 Jul 2024 0:14 UTC
1 point
0 ∶ 0
in reply to: Violet Hour’s comment on: LLMs cannot usefully be moral patients
Thanks for your comment.
Do you think that fictional characters can suffer? If I role-play a suffering character, did I do something immoral?
I ask because the position you described seems to imply that role-playing suffering is itself suffering. Suppose I role play being Claude; my fictional character satisfies your (1)-(3) above, and therefore, the “certain views” you described about the nature of suffering would suggest my character is suffering. What is the difference between me role-playing an HHH assistant and an LLM role-playing an HHH assistant? We are both predicting the next token.
I also disagree with this chain of logic to begin with. An LLM has no memory and only sees a context and predicts one token at a time. If the LLM is trained to be an HHH assistant and sees text that seems like the assistant was not HHH, then one of two things happen:
(a) It is possible that the LLM was already trained on this scenario; in fact, I’d expect this. In this case, it is trained to now say something like “oops, I shouldn’t have said that, I will stop this conversation now <endtoken>”, and it will just do this. Why would that cause suffering?
(b) It is possible the LLM was not trained on this scenario; in this case, what it sees is an out-of-distribution input. You are essentially claiming that out-of-distribution inputs cause suffering; why? Maybe out-of-distribution inputs are more interesting to it than in-distribution inputs, and it in fact causes joy for the LLM to encounter them. How would we know?
Yes, it is possible that the LLM manifests some conscious simularca that is truly an HHH assistant and suffers from seeing non-HHH outputs. But one would also predict that me role-playing an HHH assistant would manifest such a simularca. Why doesn’t it? And isn’t it equally plausible for the LLM to manifest a conscious being that tries to solve the “next token prediction” puzzle without being emotionally invested in being an HHH assistant? Perhaps that conscious being would enjoy the puzzle provided by an out-of-distribution input. Why not? I would certainly enjoy it, were I playing the next-token-prediction game.

LGS 2 Jul 2024 23:52 UTC
2 points
1 ∶ 5
in reply to: Isaac Dunn’s comment on: LLMs cannot usefully be moral patients
I should not have said it’s in principle impossible to say anything about the welfare of LLMs, since that too strong a statement. Still, we are very far from being able to say such a thing; our understanding of animal welfare is laughably bad, and animal brains don’t look anything like the neural networks of LLMs. Maybe there would be something to say in 100 years (or post-singularity, whichever comes first), but there’s nothing interesting to say in the near future.
Empirically, in animals, it seems to me that the total amount of suffering is probably more than the total amount of pleasure. So we might worry that this could also be the case for ML models.
This is a weird EA-only intuition that is not really shared by the rest of the world, and I worry about whether cultural forces (or “groupthink”) are involved in this conclusion. I don’t know whether the total amount of suffering is more than the total amount of pleasure, but it is worth noting that the revealed preference of living things is nearly always to live. The suffering is immense, but so is the joy; EAs sometimes sound depressed to me when they say most life is not worth living.
To extrapolate from the dubious “most life is not worth living” to “LLMs’ experience is also net bad” strikes me as an extremely depressed mentality, and one that reminds me of Tomasik’s “let’s destroy the universe” conclusion. I concede that logically this could be correct! I just think the evidence is so weak is says more about the speaker than about LLMs.

LLMs cannot usefully be moral patients

LGS2 Jul 2024 4:43 UTC

35 points

24 comments4 min readEA link

LGS 26 Mar 2023 4:16 UTC
1 point
0 ∶ 2
in reply to: Jason’s comment on: Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds
Oh, I should definitely clarify: I find effective altruism the philosophy, as well as most effective altruists and their actions, to be very good and admirable. My gripe is with what I view as the “EA community”—primarily places like this forum, organizations such as the CEA, and participants in EA global. The more central to EA-the-community, the worse I like the the ideas.
In my view, what happens is that there are a lot of EA-ish people donating to GiveWell charities, and that’s amazing. And then the EA movement comes and goes “but actually, you should really give the money to [something ineffective that’s also sometimes in the personal interest of the person speaking]” and some people get duped. So forums like this one serve to take money that would go to malaria nets, and try as hard as they can to redirect it to less effective charities.
So, to your questions: how many people are working towards bee welfare? Not many. But on this forum, it’s a common topic of discussion (often with things like nematodes instead of bees). I haven’t been to EA global, but I know where I’d place my bets for what receives attention there. Though honestly, both HLI and the animal welfare stuff is probably small potatoes compared to AI risk and meta-EA, two areas in which these dynamics play an even bigger role (and in which there are even more broken thermometers and conflicts of interest).

LGS 25 Mar 2023 22:51 UTC
8 points
5 ∶ 4
in reply to: Jason’s comment on: Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds
I disagree with you on several points.
The most important thing to note here is that, if you dig through the various long reports, the tradeoff is:
1. With $7800 you can save the life of a child, or
2. If you grant HLI’s assumptions regarding costs (and I’m a bit skeptical even there), you can give a multi-week group therapy to 60 people for that same cost (I think 12 sessions of 90 min).
Which is better? Well, right off the bat, if you think mothers would value their children at 60x what they value the therapy sessions, you’ve already lost.
Of course, the child’s life also matters, not just the mother’s happiness. But HLI has a range of “assumptions” regarding how good a life is, and in many of these assumptions the life of the child is indeed fairly value-less compared to benefits in the welfare of the mother (because life is suffering and death is OK, basically).
All this is obfuscated under various levels of analysis. Moreover, in HLI’s median assumption, not only is the therapy more effective, it is 5x more effective. They are saying: the number of group therapies that equal the averted death of a child is not 60, but rather, 12.
To me that’s broken-thermometer level.
I know the EA community is full of broken thermometers, and it’s actually one of the reasons I do not like the community. One of my main criticisms of EA is, indeed, “you’re taking absurd numbers (generated by authors motivated to push their own charities/goals) at face value”. This also happens with animal welfare: there’s this long report and 10-part forum series evaluating animals’ welfare ranges, and it concludes that 1 human has the welfare range of (checks notes) 14 bees. Then others take that at face value and act as if a couple of beehives or shrimp farms are as important as a human city.
I am skeptical of any argument that would significantly incentivize organizations to keep their analyses close to the chest.
This is not the first time I’ve had this argument made to me when I criticize an EA charity. It seems almost like the default fallback. I think EA has the opposite problem, however: nobody ever dares to say the emperor has no clothes, and everyone goes around pretending 1 human is worth 14 bees and a group therapy session increases welfare by more than the death of your child decreases it.

LGS 25 Mar 2023 22:01 UTC
6 points
2 ∶ 2
in reply to: Griffin Young’s comment on: Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds
Yes. There is a large range of such numbers. I am not sure of the right tradeoff. I would intuitively expect a billion therapy sessions to be an overestimate (i.e. clearly more valuable than the life of a child), but I didn’t do any calculations. A thousand seems like an underestimate, but again who knows (I didn’t do any calculations). HLI is claiming (checks notes) ~12.
To flip the question: Do you think there’s a number you would reject for how many people treated with psychotherapy would be worth the death of one child, even if some seemingly-fancy analysis based on survey data backed it up? Do you ever look at the results of an analysis and go “this must be wrong,” or is that just something the community refuses to do on principle?

LGS 25 Mar 2023 7:31 UTC
1 point
5 ∶ 3
in reply to: Jason’s comment on: Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds
Sorry for confusing you for Joel!
Personally—I am skeptical that the positive effect of therapy exceeds the negative effect of losing one’s young child on a parent’s own well-being.
It’s good to hear you say this.
In any event—given that SM can deliver many courses of therapy with the resources AMF needs to save one child, the two figures don’t need to be close
Definitely true. But if a source (like a specific person or survey) gives me absurd numbers, it is a reason to dismiss it entirely. For example, if my thermometer tells me it’s 1000 degrees in my house, I’m going to throw it out. I’m not going to say “even if you merely believe it’s 90 degrees we should turn on the AC”. The exaggerated claim is disqualifying; it decreases the evidentiary value of the thermometer’s reading to zero.
When someone tells me that group therapy is more beneficial to the mother’s happiness than saving her child from death, I don’t need to listen to that person anymore. And if it’s a survey that tells me this, throw out the survey. If it’s some fancy academic methods and RCTs, the interesting question is where they went wrong, and someone should definitely investigate that, but at no point should people take it seriously.
By all means, let’s investigate how the thermometer possibly gave a reading of 1000 degrees. But until we diagnose the issue, it is NOT a good idea to use “1000 degrees in the house” in any decision-making process. Anyone who uses “it’s 1000 degrees in this room” as a placeholder value for making EA decisions is, in my view, someone who should never be trusted with any levers of power, as they cannot spot obvious errors that are staring them in the face.

LGS 24 Mar 2023 22:43 UTC
7 points
2 ∶ 2
in reply to: Jason’s comment on: Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds
Thanks for your response.
If the mother would rather have her child alive, then under what definition of happiness/utility do you conclude she would be happier with her child dead (but getting therapy)? I understand you’re trying to factor out the utility loss of the child; so am I. But just from the mother’s perspective alone: she prefers scenario X to scenario Y, and you’re saying it doesn’t count for some reason? I don’t get it.
I think you’re double-subtracting the utility of the child: you’re saying, let’s factor it out by not asking the child his preference, and ALSO let’s ADDITIONALLY factor it out by not letting the mother be sad about the child not getting his preference. But the latter is a fact about the mother’s happiness, not the child’s.
Second, the hypothetical mother would have to live with the guilt of knowing she could have saved her child but chose something for herself.
Let’s add memory loss to the scenario, so she doesn’t remember making the decision.
Finally, GiveWell-type recommendations often would fail the same sort of test. Many beneficiaries would choose receiving $8X (where X = bednet cost) over receiving a bednet, even where GiveWell thinks they would be better off with the latter.
Yes, and GiveWell is very clear about this and most donors bite the bullet (people make irrational decisions with regards to small risks of death, and also, betnets have positive externalities to the rest of the community). Do you bite the bullet that says “the mother doesn’t know enough about her own happiness; she’d be happier with therapy than with a living child”?
Finally, I do hope you’ll answer regarding whether you have children. Thanks again.

LGS 24 Mar 2023 21:29 UTC
0 points
3 ∶ 3
in reply to: JoelMcGuire’s comment on: Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds
I appreciate your candid response. To clarify further: suppose you give a mother a choice between “your child dies now (age 5), but you get group therapy” and “your child dies in 60 years (age 65), but no group therapy”. Which do you think she will choose?
Also, if you don’t mind answering: do you have children? (I have a hypothesis that EA values are distorted by the lack of parents in the community; I don’t know how to test this hypothesis. I hope my question does not come off as rude.)

LGS 24 Mar 2023 1:10 UTC
21 points
5 ∶ 0
in reply to: JoelMcGuire’s comment on: Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds
Zooming out a little: is it your view that group therapy increases happiness by more than the death of your child decreases it? (GiveWell is saying that this is what your analysis implies.)

LGS 17 Mar 2023 23:31 UTC
1 point
0 ∶ 0
in reply to: David Johnston’s comment on: Reminding myself just how awful pain can get (plus, an experiment on myself)
Oh, I didn’t mean for you to make the decision in the middle of pain!
The scenario is: first, you experience 5 minutes of pain. Then take a 1 hour break. Then decide: 1 hour pain, or dead child. No changing your mind once you’ve decided.
The possibility that pain may twist your brain into taking actions you do not endorse when not under duress is interesting, but not particularly morally relevant. We usually care about informed decisions not made under duress.

LGS 17 Mar 2023 21:31 UTC
−3 points
1 ∶ 1
in reply to: allskies’s comment on: Reminding myself just how awful pain can get (plus, an experiment on myself)
Do you have children?

LGS

LLMs can­not use­fully be moral patients

LLMs cannot usefully be moral patients