If AIs are conscious, then they likely deserve moral consideration
AIs could have negligible welfare (in expectation) even if they are conscious. They may not be sentient even if they are conscious, or have negligible welfare even if they are sentient. I would say the (expected) total welfare of a group (individual welfare times population) matters much more for its moral consideration than the probability of consciousness of its individuals. Do you have any plans to compare the individual (expected hedonistic) welfare of AIs, animals, and humans? You do not mention this in the section āWhatās nextā.
The choice of a prior is often somewhat arbitrary and intended to reflect a state of ignorance about the details of the system. The final (posterior) probability the model generates can vary significantly depending on what we choose for the prior. Therefore, unless we are confident in our choices of priors, we shouldnāt be confident in the final probabilities.
Do you have any ideas for how to decide on the priors for the probability of sentience? I agree decisions about priors are often very arbitrary, and I worry they will have significantly different implications.
[...] We report what each perspective concludes, then combine these conclusions based on how credible experts find each perspective.
[...]
Which theory of consciousness is right matters a lot. Because different stances give strikingly different judgments about the probability of LLM consciousness, significant changes in the weights given to stances will yield significant differences in the results of the Digital Consciousness Model. [...]
I like that your report the results for each perspective. People usually give weights that are at least 0.1/āānumber of modelsā, which are not much smaller than the uniform weight of 1/āānumber of modelsā, but this could easily lead to huge mistakes. As a silly example, if I asked random people with age 7 about whether the gravitational force between 2 objects is proportional to ādistanceā^-2 (correct answer), ādistanceā^-20, or ādistanceā^-200, I imagine I would get a significant fraction picking the exponents of ā20 and ā200. Assuming 60 % picked ā2, 20 % picked ā20, and 20 % picked ā200, a respondant may naively conclude the mean exponent of ā45.2 (= 0.6*(-2) + 0.2*(-20) + 0.2*(-200)) is reasonable. Alternatively, a respondant may naively conclude an exponent of ā9.19 (= 0.933*(-2) + 0.0333*(-20) + 0.0333*(-200)) is reasonable giving a weight of 3.33 % (= 0.1/ā3) to each of the 2 wrong exponents, equal to 10 % of the uniform weight, and the remaining weight of 93.3 % (= 1 ā 2*0.0333) to the correct exponent. Yet, there is lots of empirical evidence against the exponents of ā45.2 and ā9.19 which the respondants are not aware of. The right conclusion would be that the respondants have no idea about the right exponent, or how to weight the various models because they would not be able to adequately justify their picks. This is also why I am sceptical that the absolute value of the welfare per unit time of animals is bound to be relatively close to that of humans, as one may naively infer from the welfare ranges Rethink Priorities (RP) initially presented, or the ones in Bob Fischerās book about comparing welfare across species, where there seems to be only 1 line about the weights. āWe assigned 30 percent credence to the neurophysiological model, 10 percent to the equality model, and 60 percent to the simple additive modelā.
Mistakes like the one illustrated above happen when the weights of models are guessed independently of their output. People are often sensitive to astronomical outputs, but not to the astronomically low weights they imply. How do you ensure the weights of the models to estimate the probability of consciousness are reasonable, and sensitive to their outputs? I would model the weights of the models as very wide distributions to represent very high model uncertainty.
AIs could have negligible welfare (in expectation) even if they are conscious. They may not be sentient even if they are conscious, or have negligible welfare even if they are sentient. I would say the (expected) total welfare of a group (individual welfare times population) matters much more for its moral consideration than the probability of consciousness of its individuals. Do you have any plans to compare the individual (expected hedonistic) welfare of AIs, animals, and humans? You do not mention this in the section āWhatās nextā.
This is an important caveat. While our motivation for looking at consciousness is largely from its relation to moral status, we donāt think that establishing that AIs were conscious would entail that they have significant states that counted strongly one way or the other for our treatment of them, and establishing that they werenāt conscious wouldnāt entail that we should feel free to treat them however we like.
We think that it estimates of consciousness still play an important practical role. Work on AI consciousness may help us to achieve consensus on reasonable precautionary measures and motivate future research directions with a more direct upshot. I donāt think the results of this model can be directly plugged into any kind of BOTEC, and should be treated with care.
Do you have any ideas for how to decide on the priors for the probability of sentience? I agree decisions about priors are often very arbitrary, and I worry they will have significantly different implications.
We favored a 1ā6 prior for consciousness relative to every stance and we chose that fairly early in the process. To some extent, you can check the prior against what you update to on the basis of your evidence. Given an assignment of evidence strength and an opinion about what it should say about something that satisfies all of the indicators, you can backwards infer the prior needed to update to the right posterior. That prior is basically implicit in your choices about evidential strength. We didnāt explicitly set our prior this way, but we would probably have reconsidered our choice of 1ā6 if it was giving really implausible results for humans, chickens, and ELIZA across the board.
The right conclusion would be that the respondants have no idea about the right exponent, or how to weight the various models because they would not be able to adequately justify their picks.
There is a tension here between producing probabilities we think are right and producing probabilities which could reasonably act as a consensus conclusion. I have my own favorite stance, and I think I have good reason for it, but I didnāt try to convince anyone to give it more weight in our aggregation. Insofar as weāre aiming in the direction of something that could achieve broad agreement, we donāt want to give too much weight to our own views (even if we think weāre right). Unfortunately,among people with significant expertise in this area, there is broad and fairly fundamental disagreement. We think that it is still valuable to shoot for consensus, even if that means everyone will think it is flawed (by giving too much weight to different stances.)
I have my own favorite stance, and I think I have good reason for it, but I didnāt try to convince anyone to give it more weight in our aggregation. Insofar as weāre aiming in the direction of something that could achieve broad agreement, we donāt want to give too much weight to our own views (even if we think weāre right).
To clarify, I do not have a view about which models should get more weight. I just think that, when results differ a lot across models, the top priority should be further research to decrease the uncertainty instead of acting based on a consensus view represented by best guesses for the weights of the models.
I would model the weights of the models as very wide distributions to represent very high model uncertainty.
In particular, I would model the weights of the stances as distributions instead of point estimates. As you note in the report, there was lots of variation across the 13 experts you surveyed
I wonder what exactly you asked the experts. I think the above would underestimate uncertainty if you just asked them to rate plausibility from 0 to 10, and there were experts reporting 0. Have you considered having a range of possible responses in a logarithmtic scale ranging from a weight/āprobability of e.g. 10^-6 to 1?
Thanks vasco. And thanks for helping us think through what we can do better. Some thoughts on this:
We considered several framings, scales and options to give experts. Since they were evaluating a lot of stances and we wanted experts to really know what we meant, we prioritised giving them context and then asking them the simplified general question of plausibility, with an intuitive scale. The exact question was: āhow plausible do you find X stance?ā, just after having fully describing X. We also asked them for general notes and comments and they didnāt seem to find that part of the survey particularly confusing (perhaps to your and my surprise). More broadly, I agree with you that sometimes perfectly defining terms and scales can help some people think through it but not everyone, and the science on how much it helps points is mixed.
We didnāt find that people were responding with zero plausibility very much at all. As you can see from the results, almost all respondents found most, if not all, stances at least a little bit plausible. I agree that had we found a lot of concentration around the very high or very low plausibility, having some sort of logarithmic scale could help distinguish results.
Iām not sure what you have in mind in terms of modelling the stancesā weight as distributions instead of point estimates. Perhaps you mean something like leveraging those distributions above via some sort of Monte Carlo where weights are drawn from these distributions and the process is repeated many times, then aggregated. That indeed sounds more sophisticated and could possibly help track uncertainty but I suspect it would very little difference. In particular, I think so because we observed that unweighted pooling of results across all stances is surprisingly similar to the pool when weighted by experts; the same if you squint.
We didnāt find that people were responding with zero plausibility very much at all.
I wonder how people decided between a plausibility of 0ā10 and 1ā10. It could be that people picked 0 for a plausibility lower than 0.5/ā10, or that they interpreted it as almost impossible, and therefore sometimes picked 1ā10 even for a plausibility lower than 0.5/ā10. A logarithmic scale would allow experts to specify plausibilities much lower than 1ā10 (e.g. 10^-6/ā10) without having to pick 0, although I do not know whether they would actually pick such values.
Iām not sure what you have in mind in terms of modelling the stancesā weight as distributions instead of point estimates. Perhaps you mean something like leveraging those distributions above via some sort of Monte Carlo where weights are drawn from these distributions and the process is repeated many times, then aggregated.
Yes, this is what I had in mind. Denoting by W_i and P_i the distributions for the weight and probability of consciousness for stance i, I would calculate the final distribution for the probability of consciousness from (W_1*P_1 + W_2*P_2 + ⦠W_13*P_13)/ā(W_1 + W_2 + ⦠W_13).
That indeed sounds more sophisticated and could possibly help track uncertainty but I suspect it would very little difference. In particular, I think so because we observed that unweighted pooling of results across all stances is surprisingly similar to the pool when weighted by experts; the same if you squint.
I think the mean of the final distribution for the probability of consciousness would be very similar. However, the final distribution would be more spread out. I do not know how much more spread out it would be, but I agree it would help track uncertainty better.
Thanks for this work. I find it valuable.
AIs could have negligible welfare (in expectation) even if they are conscious. They may not be sentient even if they are conscious, or have negligible welfare even if they are sentient. I would say the (expected) total welfare of a group (individual welfare times population) matters much more for its moral consideration than the probability of consciousness of its individuals. Do you have any plans to compare the individual (expected hedonistic) welfare of AIs, animals, and humans? You do not mention this in the section āWhatās nextā.
Do you have any ideas for how to decide on the priors for the probability of sentience? I agree decisions about priors are often very arbitrary, and I worry they will have significantly different implications.
I like that your report the results for each perspective. People usually give weights that are at least 0.1/āānumber of modelsā, which are not much smaller than the uniform weight of 1/āānumber of modelsā, but this could easily lead to huge mistakes. As a silly example, if I asked random people with age 7 about whether the gravitational force between 2 objects is proportional to ādistanceā^-2 (correct answer), ādistanceā^-20, or ādistanceā^-200, I imagine I would get a significant fraction picking the exponents of ā20 and ā200. Assuming 60 % picked ā2, 20 % picked ā20, and 20 % picked ā200, a respondant may naively conclude the mean exponent of ā45.2 (= 0.6*(-2) + 0.2*(-20) + 0.2*(-200)) is reasonable. Alternatively, a respondant may naively conclude an exponent of ā9.19 (= 0.933*(-2) + 0.0333*(-20) + 0.0333*(-200)) is reasonable giving a weight of 3.33 % (= 0.1/ā3) to each of the 2 wrong exponents, equal to 10 % of the uniform weight, and the remaining weight of 93.3 % (= 1 ā 2*0.0333) to the correct exponent. Yet, there is lots of empirical evidence against the exponents of ā45.2 and ā9.19 which the respondants are not aware of. The right conclusion would be that the respondants have no idea about the right exponent, or how to weight the various models because they would not be able to adequately justify their picks. This is also why I am sceptical that the absolute value of the welfare per unit time of animals is bound to be relatively close to that of humans, as one may naively infer from the welfare ranges Rethink Priorities (RP) initially presented, or the ones in Bob Fischerās book about comparing welfare across species, where there seems to be only 1 line about the weights. āWe assigned 30 percent credence to the neurophysiological model, 10 percent to the equality model, and 60 percent to the simple additive modelā.
Mistakes like the one illustrated above happen when the weights of models are guessed independently of their output. People are often sensitive to astronomical outputs, but not to the astronomically low weights they imply. How do you ensure the weights of the models to estimate the probability of consciousness are reasonable, and sensitive to their outputs? I would model the weights of the models as very wide distributions to represent very high model uncertainty.
This is an important caveat. While our motivation for looking at consciousness is largely from its relation to moral status, we donāt think that establishing that AIs were conscious would entail that they have significant states that counted strongly one way or the other for our treatment of them, and establishing that they werenāt conscious wouldnāt entail that we should feel free to treat them however we like.
We think that it estimates of consciousness still play an important practical role. Work on AI consciousness may help us to achieve consensus on reasonable precautionary measures and motivate future research directions with a more direct upshot. I donāt think the results of this model can be directly plugged into any kind of BOTEC, and should be treated with care.
We favored a 1ā6 prior for consciousness relative to every stance and we chose that fairly early in the process. To some extent, you can check the prior against what you update to on the basis of your evidence. Given an assignment of evidence strength and an opinion about what it should say about something that satisfies all of the indicators, you can backwards infer the prior needed to update to the right posterior. That prior is basically implicit in your choices about evidential strength. We didnāt explicitly set our prior this way, but we would probably have reconsidered our choice of 1ā6 if it was giving really implausible results for humans, chickens, and ELIZA across the board.
There is a tension here between producing probabilities we think are right and producing probabilities which could reasonably act as a consensus conclusion. I have my own favorite stance, and I think I have good reason for it, but I didnāt try to convince anyone to give it more weight in our aggregation. Insofar as weāre aiming in the direction of something that could achieve broad agreement, we donāt want to give too much weight to our own views (even if we think weāre right). Unfortunately,among people with significant expertise in this area, there is broad and fairly fundamental disagreement. We think that it is still valuable to shoot for consensus, even if that means everyone will think it is flawed (by giving too much weight to different stances.)
Thanks, Derek.
To clarify, I do not have a view about which models should get more weight. I just think that, when results differ a lot across models, the top priority should be further research to decrease the uncertainty instead of acting based on a consensus view represented by best guesses for the weights of the models.
In particular, I would model the weights of the stances as distributions instead of point estimates. As you note in the report, there was lots of variation across the 13 experts you surveyed
I wonder what exactly you asked the experts. I think the above would underestimate uncertainty if you just asked them to rate plausibility from 0 to 10, and there were experts reporting 0. Have you considered having a range of possible responses in a logarithmtic scale ranging from a weight/āprobability of e.g. 10^-6 to 1?
Thanks vasco. And thanks for helping us think through what we can do better. Some thoughts on this:
We considered several framings, scales and options to give experts. Since they were evaluating a lot of stances and we wanted experts to really know what we meant, we prioritised giving them context and then asking them the simplified general question of plausibility, with an intuitive scale. The exact question was: āhow plausible do you find X stance?ā, just after having fully describing X. We also asked them for general notes and comments and they didnāt seem to find that part of the survey particularly confusing (perhaps to your and my surprise). More broadly, I agree with you that sometimes perfectly defining terms and scales can help some people think through it but not everyone, and the science on how much it helps points is mixed.
We didnāt find that people were responding with zero plausibility very much at all. As you can see from the results, almost all respondents found most, if not all, stances at least a little bit plausible. I agree that had we found a lot of concentration around the very high or very low plausibility, having some sort of logarithmic scale could help distinguish results.
Iām not sure what you have in mind in terms of modelling the stancesā weight as distributions instead of point estimates. Perhaps you mean something like leveraging those distributions above via some sort of Monte Carlo where weights are drawn from these distributions and the process is repeated many times, then aggregated. That indeed sounds more sophisticated and could possibly help track uncertainty but I suspect it would very little difference. In particular, I think so because we observed that unweighted pooling of results across all stances is surprisingly similar to the pool when weighted by experts; the same if you squint.
Thanks for clarifying, Arvo.
I wonder how people decided between a plausibility of 0ā10 and 1ā10. It could be that people picked 0 for a plausibility lower than 0.5/ā10, or that they interpreted it as almost impossible, and therefore sometimes picked 1ā10 even for a plausibility lower than 0.5/ā10. A logarithmic scale would allow experts to specify plausibilities much lower than 1ā10 (e.g. 10^-6/ā10) without having to pick 0, although I do not know whether they would actually pick such values.
Yes, this is what I had in mind. Denoting by W_i and P_i the distributions for the weight and probability of consciousness for stance i, I would calculate the final distribution for the probability of consciousness from (W_1*P_1 + W_2*P_2 + ⦠W_13*P_13)/ā(W_1 + W_2 + ⦠W_13).
I think the mean of the final distribution for the probability of consciousness would be very similar. However, the final distribution would be more spread out. I do not know how much more spread out it would be, but I agree it would help track uncertainty better.