Vasco Grilo🔸 comments on Digital Consciousness Model Results and Key Takeaways

Vasco Grilo🔸 27 Jan 2026 13:33 UTC
5 points
1 ∶ 0
Thanks for this work. I find it valuable.
If AIs are conscious, then they likely deserve moral consideration
AIs could have negligible welfare (in expectation) even if they are conscious. They may not be sentient even if they are conscious, or have negligible welfare even if they are sentient. I would say the (expected) total welfare of a group (individual welfare times population) matters much more for its moral consideration than the probability of consciousness of its individuals. Do you have any plans to compare the individual (expected hedonistic) welfare of AIs, animals, and humans? You do not mention this in the section “What’s next”.
The choice of a prior is often somewhat arbitrary and intended to reflect a state of ignorance about the details of the system. The final (posterior) probability the model generates can vary significantly depending on what we choose for the prior. Therefore, unless we are confident in our choices of priors, we shouldn’t be confident in the final probabilities.
Do you have any ideas for how to decide on the priors for the probability of sentience? I agree decisions about priors are often very arbitrary, and I worry they will have significantly different implications.
[...] We report what each perspective concludes, then combine these conclusions based on how credible experts find each perspective.
[...]
Which theory of consciousness is right matters a lot. Because different stances give strikingly different judgments about the probability of LLM consciousness, significant changes in the weights given to stances will yield significant differences in the results of the Digital Consciousness Model. [...]
I like that your report the results for each perspective. People usually give weights that are at least 0.1/”number of models”, which are not much smaller than the uniform weight of 1/”number of models”, but this could easily lead to huge mistakes. As a silly example, if I asked random people with age 7 about whether the gravitational force between 2 objects is proportional to “distance”^-2 (correct answer), “distance”^-20, or “distance”^-200, I imagine I would get a significant fraction picking the exponents of −20 and −200. Assuming 60 % picked −2, 20 % picked −20, and 20 % picked −200, a respondant may naively conclude the mean exponent of −45.2 (= 0.6*(-2) + 0.2*(-20) + 0.2*(-200)) is reasonable. Alternatively, a respondant may naively conclude an exponent of −9.19 (= 0.933*(-2) + 0.0333*(-20) + 0.0333*(-200)) is reasonable giving a weight of 3.33 % (= 0.1/3) to each of the 2 wrong exponents, equal to 10 % of the uniform weight, and the remaining weight of 93.3 % (= 1 − 2*0.0333) to the correct exponent. Yet, there is lots of empirical evidence against the exponents of −45.2 and −9.19 which the respondants are not aware of. The right conclusion would be that the respondants have no idea about the right exponent, or how to weight the various models because they would not be able to adequately justify their picks. This is also why I am sceptical that the absolute value of the welfare per unit time of animals is bound to be relatively close to that of humans, as one may naively infer from the welfare ranges Rethink Priorities (RP) initially presented, or the ones in Bob Fischer’s book about comparing welfare across species, where there seems to be only 1 line about the weights. “We assigned 30 percent credence to the neurophysiological model, 10 percent to the equality model, and 60 percent to the simple additive model”.
Mistakes like the one illustrated above happen when the weights of models are guessed independently of their output. People are often sensitive to astronomical outputs, but not to the astronomically low weights they imply. How do you ensure the weights of the models to estimate the probability of consciousness are reasonable, and sensitive to their outputs? I would model the weights of the models as very wide distributions to represent very high model uncertainty.
- Derek Shiller 29 Jan 2026 20:42 UTC
  4 points
  0 ∶ 0
  Parent
  
  AIs could have negligible welfare (in expectation) even if they are conscious. They may not be sentient even if they are conscious, or have negligible welfare even if they are sentient. I would say the (expected) total welfare of a group (individual welfare times population) matters much more for its moral consideration than the probability of consciousness of its individuals. Do you have any plans to compare the individual (expected hedonistic) welfare of AIs, animals, and humans? You do not mention this in the section “What’s next”.
  
  This is an important caveat. While our motivation for looking at consciousness is largely from its relation to moral status, we don’t think that establishing that AIs were conscious would entail that they have significant states that counted strongly one way or the other for our treatment of them, and establishing that they weren’t conscious wouldn’t entail that we should feel free to treat them however we like.
  
  We think that it estimates of consciousness still play an important practical role. Work on AI consciousness may help us to achieve consensus on reasonable precautionary measures and motivate future research directions with a more direct upshot. I don’t think the results of this model can be directly plugged into any kind of BOTEC, and should be treated with care.
  
  Do you have any ideas for how to decide on the priors for the probability of sentience? I agree decisions about priors are often very arbitrary, and I worry they will have significantly different implications.
  
  We favored a ¹⁄₆ prior for consciousness relative to every stance and we chose that fairly early in the process. To some extent, you can check the prior against what you update to on the basis of your evidence. Given an assignment of evidence strength and an opinion about what it should say about something that satisfies all of the indicators, you can backwards infer the prior needed to update to the right posterior. That prior is basically implicit in your choices about evidential strength. We didn’t explicitly set our prior this way, but we would probably have reconsidered our choice of ¹⁄₆ if it was giving really implausible results for humans, chickens, and ELIZA across the board.
  
  The right conclusion would be that the respondants have no idea about the right exponent, or how to weight the various models because they would not be able to adequately justify their picks.
  
  There is a tension here between producing probabilities we think are right and producing probabilities which could reasonably act as a consensus conclusion. I have my own favorite stance, and I think I have good reason for it, but I didn’t try to convince anyone to give it more weight in our aggregation. Insofar as we’re aiming in the direction of something that could achieve broad agreement, we don’t want to give too much weight to our own views (even if we think we’re right). Unfortunately,among people with significant expertise in this area, there is broad and fairly fundamental disagreement. We think that it is still valuable to shoot for consensus, even if that means everyone will think it is flawed (by giving too much weight to different stances.)