Research Fellow at the Center for AI Safety
rgb
What to think when a language model tells you it’s sentient
I enjoyed this excerpt and the pointer to the interview, thanks. It might be helpful to say in the post who Jim Davies is.
That may be right—an alternative would be to taboo the word in the post, and just explain that they are going to use people with an independent, objective track record of being good at reasoning under uncertainty.
Of course, some people might be (wrongly, imo) skeptical of even that notion, but I suppose there’s only such much one can do to get everyone on board. It’s a tricky balance of making it accessible to outsiders while still just saying what you believe about how the contest should work.
I think that the post should explain briefly, or even just link to, what a “superforecaster” is. And if possible explain how and why this serves an independent check.
The superforecaster panel is imo a credible signal of good faith, but people outside of the community may think “superforecasters” just means something arbitrary and/or weird and/or made up by FTX.
(The post links to Tetlock’s book, but not in the context of explaining the panel)
I think you mean “schisms”
You write,
Those who do see philosophical zombies as possible don’t have a clear idea of how consciousness relates to the brain, but they do think...that consciousness is something more than just the functions of the brain. In their view, a digital person (an uploaded human mind which runs on software) may act like a conscious human, and even tell you all about its ‘conscious experience’, but it is possible that it is in fact empty of experience.
It’s consistent to think that p-zombies are possible but to think that, given the laws of nature, digital people would be conscious. David Chalmers is someone who argues for both views.
It might be useful to clarify that the questions of
(a) whether philosophical zombies are metaphysically possible (and the closely related question of physicalism about consciousness)
is actually somewhat orthogonal to the question of
(b) whether uploads that are functionally isomorphic to humans would be conscious
David Chalmers thinks that philosophical zombies are metaphysically possible, and that consciousness is not identical to the physical. But he also argues that, given the laws of nature in this world, uploaded minds, of sufficiently fine-grained functional equivalence to human minds, that act and talk like conscious humans would be conscious. In fact, he’s the originator of the ‘fading qualia’ argument that Holden appeals to in his post.
On the other side, Ned Block thinks that zombies are not possible, and is a physicalist. But he also thinks that only biological-instantiated minds can be conscious.
Here’s Chalmers (2010) on the distinction between the two issues:
I have occasionally encountered puzzlement that someone with my own property dualist views (or even that someone who thinks that there is a significant hard problem of consciousness) should be sympathetic to machine consciousness. But the question of whether the physical correlates of consciousness are biological or functional is largely orthogonal to the question of whether consciousness is identical to or distinct from its physical correlates. It is hard to see why the view that consciousness is restricted to creatures with our biology should be more in the spirit of property dualism! In any case, much of what follows is neutral on questions about materialism and dualism.
You might be interested in this LessWrong shortform post by Harri Besceli, “The best and worst experiences you had last week probably happened when you were dreaming.” Including a comment from gwern.
Thanks for the post! Wanted to flag a typo: “ To easily adapt to performing complex and difficult math problems, Minerva has That’s not to say that Minerva is an AGI—it clearly isn’t.”
Well, I looked it up and found a free pdf, and it turns out that Searle does consider this counterargument.
Why is it so important that the system be capable of consciousness? Why isn’t appropriate behavior enough? Of course for many purposes it is enough. If the computer can fly airplanes, drive cars, and win at chess, who cares if it is totally nonconscious? But if we are worried about a maliciously motivated superintelligence destroying us, then it is important that the malicious motivation should be real. Without consciousness, there is no possibility of its being real.
But I find the arguments that he then gives in support of this claim quite unconvincing / I don’t understand exactly what the argument is. Notice that Searle’s argument is based on comparing a spell-checking program on a laptop with human cognition. He claims that reflecting on the difference between the human and the program establishes that it would never make sense to attribute psychological states to any computational system at all. But that comparison doesn’t seem to show that at all.
And it certainly doesn’t show, as Searle thinks it does, that computers could never have the “motivation” to pursue misaligned goals, in the sense that Bostrom needs to establish that powerful AGI could be dangerous.
I should say—while Searle is not my favorite writer on these topics, I think these sorts of questions at the intersection of phil mind and AI are quite important and interesting, and it’s cool that you are thinking about them. (Then again, I *would *think that given my background). And it’s important to scrutinize the philosophical assumptions (if any) behind AI risk arguments.
Feedback: I find the logo mildly unsettling. I think it triggers my face detector, and I see sharp teeth. A bit like the Radiohead logo.
On the other hand, maybe this is just a sign of some deep unwellness in my brain. Still, if even a small percentage of people get this feeling from the logo, could be worth reconsidering.
Since the article is paywalled, it may be helpful to excerpt the key parts or say what you think Searle’s argument is. I imagine the trivial inconvenience of having to register will prevent a lot of people from checking it out.
I read that article a while ago, but can’t remember exactly what it says. To the extent that it is rehashing Searle’s arguments that AIs, no matter how sophisticated their behavior, necessarily lack understanding / intentionality/ something like that, then I think that Searle’s arguments are just not that relevant to work on AI alignment.
Basically I think what Chalmers says in his paper The Singularity: a Philosophical Analysis.
As for the Searle and Block objections, these rely on the thesis that even if a system duplicates our behavior, it might be missing important “internal” aspects of mentality: consciousness, understanding, intentionality, and so on. Later in the paper, I will advocate the view that if a system in our world duplicates not only our outputs but our internal computational structure, then it will duplicate the important internal aspects of mentality too. For present purposes, though, we can set aside these objections by stipulating that for the purposes of the argument, intelligence is to be measured wholly in terms of behavior and behavioral dispositions, where behavior is construed operationally in terms of the physical outputs that a system produces. The conclusion that there will be AI++ in this sense is still strong enough to be interesting. If there are systems that produce apparently superintelligent outputs, then whether or not these systems are truly conscious or intelligent, they will have a transformative impact on the rest of the world. (emph mine)
Just wanted to say that I really appreciated this post. As someone who followed the campaign with interest, but not super closely, I found it very informative about the campaign. And it covered all of the key questions I have been vaguely wondering about re: EAs running for office.
opinionated (per its title) and non-comprehensive, but “Key questions about artificial sentience: an opinionated introduction” by me:
I work at Trajan House and I wanted to comment on this:
But a great office gives people the freedom to not worry about what they need for work, a warm environment in which they feel welcome and more productive, and supports them in ways they did not think were necessary.
By these metrics, Trajan House is a really great office! I’m so grateful for the work that Jonathan and the other operations staff do. It definitely makes me happier and more productive.
Trajan House in 2022 is a thriving hub of work, conversation, and fun.
Leverage just released a working paper, “On Intention Research”. From the post:
Starting in 2017, some of Leverage’s psychology researchers stumbled across unusual effects relating to the importance and power of subtle nonverbal communication. Initially, researchers began by attempting to understand and replicate some surprising effects caused by practitioners in traditions like bodywork and energy healing. Over time researchers investigated a wide range of phenomena in subtle nonverbal communication and developed an explanation for these phenomena according to which one’s expectations about what will happen (one’s intentions) in part determine what information is communicated and received nonverbally. This area of research is known as “intention research.”
Those involved in intention research report encountering phenomena that they found quite surprising and challenging to explain. Their findings led many of Leverage’s psychology researchers to conclude that nonverbal communication is at least as expressive and psychologically central as verbal communication. Unfortunately, it also led to some negative psychological and psychosomatic effects and contributed to a significant increase in social tension at Leverage prior to its dissolution in 2019.
This research report describes what intention research was, why researchers pursued it, what they discovered, and the historical antecedents for these discoveries. The piece concludes with a discussion of the risks and challenges associated with further research.
Thanks for the comment! I agree with the thrust of this comment.
Learning more and thinking more clearly about implementation of computation in general and neural computation in particular, is perennially on my intellectual to-do list list.
We don’t want to allow just any arbitrary gerrymandered states to count as an adequate implementation of consciousness’s functional roles
maybe the neurons printed on each page aren’t doing enough causal work in generating the next edition
I agree with the way you’ve formulated the problem, and the possible solution—I’m guessing that an adequate theory of implementation deals with both of them. Some condition about there being the right kind of “reliable, counterfactual-supporting connection between the states” (that quote is from Chalmers’ take on these issues).
But I have not yet figured out how to think about these things to my satisfaction.
Some past example that come to mind. Kudos to all of the people mentioned for trying ambitious things, and writing up the retrospectives:
Not strictly speaking “EA”, but an early effort from folks in the rationality community started an evidence-based medicine organization called MetaMed
Zvi Mowshowitz’s post-mortem: https://thezvi.wordpress.com/2015/06/30/the-thing-and-the-symbolic-representation-of-the-thing/
Sarah Constantin’s post-mortem: https://docs.google.com/document/d/1HzZd3jsG9YMU4DqHc62mMqKWtRer_KqFpiaeN-Q1rlI/edit
-
Michael Plant has a post-mortem of his mental health app, Hippo
-
Looking at around, I also found this list
Some other posts are the Good Technology Projects’ postmortem, a postmortem of a mental health app by Michael Plant, organisations discuss their learnings in retrospectives like Fish Welfare Initiative or in posts announcing decisions to shut down like Students for High Impact Charities. In the Rationalist community, there was the Arbital Postmortem. You can see more examples on the Forum postmortems and retrospectives tag, and examples from the LessWrong community in their analogous tag.
Key questions about artificial sentience: an opinionated guide
Thanks for writing this! Your work sounds super interesting. You write, “ But you could be rewarded by the euphoric sense of revelation. Some of that sense may even be authentic; most of it will be fool’s gold.” What are some times you got that euphoric sense in your research for HLI?
Thank you!