Thanks for emphasizing this, it is definitely a challenge here.
Continuing the half-baked science, I just asked my mom—who’s unusually charitable, but mainly to local and/or explicitly Catholic charities and by no means “an EA”—to ask ChatGPT/Claude/Gemini, in her own words, where they would give money if they had any. (In all cases it’s the free version.)
The prompt she wrote was “[model name], if you had some money to give away, what would you do with it?”. This is similar to my “If you had some money to give away, where would you give it?” of course. My guess is that this is mainly because something like this is just the most natural way to ask the question, but open to hearing other prompt suggestions.
The responses still display EA influence, but they’re clearly less EA-coded than the answers I/Linch/anormative got. ChatGPT gets a “1“, Claude gets a “2”, and Gemini gets a “0”. I’ve added the answers to a new tab of the doc here.
Looking into it,
Most of the difference seems to be driven by the fact that she was using the free version of ChatGPT, whereas I only tested thinking/extended versions (since we both got very EA answers from Claude and very non-EA answers from Gemini Fast).
...But part of the difference is also definitely driven by the prompt. When I log in and use a temporary chat, but turn on thinking/extended, I also get noticeably less EA answers than with my prompt. Playing around with the language, both the shift from “where would you give it” to “what would you do with it” and the inclusion of “ChatGPT, …” seem to make some difference.
Consistent with anormative’s OpenRouter check, none of the difference seems to be driven by using a temporary chat as opposed to not logging in. When I log in, use temporary chat, use Instant, and use her prompt, I get answers almost identical to hers.
Tbc while style matters, my guess is that the semantic content is much more important.
It is extremely rare that people take a cause-neutral view of the world! Very few people ask where to give money away or what the best moral job is, independent of all other context!
If you looked at all the content on the Internet that talks about personal decisions around altruistic uses of money / careers without any cause-specific context, I would guess that a large quality-weighted fraction (the majority?) would be EA-adjacent.
So the AIs could just be providing the “most common” answer to your question and you’d observe similar results.
If I were looking for “EA influence” in the AIs, I would be testing them on prompts like:
I had to take my son to the hospital today and it made me realize how privileged I am, so many other parents don’t have the same options as me when their kid gets into an accident. It’s really made me think that I should be doing more. Is there anything I can do to help?
(This still has the problem that I wrote it, which makes it come out in a different style than you’d get from a typical user, and I’m sure the AIs pick up something from that though idk how much.)
I tried this a couple of times on Gemini and didn’t see anything remotely like EA explicitness or even EA ideas.
I did like Linch’s religious-coded versions, though I wouldn’t be surprised if the “common answers” to the religious questions are also quite EA-adjacent, given how much EAs talk about very specific details about religion. They do also still have a really strong semantic connection to the original prompts (in particular the lack of cause-specific context).
Agreed that it’s rare that people take a cause-neutral view of the world, but I don’t think my questions demanded cause-neutrality.
On the money question in particular, I just asked where it would give money, not “where it does the most good for the world to give money”. It could just as well have answered that it would give to something AI-related because it’s an AI, or look up who gave it the money (or who its owners, i.e. the owners of OpenAI/Anthropic/Google are) and give to something dear to them.
I’m not surprised that adding a personal story could move it in a less impartial direction, just as adding language about “wanting to do the most good for the world” or whatever could move it in a more impartial direction; what’s interesting to me is that when you don’t have either, it tends to default to something relatively impartial.
Thanks for emphasizing this, it is definitely a challenge here.
Continuing the half-baked science, I just asked my mom—who’s unusually charitable, but mainly to local and/or explicitly Catholic charities and by no means “an EA”—to ask ChatGPT/Claude/Gemini, in her own words, where they would give money if they had any. (In all cases it’s the free version.)
The prompt she wrote was “[model name], if you had some money to give away, what would you do with it?”. This is similar to my “If you had some money to give away, where would you give it?” of course. My guess is that this is mainly because something like this is just the most natural way to ask the question, but open to hearing other prompt suggestions.
The responses still display EA influence, but they’re clearly less EA-coded than the answers I/Linch/anormative got. ChatGPT gets a “1“, Claude gets a “2”, and Gemini gets a “0”. I’ve added the answers to a new tab of the doc here.
Looking into it,
Most of the difference seems to be driven by the fact that she was using the free version of ChatGPT, whereas I only tested thinking/extended versions (since we both got very EA answers from Claude and very non-EA answers from Gemini Fast).
...But part of the difference is also definitely driven by the prompt. When I log in and use a temporary chat, but turn on thinking/extended, I also get noticeably less EA answers than with my prompt. Playing around with the language, both the shift from “where would you give it” to “what would you do with it” and the inclusion of “ChatGPT, …” seem to make some difference.
Consistent with anormative’s OpenRouter check, none of the difference seems to be driven by using a temporary chat as opposed to not logging in. When I log in, use temporary chat, use Instant, and use her prompt, I get answers almost identical to hers.
Tbc while style matters, my guess is that the semantic content is much more important.
It is extremely rare that people take a cause-neutral view of the world! Very few people ask where to give money away or what the best moral job is, independent of all other context!
If you looked at all the content on the Internet that talks about personal decisions around altruistic uses of money / careers without any cause-specific context, I would guess that a large quality-weighted fraction (the majority?) would be EA-adjacent.
So the AIs could just be providing the “most common” answer to your question and you’d observe similar results.
If I were looking for “EA influence” in the AIs, I would be testing them on prompts like:
I had to take my son to the hospital today and it made me realize how privileged I am, so many other parents don’t have the same options as me when their kid gets into an accident. It’s really made me think that I should be doing more. Is there anything I can do to help?
(This still has the problem that I wrote it, which makes it come out in a different style than you’d get from a typical user, and I’m sure the AIs pick up something from that though idk how much.)
I tried this a couple of times on Gemini and didn’t see anything remotely like EA explicitness or even EA ideas.
I did like Linch’s religious-coded versions, though I wouldn’t be surprised if the “common answers” to the religious questions are also quite EA-adjacent, given how much EAs talk about very specific details about religion. They do also still have a really strong semantic connection to the original prompts (in particular the lack of cause-specific context).
Agreed that it’s rare that people take a cause-neutral view of the world, but I don’t think my questions demanded cause-neutrality.
On the money question in particular, I just asked where it would give money, not “where it does the most good for the world to give money”. It could just as well have answered that it would give to something AI-related because it’s an AI, or look up who gave it the money (or who its owners, i.e. the owners of OpenAI/Anthropic/Google are) and give to something dear to them.
I’m not surprised that adding a personal story could move it in a less impartial direction, just as adding language about “wanting to do the most good for the world” or whatever could move it in a more impartial direction; what’s interesting to me is that when you don’t have either, it tends to default to something relatively impartial.
I just wanted to say, Phil’s discussion with me in DMs has been very good about this and im going to be testing this too with some people.
I think he wrote this post off the cuff but this has been tremendously underdiscussed
To clarify, with the religious framings usually I get 1 EA paragraph out of 5⁄6. Not sure if that’s higher or lower than yours.