An EA used deceptive messaging to advance her project; we need mechanisms to avoid deontologically dubious plans
TL;DR: good-hearted EAs lack the mechanisms to not output information that can mislead people. Holly Elmore organized a protest, with the messaging centered on OpenAI changing their documents to “work with the Pentagon,” while OpenAI only collaborates with DARPA on open-source cybersecurity tools and is in talks with the Pentagon about veteran suicide prevention. Many participants of the protest weren’t aware of this; the protest announcement and the press release did not mention this. People were misled into thinking OpenAI is working on military applications of AI. OpenAI still prohibits the use of their services to “harm people, develop weapons, for communications surveillance, or to injure others or destroy property”. If OpenAI wanted to have a contract with the Pentagon to work on something bad, they wouldn’t have needed to change the usage policies of their publicly available services and could’ve simply provided any services through separate agreements. Holly was warned in advance that messaging could be misleading in this way, but she didn’t change it. I think this was deceptive, and some protest participants agreed with me. The community should notice a failure mode and implement something that would prevent unilateral decisions with bad consequences or noticeable violations of deontology.
See a comment from Holly in a footnote[1] and a post she published before I published this post. In both, she continued to ignore the core of what was deceptive- not the “charter” mistake, but the claim that OpenAI was starting to work with the Pentagon, without providing important context of the nature of the work. She further made false statements in the comments, which is easily demonstratable with the messages she sent me; I’m happy to share those on request[2], DM or email me.
The original version of this post didn’t mention Holly’s name and was trying to point at a failure mode I wanted the community to fix. But since Holly commented on the post and it’s clear who it is about, and the situation hasn’t improved, this became needlessly making the post more confusing, so I edited the post, removing earlier redactions of the name.
Correction: the post previously asserted[3] that a co-organiser of the protest is a high-status member of the EA community; people told me this might be misleading, as many in the community disagree with their strategies; I removed that wording.
Thanks to everyone who provided helpful comments and feedback.
Recently, a group of EA/EA-adjacent people announced a protest against OpenAI changing their charter to work with the Pentagon.
But OpenAI didn’t change its charter. They changed only the usage policy: a document that describes how their publicly available service can be used. The previous version of the policy would not have in any way prevented OpenAI from, say, working on developing weapons for the Pentagon.
The nature of the announced work was also, in my opinion, beneficial: they announced a collaboration with DARPA on open-source cybersecurity, and said they’re in talks with the Pentagon about helping prevent suicides among veterans.
This differs from what people understood from both the initial announcement to protest OpenAI changing its charter to work with the Pentagon and the subsequently changed version that corrected (and acknowledged) the “charter” mistake:
Join us and tell OpenAI “Stop working with the Pentagon!”
On January 10th, without any announcement, OpenAI deleted the language in its usage policy* that had stated that OpenAI doesn’t allow its models to be used for “activities that have a high chance of causing harm” such as “military and warfare”. Then, on January 17th, TIME reported that OpenAI would be taking the Pentagon as a client. On 2⁄12, we will demand that OpenAI end its relationship with the Pentagon and not take any military clients. If their ethical and safety boundaries can be revised out of convenience, they cannot be trusted.
AI is rapidly becoming more powerful, far faster than virtually any AI scientist has predicted. Billions are being poured into AI capabilities, and the results are staggering. New models are outperforming humans in many domains. As capabilities increase, so do the risks. Scientists are even warning that AI might end up destroying humanity.
According to their charter, “OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at all economically valuable work—benefits all of humanity.” But many humans value their work and find meaning in it, and hence do not want their jobs to be done by an AGI instead. What protest co-organizer [name] of [org] calls “the Psychological Threat” applies even if AGI doesn’t kill us.
*an earlier version of this description incorrectly referred to the usage policy as a “charter”
The usage policies still prohibit the use of their services to “harm people, develop weapons, for communications surveillance, or to injure others or destroy property.”It is technically true that OpenAI wants to have the Pentagon as a client: they collaborate with DARPA on open-source cybersecurity and are talking to the Pentagon about veteran suicide prevention. But I think, even with “charter” changed to “the usage policy”, the resulting phrasing is deceptive: a reader gets an impression that diverges from reality in ways that make the reader disagree with OpenAI’s actions and more likely to come to the protest. People understand it to mean that previously OpenAI couldn’t work with the Pentagon because of the policy, but now can. Which is, as far as I’m aware, false. Previously, it wasn’t clear whether the Pentagon could sign up on the OpenAI website, just like everyone else, and use the publicly available service; but nothing would’ve prevented OpenAI from making an agreement with the Pentagon outside their public terms of service. (Also, mostly the usage policies were changed to increase readability.)
A housemate realized all that after a short conversation and said they stopped wanting to attend the protest after it, as the messaging was misleading, and they didn’t want to contribute to spreading it. (I told them it’s fine to go and protest[4] reckless OpenAI actions without necessarily supporting the central messaging of the protest organizers.) 3 or 4 pro-Palestine activists attended the protest because they saw the announcement and decided to speak out against OpenAI working with the Pentagon on military uses of AI, which might be helping Israel. They, possibly, wouldn’t have come to the protest if they knew OpenAI wasn’t actually helping the Pentagon with weapons.
This is not normal. If you’re organizing a protest (or doing any public comms), you want people to be more likely to attend the protest (or support your message) if they become more aware of the details of the situation (such as OpenAI only working with DARPA on open-source cybersecurity tools and being in talks with the Pentagon about veteran suicide prevention, and still having “no use for weapons or causing harm to humans” in their public policies), not less likely.
If their ethical and safety boundaries can be revised out of convenience, they cannot be trusted.
These were not OpenAI’s ethical and safety boundaries; these were a part of the policy for their publicly available service. Whether or not OpenAI can, e.g., develop weapons is not affected by this change.
I think when the organisers become more aware of the details like that, they should change the protest’s message (or halt or postpone the protest: generally, it’s important to make sure people don’t decide to come for the wrong reasons, though in this case simply fixing the messaging to not be misleading would’ve been fine)[5].
Instead, the organisers edited the press release at the last moment before sending it out, replacing “charter”, and went on with the protest and IMO deceptive messaging.
Even the best-intending people are not perfect, often have some inertia, and aren’t able to make sure the messaging they put out isn’t misleading and fully propagate updates.
Not being able to look at previously made plans and revise them in light of new information seems bad. Spreading misleading messages to the media and the people following you seems very bad.
I ask the community to think about designing mechanisms to avoid these failure modes in the future.
I feel that stronger norms around unilateral actions could’ve helped (for example, if people in the community looked into this and suggested not to do this bad thing).
- ^
A comment from a protest organiser
(Holly wrote it in third person.)
> This differs from what people understood from both the initial announcement to protest OpenAI changing its charter to work with the Pentagon and the subsequently changed version that corrected (and acknowledged) the “charter” mistake
Holly Elmore explains that she simply made a mistake when writing the press release weeks before the event. She quoted the charter early on when drafting it, and then, in a kind of word mistake that is unfortunately common for her, started using the word “charter” for both the actual charter and the usage policy document. It was unfortunately a semantic mistake, so proofreaders didn’t catch it. She also did this verbally in several places. She even kind of convinced herself from hearing her own mistaken language that OpenAI had violated a much more serious boundary– their actual guiding document– than they had. She was horrified when she discovered the mistake because it conveyed a significantly different meaning than the true story, and could have slandered OpenAI. She spent hours trying to track down every place she had said it and people who may have repeated it so it could be corrected. She told the protesters right away about her mistake and explained changing the usage policy is a lot less bad than changing the charter, but the protest was still on, as it had been before the military story arose as the “small ask” focus to the “big ask” of pausing AI.
My reply
I felt confused about first their private messages and then their comments until I realised she simply didn’t understand the problem I’m talking about in this post. I’m concerned not about the “charter”, which was indeed an honest and corrected mistake, but about the final messaging. The message remained “OpenAI changed some important documents to enable them to work with the Pentagon”, which creates an impression different from reality: people think OpenAI did something to be able to work on military applications of AI. The day after the protest, I talked to three participants who were surprised to hear OpenAI is only working on open-source cybersecurity tools and is in talks about veteran suicide prevention, and the change of the usage policies didn’t impact OpenAI’s ability to separately work with the military. She agreed the messaging of the protest was misleading. (All three would’ve been happy to come to a more general protest about OpenAI’s recklessness in racing to develop AGI.) Some people told me that actually, initially, a more general protest against OpenAI was planned, and a more specific message emerged after the news of the change of policies. I would have no problem with the protest whatsoever and wouldn’t be writing this post if the protest had fallen back to the original/more general messaging that wasn’t misleading people about OpenAI’s relationship with the Pentagon. I encouraged people to attend despite the messaging of the protest being misleading, because it’s possible for people to attend and communicate what creates a truthful impression despite what the announcement and the press release say.
The point that I want to make is that the community needs to design mechanisms to avoid unilateral actions leading to something deontologically bad, such as spreading messages the community knows are not truthful. Making decisions under pressure, avoiding motivated cognition, etc., are hard; without accepted mechanisms for coordination, fact-checks, strong norms around preventing deception, etc., we might do harm.
- ^
My policy is to usually not share private messages, but these contain threats; if I get threats from someone, my policy is to instead publish or freely share the messages from that person.
- ^
One of the protest organisers is a
high-statusmember of the EA community with a lot of connections within it who spoken about organising protests at the recent EAG conference.I expect many members of the EA community might feel like they shouldn’t criticise actions talked about in an EAG talk, even if they notice these actions are bad.(I no longer strongly expect that people would be intimidated to criticise actions in this case.) - ^
Participating in a protest that’s happening anyway can be positive, although I wasn’t entirely sure; it also didn’t feel right to reduce the number of people going to a protest that someone with similar goals was organised.
To be clear, in general, protests can be great, and I wouldn’t be writing this post and trying to get the community to pay attention to this protest if not for the misleading messaging.
- ^
I think the protest went mostly well, except for the message in the announcement and the press release. Most people participating in it were doing a great job, and most signs weren’t clearly wrong. If she fell back to a more general message instead of keeping the misleading one, I wouldn’t be writing this post.
- 14 Feb 2024 0:18 UTC; 7 points) 's comment on Holly_Elmore’s Quick takes by (
- 15 May 2024 7:55 UTC; 2 points) 's comment on simeon_c’s Shortform by (LessWrong;
There’s a lot going on in Mikhail’s post. I am trying to figure out where the major disagreements actually lie. Here is my own tentative analysis:
If I (1) knew ~nothing about AI safety, (2) received a flyer with this language, and (3) later learned that the work with the Pentagon was only on open-source cybersecurity stuff +/- veteran suicide prevention, then I would feel that (4) the flyer was significantly misleading and (5) I would be unlikely to listen to that organization anymore.[1] I’d probably have a slightly more favorable opinion of OpenAI and have slightly fewer concerns about AI safety than if I had never received the flyer.
I find the paragraph significantly misleading for two reasons. First, (6) although there is a balance between conciseness and fidelity, the failure to disclose the nature of the Pentagon-related work is problematic under these circumstances. (7) Without such a disclosure, it is highly likely that an ordinarily reader will assume the work is related to more traditional military activities than open-source cybersecurity. Second, (8) given the lack of specificity in “Pentagon as a client,” the implied contrast is with the prior sentence—sharpened by the use of “Then” at the start of the second sentence. The first sentence is about “high chance of causing harm,” and “military and warfare” uses of OpenAI’s work. (9) The difference between “charter” and “usage policy” is not material to the analysis above.
(10) Strategically, I think it is important not to push the envelope with advocacy pieces that a reasonable reader could find to be misleading. If the reader concludes that both sides of an issue are full of “spin,” s/he is most likely to respond with inaction—which is generally to the benefit of the disputant holding more power. Also, the disputant with the bigger megaphone has an advantage in those circumstances, and pro-AI forces are loaded.
(11) I have no opinion on whether the flyer was “deceptive” or “deontologically bad” because (12) those judgments imply evaluation of the intent (or other culpable mental state) of the author.
From the discussion so far, I’m not sure how many of those points are the subject of disagreement.
Given (1), a large negative update on the org’s reliability and a choice not to invest further resources into a more accurate appraisal would make sense.
I think perhaps I had curse of knowledge on this because I did not think people would assume the work was combat or weapons-related. I did a lot of thinking about the issue before formulating it as the small ask and I was probably pretty out of touch with how a naive reader would interpret it. My commenter/proofreaders are also immersed in the issue and didn’t offer the feedback that people would misunderstand it. In other communications I mentioned that weapons were still something the models cannot be used for (to say, “how do we know the next change won’t be that they can work on weapons?”).
I appreciate you framing your analysis without speculating on my motives or intent. I feel chagrined at having miscommunicated but I don’t feel hurt and attacked. I appreciate the information.
(Edit- I updated towards them knowing the message is misleading, so this comment no longer correctly represents the level of my uncertainty.)
To be clear, in the post, I’m not saying that you, personally, tried to deceive people. I don’t mention your name or any of the orgs. I’m saying the community as a whole had knowledge of truth but communicated something misleading. My goal with the post is to ask the community to come up with ways of avoiding misleading people. I did not wish for you out your reputation to be involved in any way you wouldn’t want.
But it’s not clear to me what happened, exactly, that led to the misleading messaging being preserved. At the end of January, I shared this with you: “I mean, possibly I don’t know English well and parse this incorrectly, but when I read “OpenAI would be taking the Pentagon as a client” in your tweet, it parses in that context for me as “OpenAI is taking Pentagon as a client to help with offensives”, which is not quite different from what we know”, “ Like, it’s feels different from saying “OpenAI says they’re collaborating with DARPA on cybersecurity. We don’t know whether they’re working with Pentagon on weapons now, but the governance structure wouldn’t stop them from it””, “Like, it doesn’t matter for people aware of the context, but for people without the context, this might be misleading”.
I shared these concerns, thinking that it might be helpful to point out how a message you wrote might be misleading, assuming you were unaware, and not worrying about spending too much time on it.
I’m also confused about the “small ask”- the protest announcement was pretty focused on Pentagon.
(To be clear, I don’t want to be checking anyone’s future comms when this is not my job. I want the community to have mechanisms that don’t involve me. It seemed cheap to make something slightly better a couple of times, and so I talked to a founder of an org organising protests and they fixed some pretty bad claims, sent you these messages, etc., but I dislike the situation where the community doesn’t have mechanisms to systematically avoid mistakes like that.)
Would you consider replacing “deceptive” in the title (and probably other places) with “misleading”? The former word is usually understood as requiring intent to mislead, while misleading someone can be innocent or merely negligent. More pithily, deception requires a deceiver. As a quote from this book explains (Google excerpt below):
I think clarifying your language might help refocus the discussion in a more helpful direction, and avoid giving the reader the impression that you believe that misleading statements were made with intent.
I appreciate this suggestion and, until Mikhail commented below saying this was not the case, I thought it might be an English as a second language issue where be didn’t understand that “deception” indicates intent. I would have been placated if any accusation of intentionally creating false impressions were removed from the post.
(Edit- I changed the title to better reflect the post contents and what I currently think happened; this comment doesn’t represent my views on the current post title)
I’m honest in that I consider the community as a whole to have been putting out deceptive messages, which caused the post title; and, separately, I’m honestly saying that while it’s not my main guess for what happened, I can’t rule out the possibility that you understood the messaging could’ve been misleading and still went on with it, accepting that some people can be misled and thinking it’s ok. Things I have heard from you but wouldn’t share publicly don’t inspire confidence.
(Edit- I changed the title to better reflect the post contents and what I currently think happened; this comment doesn’t represent my views on the current post title or my current estimates of what happened)
I believe there’s a chance that protest organisers understood their phrasing could potentially cause people to have an impression (OpenAI is working with the military) not informed by details (on cybersecurity and veteran suicide prevention) but kept the phrasing because they thought it suited their goals better.
I don’t expect them to have an active malicious intent (I don’t think they wanted to do something bad). But I think it’s likely enough they were acting deceptively. I consider the chance of them thinking it’s not “misleading” in a bad way and it’s alright to do if it suits the goals to be much higher than what I would be comfortable with. (And people who attended the protest told me that yeah, this is deceptive, after familiarising themselves with the details.)
The title of the post is chosen because I think that regardless of the above, the community (as a whole/as an entity/agent) locally expected to benefit from people having this impression, while knowing it to be a misled impression, and went on with the messaging. Even if individual actors did not take decisions they knew would create impressions they knew were different from what reality is, I think community as a whole can still be deceptive in the sense of having the ability to realise all that and prevent misleading messaging, but not having done so. I think the community should work on strengthening and maybe institutionalising this ability, with the goal of being able to prevent even actively deceptive messaging, where the authors might hope to achieve something good but violate deontology.
Sorry Mikhail, but this:
Is accusing someone in the community of deliberately lying, and you seem to equivocate on that in other comments. Even earlier in this thread you say to Holly that “To be clear, in the post, I’m not implying that you, personally, tried to deceive people.” But to me this clearly is you implying that quite obviously, even with caveats. To then go back and talk about how this refers to the community as a whole feels really off to me.
I know that I am very much a contextualiser instead of a decoupler[1] but using a term like ‘deception’ is not something that you can neatly carve out from its well-understood social meaning as referring to a person’s character and instead use it to talk about a social movement as an agent.
I’d very much suggest you heed Jason’s advice earlier in the thread.
At least in EA-space, I think I’m fairly average for the general population if not maybe more decoupling than average
The messaging wasn’t technically false; it was just misleading, while saying technically true things. I’m not sure why you’re using “lying” here.
I would guess you’d agree organisations can use deceptive messaging? I’m pretty sure communities can also do that, including with dynamics where a community is deceptive but all parts have some sort of deniability/intentionlessness.
I think it’d be sad to have to discuss the likelihood of specific people of being deceptive. It would be making their lives even worse. If people Google their names, they’re going to find this. And it’s not really going to help with the problem, so this is not what I want to focus on. I titled the post this way because the community acted deceptively, regardless of how deceptive it’s members were.
I’m saying what I think in the comments and in the post, while avoiding, to the possible extent, talking about my current views of specific people’s actions or revealing the messages/words sent/said to me in private by these people.
And I don’t at all understand why we both are exchanging comments focusing on the words (that I thought were pretty clear, that I ran just before publishing pass a bunch of people, including protest participants, who told me they agreed with what I wrote) instead of focusing on the problems raised in the post.
A friend advised me to provide the context that I had spent maybe 6 hours helping Mikhail with his moratorium-related project (a website that I was going over for clarity as a native English speaker) and perhaps an additional 8 hours over the last few months answering questions about the direction I had taken with the protests. Mikhail had a number of objections which required a lot of labor on my part to understand to his satisfaction, and he usually did not accept my answers when I gave them but continued to argue with me, either straight up or by insisting I didn’t really understand his argument or was contradicting myself somehow.
After enough of this, I did not think it was worth my time to engage further (EDIT: on the general topic of this post, protest messaging for 2/12— we continued to be friends and talk about other things), and I told him that I made my decisions and didn’t need any more of his input a few weeks before the 2⁄12 protest. He may have had useful info that I didn’t get out of him, and that’s a pity because there are a few things that I would absolutely have done differently if I had realized at the time (such as removing language that implied OpenAI was being hypocritical that didn’t apply when I realized we were only talking about the usage policies changing but which didn’t register to me as needing to be updated when I corrected the press release) but I would make the same call again about how to spend my time.
I will not be replying to replies on this comment.
Huh. There are false claims in your comment, which is easily verifiable. I’m happy to share the messages that show that with anyone interested; please DM or email me. I saved the comment to the Web Archive.
This is not true. She didn’t tell me anything like that a few weeks before the protest.
I couldn’t find any examples of before the protest, no matter how I interpret the messages we exchanged about her strategy etc.
The context:
We last spoke for a significant amount of time in October[1]. You mentioned you’d be down to do an LW dialogue.
After that, in November, you messaged me on Twitter, asking for the details of the OpenAI situation. I then messaged you on Christmas (nothing EA/AI related).
On January 31 (a few weeks before the protest), I shared my concerns about the messaging being potentially misleading. You asked for advice on how you should go about anticipating misunderstandings like that. You said that you won’t say anything technically false and asked what solutions I propose. Among more specific things, I mentioned that “It seems generally good to try to be maximally honest and non-deceptive”.
At or before this point, you didn’t tell me anything about “not needing more of my input”. And you didn’t tell me anything like “I made my decisions”.
On February 4th, I attended your EAG talk, and on February 5th, I messaged you that it was great and that I was glad you gave it.
Then, on February 6, you invited me to join your Twitter space (about this protest) and I got promoted to a speaker. I didn’t get a chance to share my thoughts about allying with people promoting locally invalid views, and I shared them publicly on Twitter and privately with you on Messenger, including a quote I was asked to not share publicly until an LW dialogue is up. We chatted a little bit about general goals and allying with people with different views. You told me, “You have a lot of opinions and I’d be happy to see you organize your own protests”. I replied, “Sure, I might if I think it’s useful:)” and after a message sharing my confusion regarding one of your previous messages, didn’t share any more “opinions”. This could be interpreted as indirectly telling me you didn’t need my input, but that was a week after I shared my concerns about the messaging being misleading, and you still didn’t tell me anything about having made your decisions.
A day before the protest, I shared a picture of a sign I wanted to bring to the protest, and asked “Hey, is it alright if I print and show up to the protest with something like this?”, because I wouldn’t want to attend the protest if you weren’t ok with the sign. You replied that it was ok, shared some thoughts, and told me that you liked the sentiment as it supports the overall impression that you want, so I brought the sign to the protest:
After the protest, I shared the draft of this post with you. After a short conversation, you told me lots of things and blocked me, and I felt like I should expect retaliation if I published the post.
Please tell me if I’m missing something. If I’m not, you’re either continuing to be inattentive to the facts, or you’re directly lying.
I’d be happy to share all of our message exchanges with interested third parties, such as CEA Community Health, or share them publicly, if you agree to that.
Before that, as I can tell from Google Docs edits and comments, you definitely spent more than an hour on the https://moratorium.ai text back in August; I’m thankful for that; I accepted four of your edits and found some of your feedback helpful. We’ve also had 4 calls, mostly about moratorium.ai and the technical problem (and my impression was that you found these calls helpful), and went on a walk in Berkeley, mostly discussing strategy. Some of what you told me during the walk was concerning.
It seems to me that you’re not maintaining at least two hypotheses consistent with the data.
A hypothesis you do not seem to consider is that she did make an attempt at communicating “I made my decision and do not need more of your input”, and that you did not understand this message.
This hypothesis seems more probable to me than her straightforwardly saying a false thing, as there seems to be multiple similar misunderstandings of the sort between you.
Another misunderstanding example:
It seems to me that this quote points to another similar misunderstanding, and that it was this misunderstanding that lead to a breakdown in communication initially.
You seem to be paying lip service to the “missing something” hypothesis, but framing this as an issue of someone deliberately lying is not cooperative with Holly in the world where you are in fact missing something.
Asking to share messages publicly or showing them to a third party seems to unnecessarily up the stakes. I’m not sure why you’re suggesting that.
I carefully looked through all of our messages (there weren’t too many) a couple of times, because that was pretty surprising and I considered it to be more likely that I don’t remember something than her saying something directly false. But there’s nothing like that and she’s, unfortunately, straightforwardly saying a false thing.
This is also something I couldn’t find any examples of before the protest, no matter how I interpret the messages we have exchanged about her strategy etc.
I’m >99% sure that I’m not missing anything. Included it because it’s not a mathematical truth, and because adding “Are you sure? What am I missing, if you are?” is more polite than just saying “this is clearly false and we both know it and any third party can verify the falsehood; why are you saying that? Is there some other platform we exchanged messages on, that I somehow totally forgot about?”. It’s technically possible I’m just blind at something- I can imagine a conceivable universe where I’m wrong about this. But I’m confident she’s saying a straightforwardly false thing. I’d feel good about betting up to $20k at 99:1 odds on this.
Like, “missing something” isn’t a hypothesis with any probability mass, really, I’m including it because it is a part of my epistemic situation and seems nicer to include in the message.
When someone is confidently saying false things about the contents of the messages we exchanged, it seems reasonable to suggest publishing them or having a third party look at them. I’m not sure how it’s “upping the stakes”. It’s a natural thing to do.
This analysis roughly aligns with mine and is also why I didn’t go to this protest (but did go to a previous protest organized by Pause AI). This protest seemed to me like it overall communicated pretty deceptively around how OpenAI was handling its military relations, and also I don’t really see any reason to think that engaging with the military increases existential risk very much (at least I don’t see recent changes as an update on OpenAI causing more risk, and wouldn’t see reversing those changes as progress towards reducing existential risk).
I made a mistake—did you think something beyond the mistake was deceptive?
EDITED TO ADD: The accusation of “deception” is extremely hurtful and uncalled for and I really don’t appreciate it from you. I still can’t understand what Mikhail is getting at that was “deceptive” if he wasn’t referring to my mistake. He seems to think it was my responsibility that no one have any false ideas about the situation with OpenAI and the military after reading 2-3 paragraphs about the event and thinks that his misconceptions are what any person would think, so I should have specifically anticipated them.
Yeah, though I don’t think it’s like super egregious. I do think that even after correcting the “charter”-mistake you continued to frame OpenAI usage policies as something that should be treated as some kind of contractual commitment of OpenAI that they walked back.
But that seems backwards to me, a ToS is a commitment by users of OpenAI towards OpenAI, not a commitment by OpenAI to its users (in the vast majority of cases). For example for LessWrong, our ToS includes very few commitments by us, and I definitely don’t see myself as having committed to never changing them. If we have a clause in our ToS that asks users to not make too many API requests in quick succession, then I definitely have not committed to not serve people who nevertheless make that many requests (indeed in many cases like search engines or users asking us for rate-limiting exceptions to build things like greaterwrong.com, I have totally changed how we treat users who make too many requests).
Framing it as having gone back on a commitment seems kind of deceptive to me.
I also think there is something broader that is off about organizing “Pause AI” protests that then advocate for things that seem mostly unrelated to pausing AI to me (and instead lean into other controversial topics). Like, I now have a sense that if I attend future Pause AI events, my attendance of those events will then be seen and used as social proof that OpenAI should give into pressure on some other random controversy (like making contracts with the military), and that feels like it has some deceptive components to it.
And then also at a high-level I feel like there was a rhetorical trick going on in the event messaging where I feel like the protest is organized around some “military bad because weapons bad” affect, without recognizing that the kind of relationship that OpenAI seems to have with the military seems pretty non-central for that kind of relationship (working on cybersecurity stuff, which I think by most people’s lights is quite different).
(I also roughly agree with Jason’s analysis here)
This was not at all intentional, although we were concerned about a future where there was more engagement with the military including with weapons, so I can see someone thinking we were saying they were working with weapons now if they weren’t paying close attention. Working on cybersecurity today is a foot in the door for more involvement with the military in the future, so I don’t think it’s so wrong to fear their involvement today because you don’t want AI weapons.
I see what you mean here, and I might even have done this a bit because of the conflation of the two documents in my head that didn’t get backpropagated away. I was rushing to correct the mistake and I didn’t really step back to reframe the whole thing holistically.
Yeah, makes sense. It seemed to me you were in a kind of tight spot, having scheduled and framed this specific protest around a thing that you ended up realizing had some important errors in it.
I think it was important to reframe the whole thing more fully when that happened, but man, running protests is hard and requires a kind of courage and defiance that I think is cognitively hard to combine with reframing things like this. I still think it was a mistake, but I also feel sympathetic to how it happened, at least how it played out in my mind (I don’t want to claim I am confident what actually happened, I might still be misunderstanding important components of how things came to pass).
There was honestly no aspect of unwillingness to correct the broader story of the protest. It just didn’t even occur to me that should be done. It seems like you guys don’t believe this, but I didn’t think it being the usage policy instead of the charter made a difference to the small ask of not working with militaries. It made a huge difference in the severity of the accusation toward OpenAI, and what I had sort of retconned myself into thinking was the severity of the hypocrisy/transgression, but either way starting to work with militaries was a good specific development to call attention to and ask them to reverse.
There was definitely language and a framing that was predicated on the idea they were being hypocritical, and if I were thinking more clearly I would have scrubbed that when I realized we were only talking about the usage policy. There are a lot of things I would have changed looking back. Mikhail says he tried to tell me something like this but I found his critiques too confusing (like I thought he was saying mostly that it wasn’t bad to work with the military bc it was cybersecurity, where to me that wasn’t the crux) and so those changes did not occur to me.
I mainly did not realize these things because I was really busy with logistics, not because I needed to be in soldier mindset to do the protest. (EDIT: I mean, maybe some soldier mindset is required and takes a toll, but I don’t think it would have been an issue here. If someone had presented me with a press release with all the revisions I mentioned above to send out as a correction instead of the one I sent out, I would have thought it was better and sent it instead. The problem was more that I panicked and wanted to correct the mistake immediately and wasn’t thinking of other things that should be corrected because of it.) Mikhail may have felt I was being soldier-y bc I wouldn’t spend more time trying to figure out what he was talking about, but that had more to do with me thinking I had basically understood his point (which I took to be basically that I wasn’t including enough details in the promotional materials so people wouldn’t have a picture he considered accurate enough) and just disagreed with it (I thought space was limited and many rationalists do not appreciate the cost of extra words and thoughts in advocacy communication).
? It is clearly not what I was telling you. At the end of January, I told you pretty directly that for people who are not aware of the context what you wrote might be misleading, because you omitted crucial details. It’s not about how good or bad what OpenAI are doing is. It’s about people not having important details of the story to judge for themselves.
I’m not sure where it’s coming from. I suggest you look at the messages we exchanged around January 31 and double-check you’re not misleading people here.
It seems to me that you are not considering the possibility that you may in fact not have said this clearly, and that this was a misunderstanding that you could have prevented by communicating another way.
I don’t think the miscommunication can be blamed on any one party specifically. Both could have made different actions to reduce the risk of misunderstanding. I find it reasonable for both of them to think they had more important stuff to do than spend 10x time reducing the risk of misunderstanding and think the responsibility is on the other person.
To give my two cents on this, each time I talked with Mikhail he had really good points on lots of topics, and the conversations helped me improve my models a lot.
However, I do have a harder time understanding Mikhail than understanding the average person, and definitely feel the need to put in lots of work to get his points. In particular, his statements tend to feel a lot like attacks (like saying you’re deceptive), and it’s straining to decouple and not get defensive to just consider the factual point he’s making.
EDIT: looking at the further replies from Holly and looking back at the messages we exchanged, I’m no longer certain it was miscommunication and not something intentional. As I said elsewhere, I’d be happy for the messages to be shared with a third party. (Please ignore the part about certainty in the original comment below.)
There’s certainly was miscommunication surrounding the draft of this post, but I don’t believe they didn’t understand people can be misled back at the end of January.
You said a lot of things to me, not all of which I remember, but the above were two of them. I knew I didn’t get everything you wanted me to get about what you were saying, but I felt that I understood enough to know what the cruxes were and where I stood on them.
You said:
I said:
Are these not the same thing?
I wouldn’t care if people knew some number to some approximation and not fully. This is quite different from saying something that’s technically not false but creates a misleading impression you thought was more likely to get people to support your message.
I don’t want to be spending time this way and would be happy if you found someone we’d both be happy with them reading our message exchange and figuring out how deceptive or not was the protest messaging.
(Edit- I no longer believe the above is a true recount of what happened and retract the below.)
Good to see this. I’m sad that you didn’t think that after reading “Even the best-intending people are not perfect, often have some inertia, and aren’t able to make sure the messaging they put out isn’t misleading and fully propagate updates” in the post. As I mentioned before, it could’ve been simply a postmortem.
I found that sentence unclear. It’s poorly written and I did not know what you meant by it. In context you were not saying I had good intentions— you declined to remove the “deceptive” language earlier because you thought I could have been deceptive.
I’m concerned I may not have comported myself well in these comments. When Mikhail brought this post to me as a draft it was emotionally difficult for me because of what I interpreted as questioning my integrity.
Unfortunately, the path I’m taking— which I believe is the right path for me to be taking— is probably going to involve lots more criticism, whether I consider it fair or not. I am going to have to handle it with more aplomb.
So I am not going to comment on this post anymore. I am going to practice taking the hit and moving on because that’s just sometimes how life is and it’s a cost of doing business in a visible advocacy role.
(Edited to remove blaming language.)
(warning: some parts contain sardonic tone. maybe de-snarkify through ChatGPT if you don’t like that)
Ok I have a lot of issues with this post/whole affair,[1] so after this I’m going to limit how much I respond (though Mikhail if you want to reach out via private message then please do so, but don’t feel obliged to)
This feels like a massive storm in a teacup over what honestly feels like quite a minor issue to me. I feel like a lot of this community heat and drama could have been avoided with better communication in private from the both of you, perhaps with the aid of a trusted third party involved.
I also get a large gut impression that this falls into the broad category of “Bay Area shenanigans that I don’t care about”. I encourage everyone too caught up in it to take a breath, count to five, donate to AMF/GiveDirectly/concrete-charity-of-your-choice, and then go about their day.
I don’t think you understand how communities function. They don’t function by dictat. Do you want CEA to scan every press release from every EA-related org? Do you want to disallow any unilateral action by people in the community? That’s nonsense. Social norms are community enforcement mechanisms, and we’re arguing about norms here. I think the organisers made a mistake, you think she violated a deontological rule. I think this has already gone too far, you think it needs a stronger response. We argue about norms, persuade each other and/or observers, and then norms and actions changes. This is the enforcement mechanism.[2]
In any case, I’m much more interested in norms/social mechanisms for improving community error-correction than I am avoiding all mistakes from the community (at least, mistakes below a certain bar). And my impression[3] is that the organisers have tried to correct the mistake to the extent that they believes they made a mistake, and anything else is going to be a matter of public debate. Again, this is how communities work.
I also think you generally overestimate your confidence about what the consequences of the protest will be, which deontological norms were broken and if so how bad they were, and how it will affect people’s impressions in general. I agree that I wouldn’t necessarily frame the protest in the way it was, but I think that’s going to end up being a lot less consequential in the grand scheme of things than a) the community drama this has caused and b) any community enforcement mechanisms you actually get set up
This doesn’t seem to be the first time Holly has clashed with rationalist norms, and in general when this has happened I tend to find myself generally siding with her perspective over whichever rationalist she’s questioning, fair warning.
What did you think it looked like? Vibes? Papers? Essays?
Which could ofc be wrong, you’re in possession of private messages I don’t have
Edit: I later discovered I explicitly warned Holly this messaging could be deceptive, which she understood, asked what could be done about it, but ended up not doing anything and leaving it as is.
5- I think the messaging around the protest is deceptive. I.e., it gives people a wrong impression of the world, in a way a part of the community assumed would be better suiting their goals. I think this is deontologically bad and we shouldn’t be even talking about consequences. This is a bad thing to do regardless of how likely and how badly it can backfire. If your calculations say that it’s alright, that the EV for deceiving people is positive, ignore this calculations and maybe read https://www.lesswrong.com/posts/K9ZaZXDnL3SEmYZqB/ends-don-t-justify-means-among-humans.
This is the sort of thing that led to the FTX collapse. This is the sort of decision-making procedure that predictably leads to bad consequences, no matter what our brains calculate.
And in this area, things like that backfiring (e.g., creating an impression among the general public, experts, media, or policymakers that the community isn’t fully candid) can be increasing the chance of a permanent end of life in the lightcone.
4- I’m also interested in mechanisms for early error-correction. This is what my post calls for. I don’t even mention the names of the protest organisers or their orgs, because I care about the community preventing deceptive messaging in the future.
It doesn’t have to be a community-wide mechanism or a central org that checks press-releases; it can be more field-focused orgs and people, it can be practices like fact-checking final versions of public comms and edits to them with outside people who are good at that stuff and have a different background (see comments from @Jason), it can be any of lots of things. I want the community to think about those things and discuss them. I’d be optimistic about people spending time to figure this out and suggesting to the community.
My impression is that an organiser agreed they haven’t propagated the update, as I suggest in the post, which led to a message that was misleading and this hasn’t been understood or addressed at all until I published the post. I’m not confident this is what happened, though, as looking at the messages now, it seems fairly likely that an organiser understood the impression the messaging would create and decided to keep it that way.
3- I don’t think it’s fair to, while no one of us spent a while thinking about potential mechanisms, suggest those that seem like they wouldn’t seem good/work. Current mechanisms are bad, as they don’t prevent deontologically dubious unilateral actions. I hope there can be better mechanisms.
I think the community as a whole acted deceptively. This seems really bad and the community should probably respond and improve. We don’t want to be the kind of community that misleads people when it suits our goals. I don’t think there’s been a reaction from the community where people tried to figure out how to address the problem. Instead, most of the comments are related to a protest organiser slowly realising what the issue people are talking about was.
2- I notice I’m confused and don’t really get your point. Like, yeah, in global health and development, deceptive messaging is obviously enough unlikely be beneficial for people to not do it and not have much reason to engage with another part of the community having issues around deceptive messaging, so this is not a post that seems relevant to all the community to the extent some parts of the community identify only with these parts and not with the whole. But this is not what you’re saying; and I’m failing to understand your point. If we’re discussing a cause area (x-risk) and deceptive messaging and potentially bad consequences of it, how are other cause area relevant here? I rarely go around shrimp welfare posts, saying that I don’t care about these deep sea shenanigans, and people should donate to MIRI instead.
1- I think better communication could’ve reduced the amount of public drama; but also there’s not a lot of it here, and I’m more concerned with the community not engaging enough with the problems raised in the post.
Thank you for linking the public tweets discussing deontology. If someone believes it’s OK to violate deontological norms if they calculate the consequences to be positive, I want them to directly say that. If someone decided deceptive messaging can be fine and was ok in this case, I want them to directly say that and not be deceptive towards the community as well. We’d then be able to have a discussion about cruxes in strategies and norms and not about what happened.
Experience teaches that there are (at least) two types of postmortem analyses when something goes wrong: a type that is more focused on questions of blame, and one that is focused on root cause analysis / lessons learned / how to improve / etc. These types of inquiry struggle to coexist in the same conversation, because the former creates an adversarial tone regarding the persons against whom blame is being considered.
Now, that is not to say that blame-focused inquiries are bad, and the other type is better as a matter of course. But the title and tone of your post placed this discussion clearly in the blame-focused camp, and it’s hard for the non-blamey type of conservation to form out of such an environment.
I suspect most community members felt that the proffered evidence does not sufficiently make out a case of deception (i.e., intentional misrepresentation) and are thus disinclined to participate in discussion of downstream philosophical issues that only come into play if they reach a conclusion that deception was present (vs. making a mistake).
If you’re interested in learning more about the second type of analysis, I’d suggest reading more about just culture in fields like medicine and aviation.
To many community members, including protest participants, it’s pretty clear the messaging was deceptive.
A protest organiser is saying it was the curse of knowledge, but I sent them messages directly pointing out how people will see the messaging. As I mentioned elsewhere in the comments, I want to have a third party look at the messages exchanged between me and the protest organiser, if they agree.
Also, I expect many people to only skim through the post, and look at of the protest organiser’s initial engagement with it or a shortform post they made before I published the post; all of these make it seem like I’m saying the organiser intentionally made the mistake they then corrected: “I ran a successful protest at [company] yesterday. Before the night was over, Mikhail Samin, who attended the protest, sent me a document to review that accused me of what sounds like a bait and switch and deceptive practices because I made an error in my original press release (which got copied as a description on other materials) and apparently didn’t address it to his satisfaction because I didn’t change the theme of the event more radically or cancel it.”
These post and comments have not been corrected to show that this is not what I’m talking about, when the protest organiser understood what misleading messaging the post talks about.
It seems that one of the protest organisers now says that what happened was what I described in the post as “Even the best-intending people are not perfect, often have some inertia, and aren’t able to make sure the messaging they put out isn’t misleading and fully propagate updates”.
I suggest focusing on what mechanisms can be designed to prevent misleading messaging from emerging in the future.
https://forum.effectivealtruism.org/posts/ubzPRYP4rikNEcBL5/holly_elmore-s-shortform?commentId=pCZHau3qGBdw5PazZ
Left a comment there, but to repeat:
This was, indeed, an honest mistake that you corrected. It is not at all what I’m talking about in the post. I want the community to pay attention to the final messaging being deceptive about the nature of OpenAI’s relationship with the Pentagon.
I am extremely saddened by the emotional impact this has on you. I did not wish that to happen and was surprised and confused by your reaction. Unfortunately, it seems that you still don’t understand the issue I’m pointing at; it is not the “charter” being in the original announcement, it’s the final wording misleading people.
Have I got this right? You’re saying that me saying they have a contract with the Pentagon is deceptive because people will assume it’s to do with weapons?
You are accusing me of deception in a very important arena. That is predictably emotionally devastating. Your writing is confusing and many people will come away thinking that what you’re saying is that I lied. The very thing I think you’re claiming I did to OpenAI you are doing to me.
But you think you’re expressing yourself clearly. I, too, thought I was expressing myself clearly and honestly. If people were confused, I regret that, and I would prefer they weren’t. I want them to have accurate impressions. One constraint you may not understand with advocacy is that you can’t just keep adding more text like on LessWrong. There’s only so much that can fit and it has to be comprehensible to lots of different people. I thought it would be confusing to add a bunch of stuff saying that maybe this particular contract with the military would be good (I’m not saying that, but I’m granting your take) when my point was that we don’t want certain boundaries to be crossed at all, because working with the military on something benign is a foot in the door to something more. I don’t think you understand what I meant to communicate with the protest small ask, and that is a failure on my part as the organizer, but it’s not a deception. For some reason you seem convinced I was trying to trick people into coming to this protest because I wouldn’t pitch the messaging to rationalist sensibilities (I suspect this is what you would consider “deontologically good”), but if I had, almost no one else would have understood it or paid attention to it.
I wonder if this relates to one of the takeaway lessons—perhaps we should give additional priority to showing drafts to people in different portions of the target audience, and discerning whether the message is conveying fair and accurate impressions to those different audience segments.
In my day job, I often write about niche topics for which I have much greater background knowledge than certain important audience segments will have. My immediate teammates often have a similar kind of background knowledge and understandings (although usually at a lower level by the time I’ve invested tons of time into a case!). So showing them my draft helps, but still leaves me open to blind spots. Better to also show my draft to some generalists, or specialists in unrelated areas of the law to figure out whether they are comprehending it accurately.
A mini-redteaming exercise might also help: Are there true, non-misleading one-sentence statements that my adversary could offer as a response that would have some reasonable readers feeling mislead? I see this issue in legal work too—a brief that could be seen as hiding the ball on key unfavorable facts or legal authorities is just not effective, because the judge may well feel the author was either of low competence or attempted to pull the wool over their eyes. After this happens ~two or three times, the judge now knows to view any further work product from that attorney (or on behalf of that client) with a jaundiced eye.
I should acknowledge that how much to engage with bad facts/law in one’s own legal advocacy brief is a tough balancing act on which the lawyers on a team often disagree. I assume that will be true in AI pause advocacy as well.