AI Should Not Be Used for Research Writing Tasks

Summary

There is significant pressure in research communities to use LLMs like ChatGPT or Claude for research writing tasks. This includes summarising documents, brainstorming ideas, core writing and grammar /​ editing.

It has gotten to the point where, if you mention a research idea, people routinely respond “ask Claude,” or “ask ChatGPT”. The assumption is that researchers should no longer work alone on tasks, and should always work in AI-human teams, where the human takes the back seat.

I argue that LLMs should not be used for any core writing tasks or brainstorming, in any research activity. I’m open to the idea that they should be used to summarise large amounts of information, search queries, data crunching, editing and grammar.

My view comes from seeing firsthand the harms AI writing is causing to researchers. I have directly seen (i) deskilling, (ii) over-reliance, (iii) bias and errors of reasoning, (iv) gradual disempowerment.

I’m also concerned that AI writing distorts the job market, making unqualified candidates seem better than they are, leading to negative hiring outcomes.

Brief Background

AI writing seems like a natural thing to rely on. You can quickly generate ideas, generate drafts and generate feedback. The adoption of AI in research communities is leading to a surge in published papers.

There is an implicit assumption by EAs that AI should be used for all writing tasks. A related assumption is that all work will (soon) be delegated to AI, and that the human role is now to supervise this role (rather than doing any work ourselves):

As more work is delegated to AI, we’ll become increasingly reliant on experienced managers who can oversee AI-generated outputs, train others to use AI tools, and coordinate teams of humans and AIs.

Conor Barnes, 80, 000 hours

A. Origins of My Concern

I attended a grant proposal writing day last year, in the field of AI Safety. Various academics were coming together to collaborate on a grant, and we were holding a day to brainstorm core ideas.

The day began with the lead author saying:

“These are the ten ideas ChatGPT came up with, so let’s start from there.”

The human brainstorming I had turned up for was cancelled. There was no brainstorming. Instead, we were tasked with working on each of ChatGPT’s ideas. By the end of the day, the document we ended up with was basically identical to ChatGPT’s initial response. (There was barely any human input)

Additionally, the ideas generated were (i) 6 years out of date, (ii) cliched, dull, (iii) American-centric even though it was an EU project, (iv) already done by other research teams.

The project got funding, making this a tangible negative impact on research advancement.

I sincerely believe that had human brainstorming occurred, this outcome would have been averted. Every human who brought up new research ideas in the room was told “we already have the ideas locked down.” This has happened to me multiple times since then.

Risks of Using AI for Research Writing

If you value original research and research that makes an impact, I argue that you should be skeptical of using AI for your writing.

Here are the problems I have witnessed in other researchers:

  • Over-reliance and dependency on AI systems (cf. the gradual disempowerment scenario).

  • Loss of human writing skills. (Human drafts by these researchers reads a lot worse than it did a year ago—study confirms this).

  • A decline in the ability to answer questions on the spot (“Let me ask Chat,” “Let’s run this through Claude”)

  • A decline in critical thinking /​ originality.

  • Groupthink towards LLM views (LLMs shift your views to agree with theirs.)

  • Dull, cliched research topics (“We need more AI transparency” [okay to say in 2019, outdated now and ignores the EU AI Act exists, etc])

Added to this are the risks of hiring unqualified candidates for jobs because they sound good on paper. When AI is used to review resumes or cover letters, it preferences AI writing over human writing. This means that we are already de-valuing human work. This should already be treated as a crisis, but instead, everyone I talk to is leaning in to AI writing as a good thing.

The Groupthink Problem Elaborated

Groupthink is: “the process in which bad decisions are made by a group because its members do not want to express opinions, suggest new ideas, etc. that others may disagree with.”

I believe that bad decisions are now being made in research due to the groupthink that is caused by LLMs. LLMs tend to give cliched, generic and popular answers to topics. When researchers use LLMs, this raises the saliency /​ social capital of cliched, generic, popular answers.

Murakami has a useful quote here: “If you only read the books that everyone else is reading, you can only think what everyone else is thinking.”

I’ve witnessed in my field a radical decline in new, original, interesting research proposals, in favour of proposals that reiterate well-trodden information and well-researched topics. In AI Safety, this means research proposals about transparency, accountability, fairness and accuracy (all topics that were prevalent 6 years ago and are highly prevalent in LLM data). I sincerely believe that this is because everyone is reading the same thing.

A Brief Refutation of Counterarguments:

I am not talking about:

  • Editing your work with AI.

  • Using AI to look for information (per se, although I think this should be heavily balanced by other sources).

  • Data crunching with AI (To me, this is the most high-impact use case for AI in medicine, language learning etc).

Everything I have said is based on replying to the common counterarguments:

  • “You just need to use the latest model” (No, I am seeing this trend with the latest models).

  • “You just need to prompt better” (No. This is said by the same people who are suffering from the effects above. So, unless they are not following their own advice, this is not helpful advice).

  • “AI is the future.” (Okay, but we determine the parameters of that future. Also, this is a very very strange comment in AI Safety groups).

I would like to open up a debate about whether AI writing should be used for research, and I am open to alternative views on this.