Understanding and combatting false research – how AI features in the fight against papermills

Prefer a podcast? Listen to an audio version of this blog post

Paper mills create opportunities for falsified and plagiarised research on an industrial scale. Over the past two decades, more than 400,000 research articles have been published with strong textual similarities to known studies produced by paper mills.

Join us to learn how paper mills are utilizing language models and what is being done to combat this.

Large language models like ChatGPT are trained on large amounts of text scraped from the internet. They learn to generate text by predicting what words are more likely to occur in a given sentence, and can write grammatically accurate syntax. It isn't surprising that even academics can be fooled into believing AI-generated abstracts are real.

Katyanna Quach

Scientists tricked into believing fake abstracts written by ChatGPT were real

2023

In 2022 alone, approximately 70,000 papers similar to known studies produced by paper mills were published. This means around 1.5-2% of all scientific papers published in 2022 closely resemble paper mill works, and this rate rises to 3% for biology and medicine papers. It is important to recognize that without individual investigations, it is impossible to determine if all these papers are definitely products of paper mills, but the estimate is generally accepted by the scholarly communications industry to be roughly representative.

How are language models utilized by paper mills?

If an oversupply of journals and journal articles is already fuelling paper mills (which can, themselves, use AI to generate papers) then the whole scholarly publishing ecosystem could be about to collapse on itself. A commentary has asked whether using AI to assist writing papers will further increase the pressure to publish – given that publishing levels have increased drastically over the past decade already.

Danny Kingsley

AI and Publishing: Moving forward requires looking backwards

2023

Several papers with 'x's underneath against a yellow background. One paper has a tick underneath

Language models are being used by paper mills to generate fake or substandard scientific papers in a range of ways. The idea is to use these language models to create something which passes as a genuine paper but takes none of the time and effort and is not based on genuine research.

Some specific ways paper mills use language models include:

· To generate text that is grammatically correct but lacks scientific accuracy

· To translate papers into other languages without crediting the original research

· To create fake images and data to give the appearance of genuine research

By doing this, paper mills can churn out high numbers of papers extremely quickly, with very few resources, which they can then sell or use to falsely influence metrics like citation counts and impact factors for individuals or institutions.

What is being done to combat the use of language models by paper mills?

Publishers themselves also need to be open to collaboration with stakeholders (including other publishers) across the research ecosystem to tackle the root causes including a system of rewards and incentives that deter rather than feed into incentives to use paper mills. The STM Integrity Hub and its prototype paper mill detector shows what can be achieved through cross-publisher collaboration.

Rebecca Lawrence and Sabina Alam

AI paper mills and image generation require a co-ordinated response from academic publishers

2022

This unethical use of language models poses a significant threat to the integrity of scientific research and publication. It can erode trust in the scientific community and mislead readers who rely on these papers for information.

The STM Integrity Hub, an initiative to combat fraudulent science, has licensed a version of the Papermill Alarm software to help publishers detect potentially fabricated manuscripts.

Interestingly, language models and AI are also being used in moves to combat the issue of paper mills. Used correctly, they can act as valuable tools in the fight against paper mills:

Detecting fake papers: language models can be trained to recognize the characteristics of fake papers, such as unrealistic citations, grammar mistakes, and inconsistencies in style.

Detecting fake imagery and data: AI can be employed to develop tools that make it harder for paper mills to produce genuine-looking counterfeit papers by detecting fake images and fabricated data, raising red flags for potential fake papers.

Educating the public: AI can contribute to creating educational resources that raise awareness about the dangers of paper mills and teach individuals how to identify fake papers. AI-powered chatbots can answer questions and provide guidance on the topic.

Some great examples of AI-powered initiatives include Turnitin's AI-powered plagiarism detector utilized by universities and publishers, Copyleaks's AI-powered Plagiarism Checker tool for identifying plagiarised content, and OpenAI’s upcoming AI detection tool.

As with most technologies and tools, language models and AI can be used for positive practices, but also problematic ones, depending on who is involved and the motives. We recommend keeping up to date with developments in this fast-changing area by staying tuned to our weekly round up where we share emerging developments in marketing, scholarly communications and AI. Discover more about this topic by exploring our links:

Understanding and combatting false research – how AI features in the fight against papermills

How are language models utilized by paper mills?

What is being done to combat the use of language models by paper mills?

Further reading

Recent Posts

Comments