Web of Science Archives

Wiley Launches Paper Mill Detection Tool after Losing Millions Due to Fraudulent Journal Submissions

May 6, 2024 | International Laboratory News, Laboratory Management and Operations, Laboratory Resources, News From Dark Daily

Groups representing academic publishers are taking steps to combat paper mills that write the papers and then sell authorship spots

Clinical laboratory professionals rely on peer-reviewed research to keep up with the latest findings in pathology, laboratory medicine, and other medical fields. They should thus be interested in new efforts to combat the presence of “research paper mills,” defined as “profit oriented, unofficial, and potentially illegal organizations that produce and sell fraudulent manuscripts that seem to resemble genuine research,” according to the Committee on Publication Ethics (COPE), a non-profit organization representing stakeholders in academic publishing.

“They may also handle the administration of submitting the article to journals for review and sell authorship to researchers once the article is accepted for publication,” the COPE website states.

In a recent example of how paper mills impact scholarly research, multinational publishing company John Wiley and Sons (Wiley) announced in The Scholarly Kitchen last year that it had retracted more than 1,700 papers published in journals from the company’s Hindawi subsidiary, which specializes in open-access academic publishing.

“Often journals will invite contributions to a special issue on a specific topic and this provides an opening for paper mills to submit often many publications to the same issue,” explained a June 2022 research report from the COPE and the International Association of Scientific Technical and Medical Publishers (STM).

“In Hindawi’s case, this is a direct result of sophisticated paper mill activity,” wrote Jay Flynn, Wiley’s Executive Vice President and General Manager, Research, in a Scholarly Kitchen guest post. “The extent to which our processes and systems were breached required an end-to-end review of every step in the peer review and publishing process.”

In addition, journal indexer Clarivate removed 19 Hindawi journals from its Web of Science list in March 2023, due to problems with their editorial quality, Retraction Watch reported.

Hindawi later shut down four of the journals, which had been “heavily compromised by paper mills,” according to a blog post from the publisher.

Wiley also announced at that time that it would temporarily pause Hindawi’s special issues publishing program due to compromised articles, according to a press release.

“We urgently need a collaborative, forward-looking and thoughtful approach to journal security to stop bad actors from further abusing the industry’s systems, journals, and the communities we serve,” wrote Jay Flynn (above), Wiley EVP and General Manager, Research and Learning, in an article he penned for The Scholarly Kitchen. “We’re committed to addressing the challenge presented by paper mills and academic fraud head on, and we invite our publishing peers, and the many organizations that work alongside us, to join us in this endeavor.” Clinical laboratory leaders understand the critical need for accurate medical research papers. (Photo copyright: The Scholarly Kitchen.)

Using AI to Detect Paper Mill Submissions

Wiley acquired Hindawi in 2021 in a deal valued at $298 million, according to a press release, but the subsidiary has since become a financial drain for the company.

The journals earn their revenue by charging fees to authors. But in fiscal year 2024, which began last fall, “Wiley expects $35-40 million in lost revenue from Hindawi as it works to turn around journals with issues and retract articles,” Retraction Watch reported, citing an earnings call.

Wiley also revealed that it would stop using the Hindawi brand name and bring the subsidiary’s remaining journals under its own umbrella by the middle of 2024.

To combat the problem, Wiley announced it would launch an artificial intelligence (AI)-based service called Papermill Detection in partnership with Sage Publishing and the Institute of Electrical and Electronics Engineers (IEEE).

The service will incorporate tools to detect signs that submissions originated from paper mills, including similarities with “known papermill hallmarks” and use of “tortured phrases” indicating that passages were translated by AI-based language models, according to a press release.

These tools include:

Papermill Similarity Detection: Checks for known papermill hallmarks and compares content against existing papermills papers.
Problematic Phrase Recognition: Flags unusual alternatives to established terms.
Unusual Publication Behavior Detection: Identifies irregular publishing patterns by paper authors.
Researcher Identity Verification: Helps detect potential bad actors.
Gen-AI Generated Content Detection: Identifies potential misuse of generative AI.
Journal Scope Checker: Analyzes the article’s relevance to the journal.

The company said that the new service will be available through Research Exchange, Wiley’s manuscript submission platform, as early as next year.

Other Efforts to Spot Paper Mill Submissions

Previously, STM announced the launch of the STM Integrity Hub, with a mission “to equip the scholarly communication community with data, intelligence, and technology to protect research integrity,” Program Director Joris van Rossum, PhD, told The Scholarly Kitchen.

In 2023, the group announced that the hub would integrate Papermill Alarm from Clear Skies, a paper mill detection tool launched in 2022 with a focus on cancer research. It uses a “traffic-light rating system for research papers,” according to a press release.

In an announcement about the launch of Wiley’s Papermill Detection service, Retraction Watch suggested that one key to addressing the problem would be to reduce incentives for authors to use paper mills. Those incentives boil down to the pressure placed on many scientists, clinicians, and students to publish manuscripts, according to the research report from STM and COPE.

In one common scenario, the report noted, a paper mill will submit a staff-written paper to multiple journals. If the paper is accepted, the company will list it on a website and offer authorship spaces for sale.

“If a published paper is challenged, the ‘author’ may sometimes back down and ask for the paper to be retracted because of data problems, or they may try to provide additional supporting information including a supporting letter from their institution which is also a fake,” the report noted.

All of this serves as a warning to pathologists and clinical laboratory professionals to carefully evaluate the sources of medical journals publishing studies that feature results on areas of healthcare and lab medicine research that are of interest.

—Stephen Beale

Related Information:

Potential “Paper Mills” and What to Do about Them: A Publisher’s Perspective

Up to One in Seven Submissions to Hundreds of Wiley Journals Flagged by New Paper Mill Tool

Guest Post: Addressing Paper Mills and a Way Forward for Journal Security

Paper Mills Research Report from COPE and STM

Wiley Paused Hindawi Special Issues amid Quality Problems, Lost $9 Million in Revenue

‘The Situation Has Become Appalling’: Fake Scientific Papers Push Research Credibility to Crisis Point

Publisher Retracts More than a Dozen Papers at Once for Likely Paper Mill Activity

STM Integrity Hub Incorporates Clear Skies’ Papermill Alarm Screening Tool

The New STM Integrity Hub

Upholding Research Integrity in the Age of AI

Netherlands University Researchers Question Validity of More Than 30,000 Published Scientific Studies; Findings Have Implications for Medical Laboratories

Dec 27, 2017 | Compliance, Legal, and Malpractice, Laboratory News, Laboratory Pathology, Laboratory Testing

Radboud University researchers fear oncology, molecular biology, pharmacology, and other cell-centric medical research efforts are at risk due to verification that at least 30,000 studies published in 33,000 scientific journals included data derived from misidentified or contaminated cell lines

Many research findings that underpin the science behind various diagnostic technologies used regularly by clinical laboratories and anatomic pathology groups may not be valid. This is because a large number of published studies may have used misidentified or contaminated cell lines.

Biomedical scientists have known for a long time that many research papers exist containing reports on the wrong cells due to cell line misidentification. And yet, few studies have measured the true scope of the problem. Until now. Researchers at Radboud University in the Netherlands have determined that this problem may have influenced the findings of thousands of published research studies and upon which many other research studies were conducted.

Because clinical laboratories and anatomic pathology groups use assays and diagnostic tests that are developed as a result of these research studies, identifying how many published papers have inaccurate findings that cannot be duplicated would affect how and when it is appropriate for physicians to order certain medical laboratory tests and rely on the results.

Additionally, cancer research is based on cell line studies as well. Thus, it may prove necessary to restudy existing published findings and revise them as appropriate. In turn, these new findings might change how and when some cancer tests are ordered and the results interpreted.

Identifying Corrupted Published Data

Radboud researchers Serge P. J. M. Horbach, a doctoral student, and Willem Halffman, PhD, Associate Professor, Philosophy and Science Studies, used the Web of Science database to track down any scientific articles based on “known misidentified cell lines as listed by the International Cell Line Authentication Committee’s (ICLAC) Register of Misidentified Cell Lines,” according to an article in ScienceAlert.

“We considered a reference to this original article as a good proxy for the usage of a cell line,” the researchers noted in their study published in the journal PLOS ONE. “Since typically the original papers are focused on reporting the establishment of the cell line only.”

They focused on misidentified cell lines that were caused by HeLa cells, also known as “immortalized cells.” HeLa cells have been used in scientific research for decades. They were the first mass-producible cells that could be used in vitro, making them highly desirable for biomedical research.

However, the process of creating immortalized cells involves mutation, during which contamination can be introduced by other cells. Immortalized cells can be identified as one type of cell when in fact they are actually another type of cell.

Research scientists have been aware of this problem for about as long as immortalized cells have been in use. They attempt to take it into account when completing their analyses, though not always successfully.

The Radboud researchers found 32,655 records of primary literature based on contaminated cell lines. They then cross-referenced the ICLAC Register of Misidentified Cell Lines with a range of databases to determine if articles were available for each of the 451 cell lines listed on Table One of the ICLAC Register.

The databases they used included the:

With this information, they further researched published articles in the Web of Science database using cell line identifiers. They noted both primary literature and any citation report entries for each cell line.

The researchers noted in their published study, “As we only searched for cell lines known to be misidentified, this constitutes a conservative estimate of the scale of contamination in the primary literature. Moreover, to avoid false positives, we excluded several cell lines, such as the ones with non-unique identifiers or the cell lines for which verified stock is still in circulation.”

Their estimate for secondary contaminated literature based off primary articles is larger still. “In total, we can conservatively estimate the citations to the primary contaminated primary literature at over 500,000, excluding self-citations,” the authors noted in their PLOS ONE article. “Thereby leaving traces in a substantial share of the biomedical literature.” They concluded, “… the amount of research potentially building on false grounds remains worrisome.”

Impact of Contaminated Cell Lines on Research, Clinical Laboratory Communities

Many of the assays and diagnostic tests performed by clinical laboratories and pathology groups were developed using cell line research. Should further scrutiny into the ability to duplicate and verify study findings fail to produce positive outcomes, it might call into question the validity and appropriate use of these tests.

For the research community, these findings represent yet another call to promote accountability and define standards for verifying authenticity of cell lines to further strengthen research findings.

The Radboud researchers ranked the number of contaminated articles they discovered by research area. Top affected areas include:

Oncology
Molecular Biology
Pharmacology
Cell Biology
Immunology

The distribution of contaminated primary literature over the research areas as defined by Web of Science. Only the 25 most affected research areas are included. (Graphic copyright: PLOS ONE.)

Addressing the Problem of Cell Line Contamination and Misidentification

Adapting the ever-growing body of published medical literature to reflect the known misidentifications, as well as the possibility of invalid results, will be a major undertaking. Ultimately, resolving this problem could require changes to practices and procedures currently used by research facilities and medical laboratories.

While the cost to authenticate cell lines adds to the bottom line of research projects, the money spent on research that becomes invalidated by misidentified cell lines is far greater.

In a 2015 Retraction Watch article, Leonard P. Freeman, PhD, President, Global Biological Standards Institute, notes, “An NIH RePORT search identified 9,000 active projects using cell lines, totaling $3.7 billion. Required use of authentication techniques would affect over $900 million in research dollars annually.”

Additionally, failure to adapt authentication as a part of standard operations brings other consequences. “A 2004 survey reported that just one-third of laboratories authenticate their cell lines,” Freeman noted. “10 years later, a Sigma-Aldrich survey found that only 37% of respondents ‘validate the purity and identity before first use’ of cell lines. Understanding the existing barriers that prevent implementation of universal cell authentication is central to changing this sad state of affairs.”

Mixed Recommendations for Fixing Inaccurate Published Studies

Of course, none of this will change the vast body of archived literature that might contain errors due to misidentification. Recommendations for addressing this aspect of the problem vary. The Radboud study authors suggest posting notes on any previously published articles stating that misidentified cell lines were used.

However, in a STAT article, Ivan Oransky, MD, and Adam Marcus, Managing Editor, Gastroenterology and Endoscopy News, co-founders of Retraction Watch, recommend more severe measures. “When we polled readers of Retraction Watch last December about the issue, 55% said journals should correct papers known to describe contaminated or misidentified cell lines, and more than 40% said retraction was the right choice.”

Thanks to the Radboud study, as cell lines continue to power the innovations of modern biomedical research, concerns will surely increase surrounding cell-line authentication and research findings. For pathology groups and medical laboratories, staying abreast of these developments will work to ensure data validity and reduce reputation and liability concerns.

—Jon Stone

Related Information:

Over 30,000 Published Studies Could Be Wrong Due to Contaminated Cells

The Ghosts of HeLa: How Cell Line Misidentification Contaminates the Scientific Literature

The Economics of Reproducibility in Preclinical Research

Crosscontamination of Cells in Culture

Cell Authentication Survey Shows Little Progress in a Decade

Apparent HeLa Cell Contamination of Human Heteroploid Cell Lines

Some 30,000 Biomedical Publications Report on Misidentified Cells

Cell Line Misidentification: The Beginning of the End

Fixing Problems with Cell Lines

Thousands of Studies Used the Wrong Cells, and Journals Are Doing Nothing