Groups representing academic publishers are taking steps to combat paper mills that write the papers and then sell authorship spots
Clinical laboratory professionals rely on peer-reviewed research to keep up with the latest findings in pathology, laboratory medicine, and other medical fields. They should thus be interested in new efforts to combat the presence of “research paper mills,” defined as “profit oriented, unofficial, and potentially illegal organizations that produce and sell fraudulent manuscripts that seem to resemble genuine research,” according to the Committee on Publication Ethics (COPE), a non-profit organization representing stakeholders in academic publishing.
“They may also handle the administration of submitting the article to journals for review and sell authorship to researchers once the article is accepted for publication,” the COPE website states.
In a recent example of how paper mills impact scholarly research, multinational publishing company John Wiley and Sons (Wiley) announced in The Scholarly Kitchen last year that it had retracted more than 1,700 papers published in journals from the company’s Hindawi subsidiary, which specializes in open-access academic publishing.
“In Hindawi’s case, this is a direct result of sophisticated paper mill activity,” wrote Jay Flynn, Wiley’s Executive Vice President and General Manager, Research, in a Scholarly Kitchen guest post. “The extent to which our processes and systems were breached required an end-to-end review of every step in the peer review and publishing process.”
In addition, journal indexer Clarivate removed 19 Hindawi journals from its Web of Science list in March 2023, due to problems with their editorial quality, Retraction Watch reported.
Hindawi later shut down four of the journals, which had been “heavily compromised by paper mills,” according to a blog post from the publisher.
Wiley also announced at that time that it would temporarily pause Hindawi’s special issues publishing program due to compromised articles, according to a press release.
“We urgently need a collaborative, forward-looking and thoughtful approach to journal security to stop bad actors from further abusing the industry’s systems, journals, and the communities we serve,” wrote Jay Flynn (above), Wiley EVP and General Manager, Research and Learning, in an article he penned for The Scholarly Kitchen. “We’re committed to addressing the challenge presented by paper mills and academic fraud head on, and we invite our publishing peers, and the many organizations that work alongside us, to join us in this endeavor.” Clinical laboratory leaders understand the critical need for accurate medical research papers. (Photo copyright: The Scholarly Kitchen.)
Using AI to Detect Paper Mill Submissions
Wiley acquired Hindawi in 2021 in a deal valued at $298 million, according to a press release, but the subsidiary has since become a financial drain for the company.
The journals earn their revenue by charging fees to authors. But in fiscal year 2024, which began last fall, “Wiley expects $35-40 million in lost revenue from Hindawi as it works to turn around journals with issues and retract articles,” Retraction Watch reported, citing an earnings call.
Wiley also revealed that it would stop using the Hindawi brand name and bring the subsidiary’s remaining journals under its own umbrella by the middle of 2024.
The service will incorporate tools to detect signs that submissions originated from paper mills, including similarities with “known papermill hallmarks” and use of “tortured phrases” indicating that passages were translated by AI-based language models, according to a press release.
These tools include:
Papermill Similarity Detection: Checks for known papermill hallmarks and compares content against existing papermills papers.
Problematic Phrase Recognition: Flags unusual alternatives to established terms.
Unusual Publication Behavior Detection: Identifies irregular publishing patterns by paper authors.
Researcher Identity Verification: Helps detect potential bad actors.
Gen-AI Generated Content Detection: Identifies potential misuse of generative AI.
Journal Scope Checker: Analyzes the article’s relevance to the journal.
The company said that the new service will be available through Research Exchange, Wiley’s manuscript submission platform, as early as next year.
Other Efforts to Spot Paper Mill Submissions
Previously, STM announced the launch of the STM Integrity Hub, with a mission “to equip the scholarly communication community with data, intelligence, and technology to protect research integrity,” Program Director Joris van Rossum, PhD, told The Scholarly Kitchen.
In 2023, the group announced that the hub would integrate Papermill Alarm from Clear Skies, a paper mill detection tool launched in 2022 with a focus on cancer research. It uses a “traffic-light rating system for research papers,” according to a press release.
In an announcement about the launch of Wiley’s Papermill Detection service, Retraction Watch suggested that one key to addressing the problem would be to reduce incentives for authors to use paper mills. Those incentives boil down to the pressure placed on many scientists, clinicians, and students to publish manuscripts, according to the research report from STM and COPE.
In one common scenario, the report noted, a paper mill will submit a staff-written paper to multiple journals. If the paper is accepted, the company will list it on a website and offer authorship spaces for sale.
“If a published paper is challenged, the ‘author’ may sometimes back down and ask for the paper to be retracted because of data problems, or they may try to provide additional supporting information including a supporting letter from their institution which is also a fake,” the report noted.
All of this serves as a warning to pathologists and clinical laboratory professionals to carefully evaluate the sources of medical journals publishing studies that feature results on areas of healthcare and lab medicine research that are of interest.
Radboud University researchers fear oncology, molecular biology, pharmacology, and other cell-centric medical research efforts are at risk due to verification that at least 30,000 studies published in 33,000 scientific journals included data derived from misidentified or contaminated cell lines
Many research findings that underpin the science behind various diagnostic technologies used regularly by clinical laboratories and anatomic pathology groups may not be valid. This is because a large number of published studies may have used misidentified or contaminated cell lines.
Biomedical scientists have known for a long time that many research papers exist containing reports on the wrong cells due to cell line misidentification. And yet, few studies have measured the true scope of the problem. Until now. Researchers at Radboud University in the Netherlands have determined that this problem may have influenced the findings of thousands of published research studies and upon which many other research studies were conducted.
Because clinical laboratories and anatomic pathology groups use assays and diagnostic tests that are developed as a result of these research studies, identifying how many published papers have inaccurate findings that cannot be duplicated would affect how and when it is appropriate for physicians to order certain medical laboratory tests and rely on the results.
Additionally, cancer research is based on cell line studies as well. Thus, it may prove necessary to restudy existing published findings and revise them as appropriate. In turn, these new findings might change how and when some cancer tests are ordered and the results interpreted.
“We considered a reference to this original article as a good proxy for the usage of a cell line,” the researchers noted in their study published in the journal PLOS ONE. “Since typically the original papers are focused on reporting the establishment of the cell line only.”
They focused on misidentified cell lines that were caused by HeLa cells, also known as “immortalized cells.” HeLa cells have been used in scientific research for decades. They were the first mass-producible cells that could be used in vitro, making them highly desirable for biomedical research.
However, the process of creating immortalized cells involves mutation, during which contamination can be introduced by other cells. Immortalized cells can be identified as one type of cell when in fact they are actually another type of cell.
Research scientists have been aware of this problem for about as long as immortalized cells have been in use. They attempt to take it into account when completing their analyses, though not always successfully.
The Radboud researchers found 32,655 records of primary literature based on contaminated cell lines. They then cross-referenced the ICLAC Register of Misidentified Cell Lines with a range of databases to determine if articles were available for each of the 451 cell lines listed on Table One of the ICLAC Register.
With this information, they further researched published articles in the Web of Science database using cell line identifiers. They noted both primary literature and any citation report entries for each cell line.
The researchers noted in their published study, “As we only searched for cell lines known to be misidentified, this constitutes a conservative estimate of the scale of contamination in the primary literature. Moreover, to avoid false positives, we excluded several cell lines, such as the ones with non-unique identifiers or the cell lines for which verified stock is still in circulation.”
Their estimate for secondary contaminated literature based off primary articles is larger still. “In total, we can conservatively estimate the citations to the primary contaminated primary literature at over 500,000, excluding self-citations,” the authors noted in their PLOS ONE article. “Thereby leaving traces in a substantial share of the biomedical literature.” They concluded, “… the amount of research potentially building on false grounds remains worrisome.”
Impact of Contaminated Cell Lines on Research, Clinical Laboratory Communities
Many of the assays and diagnostic tests performed by clinical laboratories and pathology groups were developed using cell line research. Should further scrutiny into the ability to duplicate and verify study findings fail to produce positive outcomes, it might call into question the validity and appropriate use of these tests.
For the research community, these findings represent yet another call to promote accountability and define standards for verifying authenticity of cell lines to further strengthen research findings.
The Radboud researchers ranked the number of contaminated articles they discovered by research area. Top affected areas include:
Oncology
Molecular Biology
Pharmacology
Cell Biology
Immunology
The distribution of contaminated primary literature over the research areas as defined by Web of Science. Only the 25 most affected research areas are included. (Graphic copyright: PLOS ONE.)
Addressing the Problem of Cell Line Contamination and Misidentification
Adapting the ever-growing body of published medical literature to reflect the known misidentifications, as well as the possibility of invalid results, will be a major undertaking. Ultimately, resolving this problem could require changes to practices and procedures currently used by research facilities and medical laboratories.
While the cost to authenticate cell lines adds to the bottom line of research projects, the money spent on research that becomes invalidated by misidentified cell lines is far greater.
In a 2015 Retraction Watch article, Leonard P. Freeman, PhD, President, Global Biological Standards Institute, notes, “An NIH RePORT search identified 9,000 active projects using cell lines, totaling $3.7 billion. Required use of authentication techniques would affect over $900 million in research dollars annually.”
Additionally, failure to adapt authentication as a part of standard operations brings other consequences. “A 2004 survey reported that just one-third of laboratories authenticate their cell lines,” Freeman noted. “10 years later, a Sigma-Aldrich survey found that only 37% of respondents ‘validate the purity and identity before first use’ of cell lines. Understanding the existing barriers that prevent implementation of universal cell authentication is central to changing this sad state of affairs.”
Mixed Recommendations for Fixing Inaccurate Published Studies
Of course, none of this will change the vast body of archived literature that might contain errors due to misidentification. Recommendations for addressing this aspect of the problem vary. The Radboud study authors suggest posting notes on any previously published articles stating that misidentified cell lines were used.
However, in a STAT article, Ivan Oransky, MD, and Adam Marcus, Managing Editor, Gastroenterology and Endoscopy News, co-founders of Retraction Watch, recommend more severe measures. “When we polled readers of Retraction Watch last December about the issue, 55% said journals should correct papers known to describe contaminated or misidentified cell lines, and more than 40% said retraction was the right choice.”
Thanks to the Radboud study, as cell lines continue to power the innovations of modern biomedical research, concerns will surely increase surrounding cell-line authentication and research findings. For pathology groups and medical laboratories, staying abreast of these developments will work to ensure data validity and reduce reputation and liability concerns.