All articles

Enterprise AI

Pharma AI Catches Duplicate Patients and Site-Level Fraud Before Clinical Trials Are Compromised

AI Data Press - News Team
|
May 27, 2026

Shruti Kaushal, Senior Data Scientist at AbbVie, explains how an anomaly detection framework catches duplicate patients and site-level risks before they compromise clinical trial integrity.

Credit: AI Data Press News

Make AI Data Press one of your go-to sources on Google

Add AI Data Press on Google

In pharma, we're not building new tech. We're applying existing tech to novel use cases. And our efficiency metric is not about how many hours we save in a day. It's about how much we've been able to accelerate our clinical trials.

Shruti Kaushal

Senior Data Scientist
AbbVie

Shruti Kaushal

Senior Data Scientist
AbbVie

Clinical trial acceleration has become the primary success metric for pharma AI. The most valuable applications are not general-purpose automation tools but systems that protect trial integrity and compress timelines measured in years and billions of dollars. At AbbVie, that principle guided the development of an anomaly detection framework now embedded in the company's governance workflow and running across all trials at enterprise scale.

Shruti Kaushal, Senior Data Scientist on the Experimental AI team at AbbVie, leads the development of scalable ML systems for clinical trial and medical analytics. Her work spans anomaly detection, text classification, and vector-embedding systems for clinical research and patient safety. Before AbbVie, she built an AI-driven drug discovery framework at the Wyss Institute at Harvard, achieving 300% faster throughput on protein-target prediction. She holds a Master of Science in Data Science from Columbia and a Master of Science in Mathematics from IIT Delhi. In April, Kaushal received the Pistoia Alliance Young Data Scientist of the Year Award 2026 at the organization's London conference, which recognizes data scientists driving AI innovation and impact in life sciences R&D.

"Anything that helps accelerate our clinical trial timelines, because the patients are at the center of everything that a pharma company does. A lot of the times when people ask me what experimental AI is and how do you really measure success, it's against these timelines," says Kaushal.

Two engines, one framework

The anomaly detection framework operates with two distinct engines. The first focuses on site-level anomalies, comparing data from each clinical trial site against all others to determine whether something is out of the ordinary. The system catches more than outright fraud. "This framework is able to detect not just extreme cases like fraud. It can also detect data entry errors where everybody has entered hemoglobin in the expected range, but another site ended up adding a zero at the end. These things help with data quality, not just risk management."

The framework produces quarterly health reports across all trials, giving review teams actual signals to investigate rather than forcing them to manually sort through hundreds of sites. "Our review teams are not sitting with 100 sites, thinking which one should I review. They now have actual signals they can drill down into."

The second engine handles duplicate patient detection, a problem that threatens both patient safety and statistical power. So-called "professional patients," motivated by compensation or other factors, enroll in multiple trials simultaneously. That dilutes statistical power and can lead to trial dismissal.

Kaushal says the key advantage is catching these patients before they receive treatment. "We're not detecting once they have been dosed, because then intervening is still compromising their safety. Current industry standards wait until the entire trial is finished and all data is collected. That's when they find they have duplicates. The damage is done."

Getting data right first

Kaushal says the first question her team asks before evaluating any use case is whether the data exists and whether it meets FAIR principles: findable, accessible, interoperable, and reusable. Regulatory restrictions also limit what problems can be addressed.

For organizations looking to prepare their data infrastructure, Kaushal points to two priorities: building a semantic layer that harmonizes all incoming data to a single ontology, and establishing a central data warehouse that links clinical data across formats. "At any given point, I don't have to go to external vendors or do tedious research to figure out what stands for what or which terms need to be converted into which units."

Why the inflection point is now

While pharma has historically lagged behind tech in AI adoption, this year brought notably aggressive moves: dedicated AI verticals and pharma leadership gaining seats at the tech table.

The Novartis CEO now sits on Anthropic's board, a dual signal that communicates AI integration as an industry-wide goal for pharma while positioning the sector as a serious client that will require regulatory and safety considerations baked into the systems tech companies build. That shift is already reflected in products like Claude for Life Sciences and in Novo Nordisk's strategic partnership with OpenAI to integrate advanced AI across its global value chain, from drug discovery to commercial operations.

Kaushal says the catalyst is technical, not just strategic. Generative AI has made connecting multimodal datasets tractable in a way it simply was not before, which is why AI-driven drug discovery is booming despite being a field that has existed for years.

As trial timelines shrink and drug discovery accelerates, what will differentiate pharma companies three to ten years from now is AI applied to cutting-edge research that was previously too complex to implement. Digital twins are a clear example: Roche has published a peer-reviewed Digital Twin-GPT framework for assessing drug efficacy.

Validation is not optional

Every AI model at AbbVie requires a spec document and must be audit-ready. Validation is built into the project plan from the start, not treated as a later-stage checkpoint. Kaushal says this regulatory pressure, often seen as a constraint, actually makes adoption easier. "Human in the loop is almost necessary to have another human verify the signals for the entire process to be audit-ready. That's why it's easier for us to have our internal stakeholders adopt."

The FDA is paying attention. Major policy developments, including published AI reports, the CDER AI Council, and a focus on risk-based evaluation frameworks for AI outputs, signal that governing agencies recognize the pace of pharma tech and are actively restructuring their regulatory approach to enable innovation while protecting patients.

That mandatory rigor is also why Kaushal argues the AI bubble will not burst in pharma. "In pharma, we're not building new tech. We're applying existing tech to novel use cases. And our efficiency metric is not about how many hours we save in a day. It's about how much we've been able to accelerate our clinical trials." She points to AstraZeneca's stated goal of increasing regulatory submissions from one to six per year as evidence that the payoff is real.

The regulatory standards that pharma must follow force robustness from the start. "We will not be able to ship a model and say it works unless it's validated to a point where it meets all regulatory standards and guidelines. Being a data scientist in pharma AI is probably the best position to be in right now if you're more interested in application rather than creation."