How applying AI/ML to data management processes can empower biospecimen and biomarker operations, clinical, and translational teams

By David Caplan, VP, SaaS Solutions, and Dandan Xu, PhD, Sr. Dir. virtual Sample Inventory Management Operations & Data Curation

May 25, 2023 — This month, the United States FDA released a paper requesting feedback from the industry on “Using Artificial Intelligence and Machine Learning (AI/ML) in the Development of Drug and Biological Products.

With the explosive emergence of generative AI frameworks in the last months, we appreciate the authors’ recognition that AI/ML has potential applications across all stages of therapeutic development, ranging from discovery, nonclinical research, and clinical operations, through post-market surveillance and manufacturing.

Every one of these activities is increasingly affected by data management challenges, as drug development becomes decentralized and increases in operational complexity.

Enhancing data management processes while reducing risk

As QuartzBio provides global clinical programs with SaaS (software-as-a-service) solutions, we’ve gathered support for the idea that AI/ML can be a powerful, error-reducing tool for managing clinical sample data as well as biomarker data.

AI/ML-based tools should not replace human judgment, particularly for insight generation, at least until AI/ML-based tools are extensively and rigorously validated (as any piece of critical software would be). Furthermore, regulatory compliance and data privacy are of utmost importance and must be considered when building and using solutions that leverage generative AI frameworks.

However, near-term applications of AI/ML can dramatically improve any tedious process involving a human inspecting data. We list some of these processes in the box below, along with steps we recommend taking to reduce risk in each case.

How AI/ML can be applied for sample inventory data and biomarker data management to augment existing processes and reduce human error:

  • Flagging Inconsistencies: Instead of making definitive changes autonomously, AI/ML can be used to flag potential inconsistencies or outliers in the data, which can then be manually reviewed by data managers.
  • Duplicate Detection: AI/ML can be used to suggest potential duplicates in participant data, with human verification for final decision-making.
  • Automated Data Harmonization: AI/ML can help in the automated identification of similar terms across datasets, proposing potential harmonization candidates.
  • Imputation Recommendations: AI/ML can suggest imputations for missing data based on patterns and correlations in the existing data, to be reviewed and confirmed by data managers.
  • De-identification Assistance: AI/ML can assist in the process of de-identifying data by flagging potential personal identifiers, to be reviewed by humans to ensure no sensitive information is overlooked.
  • Data Quality Dashboard: AI/ML algorithms can continuously monitor data quality metrics and alert data managers to any potential issues, thereby speeding up the detection and resolution process.
  • Training AI/ML Models on Annotated Datasets: AI/ML models can be trained on datasets where errors have been manually annotated. This way, the models will learn to identify similar issues in new data, becoming valuable tools for data managers.
  • Supervised Learning: Use supervised learning where data managers continually train and improve the AI/ML models, gradually enhancing their efficiency and reliability.
  • Enhanced User Experience (UX): Natural language interfaces can be deployed on top of AI models trained on the use of existing validated data platforms, providing an easier way for non-technical users to find, access, summarize, and analyze relevant data.

Use cases: empowering all sponsor teams, regardless of data fluency

At QuartzBio, our mission is to empower cross-functional sponsor teams, including data managers, biospecimen operations, translational research, and biomarker operations, to focus on higher-value decisions and pursuits, instead of remaining mired in manual tasks.

Examples of specific use cases for AI/ML-enhanced workflows for sample data and biomarker data management:

  • Querying biospecimen inventory across clinical programs: Generative AI-based tools that support natural language user queries can make it easier for non-technical teams (such as clinical operations) to extract portfolio-level insights around sample status, consent, and location. For example, a user could ask, “How many of my samples have been tested, have failed the test, and have non-zero remaining volume?”
  • Developing Data Transfer Agreements (DTAs) : Ensuring that data streams are consistent with established DTAs is an important part of managing quality and compliance. However, implementing DTA checks is currently manual, and because DTAs are often developed before data is actually generated, discrepancies are common. An Enterprise Data Platform can be used to train generative AI frameworks on common terms and code lists used across a sponsor’s portfolio to ease generation of DTAs, reducing discrepancies from the outset.
  • Mapping disparate data sources into a common data model: AI/ML can help integrate disparate data sources with inconsistent nomenclature, annotations, and metadata that may be difficult to harmonize. AI/ML-based tools can automatically identify similar terms, propose opportunities for harmonization, and execute data integration, potentially reducing human error and bias.

These examples are just a few of many opportunities that we imagine could improve workflows for the clinical and scientific teams driving important advances in precision medicine.

Have more ideas for applying AI/ML to data management processes that are challenging your team?