February 26, 2021 — Public data assets, such as TCGA, are rich resources for enabling pre-clinical discovery, translational, and clinical biomarker teams to make faster drug development decisions that are informed by human disease data.
Historically, this real-world, human disease data has not had much of a place at these points in the drug life cycle, simply because it didn’t exist. Now that it does, it is imperative that we as researchers figure out the optimal way by which we can integrate findings from public data.
In our last article on derisking drug development using TCGA, we discussed specific applications as well as some of the inherent challenges facing researchers who seek to use TCGA. In particular, significant data QC, processing, and normalizing steps are required to align and format the data as “analysis-ready” before translational researchers can begin interrogating TCGA. These procedures need to be robust in order to guarantee reproducibility, but flexible in order to integrate potential changes in TCGA annotations.
Given the challenges of using TCGA (including as additional data are added), translational researchers increasingly find that pre-aligned and pre-processed TCGA data accelerates their ability to generate reliable insights to inform critical drug development decisions.