Funding, Deals & Partnerships: BIOLOGICS & MEDICAL DEVICES; BioMed e-Series; Medicine and Life Sciences Scientific Journal – http://PharmaceuticalIntelligence.com
#BioIT20 Plenary Keynote: cutting innovative approach to #Science#Game On: How #AI, #CitizenScience#HumanComputation are facilitating the next leap forward in #Genomics and in #Biology may be in #PrecisionMedicine in the Future @pharma_BI@AVIVA1950https://pic.twitter.com/L52qktkeYc
#BioIT20 Plenary Keynote: cutting innovative approach to #Science#Game On: How #AI, #CitizenScience#HumanComputation are facilitating the next leap forward in #Genomics and in #Biology may be in #PrecisionMedicine in the Future @pharma_BI@AVIVA1950https://pic.twitter.com/L52qktkeYc
#BioIT20 Plenary Keynote: cutting innovative approach to #Science#Game On: How #AI, #CitizenScience#HumanComputation are facilitating the next leap forward in #Genomics and in #Biology may be in #PrecisionMedicine in the Future @pharma_BI@AVIVA1950https://pic.twitter.com/L52qktkeYc
NIH Office of Data Science Strategy
@NIHDataScience
·
We’ve made progress with #FAIRData, but we still have a ways to go and our future is bright. #BioIT20#NIHData
#CRISPR Journal BBC Tagging system superior than Metadata efforts in BioScience
Rob Lalonde
@HPC_Cloud_Rob
·
My #BioIT20 talk, “#Bioinformatics in the #Cloud Age,” is tomorrow at 3:30pm. I discuss cloud migration trends in life sciences and #HPC. Join us! A panel with
I’m going to Bio-IT World 2020, Oct 6-8, from home! Its a virtual event. Join me!
My team is participating in Bio-IT World Virtual 2020, October 6-8. Join me! Use discount code 20NUA to save 20%! @bioitworld #BioIT20
invt.io
2
2
NIH Office of Data Science Strategy
@NIHDataScience
·
One of the challenges we face today: we need an algorithm that can search across the 36+ PB of Sequence Read Archive (SRA) data now in the cloud. Imagine what we could do! #BioIT20#NIHdata#SRAdata
5
2
NCBI Staff
@NCBI
·
NCBI’s virtual #BioIT20 booth will open in 15 minutes. There, you can watch videos, grab some flyers and even speak with an expert! https://bio-itworld.pathable.co/organizations/xjq6qckzkbMaYvxAY… The booth will close at 4:15 PM, but we’ll be back tomorrow, Oct 7 and Thursday, Oct 8 at 9AM.
Bio-IT World
Welcome to Bio-IT World Virtual
bio-itworld.pathable.co
1
4
6
Show this thread
PERCAYAI
@percayai
·
Happening soon at #BioIT20: Join our faculty inventor Professor Rich Head’s invited talk “CompBio: An Augmented Intelligence System for Comprehensive Interpretation of Biological Data.”
CIO Kjiersten Fagnan is part of the #BioIT20 Trends in the Trenches panel! Reserve your complimentary pass by Oct. 2 to hear her and others at the Oct. 6-8
RT VishakhaSharma_: Excited to speak and moderate a panel on Emerging #AI technologies bioitworld #BioIT20
2
1
Titian Software
@TitianSoftware
·
Meet Titian at #BioIT20 on 6-8th October and discover the latest research, science and solutions for exploring the world of precision medicine and the technologies that are powering it: https://bit.ly/2GjCj4B
‘s #DayofDravet Virtual Workshop! The opportunity to learn and connect is right around the corner. Pre-register by October 14th to attend this free event! https://bit.ly/3lZVZuv#Dravet
PERCAYAI
@percayai
·
Thanks for joining us, Wendy! You’ve done a great job summing up key points from the discussion. #BioIT20
Systems Biology analysis of Transcription Networks, Artificial Intelligence, and High-End Computing Coming to Fruition in Personalized Oncology
Curator: Stephen J. Williams, Ph.D.
In the June 2020 issue of the journal Science, writer Roxanne Khamsi has an interesting article “Computing Cancer’s Weak Spots; An algorithm to unmask tumors’ molecular linchpins is tested in patients”[1], describing some early successes in the incorporation of cancer genome sequencing in conjunction with artificial intelligence algorithms toward a personalized clinical treatment decision for various tumor types. In 2016, oncologists Amy Tiersten collaborated with systems biologist Andrea Califano and cell biologist Jose Silva at Mount Sinai Hospital to develop a systems biology approach to determine that the drug ruxolitinib, a STAT3 inhibitor, would be effective for one of her patient’s aggressively recurring, Herceptin-resistant breast tumor. Dr. Califano, instead of defining networks of driver mutations, focused on identifying a few transcription factors that act as ‘linchpins’ or master controllers of transcriptional networks withing tumor cells, and in doing so hoping to, in essence, ‘bottleneck’ the transcriptional machinery of potential oncogenic products. As Dr. Castilano states
“targeting those master regulators and you will stop cancer in its tracks, no matter what mutation initially caused it.”
It is important to note that this approach also relies on the ability to sequence tumors by RNA-seq to determine the underlying mutations which alter which master regulators are pertinent in any one tumor. And given the wide tumor heterogeneity in tumor samples, this sequencing effort may have to involve multiple biopsies (as discussed in earlier posts on tumor heterogeneity in renal cancer).
As stated in the article, Califano co-founded a company called Darwin-Health in 2015 to guide doctors by identifying the key transcription factors in a patient’s tumor and suggesting personalized therapeutics to those identified molecular targets (OncoTarget™). He had collaborated with the Jackson Laboratory and most recently Columbia University to conduct a $15 million 3000 patient clinical trial. This was a bit of a stretch from his initial training as a physicist and, in 1986, IBM hired him for some artificial intelligence projects. He then landed in 2003 at Columbia and has been working on identifying these transcriptional nodes that govern cancer survival and tumorigenicity. Dr. Califano had figured that the number of genetic mutations which potentially could be drivers were too vast:
A 2018 study which analyzed more than 9000 tumor samples reported over 1.5 million mutations[2]
and impossible to develop therapeutics against. He reasoned that you would just have to identify the common connections between these pathways or transcriptional nodes and termed them master regulators.
A Pan-Cancer Analysis of Enhancer Expression in Nearly 9000 Patient Samples
Chen H, Li C, Peng X, et al. Cell. 2018;173(2):386-399.e12.
Abstract
The role of enhancers, a key class of non-coding regulatory DNA elements, in cancer development has increasingly been appreciated. Here, we present the detection and characterization of a large number of expressed enhancers in a genome-wide analysis of 8928 tumor samples across 33 cancer types using TCGA RNA-seq data. Compared with matched normal tissues, global enhancer activation was observed in most cancers. Across cancer types, global enhancer activity was positively associated with aneuploidy, but not mutation load, suggesting a hypothesis centered on “chromatin-state” to explain their interplay. Integrating eQTL, mRNA co-expression, and Hi-C data analysis, we developed a computational method to infer causal enhancer-gene interactions, revealing enhancers of clinically actionable genes. Having identified an enhancer ∼140 kb downstream of PD-L1, a major immunotherapy target, we validated it experimentally. This study provides a systematic view of enhancer activity in diverse tumor contexts and suggests the clinical implications of enhancers.
A diagram of how concentrating on these transcriptional linchpins or nodes may be more therapeutically advantageous as only one pharmacologic agent is needed versus multiple agents to inhibit the various upstream pathways:
VIPER Algorithm (Virtual Inference of Protein activity by Enriched Regulon Analysis)
The algorithm that Califano and DarwinHealth developed is a systems biology approach using a tumor’s RNASeq data to determine controlling nodes of transcription. They have recently used the VIPER algorithm to look at RNA-Seq data from more than 10,000 tumor samples from TCGA and identified 407 transcription factor genes that acted as these linchpins across all tumor types. Only 20 to 25 of them were implicated in just one tumor type so these potential nodes are common in many forms of cancer.
Other institutions like the Cold Spring Harbor Laboratories have been using VIPER in their patient tumor analysis. Linchpins for other tumor types have been found. For instance, VIPER identified transcription factors IKZF1 and IKF3 as linchpins in multiple myeloma. But currently approved therapeutics are hard to come by for targets with are transcription factors, as most pharma has concentrated on inhibiting an easier target like kinases and their associated activity. In general, developing transcription factor inhibitors in more difficult an undertaking for multiple reasons.
Identifying the multiple dysregulated oncoproteins that contribute to tumorigenesis in a given patient is crucial for developing personalized treatment plans. However, accurate inference of aberrant protein activity in biological samples is still challenging as genetic alterations are only partially predictive and direct measurements of protein activity are generally not feasible. To address this problem we introduce and experimentally validate a new algorithm, VIPER (Virtual Inference of Protein-activity by Enriched Regulon analysis), for the accurate assessment of protein activity from gene expression data. We use VIPER to evaluate the functional relevance of genetic alterations in regulatory proteins across all TCGA samples. In addition to accurately inferring aberrant protein activity induced by established mutations, we also identify a significant fraction of tumors with aberrant activity of druggable oncoproteins—despite a lack of mutations, and vice-versa. In vitro assays confirmed that VIPER-inferred protein activity outperforms mutational analysis in predicting sensitivity to targeted inhibitors.
Schematic overview of the VIPER algorithm From: Alvarez MJ, Shen Y, Giorgi FM, Lachmann A, Ding BB, Ye BH, Califano A: Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nature genetics 2016, 48(8):838-847.
(a) Molecular layers profiled by different technologies. Transcriptomics measures steady-state mRNA levels; Proteomics quantifies protein levels, including some defined post-translational isoforms; VIPER infers protein activity based on the protein’s regulon, reflecting the abundance of the active protein isoform, including post-translational modifications, proper subcellular localization and interaction with co-factors. (b) Representation of VIPER workflow. A regulatory model is generated from ARACNe-inferred context-specific interactome and Mode of Regulation computed from the correlation between regulator and target genes. Single-sample gene expression signatures are computed from genome-wide expression data, and transformed into regulatory protein activity profiles by the aREA algorithm. (c) Three possible scenarios for the aREA analysis, including increased, decreased or no change in protein activity. The gene expression signature and its absolute value (|GES|) are indicated by color scale bars, induced and repressed target genes according to the regulatory model are indicated by blue and red vertical lines. (d) Pleiotropy Correction is performed by evaluating whether the enrichment of a given regulon (R4) is driven by genes co-regulated by a second regulator (R4∩R1). (e) Benchmark results for VIPER analysis based on multiple-samples gene expression signatures (msVIPER) and single-sample gene expression signatures (VIPER). Boxplots show the accuracy (relative rank for the silenced protein), and the specificity (fraction of proteins inferred as differentially active at p < 0.05) for the 6 benchmark experiments (see Table 2). Different colors indicate different implementations of the aREA algorithm, including 2-tail (2T) and 3-tail (3T), Interaction Confidence (IC) and Pleiotropy Correction (PC).
Other articles from Andrea Califano on VIPER algorithm in cancer include:
Echeverria GV, Ge Z, Seth S, Zhang X, Jeter-Jones S, Zhou X, Cai S, Tu Y, McCoy A, Peoples M, Sun Y, Qiu H, Chang Q, Bristow C, Carugo A, Shao J, Ma X, Harris A, Mundi P, Lau R, Ramamoorthy V, Wu Y, Alvarez MJ, Califano A, Moulder SL, Symmans WF, Marszalek JR, Heffernan TP, Chang JT, Piwnica-Worms H.Sci Transl Med. 2019 Apr 17;11(488):eaav0936. doi: 10.1126/scitranslmed.aav0936.PMID: 30996079
Chen H, Li C, Peng X, Zhou Z, Weinstein JN, Liang H: A Pan-Cancer Analysis of Enhancer Expression in Nearly 9000 Patient Samples. Cell 2018, 173(2):386-399 e312.
Alvarez MJ, Shen Y, Giorgi FM, Lachmann A, Ding BB, Ye BH, Califano A: Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nature genetics 2016, 48(8):838-847.
Other articles of Note on this Open Access Online Journal Include:
This session will provide information regarding methodologic and computational aspects of proteogenomic analysis of tumor samples, particularly in the context of clinical trials. Availability of comprehensive proteomic and matching genomic data for tumor samples characterized by the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) and The Cancer Genome Atlas (TCGA) program will be described, including data access procedures and informatic tools under development. Recent advances on mass spectrometry-based targeted assays for inclusion in clinical trials will also be discussed.
Amanda G Paulovich, Shankha Satpathy, Meenakshi Anurag, Bing Zhang, Steven A Carr
Methods and tools for comprehensive proteogenomic characterization of bulk tumor to needle core biopsies
Shankha Satpathy
TCGA has 11,000 cancers with >20,000 somatic alterations but only 128 proteins as proteomics was still young field
CPTAC is NCI proteomic effort
Chemical labeling approach now method of choice for quantitative proteomics
Looked at ovarian and breast cancers: to measure PTM like phosphorylated the sample preparation is critical
Data access and informatics tools for proteogenomics analysis
Bing Zhang
Raw and processed data (raw MS data) with linked clinical data can be extracted in CPTAC
Python scripts are available for bioinformatic programming
Pathways to clinical translation of mass spectrometry-based assays
Meenakshi Anurag
· Using kinase inhibitor pulldown (KIP) assay to identify unique kinome profiles
· Found single strand break repair defects in endometrial luminal cases, especially with immune checkpoint prognostic tumors
· Paper: JNCI 2019 analyzed 20,000 genes correlated with ET resistant in luminal B cases (selected for a list of 30 genes)
· Validated in METABRIC dataset
· KIP assay uses magnetic beads to pull out kinases to determine druggable kinases
· Looked in xenografts and was able to pull out differential kinomes
· Matched with PDX data so good clinical correlation
· Were able to detect ESR1 fusion correlated with ER+ tumors
The adoption of omic technologies in the cancer clinic is giving rise to an increasing number of large-scale high-dimensional datasets recording multiple aspects of the disease. This creates the need for frameworks for translatable discovery and learning from such data. Like artificial intelligence (AI) and machine learning (ML) for the cancer lab, methods for the clinic need to (i) compare and integrate different data types; (ii) scale with data sizes; (iii) prove interpretable in terms of the known biology and batch effects underlying the data; and (iv) predict previously unknown experimentally verifiable mechanisms. Methods for the clinic, beyond the lab, also need to (v) produce accurate actionable recommendations; (vi) prove relevant to patient populations based upon small cohorts; and (vii) be validated in clinical trials. In this educational session we will present recent studies that demonstrate AI and ML translated to the cancer clinic, from prognosis and diagnosis to therapy.
NOTE: Dr. Fish’s talk is not eligible for CME credit to permit the free flow of information of the commercial interest employee participating.
Ron C. Anafi, Rick L. Stevens, Orly Alter, Guy Fish
Overview of AI approaches in cancer research and patient care
Rick L. Stevens
Deep learning is less likely to saturate as data increases
Deep learning attempts to learn multiple layers of information
The ultimate goal is prediction but this will be the greatest challenge for ML
ML models can integrate data validation and cross database validation
What limits the performance of cross validation is the internal noise of data (reproducibility)
Learning curves: not the more data but more reproducible data is important
Neural networks can outperform classical methods
Important to measure validation accuracy in training set. Class weighting can assist in development of data set for training set especially for unbalanced data sets
Discovering genome-scale predictors of survival and response to treatment with multi-tensor decompositions
Orly Alter
Finding patterns using SVD component analysis. Gene and SVD patterns match 1:1
Comparative spectral decompositions can be used for global datasets
Validation of CNV data using this strategy
Found Ras, Shh and Notch pathways with altered CNV in glioblastoma which correlated with prognosis
These predictors was significantly better than independent prognostic indicator like age of diagnosis
Identifying targets for cancer chronotherapy with unsupervised machine learning
Ron C. Anafi
Many clinicians have noticed that some patients do better when chemo is given at certain times of the day and felt there may be a circadian rhythm or chronotherapeutic effect with respect to side effects or with outcomes
ML used to determine if there is indeed this chronotherapy effect or can we use unstructured data to determine molecular rhythms?
Found a circadian transcription in human lung
Most dataset in cancer from one clinical trial so there might need to be more trials conducted to take into consideration circadian rhythms
Stratifying patients by live-cell biomarkers with random-forest decision trees
Stratifying patients by live-cell biomarkers with random-forest decision trees
Guy Fish CEO Cellanyx Diagnostics
Some clinicians feel we may be overdiagnosing and overtreating certain cancers, especially the indolent disease
This educational session focuses on the chronic wound healing, fibrosis, and cancer “triad.” It emphasizes the similarities and differences seen in these conditions and attempts to clarify why sustained fibrosis commonly supports tumorigenesis. Importance will be placed on cancer-associated fibroblasts (CAFs), vascularity, extracellular matrix (ECM), and chronic conditions like aging. Dr. Dvorak will provide an historical insight into the triad field focusing on the importance of vascular permeability. Dr. Stewart will explain how chronic inflammatory conditions, such as the aging tumor microenvironment (TME), drive cancer progression. The session will close with a review by Dr. Cukierman of the roles that CAFs and self-produced ECMs play in enabling the signaling reciprocity observed between fibrosis and cancer in solid epithelial cancers, such as pancreatic ductal adenocarcinoma.
Harold F Dvorak, Sheila A Stewart, Edna Cukierman
The importance of vascular permeability in tumor stroma generation and wound healing
Harold F Dvorak
Aging in the driver’s seat: Tumor progression and beyond
Sheila A Stewart
Why won’t CAFs stay normal?
Edna Cukierman
Tuesday, June 23
3:00 PM – 5:00 PM EDT
Other Articles on this Open Access Online Journal on Cancer Conferences and Conference Coverage in Real Time Include
Live Notes, Real Time Conference Coverage 2020 AACR Virtual Meeting April 28, 2020 Session on Evaluating Cancer Genomics from Normal Tissues Through Metastatic Disease 3:50 PM
Presenter/Authors
Kelly L. Bolton, Ryan N. Ptashkin, Teng Gao, Lior Braunstein, Sean M. Devlin, Minal Patel, Antonin Berthon, Aijazuddin Syed, Mariko Yabe, Catherine Coombs, Nicole M. Caltabellotta, Mike Walsh, Ken Offit, Zsofia Stadler, Choonsik Lee, Paul Pharoah, Konrad H. Stopsack, Barbara Spitzer, Simon Mantha, James Fagin, Laura Boucai, Christopher J. Gibson, Benjamin Ebert, Andrew L. Young, Todd Druley, Koichi Takahashi, Nancy Gillis, Markus Ball, Eric Padron, David Hyman, Jose Baselga, Larry Norton, Stuart Gardos, Virginia Klimek, Howard Scher, Dean Bajorin, Eder Paraiso, Ryma Benayed, Maria Arcilla, Marc Ladanyi, David Solit, Michael Berger, Martin Tallman, Montserrat Garcia-Closas, Nilanjan Chatterjee, Luis Diaz, Ross Levine, Lindsay Morton, Ahmet Zehir, Elli Papaemmanuil. Memorial Sloan Kettering Cancer Center, New York, NY, University of North Carolina at Chapel Hill, Chapel Hill, NC, University of Cambridge, Cambridge, United Kingdom, Dana-Farber Cancer Institute, Boston, MA, Washington University, St Louis, MO, The University of Texas MD Anderson Cancer Center, Houston, TX, Moffitt Cancer Center, Tampa, FL, National Cancer Institute, Bethesda, MD
Abstract
Recent studies among healthy individuals show evidence of somatic mutations in leukemia-associated genes, referred to as clonal hematopoiesis (CH). To determine the relationship between CH and oncologic therapy we collected sequential blood samples from 525 cancer patients (median sampling interval time = 23 months, range: 6-53 months) of whom 61% received cytotoxic therapy or external beam radiation therapy and 39% received either targeted/immunotherapy or were untreated. Samples were sequenced using deep targeted capture-based platforms. To determine whether CH mutational features were associated with tMN risk, we performed Cox proportional hazards regression on 9,549 cancer patients exposed to oncologic therapy of whom 75 cases developed tMN (median time to transformation=26 months). To further compare the genetic and clonal relationships between tMN and the proceeding CH, we analyzed 35 cases for which paired samples were available. We compared the growth rate of the variant allele fraction (VAF) of CH clones across treatment modalities and in untreated patients. A significant increase in the growth rate of CH mutations was seen in DDR genes among those receiving cytotoxic (p=0.03) or radiation therapy (p=0.02) during the follow-up period compared to patients who did not receive therapy. Similar growth rates among treated and untreated patients were seen for non-DDR CH genes such as DNMT3A. Increasing cumulative exposure to cytotoxic therapy (p=0.01) and external beam radiation therapy (2×10-8) resulted in higher growth rates for DDR CH mutations. Among 34 subjects with at least two CH mutations in which one mutation was in a DDR gene and one in a non-DDR gene, we studied competing clonal dynamics for multiple gene mutations within the same patient. The risk of tMN was positively associated with CH in a known myeloid neoplasm driver mutation (HR=6.9, p<10-6), and increased with the total number of mutations and clone size. The strongest associations were observed for mutations in TP53 and for CH with mutations in spliceosome genes (SRSF2, U2AF1 and SF3B1). Lower hemoglobin, lower platelet counts, lower neutrophil counts, higher red cell distribution width and higher mean corpuscular volume were all positively associated with increased tMN risk. Among 35 cases for which paired samples were available, in 19 patients (59%), we found evidence of at least one of these mutations at the time of pre-tMN sequencing and in 13 (41%), we identified two or more in the pre-tMN sample. In all cases the dominant clone at tMN transformation was defined by a mutation seen at CH Our serial sampling data provide clear evidence that oncologic therapy strongly selects for clones with mutations in the DDR genes and that these clones have limited competitive fitness, in the absence of cytotoxic or radiation therapy. We further validate the relevance of CH as a predictor and precursor of tMN in cancer patients. We show that CH mutations detected prior to tMN diagnosis were consistently part of the dominant clone at tMN diagnosis and demonstrate that oncologic therapy directly promotes clones with mutations in genes associated with chemo-resistant disease such as TP53.
therapy resulted also in clonal evolution and saw changes in splice variants and spliceosome
therapy promotes current DDR mutations
clonal hematopoeisis due to selective pressures
mutations, variants number all predictive of myeloid disease
deferring adjuvant therapy for breast cancer patients with patients in highest MDS risk group based on biomarkers, greatly reduced their risk for MDS
Presenter/AuthorsOlivia W. Lee, Akash Mitra, Won-Chul Lee, Kazutaka Fukumura, Hannah Beird, Miles Andrews, Grant Fischer, John N. Weinstein, Michael A. Davies, Jason Huse, P. Andrew Futreal. The University of Texas MD Anderson Cancer Center, TX, The University of Texas MD Anderson Cancer Center, TX, Olivia Newton-John Cancer Research Institute and School of Cancer Medicine, La Trobe University, AustraliaDisclosures O.W. Lee: None. A. Mitra: None. W. Lee: None. K. Fukumura: None. H. Beird: None. M. Andrews: ; Merck Sharp and Dohme. G. Fischer: None. J.N. Weinstein: None. M.A. Davies: ; Bristol-Myers Squibb. ; Novartis. ; Array BioPharma. ; Roche and Genentech. ; GlaxoSmithKline. ; Sanofi-Aventis. ; AstraZeneca. ; Myriad Genetics. ; Oncothyreon. J. Huse: None. P. Futreal: None.
Abstract: Brain metastases (BM) occur in 10-30% of patients with cancer. Approximately 200,000 new cases of brain metastases are diagnosed in the United States annually, with median survival after diagnosis ranging from 3 to 27 months. Recently, studies have identified significant genetic differences between BM and their corresponding primary tumors. It has been shown that BM harbor clinically actionable mutations that are distinct from those in the primary tumor samples. Additional genomic profiling of BM will provide deeper understanding of the pathogenesis of BM and suggest new therapeutic approaches.
We performed whole-exome sequencing of BM and matched tumors from 41 patients collected from renal cell carcinoma (RCC), breast cancer, lung cancer, and melanoma, which are known to be more likely to develop BM. We profiled total 126 fresh-frozen tumor samples and performed subsequent analyses of BM in comparison to paired primary tumor and extracranial metastases (ECM). We found that lung cancer shared the largest number of mutations between BM and matched tumors (83%), followed by melanoma (74%), RCC (51%), and Breast (26%), indicating that cancer type with high tumor mutational burden share more mutations with BM. Mutational signatures displayed limited differences, suggesting a lack of mutagenic processes specific to BM. However, point-mutation heterogeneity revealed that BM evolve separately into different subclones from their paired tumors regardless of cancer type, and some cancer driver genes were found in BM-specific subclones. These models and findings suggest that these driver genes may drive prometastatic subclones that lead to BM. 32 curated cancer gene mutations were detected and 71% of them were shared between BM and primary tumors or ECM. 29% of mutations were specific to BM, implying that BM often accumulate additional cancer gene mutations that are not present in primary tumors or ECM. Co-mutation analysis revealed a high frequency of TP53 nonsense mutation in BM, mostly in the DNA binding domain, suggesting TP53 nonsense mutation as a possible prerequisite for the development of BM. Copy number alteration analysis showed statistically significant differences between BM and their paired tumor samples in each cancer type (Wilcoxon test, p < 0.0385 for all). Both copy number gains and losses were consistently higher in BM for breast cancer (Wilcoxon test, p =1.307e-5) and lung cancer (Wilcoxon test, p =1.942e-5), implying greater genomic instability during the evolution of BM.
Our findings highlight that there are more unique mutations in BM, with significantly higher copy number alterations and tumor mutational burden. These genomic analyses could provide an opportunity for more reliable diagnostic decision-making, and these findings will be further tested with additional transcriptomic and epigenetic profiling for better characterization of BM-specific tumor microenvironments.
are there genomic signatures different in brain mets versus non metastatic or normal?
32 genes from curated databases were different between brain mets and primary tumor
frequent nonsense mutations in TP53
divergent clonal evolution of drivers in BMets from primary
they were able to match BM with other mutational signatures like smokers and lung cancer signatures
Presenter/AuthorsPeter Horak, Malachi Griffith, Arpad Danos, Beth A. Pitel, Subha Madhavan, Xuelu Liu, Jennifer Lee, Gordana Raca, Shirley Li, Alex H. Wagner, Shashikant Kulkarni, Obi L. Griffith, Debyani Chakravarty, Dmitriy Sonkin. National Center for Tumor Diseases, Heidelberg, Germany, Washington University School of Medicine, St. Louis, MO, Mayo Clinic, Rochester, MN, Georgetown University Medical Center, Washington, DC, Dana-Farber Cancer Institute, Boston, MA, Frederick National Laboratory for Cancer Research, Rockville, MD, University of Southern California, Los Angeles, CA, Sunquest, Boston, MA, Baylor College of Medicine, Houston, TX, Memorial Sloan Kettering Cancer Center, New York, NY, National Cancer Institute, Rockville, MDDisclosures P. Horak: None. M. Griffith: None. A. Danos: None. B.A. Pitel: None. S. Madhavan: ; Perthera Inc. X. Liu: None. J. Lee: None. G. Raca: None. S. Li: ; Sunquest Information Systems, Inc. A.H. Wagner: None. S. Kulkarni: ; Baylor Genetics. O.L. Griffith: None. D. Chakravarty: None. D. Sonkin: None.AbstractSomatic variants in cancer-relevant genes are interpreted from multiple partially overlapping perspectives. When considered in discovery and translational research endeavors, it is important to determine if a particular variant observed in a gene of interest is oncogenic/pathogenic or not, as such knowledge provides the foundation on which targeted cancer treatment research is based. In contrast, clinical applications are dominated by diagnostic, prognostic, or therapeutic interpretations which in part also depends on underlying variant oncogenicity/pathogenicity. The Association for Molecular Pathology, the American Society of Clinical Oncology, and the College of American Pathologists (AMP/ASCO/CAP) have published structured somatic variant clinical interpretation guidelines which specifically address diagnostic, prognostic, and therapeutic implications. These guidelines have been well-received by the oncology community. Many variant knowledgebases, clinical laboratories/centers have adopted or are in the process of adopting these guidelines. The AMP/ASCO/CAP guidelines also describe different data types which are used to determine oncogenicity/pathogenicity of a variant, such as: population frequency, functional data, computational predictions, segregation, and somatic frequency. A second collaborative effort created the European Society for Medical Oncology (ESMO) Scale for Clinical Actionability of molecular Targets to provide a harmonized vocabulary that provides an evidence-based ranking system of molecular targets that supports their value as clinical targets. However, neither of these clinical guideline systems provide systematic and comprehensive procedures for aggregating population frequency, functional data, computational predictions, segregation, and somatic frequency to consistently interpret variant oncogenicity/pathogenicity, as has been published in the ACMG/AMP guidelines for interpretation of pathogenicity of germline variants. In order to address this unmet need for somatic variant oncogenicity/pathogenicity interpretation procedures, the Variant Interpretation for Cancer Consortium (VICC, a GA4GH driver project) Knowledge Curation and Interpretation Standards (KCIS) working group (WG) has developed a Standard Operating Procedure (SOP) with contributions from members of ClinGen Somatic Clinical Domain WG, and ClinGen Somatic/Germline variant curation WG using an approach similar to the ACMG/AMP germline pathogenicity guidelines to categorize evidence of oncogenicity/pathogenicity as very strong, strong, moderate or supporting. This SOP enables consistent and comprehensive assessment of oncogenicity/pathogenicity of somatic variants and latest version of an SOP can be found at https://cancervariants.org/wg/kcis/.
best to use this SOP for somatic mutations and not rearangements
variants based on oncogenicity as strong to weak
useful variant knowledge on pathogenicity curated from known databases
the recommendations would provide some guideline on curating unknown somatic variants versus known variants of hereditary diseases
they have not curated RB1 mutations or variants (or for other RBs like RB2? p130?)
test has a specificity over 90% and intended to used along with guideline
The Circulating Cell-free Genome Atlas Study (clinical trial NCT02889978) (CCGA) study divided into three substudies: highest performing assay, refining assay, validation of assays
methylation based assays worked better than sequencing (bisulfite sequencing)
used a machine learning algorithm to help refine assay
prediction was >90%; subgroup for high clinical suspicion of cancer
HCS sensitivity was 100% and specificity very high; but sensitivity on training set was 40% and results may have been confounded by including kidney cancer
TOO tissue of origin was predicted in greater than 99% in both training and validation sets
A first-of-its-kind prospective study of a multi-cancer blood test to screen and manage 10,000 women with no history of cancer
DETECT-A study: prospective interventional study; can multi blood test be used prospectively and can lead to a personalized care; can the screen be used to complement current therapy?
10,000 women aged 65-75; these women could not have previous cancer and conducted through Geisinger Health Network; multi test detects DNA and protein and standard of care screening
the study focused on safety so a committee was consulted on each case, and used a diagnostic PET-CT
blood test alone not good but combined with protein and CT scans much higher (5 fold increase) detection for breast cancer
there are mutiple opportunities yet at same time there are still challenges to utilize these cell free tests in therapeutic monitoring, diagnostic, and screening however sensitivities for some cancers are still too low to use in large scale screening however can supplement current screening guidelines
we have to ask about false positive rate and need to concentrate on prospective studies
we must consider how tests will be used, population health studies will need to show improved survival
Phylogenetic tracking and minimal residual disease detection using ctDNA in early-stage NSCLC: A lung TRACERx study Chris Abbosh@ucl
TRACERx study in collaboration with Charles Swanton.
multiplex PCR to track 200 SNVs: correlate tumor tissue biopsy with ctDNA
spike in assay shows very good sensitivity and specificity for SNVs variants tracked, did over 400 TRACERx libraries
sensitivity increases when tracking more variants but specificity does go down a bit
tracking variants can show evidence of subclonal dynamics and evolution and copy number deletion events; they also show neoantigen editing or changing of their neoantigens
this assay can detect low variants in a reproducible manner
The TRACERx (TRAcking Cancer Evolution through therapy (Rx)) lung study is a multi-million pound research project taking place over nine years, which will transform our understanding of non-small cell lung cancer (NSCLC) and take a practical step towards an era of precision medicine. The study will uncover mechanisms of cancer evolution by analysing the intratumour heterogeneity in lung tumours from approximately 850 patients and tracking its evolutionary trajectory from diagnosis through to relapse. At £14 million, it’s the biggest single investment in lung cancer research by Cancer Research UK, and the start of a strategic UK-wide focus on the disease, aimed at making real progress for patients.
Led by Professor Charles Swanton at UCL, the study will bring together a network of experts from different disciplines to help integrate clinical and genomic data and identify patients who could benefit from trials of new, targeted treatments. In addition, it will use a whole suite of cutting edge analytical techniques on these patients’ tumour samples, giving unprecedented insight into the genomic landscape of primary and metastatic tumours and the impact of treatment upon this landscape.
In future, TRACERx will enable us to define how intratumour heterogeneity impacts upon cancer immunity throughout tumour evolution and therapy. Such studies will help define how the clinical evaluation of intratumour heterogeneity can inform patient stratification and the development of combinatorial therapies incorporating conventional, targeted and immune based therapeutics.
Intratumour heterogeneity is increasingly recognised as a major hurdle to achieve improvements in therapeutic outcome and biomarker validation. Intratumour genetic diversity provides a substrate for tumour adaptation and evolution. However, the evolutionary genomic landscape of non-small cell lung cancer (NSCLC) and how it changes through the disease course has not been studied in detail. TRACERx is a prospective observational study with the following objectives:
Primary Objectives
Define the relationship between intratumour heterogeneity and clinical outcome following surgery and adjuvant therapy (including relationships between intratumour heterogeneity and clinical disease stage and histological subtypes of NSCLC).
Establish the impact of adjuvant platinum-containing regimens upon intratumour heterogeneity in relapsed disease compared to primary resected tumour.
Key Secondary Objectives
Develop and validate an intratumour heterogeneity (ITH) ratio index as a prognostic and predictive biomarker in relation to disease-free survival and overall survival.
Infer a complete picture of NSCLC evolutionary dynamics – define drivers of genomic instability, metastatic progression and drug resistance by identifying and tracking the dynamics of somatic mutational heterogeneity, and chromosomal structural and numerical instability present in the primary tumour and at metastatic sites. Individual tumour phylogenetic tree analysis will:
Establish the order of somatic events in relation to genomic instability onset and metastatic progression
Decipher genetic “bottlenecking” events following metastasis and drug therapy
Establish dynamics of tumour evolution during the disease course from early to late stage NSCLC.
Initiate a longitudinal biobank of circulating tumour cells (CTCs) and circulating-free tumour DNA (cfDNA) to develop analytical methods for the early detection and monitoring of tumour evolution over time.
Develop a longitudinal tissue resource to serve as a platform to assess the relationship between genetic intratumour heterogeneity and the host immune response.
Define relationships between intratumour heterogeneity and targeted/cytotoxic therapeutic outcome.
Use a lung cancer specific gene panel in a certified Good Clinical Practice (GCP) laboratory environment to define clonally dominant disease drivers to address the role of clonal driver dominance in targeted therapeutic response and to guide stratification of lung cancer treatment and future clinical study inclusion (paired primary-metastatic site comparisons in at least 270 patients with relapsed disease).
Utility of longitudinal circulating tumor DNA (ctDNA) modeling to predict RECIST-defined progression in first-line patients with epidermal growth factor receptor mutation-positive (EGFRm) advanced non-small cell lung cancer (NSCLC)
Martin Johnson
Impact of the EML4-ALK fusion variant on the efficacy of lorlatinib in patients (pts) with ALK-positive advanced non-small cell lung cancer (NSCLC) Todd Bauer
Lorlatinib, a smallmolecule inhibitor of ALK and ROS1, was granted accelerated U.S. Food and Drug Administration approval in November 2018 for patients with ALK-positive metastatic NSCLC whose disease has progressed on crizotinib and at least one other ALK inhibitor or whose disease has progressed on alectinib or ceritinib as the first ALK inhibitor therapy for metastatic disease. Todd M. Bauer, MD, a medical oncologist and senior investigator at Sarah Cannon Research Institute/Tennessee Oncology, PLLC, in Nashville, has been very involved with the development of lorlatinib since the beginning. In the following interview, Dr. Bauer discusses some of lorlatinib’s unique toxicities, as well as his first-hand experiences with the drug.
BACKGROUND: Lorlatinib is a potent, brain-penetrant, third-generation inhibitor of ALK and ROS1 tyrosine kinases with broad coverage of ALK mutations. In a phase 1 study, activity was seen in patients with ALK-positive non-small-cell lung cancer, most of whom had CNS metastases and progression after ALK-directed therapy. We aimed to analyse the overall and intracranial antitumour activity of lorlatinib in patients with ALK-positive, advanced non-small-cell lung cancer.
METHODS: In this phase 2 study, patients with histologically or cytologically ALK-positive or ROS1-positive, advanced, non-small-cell lung cancer, with or without CNS metastases, with an Eastern Cooperative Oncology Group performance status of 0, 1, or 2, and adequate end-organ function were eligible. Patients were enrolled into six different expansion cohorts (EXP1-6) on the basis of ALK and ROS1 status and previous therapy, and were given lorlatinib 100 mg orally once daily continuously in 21-day cycles. The primary endpoint was overall and intracranial tumour response by independent central review, assessed in pooled subgroups of ALK-positive patients. Analyses of activity and safety were based on the safety analysis set (ie, all patients who received at least one dose of lorlatinib) as assessed by independent central review. Patients with measurable CNS metastases at baseline by independent central review were included in the intracranial activity analyses. In this report, we present lorlatinib activity data for the ALK-positive patients (EXP1-5 only), and safety data for all treated patients (EXP1-6). This study is ongoing and is registered with ClinicalTrials.gov, number NCT01970865.
FINDINGS: Between Sept 15, 2015, and Oct 3, 2016, 276 patients were enrolled: 30 who were ALK positive and treatment naive (EXP1); 59 who were ALK positive and received previous crizotinib without (n=27; EXP2) or with (n=32; EXP3A) previous chemotherapy; 28 who were ALK positive and received one previous non-crizotinib ALK tyrosine kinase inhibitor, with or without chemotherapy (EXP3B); 112 who were ALK positive with two (n=66; EXP4) or three (n=46; EXP5) previous ALK tyrosine kinase inhibitors with or without chemotherapy; and 47 who were ROS1 positive with any previous treatment (EXP6). One patient in EXP4 died before receiving lorlatinib and was excluded from the safety analysis set. In treatment-naive patients (EXP1), an objective response was achieved in 27 (90·0%; 95% CI 73·5-97·9) of 30 patients. Three patients in EXP1 had measurable baseline CNS lesions per independent central review, and objective intracranial responses were observed in two (66·7%; 95% CI 9·4-99·2). In ALK-positive patients with at least one previous ALK tyrosine kinase inhibitor (EXP2-5), objective responses were achieved in 93 (47·0%; 39·9-54·2) of 198 patients and objective intracranial response in those with measurable baseline CNS lesions in 51 (63·0%; 51·5-73·4) of 81 patients. Objective response was achieved in 41 (69·5%; 95% CI 56·1-80·8) of 59 patients who had only received previous crizotinib (EXP2-3A), nine (32·1%; 15·9-52·4) of 28 patients with one previous non-crizotinib ALK tyrosine kinase inhibitor (EXP3B), and 43 (38·7%; 29·6-48·5) of 111 patients with two or more previous ALK tyrosine kinase inhibitors (EXP4-5). Objective intracranial response was achieved in 20 (87·0%; 95% CI 66·4-97·2) of 23 patients with measurable baseline CNS lesions in EXP2-3A, five (55·6%; 21·2-86·3) of nine patients in EXP3B, and 26 (53·1%; 38·3-67·5) of 49 patients in EXP4-5. The most common treatment-related adverse events across all patients were hypercholesterolaemia (224 [81%] of 275 patients overall and 43 [16%] grade 3-4) and hypertriglyceridaemia (166 [60%] overall and 43 [16%] grade 3-4). Serious treatment-related adverse events occurred in 19 (7%) of 275 patients and seven patients (3%) permanently discontinued treatment because of treatment-related adverse events. No treatment-related deaths were reported.
INTERPRETATION: Consistent with its broad ALK mutational coverage and CNS penetration, lorlatinib showed substantial overall and intracranial activity both in treatment-naive patients with ALK-positive non-small-cell lung cancer, and in those who had progressed on crizotinib, second-generation ALK tyrosine kinase inhibitors, or after up to three previous ALK tyrosine kinase inhibitors. Thus, lorlatinib could represent an effective treatment option for patients with ALK-positive non-small-cell lung cancer in first-line or subsequent therapy.
loratinib could be used for crizotanib resistant tumors based on EML4-ALK variants present in ctDNA
Live Notes, Real Time Conference Coverage 2020 AACR Virtual Meeting April 27, 2020 Minisymposium on AACR Project Genie & Bioinformatics 4:00 PM – 6:00 PM
April 27, 2020, 4:00 PM – 6:00 PM
Virtual Meeting: All Session Times Are U.S. EDT
Session Type
Virtual Minisymposium
Track(s)
Bioinformatics and Systems Biology
17 Presentations
4:00 PM – 6:00 PM
– Chairperson Gregory J. Riely. Memorial Sloan Kettering Cancer Center, New York, NY
4:00 PM – 4:01 PM
– Introduction Gregory J. Riely. Memorial Sloan Kettering Cancer Center, New York, NY
Precision medicine requires an end-to-end learning healthcare system, wherein the treatment decisions for patients are informed by the prior experiences of similar patients. Oncology is currently leading the way in precision medicine because the genomic and other molecular characteristics of patients and their tumors are routinely collected at scale. A major challenge to realizing the promise of precision medicine is that no single institution is able to sequence and treat sufficient numbers of patients to improve clinical-decision making independently. To overcome this challenge, the AACR launched Project GENIE (Genomics Evidence Neoplasia Information Exchange).
AACR Project GENIE is a publicly accessible international cancer registry of real-world data assembled through data sharing between 19 of the leading cancer centers in the world. Through the efforts of strategic partners Sage Bionetworks (https://sagebionetworks.org) and cBioPortal (www.cbioportal.org), the registry aggregates, harmonizes, and links clinical-grade, next-generation cancer genomic sequencing data with clinical outcomes obtained during routine medical practice from cancer patients treated at these institutions. The consortium and its activities are driven by openness, transparency, and inclusion, ensuring that the project output remains accessible to the global cancer research community for the benefit of all patients.AACR Project GENIE fulfills an unmet need in oncology by providing the statistical power necessary to improve clinical decision-making, particularly in the case of rare cancers and rare variants in common cancers. Additionally, the registry can power novel clinical and translational research.
Because we collect data from nearly every patient sequenced at participating institutions and have committed to sharing only clinical-grade data, the GENIE registry contains enough high-quality data to power decision making on rare cancers or rare variants in common cancers. We see the GENIE data providing another knowledge turn in the virtuous cycle of research, accelerating the pace of drug discovery, improving the clinical trial design, and ultimately benefiting cancer patients globally.
The first set of cancer genomic data aggregated through AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) was available to the global community in January 2017. The seventh data set, GENIE 7.0-public, was released in January 2020 adding more than 9,000 records to the database. The combined data set now includes nearly 80,000 de-identified genomic records collected from patients who were treated at each of the consortium’s participating institutions, making it among the largest fully public cancer genomic data sets released to date. These data will be released to the public every six months. The public release of the eighth data set, GENIE 8.0-public, will take place in July 2020.
The combined data set now includes data for over 80 major cancer types, including data from greater than 12,500 patients with lung cancer, nearly 11,000 patients with breast cancer, and nearly 8,000 patients with colorectal cancer.
For more details about the data, analyses, and summaries of the data attributes from this release, GENIE 7.0-public, consult the data guide.
Users can access the data directly via cbioportal, or download the data directly from Sage Bionetworks. Users will need to create an account for either site and agree to the terms of access.
For frequently asked questions, visit our FAQ page.
In fall of 2019 AACR announced the Bio Collaborative which collected pan cancer data in conjuction and collaboration and support by a host of big pharma and biotech companies
they have a goal to expand to more than 6 cancer types and more than 50,000 records including smoking habits, lifestyle data etc
They have started with NSCLC have have done mutational analysis on these
included is tumor mutational burden and using cbioportal able to explore genomic data even further
treatment data is included as well
need to collect highly CURATED data with PRISM backbone to get more than outcome data, like progression data
they might look to incorporate digital pathology but they are not there yet; will need good artificial intelligence systems
4:01 PM – 4:15 PM
– Invited Speaker Gregory J. Riely. Memorial Sloan Kettering Cancer Center, New York, NY
4:15 PM – 4:20 PM
– Discussion
4:20 PM – 4:30 PM
1092 – A systematic analysis of BRAF mutations and their sensitivity to different BRAF inhibitors: Zohar Barbash, Dikla Haham, Liat Hafzadi, Ron Zipor, Shaul Barth, Arie Aizenman, Lior Zimmerman, Gabi Tarcic. Novellusdx, Jerusalem, Israel
Abstract: The MAPK-ERK signaling cascade is among the most frequently mutated pathways in human cancer, with the BRAF V600 mutation being the most common alteration. FDA-approved BRAF inhibitors as well as combination therapies of BRAF and MEK inhibitors are available and provide survival benefits to patients with a BRAF V600 mutation in several indications. Yet non-V600 BRAF mutations are found in many cancers and are even more prevalent than V600 mutations in certain tumor types. As the use of NGS profiling in precision oncology is becoming more common, novel alterations in BRAF are being uncovered. This has led to the classification of BRAF mutations, which is dependent on its biochemical properties and affects it sensitivity to inhibitors. Therefore, annotation of these novel variants is crucial for assigning correct treatment. Using a high throughput method for functional annotation of MAPK activity, we profiled 151 different BRAF mutations identified in the AACR Project GENIE dataset, and their response to 4 different BRAF inhibitors- vemurafenib and 3 different exploratory 2nd generation inhibitors. The system is based on rapid synthesis of the mutations and expression of the mutated protein together with fluorescently labeled reporters in a cell-based assay. Our results show that from the 151 different BRAF mutations, ~25% were found to activate the MAPK pathway. All of the class 1 and 2 mutations tested were found to be active, providing positive validation for the method. Additionally, many novel activating mutations were identified, some outside of the known domains. When testing the response of the active mutations to different classes of BRAF inhibitors, we show that while vemurafenib efficiently inhibited V600 mutations, other types of mutations and specifically BRAF fusions were not inhibited by this drug. Alternatively, the second-generation experimental inhibitors were effective against both V600 as well as non-V600 mutations.Using this large-scale approach to characterize BRAF mutations, we were able to functionally annotate the largest number of BRAF mutations to date. Our results show that the number of activating variants is large and that they possess differential sensitivity to different types of direct inhibitors. This data can serve as a basis for rational drug design as well as more accurate treatment options for patients.
Molecular profiling is becoming imperative for successful targeted therapies
500 unique mutations in BRAF so need to use bioinformatic pipeline; start with NGS panels then cluster according to different subtypes or class specific patterns
certain mutation like V600E mutations have distinct clustering in tumor types
25% of mutations occur with other mutations; mutations may not be functional; they used highthruput system to analyze other V600 braf mutations to determine if functional
active yet uncharacterized BRAF mutations seen in a major proportion of human tumors
using genomic drug data found that many inhibitors like verafanib are specific to a specific mutation but other inhibitors that are not specific to a cleft can inhibit other BRAF mutants
40% of 135 mutants were functionally active
USE of Functional Profiling instead of just genomic profiling
Q?: They have already used this platform and analysis for RTKs and other genes as well successfully
Q? how do you deal with co reccuring mutations: platform is able to do RTK plus signaling protiens
4:30 PM – 4:35 PM
– Discussion
4:35 PM – 4:45 PM
1093 – Calibration Tool for Genomic Aggregates (CTGA): A deep learning framework for calibrating somatic mutation profiling data from conventional gene panel data. Jordan Anaya, Craig Cummings, Jocelyn Lee, Alexander Baras. Johns Hopkins Sidney Kimmel Comprehensive Cancer Center, MD, Genentech, Inc., CA, AACR, Philadelphia, PA
Abstract: It has been suggested that aggregate genomic measures such as mutational burden can be associated with response to immunotherapy. Arguably, the gold standard for deriving such aggregate genomic measures (AGMs) would be from exome level sequencing. While many clinical trials run exome level sequencing, the vast majority of routine genomic testing performed today, as seen in AACR Project GENIE, is targeted / gene-panel based sequencing.
Despite the smaller size of these gene panels focused on clinically targetable alterations, it has been shown they can estimate, to some degree, exomic mutational burden; usually by normalizing mutation count by the relevant size of the panels. These smaller gene panels exhibit significant variability both in terms of accuracy relative to exomic measures and in comparison to other gene panels. While many genes are common to the panels in AACR Project GENIE, hundreds are not. These differences in extent of coverage and genomic loci examined can result in biases that may negatively impact panel to panel comparability.
To address these issues we developed a deep learning framework to model exomic AGMs, such as mutational burden, from gene panel data as seen in AACR Project GENIE. This framework can leverage any available sample and variant level information, in which variants are featurized to effectively re-weight their importance when estimating a given AGM, such as mutational burden, through the use of multiple instance learning techniques in this form of weakly supervised data.
Using TCGA data in conjunction with AACR Project GENIE gene panel definitions, as a proof of concept, we first applied this framework to learn expected variant features such as codons and genomic position from mutational data (greater than 99.9% accuracy observed). Having established the validity of the approach, we then applied this framework to somatic mutation profiling data in which we show that data from gene panels can be calibrated to exomic TMB and thereby improve panel to panel compatibility. We observed approximately 25% improvements in mean squared error and R-squared metrics when using our framework over conventional approaches to estimate TMB from gene panel data across the 9 tumors types examined (spanning melanoma, lung cancer, colon cancer, and others). This work highlights the application of sophisticated machine learning approaches towards the development of needed calibration techniques across seemingly disparate gene panel assays used clinically today.
4:45 PM – 4:50 PM
– Discussion
4:50 PM – 5:00 PM
1094 – Genetic determinants of EGFR-driven lung cancer growth and therapeutic response in vivoGiorgia Foggetti, Chuan Li, Hongchen Cai, Wen-Yang Lin, Deborah Ayeni, Katherine Hastings, Laura Andrejka, Dylan Maghini, Robert Homer, Dmitri A. Petrov, Monte M. Winslow, Katerina Politi. Yale School of Medicine, New Haven, CT, Stanford University School of Medicine, Stanford, CA, Stanford University School of Medicine, Stanford, CA, Yale School of Medicine, New Haven, CT, Stanford University School of Medicine, Stanford, CA, Yale School of Medicine, New Haven, CT
5:00 PM – 5:05 PM
– Discussion
5:05 PM – 5:15 PM
1095 – Comprehensive pan-cancer analyses of RAS genomic diversityRobert Scharpf, Gregory Riely, Mark Awad, Michele Lenoue-Newton, Biagio Ricciuti, Julia Rudolph, Leon Raskin, Andrew Park, Jocelyn Lee, Christine Lovly, Valsamo Anagnostou. Johns Hopkins Sidney Kimmel Comprehensive Cancer Center, Baltimore, MD, Memorial Sloan Kettering Cancer Center, New York, NY, Dana-Farber Cancer Institute, Boston, MA, Vanderbilt-Ingram Cancer Center, Nashville, TN, Amgen, Inc., Thousand Oaks, CA, AACR, Philadelphia, PA
5:15 PM – 5:20 PM
– Discussion
5:20 PM – 5:30 PM
1096 – Harmonization standards from the Variant Interpretation for Cancer Consortium. Alex H. Wagner, Reece K. Hart, Larry Babb, Robert R. Freimuth, Adam Coffman, Yonghao Liang, Beth Pitel, Angshumoy Roy, Matthew Brush, Jennifer Lee, Anna Lu, Thomas Coard, Shruti Rao, Deborah Ritter, Brian Walsh, Susan Mockus, Peter Horak, Ian King, Dmitriy Sonkin, Subha Madhavan, Gordana Raca, Debyani Chakravarty, Malachi Griffith, Obi L. Griffith. Washington University School of Medicine, Saint Louis, MO, Reece Hart Consulting, CA, Broad Institute, Boston, MA, Mayo Clinic, Rochester, MN, Washington University School of Medicine, Saint Louis, MO, Washington University School of Medicine, Saint Louis, MO, Baylor College of Medicine, Houston, TX, Oregon Health and Science University, Portland, OR, National Cancer Institute, Bethesda, MD, Georgetown University, Washington, DC, The Jackson Laboratory for Genomic Medicine, Farmington, CT, National Center for Tumor Diseases, Heidelberg, Germany, University of Toronto, Toronto, ON, Canada, University of Southern California, Los Angeles, CA, Memorial Sloan Kettering Cancer Center, New York, NY
Abstract: The use of clinical gene sequencing is now commonplace, and genome analysts and molecular pathologists are often tasked with the labor-intensive process of interpreting the clinical significance of large numbers of tumor variants. Numerous independent knowledge bases have been constructed to alleviate this manual burden, however these knowledgebases are non-interoperable. As a result, the analyst is left with a difficult tradeoff: for each knowledgebase used the analyst must understand the nuances particular to that resource and integrate its evidence accordingly when generating the clinical report, but for each knowledgebase omitted there is increased potential for missed findings of clinical significance.The Variant Interpretation for Cancer Consortium (VICC; cancervariants.org) was formed as a driver project of the Global Alliance for Genomics and Health (GA4GH; ga4gh.org) to address this concern. VICC members include representatives from several major somatic interpretation knowledgebases including CIViC, OncoKB, Jax-CKB, the Weill Cornell PMKB, the IRB-Barcelona Cancer Biomarkers Database, and others. Previously, the VICC built and reported on a harmonized meta-knowledgebase of 19,551 biomarker associations of harmonized variants, diseases, drugs, and evidence across the constituent resources.In that study, we analyzed the frequency with which the tumor samples from the AACR Project GENIE cohort would match to harmonized associations. Variant matches increased dramatically from 57% to 86% when broader matching to regions describing categorical variants were allowed. Unlike precise sequence variants with specified alternate alleles, categorical variants describe a collection of potential variants with a common feature, such as “V600” (non-valine alleles at the 600 residue), “Exon 20 mutations” (all non-silent mutations in exon 20), or “Gain-of-function” (hypermorphic alterations that activate or amplify gene activity). However, matching observed sequence variants to categorical variants is challenging, as the latter are typically only described as unstructured text. Here we describe the expressive and computational GA4GH Variation Representation specification (vr-spec.readthedocs.io), which we co-developed as members of the GA4GH Genomic Knowledge Standards work stream. This specification provides a schema for common, precise forms of variation (e.g. SNVs and Indels) and the method for computing identifiers from these objects. We highlight key aspects of the specification and our work to apply it to the characterization of categorical variation, showcasing the variant terminology and classification tools developed by the VICC to support this effort. These standards and tools are free, open-source, and extensible, overcoming barriers to standardized variant knowledge sharing and search.
store information from different databases by curating them and classifying them then harmonizing them into values
harmonize each variant across their knowledgebase; at any level of evidence
had 29% of patients variants that matched when compare across many knowledgebase databases versus only 13% when using individual databases
they are also trying to curate the database so a variant will have one code instead of various refseq codes or protein codes
VIC is an open consortium
5:30 PM – 5:35 PM
– Discussion
5:35 PM – 5:45 PM
1097 – FGFR2 in-frame indels: A novel targetable alteration in intrahepatic cholangiocarcinoma. Yvonne Y. Li, James M. Cleary, Srivatsan Raghavan, Liam F. Spurr, Qibiao Wu, Lei Shi, Lauren K. Brais, Maureen Loftus, Lipika Goyal, Anuj K. Patel, Atul B. Shinagare, Thomas E. Clancy, Geoffrey Shapiro, Ethan Cerami, William R. Sellers, William C. Hahn, Matthew Meyerson, Nabeel Bardeesy, Andrew D. Cherniack, Brian M. Wolpin. Dana-Farber Cancer Institute, Boston, MA, Dana-Farber Cancer Institute, Boston, MA, Massachusetts General Hospital, Boston, MA, Brigham and Women’s Hospital, Boston, MA, Dana-Farber Cancer Institute, Boston, MA, Dana-Farber Cancer Institute, Boston, MA, Broad Institute of MIT and Harvard, Cambridge, MA, Massachusetts General Hospital, Boston, MA
5:45 PM – 5:50 PM
– Discussion
5:50 PM – 6:00 PM
– Closing RemarksGregory J. Riely. Memorial Sloan Kettering Cancer Center, New York, NY
Personalized Medicine, Omics, and Health Disparities in Cancer: Can Personalized Medicine Help Reduce the Disparity Problem?
Curator: Stephen J. Williams, PhD
In a Science Perspectives article by Timothy Rebbeck, health disparities, specifically cancer disparities existing in the sub-Saharan African (SSA) nations, highlighting the cancer incidence disparities which exist compared with cancer incidence in high income areas of the world [1]. The sub-Saharan African nations display a much higher incidence of prostate, breast, and cervix cancer and these cancers are predicted to double within the next twenty years, according to IARC[2]. Most importantly,
the histopathologic and demographic features of these tumors differ from those in high-income countries
meaning that the differences seen in incidence may reflect a true health disparity as increases rates in these cancers are not seen in high income countries (HIC).
Most frequent male cancers in SSA include prostate, lung, liver, leukemia, non-Hodgkin’s lymphoma, and Kaposi’s sarcoma (a cancer frequently seen in HIV infected patients [3]). In SSA women, breast and cervical cancer are the most common and these display higher rates than seen in high income countries. In fact, liver cancer is seen in SSA females at twice the rate, and in SSA males almost three times the rate as in high income countries.
Reasons for cancer disparity in SSA
Patients with cancer are often diagnosed at a late stage in SSA countries. This contrasts with patients from high income countries, which have their cancers usually diagnosed at an earlier stage, and with many cancers, like breast[4], ovarian[5, 6], and colon, detecting the tumor in the early stages is critical for a favorable outcome and prognosis[7-10]. In addition, late diagnosis also limits many therapeutic options for the cancer patient and diseases at later stages are much harder to manage, especially with respect to unresponsiveness and/or resistance of many therapies. In addition, treatments have to be performed in low-resource settings in SSA, and availability of clinical lab work and imaging technologies may be limited.
Molecular differences in SSA versus HIC cancers which may account for disparities
Emerging evidence suggests that there are distinct molecular signatures with SSA tumors with respect to histotype and pathology. For example Dr. Rebbeck mentions that Nigerian breast cancers were defined by increased mutational signatures associated with deficiency of the homologous recombination DNA repair pathway, pervasive mutations in the tumor suppressor gene TP53, mutations in GATA binding protein 3 (GATA3), and greater mutational burden, compared with breast tumors from African Americans or Caucasians[11]. However more research will be required to understand the etiology and causal factors related to this molecular distinction in mutational spectra.
It is believed that there is a higher rate of hereditary cancers in SSA. And many SSA cancers exhibit the more aggressive phenotype than in other parts of the world. For example breast tumors in SSA black cases are twice as likely than SSA Caucasian cases to be of the triple negative phenotype, which is generally more aggressive and tougher to detect and treat, as triple negative cancers are HER2 negative and therefore are not a candidate for Herceptin. Also BRCA1/2 mutations are more frequent in black SSA cases than in Caucasian SSA cases [12, 13].
Initiatives to Combat Health Disparities in SSA
Multiple initiatives are being proposed or in action to bring personalized medicine to the sub-Saharan African nations. These include:
H3Africa empowers African researchers to be competitive in genomic sciences, establishes and nurtures effective collaborations among African researchers on the African continent, and generates unique data that could be used to improve both African and global health.
There is currently a global effort to apply genomic science and associated technologies to further the understanding of health and disease in diverse populations. These efforts work to identify individuals and populations who are at risk for developing specific diseases, and to better understand underlying genetic and environmental contributions to that risk. Given the large amount of genetic diversity on the African continent, there exists an enormous opportunity to utilize such approaches to benefit African populations and to inform global health.
The Human Heredity and Health in Africa (H3Africa) consortium facilitates fundamental research into diseases on the African continent while also developing infrastructure, resources, training, and ethical guidelines to support a sustainable African research enterprise – led by African scientists, for the African people. The initiative consists of 51 African projects that include population-based genomic studies of common, non-communicable disorders such as heart and renal disease, as well as communicable diseases such as tuberculosis. These studies are led by African scientists and use genetic, clinical, and epidemiologic methods to identify hereditary and environmental contributions to health and disease. To establish a foundation for African scientists to continue this essential work into the future work, the consortium also supports many crucial capacity building elements, such as: ethical, legal, and social implications research; training and capacity building for bioinformatics; capacity for biobanking; and coordination and networking.
Advancing precision medicine in a way that is equitable and beneficial to society means ensuring that healthcare systems can adopt the most scientifically and technologically appropriate approaches to a more targeted and personalized way of diagnosing and treating disease. In certain instances, countries or institutions may be able to bypass, or “leapfrog”, legacy systems or approaches that prevail in developed country contexts.
The World Economic Forum’s Leapfrogging with Precision Medicine project will develop a set of tools and case studies demonstrating how a precision medicine approach in countries with greenfield policy spaces can potentially transform their healthcare delivery and outcomes. Policies and governance mechanisms that enable leapfrogging will be iterated and scaled up to other projects.
Successes in personalized genomic research in SSA
As Dr. Rebbeck states:
Because of the underlying genetic and genomic relationships between Africans and members of the African diaspora (primarily in North America and Europe), knowledge gained from research in SSA can be used to address health disparities that are prevalent in members of the African diaspora.
For example members of the West African heritage and genomic ancestry has been reported to confer the highest genomic risk for prostate cancer in any worldwide population [14].
Science 03 Jan 2020:
Vol. 367, Issue 6473, pp. 27-28
DOI: 10.1126/science.aay474
Summary/Abstract
Cancer is an increasing global public health burden. This is especially the case in sub-Saharan Africa (SSA); high rates of cancer—particularly of the prostate, breast, and cervix—characterize cancer in most countries in SSA. The number of these cancers in SSA is predicted to more than double in the next 20 years (1). Both the explanations for these increasing rates and the solutions to address this cancer epidemic require SSA-specific data and approaches. The histopathologic and demographic features of these tumors differ from those in high-income countries (HICs). Basic knowledge of the epidemiology, clinical features, and molecular characteristics of cancers in SSA is needed to build prevention and treatment tools that will address the future cancer burden. The distinct distribution and determinants of cancer in SSA provide an opportunity to generate knowledge about cancer risk factors, genomics, and opportunities for prevention and treatment globally, not only in Africa.
Parkin DM, Ferlay J, Jemal A, Borok M, Manraj S, N’Da G, Ogunbiyi F, Liu B, Bray F: Cancer in Sub-Saharan Africa: International Agency for Research on Cancer; 2018.
Chinula L, Moses A, Gopal S: HIV-associated malignancies in sub-Saharan Africa: progress, challenges, and opportunities. Current opinion in HIV and AIDS 2017, 12(1):89-95.
Colditz GA: Epidemiology of breast cancer. Findings from the nurses’ health study. Cancer 1993, 71(4 Suppl):1480-1489.
Hamilton TC, Penault-Llorca F, Dauplat J: [Natural history of ovarian adenocarcinomas: from epidemiology to experimentation]. Contracept Fertil Sex 1998, 26(11):800-804.
Garner EI: Advances in the early detection of ovarian carcinoma. J Reprod Med 2005, 50(6):447-453.
Brockbank EC, Harry V, Kolomainen D, Mukhopadhyay D, Sohaib A, Bridges JE, Nobbenhuis MA, Shepherd JH, Ind TE, Barton DP: Laparoscopic staging for apparent early stage ovarian or fallopian tube cancer. First case series from a UK cancer centre and systematic literature review. European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology 2013, 39(8):912-917.
Kolligs FT: Diagnostics and Epidemiology of Colorectal Cancer. Visceral medicine 2016, 32(3):158-164.
Rocken C, Neumann U, Ebert MP: [New approaches to early detection, estimation of prognosis and therapy for malignant tumours of the gastrointestinal tract]. Zeitschrift fur Gastroenterologie 2008, 46(2):216-222.
Srivastava S, Verma M, Henson DE: Biomarkers for early detection of colon cancer. Clinical cancer research : an official journal of the American Association for Cancer Research 2001, 7(5):1118-1126.
Pitt JJ, Riester M, Zheng Y, Yoshimatsu TF, Sanni A, Oluwasola O, Veloso A, Labrot E, Wang S, Odetunde A et al: Characterization of Nigerian breast cancer reveals prevalent homologous recombination deficiency and aggressive molecular features. Nature communications 2018, 9(1):4181.
Zheng Y, Walsh T, Gulsuner S, Casadei S, Lee MK, Ogundiran TO, Ademola A, Falusi AG, Adebamowo CA, Oluwasola AO et al: Inherited Breast Cancer in Nigerian Women. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2018, 36(28):2820-2825.
Rebbeck TR, Friebel TM, Friedman E, Hamann U, Huo D, Kwong A, Olah E, Olopade OI, Solano AR, Teo SH et al: Mutational spectrum in a worldwide study of 29,700 families with BRCA1 or BRCA2 mutations. Human mutation 2018, 39(5):593-620.
Lachance J, Berens AJ, Hansen MEB, Teng AK, Tishkoff SA, Rebbeck TR: Genetic Hitchhiking and Population Bottlenecks Contribute to Prostate Cancer Disparities in Men of African Descent. Cancer research 2018, 78(9):2432-2443.
Other articles on Cancer Health Disparities and Genomics on this Online Open Access Journal Include:
NIH – agenda on data: diverse sets of data: Images of MRI, cells, of organs, of communities,
Share images and link it to tables
METADATA 34PB enable search – moving Data to clouds for Large-Scalable Analysis
Sequence Read Archive (SRA) – DNA seq.
COVID-19 from around the World SRA in Cloud Partnerships enabled
Open Science – enhance SW tools for making research cloud-ready
NIH has 12 Centers: Genomics, Neuro-imaging
SCH – Smart & Connected Health
IT, Sensor system hardware, effective usability, medical interpretation, Transformative data Science
Cancer, Alzheimer’s, Genomics, Medical Imaging, Brain circuits,
Coding it Forward: Students come to NIH Virtually from home to join CIVIL DIGITAL FELLOWSHIP
COVID-19: repositories of data for researches:
Treatment for Interventions
Long term Sequelae
Clinical platforms: BigData Catalyst, Allow US, ADSO, National COVID Cohort
Across platforms: workflow after RAS August Deploy: Passport for researchers to access data faster, Privacy-Preserving Tokens, Interoperability across clinical COVID data bases
Metadata super rich to link to other new data sources is a challenging issue to solve across studies
Scott Parker
Sinequa Corp
Director of Product Marketing
Disconnect between R&D & IT
Intelligence search Applications for sensitive information: Sinequa is a leader
shares one index cost for document go down & productivity increases
Rebecca Baker
NIH OD
Dir HEAL Initiative
END ADDICTION Project – NIH HEAL Initiative: 20 NIH collaborating on Studies
National Overdose Deaths overdose opioid drugs – synthetic Fentanyl
Heroin, Cocaine, Methamphetamine
During COVID Overdose increased during the pandemic
Increase in drug use overall and 67% of Fentanyl
Chronic Pain: Daily severe pain: can’t go to work – 25 Million
$500 Million/year Sustained Research Investment 25+ HEAL Research Programs
HEAL Initiative: Pain management, Translating research, New presention, enhance outcomes for affected newborns, novel medications options Pre-clinical translational research in Pain management
Improving treatments for opioid misuse & addiction
Opioid disorder people do not receive treatment: justice community, collaborative, ER, pregnant mothers
Medication-based treatment – do not stay long enough to achieve long-term recovery
People experience Pain differently: Muscular, neurological, : Biomarkers, endpoints, signatures, test non-addictive treatments for specific pains
Pain control balance of risks of long-term opioid therapy
HEAL Research – infant born after exposure to opioids in utero affect brain growth, born with withdrawal syndromes
Diversity of Data under HEAL Initiative –>> Harmonize the data
Common Data Elements in HEAL Clinical Research in Pain Management
CORE CDE & Supplemental CDE
Making HEAL Data FAIR: Findable, Accessible, Interpretable, Reusable
LINK HEAL data with communities studies, predict behaviours
Data sharing made available to the public
HEAL Data Lifecycle
effect of change due to change in dosage used – if dat is not collected – then we are not able to explore the relationships
Use the data to advance research beyond the current understanding of the problem
#NIHhealthInitiative
Ari Berman
BioTeam Inc
Chief Executive Officer
Distributed Questions from the Audience to the speakers
10:00 AM – 11:25 AM EDT on Tuesday, October 6
How to Hold on to Your Knowledge in an Agile World
Etzard Stolte
Roche Pharma
Global Head
October 7, 2020
The Chicagoland COVID-19 Commons: A Regional Data Commons Powering Research to Support Public Health Efforts
Matthew Trunnell
VP & Chief Data Officer
9:00 AM – 9:20 AM EDT on Wednesday, October 7
Seattle & COVID – samples from Seattle Flu Study
Public Health Practice vs Research – Data from Human Subjects: Avoid delute the control
Chicagoland COVID-19 Data Commons – in Chicago
Neighborhood level in Chicago
common data model
power efforts Predictive modeling : Case rate Total confirmed cases, Death cases
Panel Sizes – 500-1000x – the bigger the panel – more computational time more data need be investigated
Hotspot Panels,
Gene Panels,
Exomes
Cell free DNA Testing – Liquid biopsy
Apoptosis
Necrosis
FoundationONE
Patient Results: ALL mutations found, Mutation Burden,
Gene EGFR – no mutation
For every Mutation what Therapy is recommended for approved drugs
Clinical Trials for the mutations
VARIANTS of unknown significance
WORKFLOW: many MDs send sample get 38pps report
Genomic Classification and Prognosis in AML: Mutations subset and therapies available
Paradigm Shift in Classification
2013 – Lung Adenocarcinoma <<<- –
2011 – another cancer
mTOR System: A Database for Systems-Level Biomarker Discovery in Cancer
Iman Tavassoly – CANCELLED
C2i Genomics
Physician Scientist
10:20 AM – 10:40 AM EDT on Wednesday, October 7 Add to Calendar
mTOR system is a database I have designed for exploring biomarkers and systems-level data related to mTOR pathway in cancer. This database consists of different layers of molecular markers and quantitative parameters assigned to them through a current mathematical model. This database is an example of merging systems-level data with mathematical models for precision oncology.
FAIR and the (Tr)end of Data Lakes
Kees Van Bochove
The Hyve
Founder & Owner
10:20 AM – 10:40 AM EDT on Wednesday, October 7
Normalizing Regulatory Data Using Natural Language Processing (NLP)
Qais Hatim, Dr.
FDA CDER
Visiting Assoc
David Milward
Linguamatics
Senior Director, NLP Technology
10:40 AM – 11:00 AM EDT on Wednesday, October 7
ML focus on Disease
NLP – different words have same meanings, different expression same meaning, grammer & Meaning
Normalizes output
Disease
Genes
Dates
Mutations
Transform Unstructured into structured
Identifying Gaps in adverse events Labelling: Pain and Opioids
Improve drug safety
ChemAxon
Supplemental Approval Letters
Coding for Adverse events: “derived values of possible interest”
Use of Prominent Terminologies used at the FDA: UNII – Translation into ANSI tesaurus standard
Matching to the Variation found within Real Text: synonyms
Using ML for Normalization in Disease Context
Deep Learning PRE-TRAINING APPROACH for annotated date = supervised learning
A set of rules to handle overlapping entities
normalized the amp extracted from concepts
BERN and Terminologies: BioBERN, PubMed Central, PubMed Articles
NER – Named Entity Recognition
Evaluation of the Approach
Conclusions
NLP, ML, Hybrid methods, Terminology +ML methods
Building an Artificial Intelligence-Based Vaccine Discovery System: Applications in Infectious Diseases & Personalized Neoantigen-Related Immunotherapy for Treatment of Cancers
Kamal Rawal
Amity Univ
Assoc Prof
10:40 AM – 11:00 AM EDT on Wednesday, October 7
Classification of proteins
Data Collection
Feature Selection – Most important from 1447 features
Deep learning Model: Vaxi-DL: Layers, compilation
Overfitting Model strategy
Balancing Imbalanced
Hyper parameter tuning: Internal parameter of the model
Stratified K-Fold Training and Validation
Ensembling Approach: many weak classifier to create a STRONG Classifier
ROC Curve: Ensemble by Consensus
Before and after calibration
Benchmarking the system: Vaxi-DL Ensemble by Average vs by Consensus
Cohen Veterans Bioscience – not for profit – advancing Brain health
Biotyping and stratification
Biomarkers
Omics data
All meet in the Common – Brain Commons: Clinician, Geneticist, Scientist, Bioinformatician, R Studio, Python, Jupyterhub
Multidimensional Biomarkers in Multiple Sclerosis
Pietro Michelucci
Human Computation Institute
Director
Why machine can’t tackle AI on their own and AI can’t do Precision Medicine on their own
young people more than others N of 1 – Precision Mediicne
Scandinavians and Russians are immune
AI & Precision Medicine: can’t solve the complexity of messy data vs big data
Messy data: heterogeneous multidimensional, to many combinations to explore, select which combination to explore vs let the machine generate all the combination and do analysis on all and discover PATTERN
Causal vs spurious
Logical reasoning, right brain abstract and short cuts – Human brain does routinely
Human do better on context: Not all info is in pixels such as context
#ADS – SBIR suspected the hypothesis to be tested
improving crowd wisdom methods: 20 input by different people PLUS machine
combine crowd answers with machine faster and improved accuracy
Machine has no intuition – machine bias of Human and of machine is similar
Wisdom of Crowd: Bootstrapping hybrid Intelligence: CIVIUM
Advanced Imaging and AI Technologies Providing New Image and Data Analysis Challenges and Opportunities
Richard Goodwin
AstraZeneca
Dir & Head of Imaging & AI
2:30 PM – 2:50 PM EDT on Wednesday, October 7
AstraZeneca is empowering its scientists to see the complexity of a disease in unprecedented detail to enable effective development and selection of new medicines. This is enabled though the use of an extensive range of cutting-edge imaging technologies that support studies into the efficacy and safety of drugs through the R&D pipeline. This presentation will introduce the range of novel in vivo and ex vivo imaging technologies employed, describe the data challenges associated with scaling up the use of molecular imaging technologies, and address the new data integration and mining challenges. Novel computational methods are required for large cohort imaging studies that involve tissue based multi-omics analysis, which integrate spatial relationships in unprecedented detail.
Small molecule – not suitable for complex diseases
focus on quality vs quantity
compound for commercial value
right safety
Imaging supports R&D: Molecular, medical, big data and AI
convergence of ML for decision making
Spatial imaging: morphology
Multiplex imaging like MRI
Multimodal analysis: tissue data and invivo holistic understanding of drug delivery
spacial transcriptomics proteomics: imaging platforms in R&D
AZ invest in imaging technologies already impacting projects: AI-empowered imaging delivering subcellular resolution
Mass Spec Imaging (MSI) – ex-vivo imaging techniques- spatial distribution of molecular
cartography of cancer: Drug metabolite distribution – NEW understanding of disease and drug distribution in tissue
Digital pathology and beyond – AI Image Analysis – AI outperform pathololigst and radiologists
Data volume and dimensionality challenge and opportunity
Data volume and dimensionality: complete image
AZ Oncology – disease is understood for drug discovery using Imaging technology
PANEL: Framework and Approach to Unlock the Potential of Quantum Computing in Drug Discovery
Brian Martin
AbbVie Inc
Research Fellow & Head
Philipp Harbach
Merck KGaA
Head of In Silico Research in Germany
chemistry and manufacturing with QC – end user in Pharmaceutical
VC at Merck ask expert in Merck to guide investment of Merck in QC
50 people across Merck [three areas at Merck [Pharmaceutics, Animal Health, Diagnostics]
Celia Merzbacher
SRI Intl
Assoc Dir Quantum Economic Dev Consortium (QEDC)
Methodology from Pistoia to be used in QC
QC R&D developed in parallel
Simulation of all the components is possible
John Wise
Pistoia Alliance Inc (2007)
We are a global, not-for-profit members’ organization working to lower barriers to innovation in life science and healthcare R&D through pre-competitive collaboration.
Consultant
How Pharmaceutical Industry can benefit from quantum computing
9 of 10 big Pharma are members of the Pistoia Alliance
IP created on specifications
Zahid Tharia
Pistoia Alliance Inc
Consultant
Barriers to adoption of quantum computing (QC) in Pharma is training of staff and skills in the IT aspects of QC
3:10 PM – 4:00 PM EDT on Wednesday, October 7
In 2019, major life sciences companies mobilized to form a pre-competitive, collaborative quantum computing working group (QuPharm) and delineate a framework and approach to accelerate realizing the potential of quantum acceleration in drug discovery. Learn from industry thought leaders on how to valuate and map problems into quantum algorithms, set up organizations to enable and scale quantum computing pilots and establish effective cross-industry, tech, and start-up collaborations.
in UK 6 Labs for the entire countries: all send the data to Wellcome Sanger Institute for analysis
Metadata is the problem – coordination of each of the 6 labs to send the metadata created problems
Cindy Crowninshield
Cambridge Healthtech Institute
Executive Event Director
Vivien Bonazzi
Deloitte Consulting LLP
Managing Dir & Chief Biomedical Data Scientist
How organizations use bioscience data
Data Ecosystem: Hardware and software: Cloud and other options
Operationalize the two trends:
Platforms: End to end solutions resulting in SILOS, systems are native: data ingestions
Data Commons: Open arch, open source – integration and interdependence issues
Biomedical Agencies in NIH various Organizations in the Private sector: Sharing data must be more effective
IT, Data Science, Management – COVID – reduced barriers
Leadership: Different voices from different people
Data strategies & Governance not the whole but small pieces , incentives to share data
Chris Dagdigian
BioTeam Inc
Sr Dir
10th Anniversary to Trends from the Trenches
IT infrastructure changes
Research IT:
Genomics & BioInformatics
Image-based data acquisition and analysis: CryoEM, 3D microscopy, fMRI image analysis
ML and AI – GPU FPGAs, neural processors: Drive in organizations: bottom up
Chemistry & Molecular Dynamics
Storage and exploitation of data for insights
2020 Hype vs Reality
Scientific Data: managing and understanding, data movement, federated/access
Big Data: data storage, management & governance standards vs human curated data
IT needs guidance and decisions from Science Team
Culture change for joint management by Science & IT: data fidelity, attribution, allocation top down
NERSC File System quotas & Purging overviewSilos & So
Petabytes of open access data, collaborative research resources: Data rich environments
Data Lakes: Gen3 Data Commons
Data hygiene:metadata is Science side vs IT
Biased Data: Model & Data Bias
Failed Predictions:
Compilers matter again – not True
CPU benchmarking is back – WRONG
AMD vs Inter arm64 vs both
Policy driven auto-tiering storage – wrong, USER self-service for tiering, movement and archive decision. Let researchers tier/move/archive based on Project, Experiment or Group
Single storage namespace – Wrong: Data intensive science: scientists must do some IT jobs themselves
Kjiersten Fagnan
Lawrence Berkeley Natl Lab
CIO
Genome Project of DOE
Data management with other agencies
COVID: Collaborations, breaking down barriers, small labs and big labs ALL generate data and sharing
that collaboration is needed regardless of COVID – not happen
If twoo big one lab can’t handle it all
Funding and training does not support the Collaborations because next round of funding depend on individual publications – which requires silos
Data cleaning and data management:Standards are annoying and painful – not needed for publishing the results as soon as possible – just that someone else will be able to use it
Facebook have hundred of curators – the curation of scientific data requires same hunsrands od curators that are SCIENTISTS and Data scientists
Matthew Trunnell
Pandemic Response Commons, Seattle
VP & Chief Data Officer
Data commons for intra- and inter-mural data sharing
ML is needed for Data commons
Progress in FAIRness, NIH efforts driven by Susan Gregory across NIH all centers
Large amount of B-to-B Data sharing UBER sharing with a jurisdiction they operate
SNOWFLAKES – new cloud technology
COVID – plays an accelerator
Cancer vs COVID – transfer knowledge from COVID to Cancer
9:00 AM – 10:40 AM EDT on Thursday, October 8
The “Trends from the Trenches” will celebrate its 10th Anniversary at Bio-IT! Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, and cloud that are involved in supporting data-intensive science. In 2020, Chris will give the “Trends from the Trenches” presentation in its original “state-of-the-state address” followed by guest speakers giving podium talks on relevant topics. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session.
Project vs enterprise – Sequencing for internal research vs for clients’ data
Tension in governmental agencies – no robust solutions: IT, Science, Management
different Use cases need different infrastructure: HW & SW: Storage and data exploration
Data Lakes: rule base, enterprising – training is an issue in organizations
Management, Scientists, IT in enterprises – terra byte of storage, budgets issues, conversation on the limits that IT can ofer putting more burden on the Scientists for triage and quotas – business and scientific value
New capabilities in organizations: hands on in data management tactical of data management not IT bur data engineering
Citizen Science: privacy vs plants and microbes – no privacy issues
Incentives need be changed for Data Citations in addition to Papers
Curation Citations as Authorship citation
Data sharing in Cancer: GEN3 – NCI Data Commons, Data Governance and Data Permission (Access) – NCI does work in data commons – much data outside this space
EBI – in UK Sanger Institute has the infrastructure in one place
Migrating Project based Data structure: that involves scientist decisions that should not be a quota (storage is full) in the IT space
Human to Human communications vs tools for data migration
Which Organizations get the data curation and annotation well: Subject matter from day 1 – hard to teach vs data engineering skills; TEAM as a solving is critical in Biomedical space no incentives
BBC – Meta tagging system is outstanding
NCAST TRANSLATOR – across organizations
Changing incentives – MORE organizations will do that task better
Common metadata across domains with predict uses of data in the Future – collaboration of CS to create in the science organization tagging like in BBC
NCI – Cancer Data Commons – concierge services to organization on data services
Ravi Madduri – CVD large cohort
Univ of Chicago
Scientist
Lara Mangravite
Sage Bionetworks
President
Kees Van Bochove
The Hyve
Founder & Owner
11:10 AM – 11:30 AM EDT on Thursday, October 8
BREAKOUT: Driving Scientific Discovery with Data / Digitization
Timothy Gardner
Riffyn Inc
CEO
11:35 AM – 12:00 PM EDT on Thursday, October 8
PLENARY KEYNOTE – 12:00 PM – 1:25 PM EDT on Thursday, October 8
Robert Green
Brigham & Womens Hospital
Co-founder of Genome Medicine
Prof & Dir G2P Research
Combining data to rapidly analyze COVID-19 Patients –
identify BIOMARKERS for vulnerability
Preventive Genomics – Angelina Jolly’s musectomy as a preventive clinical condition
Patients access to own genomics data
Population screening – to predict risks
Genetic Testing to Consumer: Preventive Genomics: conflated genotyping/sequencing and labs/care providers
Genetic Testing to Consumer: COST & Benefits – UNCLEAR
diagnosis of unsuspected genetic disease
stratification for surveillance
which pieces of the puzzle need to be brought to bear in patient care
Categories and Reporting criteria: Gene-Disease validity vs Variant Pathogenicity –>> Clinic
MedSeq Project: 10MM randomized study – all genome info shared with Patient, other arm only selective genome data shared with patient: 100 patients 20% carrymonogenic condition: Polygenic risk scores:
CAD – high Cholesterol biomarker, A-FIb, DM2, 52% Women 48% Men
No high risk error by PCP discussing and disclosing the results of the sequence
Filtering the results: Indication -based testing vs Screening
BabySeq Project: INFANTS sequencing to prevent disease: 11% carry a mutation in a monogenic gene for a monogenic condition -like abnormal narrowed aorta
MDR – Monogenic Disease Risk
MilSeq Project: US Air Force – Military active duty
5,8,10 – are all Polygenic studies
Polygenic Risk Scores – High risk
Classification need to be repeated every few years (2 years – re-sequence) due to changes in health and to efficiencies in new discovery in curated data which is improving as on-going
Vaccine preventable diseases – produce 1Billion vaccines a year
reduction of incidence: Pertusis – 92% eradication
manage risk profile
Science mechanism translatable to machines
high automated ingestible data for AI
Digital is about people: Good data Good algorithms Good GUI
Vivien Bonazzi
Deloitte Consulting LLP
Managing Dir & Chief Biomedical Data Scientist
12:00 PM – 1:25 PM EDT on Thursday, October 8 Add to Calendar
12:00 Organizer’s Remarks
Cindy Crowninshield, RDN, LDN, Executive Event Director, Cambridge Healthtech Institute
12:05 Keynote Introduction
Juergen A. Klenk, PhD, Principal, Deloitte Consulting LLP
12:15 Toward Preventive Genomics: Lessons from MedSeq and BabySeq
Robert Green, MD, MPH, Professor of Medicine (Genetics) and Director, G2P Research Program/Preventive Genomics Clinic, Brigham & Women’s Hospital, Broad Institute, and Harvard Medical School
12:40 AI in Pharma: Where We Are Today and How We Will Succeed in the Future
Natalija Jovanovic, PhD, Chief Digital Officer, Sanofi Pasteur
1:05 LIVE Q&A: Session Wrap-Up Panel Discussion
PANEL MODERATORS:
Juergen A. Klenk, PhD, Principal, Deloitte Consulting LLP
Vivien R. Bonazzi, PhD, Managing Director & Chief Biomedical Data Scientist, Deloitte Consulting LLP
Rachana Ananthakrishnan, Executive Director, Globus, University of Chicago
Michael A. Cianfrocco, PhD, Assistant Professor, Department of Biological Chemistry and Research Assistant Professor, Life Sciences Institute, University of Michigan
Brigitte E. Raumann, Product Manager, Globus, University of Chicago
3. Connect with peers from across the industry during these dedicated networking times.
Looking to meet fellow attendees and have meaningful conversations – just as you would at an in- person event? This is the perfect way to achieve just that. Get to know your fellow attendees by joining this interactive speed networking event. To participate, each attendee will be paired at random with another fellow attendee and given a chance to interact for 7 minutes in a private zoom room. Once the 7 minutes are up, you will move on to meet with another selected attendee. Maximize your networking at the meeting and join in.
Take a minute to revitalize and join our friends from VOS Fitness for a stretch break. The professional trainer from VOS will bring you through some easy moves that will help with screen fatigue and ease your muscles after a long day of sitting at the computer. All moves can be done right at your desk and is appropriate for all fitness levels.
Earn points by completing the activities listed on our Game tab. Some activities will only award points once, but others will award you every time you do it – so the more involved you are in the virtual event, the more points you will earn! You can start earning points one week before the event – so get ready to start sending meeting invitations, exploring our virtual expo and planning your schedule.
Attendees in the top 5% of points earned when the game closes at the end of the conference will be eligible to win a gift card worth $200 USD!
5. Take part in 1-on-1 networking with an easy-to-navigate profile search and scheduling platform.
Check out your recommended connections flagged as “Want to Meet” in the People Tab. These connections were chosen based on your similar roles, companies and conference program interests.
Take a moment to add relevant interest tags to your profile. Then search and connect with participants who have the same interests.
Engage with technology leaders in their booths and view relevant videos and demos.
Take part in live Q&A with speakers and participants following each educational session.
Create and join in ad hoc group discussions throughout the event.
In the spirit of open collaboration, the world’s premier bio-IT conference will bring together the community to focus on how we are using technologies and analytic approaches to solve problems, accelerate science, and drive the future of precision medicine. With a focus on AI, data science and other “data-driven” technologies that are advancing biomedical research, drug discovery and healthcare, the Bio-IT World Conference & Expo ’20 will bring together more than 3,000 participants to the Seaport World Trade Center in Boston from October 6-8, 2020.
The participants will have the chance to meet and share research/ideas with leading life sciences, pharmaceutical, clinical, healthcare, informatics and technology experts.
TRACK 3 Data Science and Analytics Technologies VIEW
TRACK 4 Software Applications and Services VIEW
TRACK 5 Data Security and Compliance VIEW
TRACK 6 Cloud Computing VIEW
TRACK 7 AI for Drug Discovery VIEW
TRACK 8 Emerging AI Technologies VIEW
TRACK 9 AI: Business Value Outcomes VIEW
TRACK 10 Data Visualization Tools VIEW
TRACK 11 Bioinformatics VIEW
TRACK 12 Pharmaceutical R&D Informatics VIEW
TRACK 13 Genome Informatics VIEW
TRACK 14 Clinical Research and Translational Informatics VIEW
TRACK 15 Cancer Informatics VIEW
TRACK 16 Open Access and Collaborations
2020 Plenary Keynote Speakers
Rebecca Baker, PhD
Director, HEAL (Helping to End Addiction Long-term) Initiative, Office of the Director, National Institutes of Health
Vivien Bonazzi, PhD
Chief Biomedical Data Scientist, Managing Director, Deloitte
Tim Cutts, PhD
Head, Scientific Computing, Wellcome Trust Sanger Institute
Chris Dagdigian
Co-Founder and Senior Director, Infrastructure, BioTeam, Inc
Kevin Davies, PhD
Executive Editor, The CRISPR Journal, Mary Ann Liebert, Inc.
Kjiersten Fagnan, PhD
Chief Informatics Officer, Data Science and Informatics Leader, DOE Joint Genome Institute, Lawrence Berkeley National Laboratory
Robert Green, MD, MPH
Professor of Medicine (Genetics) and Director, G2P Research Program/Preventive Genomics Clinic, Brigham & Women’s Hospital, Broad Institute, and Harvard Medical School
Susan K. Gregurick, PhD
Associate Director, Data Science (ADDS) and Director, Office of Data Science Strategy (ODSS), National Institutes of Health
Natalija Jovanovic, PhD
Chief Digital Officer, Sanofi Pasteur
Pietro Michelucci, PhD
Director, Human Computation Institute
Matthew Trunnell
Vice President and Chief Data Officer, Fred Hutchinson Cancer Research Center
Group of Researchers @ University of California, Riverside, the University of Chicago, the U.S. Department of Energy’s Argonne National Laboratory, and Northwestern University solve COVID-19 Structure and Map Potential Therapeutics
Reporters: Stephen J Williams, PhD and Aviva Lev-Ari, PhD, RN
This illustration, created at the Centers for Disease Control and Prevention (CDC), reveals ultrastructural morphology exhibited by coronaviruses. Note the spikes that adorn the outer surface of the virus, which impart the look of a corona surrounding the virion, when viewed electron microscopically. A novel coronavirus virus was identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China in 2019.
Image of newly mapped coronavirus protein, called Nsp15, which helps the virus replicate.
How UC is responding to the coronavirus (COVID-19)
The University of California is vigilantly monitoring and responding to new information about the coronavirus (COVID-19) outbreak, which has been declared a global health emergency.
The 3-D structure of a potential drug target in a newly mapped protein of COVID-19, or coronavirus, has been solved by a team of researchers from the University of California, Riverside, the University of Chicago, the U.S. Department of Energy’s Argonne National Laboratory, and Northwestern University.
The scientists said their findings suggest drugs previously developed to treat the earlier SARS outbreak could now be developed as effective drugs against COVID-19.
The initial genome analysis and design of constructs for protein synthesis were performed by the bioinformatic group of Adam Godzik, a professor of biomedical sciences at the UC Riverside School of Medicine.
The protein Nsp15 from Severe Acute Respiratory Syndrome Coronavirus 2, or SARS-CoV-2, is 89% identical to the protein from the earlier outbreak of SARS-CoV. SARS-CoV-2 is responsible for the current outbreak of COVID-19. Studies published in 2010 on SARS-CoV revealed inhibition of Nsp15 can slow viral replication.This suggests drugs designed to target Nsp15 could be developed as effective drugs against COVID-19.
Adam Godzik, UC Riverside professor of biomedical sciences Credit: Sanford Burnham Prebys Medical Discovery Institute
“While the SARS-CoV-19 virus is very similar to the SARS virus that caused epidemics in 2003, new structures shed light on the small, but potentially important differences between the two viruses that contribute to the different patterns in the spread and severity of the diseases they cause,” Godzik said.
The structure of Nsp15, which will be released to the scientific community on March 4, was solved by the group of Andrzej Joachimiak, a distinguished fellow at the Argonne National Laboratory, University of Chicago Professor, and Director of the Structural Biology Center at Argonne’s Advanced Photon Source, a Department of Energy Office of Science user facility.
“Nsp15 is conserved among coronaviruses and is essential in their lifecycle and virulence,” Joachimiak said. “Initially, Nsp15 was thought to directly participate in viral replication, but more recently, it was proposed to help the virus replicate possibly by interfering with the host’s immune response.”
Mapping a 3D protein structure of the virus, also called solving the structure, allows scientists to figure out how to interfere in the pathogen’s replication in human cells.
“The Nsp15 protein has been investigated in SARS as a novel target for new drug development, but that never went very far because the SARS epidemic went away, and all new drug development ended,” said Karla Satchell, a professor of microbiology-immunology at Northwestern, who leads the international team of scientists investigating the structure of the SARS CoV-2 virus to understand how to stop it from replicating. “Some inhibitors were identified but never developed into drugs. The inhibitors that were developed for SARS now could be tested against this protein.”
Rapid upsurge and proliferation of SARS-CoV-2 raised questions about how this virus could become so much more transmissible as compared to the SARS and MERS coronaviruses. The scientists are mapping the proteins to address this issue.
Over the past two months, COVID-19 infected more than 80,000 people and caused at least 2,700 deaths. Although currently mainly concentrated in China, the virus is spreading worldwide and has been found in 46 countries. Millions of people are being quarantined, and the epidemic has impacted the world economy. There is no existing drug for this disease, but various treatment options, such as utilizing medicines effective in other viral ailments, are being attempted.
Godzik, Satchell, and Joachimiak — along with the entire center team — will map the structure of some of the 28 proteins in the virus in order to see where drugs can throw a chemical monkey wrench into its machinery. The proteins are folded globular structures with precisely defined functions and their “active sites” can be targeted with chemical compounds.
The first step is to clone and express the genes of the virus proteins and grow them as protein crystals in miniature ice cube-like trays. The consortium includes nine labs across eight institutions that will participate in this effort.
Above is a modified version of the Northwestern University news release written by Marla Paul.
Science 07 Jun 2019:
Vol. 364, Issue 6444, pp. 941-942
DOI: 10.1126/science.aaw8299
Precision medicine is at a crossroads. Progress toward its central goal, to address persistent health inequities, will depend on enrolling populations in research that have been historically underrepresented, thus eliminating longstanding exclusions from such research (1). Yet the history of ethical violations related to protocols for inclusion in biomedical research, as well as the continued misuse of research results (such as white nationalists looking to genetic ancestry to support claims of racial superiority), continue to engender mistrust among these populations (2). For precision medicine research (PMR) to achieve its goal, all people must believe that there is value in providing information about themselves and their families, and that their participation will translate into equitable distribution of benefits. This requires an ethics of inclusion that considers what constitutes inclusive practices in PMR, what goals and values are being furthered through efforts to enhance diversity, and who participates in adjudicating these questions. The early stages of PMR offer a critical window in which to intervene before research practices and their consequences become locked in (3).
Initiatives such as the All of Us program have set out to collect and analyze health information and biological samples from millions of people (1). At the same time, questions of trust in biomedical research persist. For example, although the recent assertions of white nationalists were eventually denounced by the American Society of Human Genetics (4), the misuse of ancestry testing may have already undermined public trust in genetic research.
There are also infamous failures in research that included historically underrepresented groups, including practices of deceit, as in the Tuskegee Syphilis Study, or the misuse of samples, as with the Havasupai tribe (5). Many people who are being asked to give their data and samples for PMR must not only reconcile such past research abuses, but also weigh future risks of potential misuse of their data.
To help assuage these concerns, ongoing PMR studies should open themselves up to research, conducted by social scientists and ethicists, that examines how their approaches enhance diversity and inclusion. Empirical studies are needed to account for how diversity is conceptualized and how goals of inclusion are operationalized throughout the life course of PMR studies. This is not limited to selection and recruitment of populations but extends to efforts to engage participants and communities, through data collection and measurement, and interpretations and applications of study findings. A commitment to transparency is an important step toward cultivating public trust in PMR’s mission and practices.
From Inclusion to Inclusive
The lack of diverse representation in precision medicine and other biomedical research is a well-known problem. For example, rare genetic variants may be overlooked—or their association with common, complex diseases can be misinterpreted—as a result of sampling bias in genetics research (6). Concentrating research efforts on samples with largely European ancestry has limited the ability of scientists to make generalizable inferences about the relationships among genes, lifestyle, environmental exposures, and disease risks, and thereby threatens the equitable translation of PMR for broad public health benefit (7).
However, recruiting for diverse research participation alone is not enough. As with any push for “diversity,” related questions arise about how to describe, define, measure, compare, and explain inferred similarities and differences among individuals and groups (8). In the face of ambivalence about how to represent population variation, there is ample evidence that researchers resort to using definitions of diversity that are heterogeneous, inconsistent, and sometimes competing (9). Varying approaches are not inherently problematic; depending on the scientific question, some measures may be more theoretically justified than others and, in many cases, a combination of measures can be leveraged to offer greater insight (10). For example, studies have shown that American adults who do not self-identify as white report better mental and physical health if they think others perceive them as white (11, 12).
The benefit of using multiple measures of race and ancestry also extends to genetic studies. In a study of hypertension in Puerto Rico, not only did classifications based on skin color and socioeconomic status better predict blood pressure than genetic ancestry, the inclusion of these sociocultural measures also revealed an association between a genetic polymorphism and hypertension that was otherwise hidden (13). Thus, practices that allow for a diversity of measurement approaches, when accompanied by a commitment to transparency about the rationales for chosen approaches, are likely to benefit PMR research more than striving for a single gold standard that would apply across all studies. These definitional and measurement issues are not merely semantic. They also are socially consequential to broader perceptions of PMR research and the potential to achieve its goals of inclusion.
Study Practices, Improve Outcomes
Given the uncertainty and complexities of the current, early phase of PMR, the time is ripe for empirical studies that enable assessment and modulation of research practices and scientific priorities in light of their social and ethical implications. Studying ongoing scientific practices in real time can help to anticipate unintended consequences that would limit researchers’ ability to meet diversity recruitment goals, address both social and biological causes of health disparities, and distribute the benefits of PMR equitably. We suggest at least two areas for empirical attention and potential intervention.
First, we need to understand how “upstream” decisions about how to characterize study populations and exposures influence “downstream” research findings of what are deemed causal factors. For example, when precision medicine researchers rely on self-identification with U.S. Census categories to characterize race and ethnicity, this tends to circumscribe their investigation of potential gene-environment interactions that may affect health. The convenience and routine nature of Census categories seemed to lead scientists to infer that the reasons for differences among groups were self-evident and required no additional exploration (9). The ripple effects of initial study design decisions go beyond issues of recruitment to shape other facets of research across the life course of a project, from community engagement and the return of results to the interpretation of study findings for human health.
Second, PMR studies are situated within an ecosystem of funding agencies, regulatory bodies, disciplines, and other scholars. This partly explains the use of varied terminology, different conceptual understandings and interpretations of research questions, and heterogeneous goals for inclusion. It also makes it important to explore how expectations related to funding and regulation influence research definitions of diversity and benchmarks for inclusion.
For example, who defines a diverse study population, and how might those definitions vary across different institutional actors? Who determines the metrics that constitute successful inclusion, and why? Within a research consortium, how are expectations for data sharing and harmonization reconciled with individual studies’ goals for recruitment and analysis? In complex research fields that include multiple investigators, organizations, and agendas, how are heterogeneous, perhaps even competing, priorities negotiated? To date, no studies have addressed these questions or investigated how decisions facilitate, or compromise, goals of diversity and inclusion.
The life course of individual studies and the ecosystems in which they reside cannot be easily separated and therefore must be studied in parallel to understand how meanings of diversity are shaped and how goals of inclusion are pursued. Empirically “studying the studies” will also be instrumental in creating mechanisms for transparency about how PMR is conducted and how trade-offs among competing goals are resolved. Establishing open lines of inquiry that study upstream practices may allow researchers to anticipate and address downstream decisions about how results can be interpreted and should be communicated, with a particular eye toward the consequences for communities recruited to augment diversity. Understanding how scientists negotiate the challenges and barriers to achieving diversity that go beyond fulfilling recruitment numbers is a critical step toward promoting meaningful inclusion in PMR.
Transparent Reflection, Cultivation of Trust
Emerging research on public perceptions of PMR suggests that although there is general support, questions of trust loom large. What we learn from studies that examine on-the-ground approaches aimed at enhancing diversity and inclusion, and how the research community reflects and responds with improvements in practices as needed, will play a key role in building a culture of openness that is critical for cultivating public trust.
Cultivating long-term, trusting relationships with participants underrepresented in biomedical research has been linked to a broad range of research practices. Some of these include the willingness of researchers to (i) address the effect of history and experience on marginalized groups’ trust in researchers and clinicians; (ii) engage concerns about potential group harms and risks of stigmatization and discrimination; (iii) develop relationships with participants and communities that are characterized by transparency, clear communication, and mutual commitment; and (iv) integrate participants’ values and expectations of responsible oversight beyond initial informed consent (14). These findings underscore the importance of multidisciplinary teams that include social scientists, ethicists, and policy-makers, who can identify and help to implement practices that respect the histories and concerns of diverse publics.
A commitment to an ethics of inclusion begins with a recognition that risks from the misuse of genetic and biomedical research are unevenly distributed. History makes plain that a multitude of research practices ranging from unnecessarily limited study populations and taken-for-granted data collection procedures to analytic and interpretive missteps can unintentionally bolster claims of racial superiority or inferiority and provoke group harm (15). Sustained commitment to transparency about the goals, limits, and potential uses of research is key to further cultivating trust and building long-term research relationships with populations underrepresented in biomedical studies.
As calls for increasing diversity and inclusion in PMR grow, funding and organizational pathways must be developed that integrate empirical studies of scientific practices and their rationales to determine how goals of inclusion and equity are being addressed and to identify where reform is required. In-depth, multidisciplinary empirical investigations of how diversity is defined, operationalized, and implemented can provide important insights and lessons learned for guiding emerging science, and in so doing, meet our ethical obligations to ensure transparency and meaningful inclusion.