Feeds:
Posts
Comments

Posts Tagged ‘AAAS’

The drug efflux pump MDR1 promotes intrinsic and acquired resistance to PROTACs in cancer cells

Reporter: Stephen J. Williams, PhD.
Below is one of the first reports  on the potential mechanisms of intrinsic and acquired resistance to PROTAC therapy in cancer cells.
Proteolysis-targeting chimeras (PROTACs) are a promising new class of drugs that selectively degrade cellular proteins of interest. PROTACs that target oncogene products are avidly being explored for cancer therapies, and several are currently in clinical trials. Drug resistance is a substantial challenge in clinical oncology, and resistance to PROTACs has been reported in several cancer cell models. Here, using proteomic analysis, we found intrinsic and acquired resistance mechanisms to PROTACs in cancer cell lines mediated by greater abundance or production of the drug efflux pump MDR1. PROTAC-resistant cells were resensitized to PROTACs by genetic ablation of ABCB1 (which encodes MDR1) or by coadministration of MDR1 inhibitors. In MDR1-overexpressing colorectal cancer cells, degraders targeting either the kinases MEK1/2 or the oncogenic mutant GTPase KRASG12C synergized with the dual epidermal growth factor receptor (EGFR/ErbB)/MDR1 inhibitor lapatinib. Moreover, compared with single-agent therapies, combining MEK1/2 degraders with lapatinib improved growth inhibition of MDR1-overexpressing KRAS-mutant colorectal cancer xenografts in mice. Together, our findings suggest that concurrent blockade of MDR1 will likely be required with PROTACs to achieve durable protein degradation and therapeutic response in cancer.

INTRODUCTION

Proteolysis-targeting chimeras (PROTACs) have emerged as a revolutionary new class of drugs that use cancer cells’ own protein destruction machinery to selectively degrade essential tumor drivers (1). PROTACs are small molecules with two functional ends, wherein one end binds to the protein of interest, whereas the other binds to an E3 ubiquitin ligase (23), bringing the ubiquitin ligase to the target protein, leading to its ubiquitination and subsequent degradation by the proteasome. PROTACs have enabled the development of drugs against previously “undruggable” targets and require neither catalytic activity nor high-affinity target binding to achieve target degradation (4). In addition, low doses of PROTACs can be highly effective at inducing degradation, which can reduce off-target toxicity associated with high dosing of traditional inhibitors (3). PROTACs have been developed for a variety of cancer targets, including oncogenic kinases (5), epigenetic proteins (6), and, recently, KRASG12C proteins (7). PROTACs targeting the androgen receptor or estrogen receptor are avidly being evaluated in clinical trials for prostate cancer (NCT03888612) or breast cancer (NCT04072952), respectively.
However, PROTACs may not escape the overwhelming challenge of drug resistance that befalls so many cancer therapies (8). Resistance to PROTACs in cultured cells has been shown to involve genomic alterations in their E3 ligase targets, such as decreased expression of Cereblon (CRBN), Von Hippel Lindau (VHL), or Cullin2 (CUL2) (911). Up-regulation of the drug efflux pump encoded by ABCB1—MDR1 (multidrug resistance 1), a member of the superfamily of adenosine 5′-triphosphate (ATP)–binding cassette (ABC) transporters—has been shown to convey drug resistance to many anticancer drugs, including chemotherapy agents, kinase inhibitors, and other targeted agents (12). Recently, PROTACs were shown to be substrates for MDR1 (1013), suggesting that drug efflux represents a potential limitation for degrader therapies. Here, using degraders (PROTACs) against bromodomain and extraterminal (BET) bromodomain (BBD) proteins and cyclin-dependent kinase 9 (CDK9) as a proof of concept, we applied proteomics to define acquired resistance mechanisms to PROTAC therapies in cancer cells after chronic exposure. Our study reveals a role for the drug efflux pump MDR1 in both acquired and intrinsic resistance to protein degraders in cancer cells and supports combination therapies involving PROTACs and MDR1 inhibitors to achieve durable protein degradation and therapeutic responses.

Fig. 1. Proteomic characterization of degrader-resistant cancer cell lines.
(A) Workflow for identifying protein targets up-regulated in degrader-resistant cancer cells. Single-run proteome analysis was performed, and changes in protein levels among parent and resistant cells were determined by LFQ. m/z, mass/charge ratio. (B and C) Cell viability assessed by CellTiter-Glo in parental and dBET6- or Thal SNS 032–resistant A1847 cells treated with increasing doses of dBET6 (B) or Thal SNS 032 (C) for 5 days. Data were analyzed as % of DMSO control, presented as means ± SD of three independent assays. Growth inhibitory 50% (GI50) values were determined using Prism software. (D to G) Immunoblotting for degrader targets and downstream signaling in parental A1847 cells and their derivative dBET6-R or Thal-R cells treated with increasing doses of dBET6 or Thal SNS 032 for 4 hours. The dBET6-R and Thal-R cells were continuously cultured in 500 nM PROTAC. Blots are representative, and densitometric analyses are means ± SD from three blots, each normalized to the loading control, GAPDH. DC50 values, quantitating either (E) the dose of dBET6 that reduces BRD2, BRD3, or BRD4 or (G) the dose of Thal SNS 032 that reduces CDK9 protein levels 50% of the DMSO control treatment, were determined with Prism software. Pol II, polymerase II. (H to K) Volcano plot of proteins with increased or reduced abundance in dBET6-R (H) or Thal-R (I) A1847 cells relative to parental cells. Differences in protein log2 LFQ intensities among degrader-resistant and parental cells were determined by paired t test permutation-based adjusted P values at FDR of <0.05 using Perseus software. The top 10 up-regulated proteins in each are shown in (J) and (K), respectively. FC, fold change. (L and M) ABCB1 log2 LFQ values in dBET6-R cells from (H) and Thal-R cells from (I) compared with those in parental A1847 cells. Data are presented as means ± SD from three independent assays. By paired t test permutation-based adjusted P values at FDR of <0.05 using Perseus software, ***P ≤ 0.001. (N) Cell viability assessed by CellTiter-Glo in parental and MZ1-resistant SUM159 cells treated with increasing doses of MZ1 for 5 days. Data were analyzed as % of DMSO control, presented as means of three independent assays. GI50 values were determined using Prism software. (O and P) Immunoblotting for degrader targets and downstream signaling in parental or MZ1-R SUM159 cells treated with increasing doses of MZ1 for 24 hours. The MZ1-R cells were continuously cultured in 500 nM MZ1. Blots are representative, and densitometric analyses are means ± SD from three blots, each normalized to the loading control, GAPDH. DC50 values were determined in Prism software. (Q and R) Top 10 up-regulated proteins (Q) and ABCB1 log2 LFQ values (R) in MZ1-R cells relative to parental SUM159 cells

Fig. 2. Chronic exposure to degraders induces MDR1 expression and drug efflux activity.
(A) ABCB1 mRNA levels in parental and degrader-resistant cell lines as determined by qRT-PCR. Data are means ± SD of three independent experiments. ***P ≤ 0.001 by Student’s t test. (B) Immunoblot analysis of MDR1 protein levels in parental and degrader-resistant cell lines. Blots are representative of three independent experiments. (C to E) Immunofluorescence (“IF”) microscopy of MDR1 protein levels in A1847 dBET6-R (C), SUM159 MZ1-R (D), and Thal-R A1847 cells (E) relative to parental cells. Nuclear staining by DAPI. Images are representative of three independent experiments. Scale bars, 100 μm. (F) Drug efflux activity in A1847 dBET6-R, SUM159 MZ1-R, and Thal-R A1847 cells relative to parental cells (Par.) using rhodamine 123 efflux assays. Bars are means ± SD of three independent experiments. ***P ≤ 0.001 by Student’s t test. (G) Intracellular dBET6 levels in parental or dBET-R A1847 cells transfected with a CRBN sensor and treated with increasing concentrations of dBET6. Intracellular dBET6 levels measured using the CRBN NanoBRET target engagement assay. Data were analyzed as % of DMSO control, presented as means ± SD of three independent assays. *P ≤ 0.05, **P ≤ 0.01, and ***P ≤ 0.001 by Student’s t test. (H and I) FISH analysis of representative drug-sensitive parental and drug-resistant A1847 (H) and SUM159 (I) cells using ABCB1 and control XCE 7 centromere probes. Images of interphase nuclei were captured with a Metasystems Metafer microscope workstation, and the raw images were extracted and processed to depict ABCB1 signals in magenta, centromere 7 signals in cyan, and DAPI-stained nuclei in blue. (J and K) CpG methylation status of the ABCB1 downstream promoter (coordinates: chr7.87,600,166-87,601,336) by bisulfite amplicon sequencing in parent and degrader-resistant A1847 (J) and SUM159 (K) cells. Images depict the averaged percentage of methylation for each region of the promoter, where methylation status is depicted by color as follows: red, methylated; blue, unmethylated. Schematic of the ABCB1 gene with the location of individual CpG sites is shown. Graphs are representative of three independent experiments. (L and M) Immunoblot analysis of MDR1 protein levels after short-term exposure [for hours (h) or days (d) as indicated] to BET protein degraders dBET6 or MZ1 (100 nM) in A1847 (L) and SUM159 (M) cells, respectively. Blots are representative of three independent experiments. (N to P) Immunoblot analysis of MDR1 protein levels in A1847 and SUM159 cells after long-term exposure (7 to 30 days) to BET protein degraders dBET6 (N), Thal SNS 032 (O), or MZ1 (P), each at 500 nM. Blots are representative of three independent experiments. (Q and R) Immunoblot analysis of MDR1 protein levels in degrader-resistant A1847 (Q) and SUM159 (R) cells after PROTAC removal for 2 or 7 days. Blots are representative of three independent experiments.

 

Fig. 3. Blockade of MDR1 activity resensitizes degrader-resistant cells to PROTACs.
(A and B) Cell viability by CellTiter-Glo assay in parental and degrader-resistant A1847 (A) and SUM159 (B) cells transfected with control siRNA or siRNAs targeting ABCB1 and cultured for 120 hours. Data were analyzed as % of control, presented as means ± SD of three independent assays. ***P ≤ 0.001 by Student’s t test. (C and D) Immunoblot analysis of degrader targets after ABCB1 knockdown in parental and degrader-resistant A1847 (C) and SUM159 (D) cells. Blots are representative, and densitometric analyses using ImageJ are means ± SD of three blots, each normalized to the loading control, GAPDH. (E) Drug efflux activity, using the rhodamine 123 efflux assay, in degrader-resistant cells after MDR1 inhibition by tariquidar (0.1 μM). Data are means ± SD of three independent experiments. ***P ≤ 0.001 by Student’s t test. (F to H) Cell viability by CellTiter-Glo assay in parental and dBET6-R (F) or Thal-R (G) A1847 cells or MZ1-R SUM159 cells (H) treated with increasing concentrations of tariquidar. Data are % of DMSO control, presented as means ± SD of three independent assays. GI50 value determined with Prism software. (I to K) Immunoblot analysis of degrader targets after MDR1 inhibition (tariquidar, 0.1 μM for 24 hours) in parental and degrader-resistant A1847 cells (I and J) and SUM159 cells (K). Blots are representative, and densitometric analyses are means ± SD from three blots, each normalized to the loading control, GAPDH. (L and M) A 14-day colony formation assessed by crystal violet staining of (L) A1847 cells or (M) SUM159 cells treated with degrader (0.1 μM; dBET6 or MZ1, respectively) and MDR1 inhibitor tariquidar (0.1 μM). Images are representative of three biological replicates. (N) Immunoblotting for MDR1 in SUM159 cells stably expressing FLAG-MDR1 after selection with hygromycin. (O) Long-term 14-day colony formation assay of SUM159 cells expressing FLAG-MDR1 that were treated with DMSO, MZ1 (0.1 μM), or MZ1 and tariquidar (0.1 μM) for 14 days, assessed by crystal violet staining. Representative images of three biological replicates are shown. (P and Q) RT-PCR (P) and immunoblot (Q) analysis of ABCB1 mRNA and MDR1 protein levels, respectively, in parental or MZ1-R HCT116, OVCAR3, and MOLT4 cells.

 

Fig. 4. Overexpression of MDR1 conveys intrinsic resistance to degrader therapies in cancer cells.
(A) Frequency of ABCB1 mRNA overexpression in a panel of cancer cell lines, obtained from cBioPortal for Cancer Genomics using Z-score values of >1.2 for ABCB1 mRNA levels (30). (B) Immunoblot for MDR1 protein levels in a panel of 10 cancer cell lines. Blots are representative of three independent experiments. (C) Cell viability by CellTiter-Glo assay in cancer cell lines expressing high or low MDR1 protein levels and treated with Thal SNS 032 for 5 days. Data were analyzed as % of DMSO control, presented as means ± SD of three independent assays. GI50 values were determined with Prism software. (D to F) Immunoblot analysis of CDK9 in MDR1-low (D) or MDR1-high (E) cell lines after Thal SNS 032 treatment for 4 hours. Blots are representative, and densitometric analyses using ImageJ are means ± SD from three blots, each normalized to the loading control, GAPDH. DC50 value determined with Prism. (G and H) Immunoblotting of control and MDR1-knockdown DLD-1 cells treated for 4 hours with increasing concentrations of Thal SNS 032 [indicated in (H)]. Blots are representative, and densitometric analysis data are means ± SD from three blots, each normalized to the loading control, GAPDH. DC50 value determined with Prism. (I) Drug efflux activity using rhodamine 123 efflux assays in DLD-1 cells treated with DMSO or 0.1 μM tariquidar. Data are means ± SD of three independent experiments. ***P ≤ 0.001 by Student’s t test. (J) Intracellular Thal SNS 032 levels, using the CRBN NanoBRET target engagement assay, in MDR1-overexpressing DLD-1 cells treated with DMSO or 0.1 μM tariquidar and increasing doses of Thal SNS 032. Data are % of DMSO control, presented as means ± SD of three independent assays. **P ≤ 0.01 and ***P ≤ 0.001 by Student’s t test. (K to N) Immunoblotting in DLD-1 cells treated with increasing doses of Thal SNS 032 (K and L) or dBET6 (M and N) alone or with tariquidar (0.1 μM) for 4 hours. Blots are representative, and densitometric analyses are means ± SD from three blots, each normalized to the loading control, GAPDH. DC50 value of Thal SNS 032 for CDK9 reduction (L) or of dBET6 for BRD4 reduction (N) determined with Prism. (O to T) Bliss synergy scores based on cell viability by CellTiter-Glo assay, colony formation, and immunoblotting in DLD-1 cells treated with the indicated doses of Thal SNS 032 (O to Q) or dBET6 (R to T) alone or with tariquidar. Cells were treated for 14 days for colony formation assays and 24 hours for immunoblotting.

 

Fig. 5. Repurposing dual kinase/MDR1 inhibitors to overcome degrader resistance in cancer cells.
(A and B) Drug efflux activity by rhodamine 123 efflux assays in degrader-resistant [dBET-R (A) or Thal-R (B)] A1847 cells after treatment with tariquidar, RAD001, or lapatinib (each 2 μM). Data are means ± SD of three independent experiments. *P ≤ 0.05 by Student’s t test. (C and D) CellTiter-Glo assay for the cell viability of parental, dBET6-R, or Thal-R A1847 cells treated with increasing concentrations of RAD001 (C) or lapatinib (D). Data were analyzed as % of DMSO control, presented as means ± SD of three independent assays. GI50 values were determined with Prism software. (E to I) Immunoblot analysis of degrader targets in parental (E), dBET6-R (F and G), and Thal-R (H and I) A1847 cells treated with increasing concentrations of RAD001 or lapatinib for 4 hours. Blots are representative, and densitometric analyses are means ± SD from three blots, each normalized to the loading control, GAPDH. DC50 value of dBET6 for BRD4 reduction (G) or of Thal SNS 032 for CDK9 reduction (I) determined with Prism. (J) Immunoblotting for cleaved PARP in dBET6-R or Thal-R A1847 cells treated with RAD001, lapatinib, or tariquidar (each 2 μM) for 24 hours. Blots are representative of three independent blots. (K to N) Immunoblotting for BRD4 in DLD-1 cells treated with increasing doses of dBET6 alone or in combination with either RAD001 or lapatinib [each 2 μM (K and L)] or KU-0063794 or afatinib [each 2 μM (M and N)] for 4 hours. Blots are representative of three independent experiments and, in (L), are means ± SD from three blots, each normalized to the loading control, GAPDH. DC50 value for BRD4 reduction (L) determined in Prism. (O) Colony formation by DLD-1 cells treated with DMSO, dBET6 (0.1 μM), lapatinib (2 μM), afatinib (2 μM), RAD001 (2 μM), KU-0063794 (2 μM), or the combination of inhibitor and dBET6 for 14 days. Images representative of three independent assays. (P and Q) Immunoblotting for CDK9 in DLD-1 cells treated with increasing doses of Thal SNS 032 and/or RAD001 (2 μM) or lapatinib (2 μM) for 4 hours. Blots are representative, and densitometric analyses are means ± SD from three blots, each normalized to the loading control, GAPDH. DC50 value for CDK9 reduction determined with Prism (Q). (R) Colony formation in DLD-1 cells treated with DMSO, Thal SNS 032 (0.5 μM), lapatinib (2 μM), and/or RAD001 (2 μM) as indicated for 14 days.

 

Fig. 6. Combining MEK1/2 degraders with lapatinib synergistically kills MDR1-overexpressing KRAS-mutant CRC cells and tumors.
(A and B) ABCB1 expression in KRAS-mutant CRC cell lines from cBioPortal (30) (A) and MDR1 abundance in select KRAS-mutant CRC cell lines (B). (C) Cell viability assessed by CellTiter-Glo in CRC cells treated with increasing doses of MS432 for 5 days, analyzed as % of DMSO control. GI50 value determined with Prism software. (D) Colony formation by CRC cells 14 days after treatment with 1 μM MS432. (E) MEK1/2 protein levels assessed by immunoblot in CRC lines SKCO1 (low MDR1) or LS513 (high MDR1) treated with increasing doses of MS432 for 4 hours. (F) Rhodamine 123 efflux in LS513 cells treated with DMSO, 2 μM tariquidar, or 2 μM lapatinib. (G and H) Immunoblotting analysis in LS513 cells treated with increasing doses of MS432 alone or in combination with tariquidar (0.1 μM) or lapatinib (5 μM) for 24 hours. DC50 value for MEK1 levels determined with Prism. (I) Immunoblotting in LS513 cells treated with DMSO, PD0325901 (0.01 μM), lapatinib (5 μM), or the combination for 48 hours. (J and K) Immunoblotting in LS513 cells treated either with DMSO, MS432 (1 μM), tariquidar (0.1 μM) (J), or lapatinib (5 μM) (K), alone or in combination. (L) Bliss synergy scores determined from cell viability assays (CellTiter-Glo) in LS513 cells treated with increasing concentrations of MS432, lapatinib, or the combination. (M and N) Colony formation by LS513 cells (M) and others (N) treated with DMSO, lapatinib (2 μM), MS432 (1 μM), or the combination for 14 days. (O and P) Immunoblotting in LS513 cells treated with increasing doses of MS934 alone (O) or combined with lapatinib (5 μM) (P) for 24 hours. (Q and R) Tumor volume of LS513 xenografts (Q) and the body weights of the tumor-bearing nude mice (R) treated with vehicle, MS934 (50 mg/kg), lapatinib (100 mg/kg), or the combination. n = 5 mice per treatment group. In (A) to (R), blots and images are representative of three independent experiments, and quantified data are means ± SD [SEM in (Q) and (R)] of three independent experiments; ***P ≤ 0.001 by Student’s t test.

 

Fig. 7. Lapatinib treatment improves KRASG12C degrader therapies in MDR1-overexpressing CRC cell lines.
(A and B) Colony formation by SW1463 (A) or SW837 (B) cells treated with DMSO, LC-2 (1 μM), or MRTX849 (1 μM) for 14 days. Images representative of three independent assays. (C to E) Immunoblotting in SW1463 cells (C and D) and SW837 cells (E) treated with DMSO, LC-2 (1 μM), tariquidar (0.1 μM) (C), or lapatinib (5 μM) (D and E) alone or in combination for 48 hours. Blots are representative of three independent experiments. (F and G) Bliss synergy scores based on CellTiter-Glo assay for the cell viability of SW1463 (F) or SW837 (G) cells treated with increasing concentrations of LC-2, lapatinib, or the combination. Data are means of three experiments ± SD. (H and I) Colony formation of SW1463 (H) or SW837 (I) cells treated as indicated (−, DMSO; LC-2, 1 μM; lapatinib, 2 μM; tariquidar, 0.1 μM) for 14 days. Images representative of three independent assays. (J) Rationale for combining lapatinib with MEK1/2 or KRASG12C degraders in MDR1-overexpressing CRC cell lines. Simultaneous blockade of MDR1 and ErbB receptor signaling overcomes degrader resistance and ErbB receptor kinome reprogramming, resulting in sustained inhibition of KRAS effector signaling.

SOURCE

Other articles in this Open Access Scientific Journal on PROTAC therapy in cancer include

Accelerating PROTAC drug discovery: Establishing a relationship between ubiquitination and target protein degradation

The Vibrant Philly Biotech Scene: Proteovant Therapeutics Using Artificial Intelligence and Machine Learning to Develop PROTACs

The Map of human proteins drawn by artificial intelligence and PROTAC (proteolysis targeting chimeras) Technology for Drug Discovery

Read Full Post »

The Human Genome Gets Fully Sequenced: A Simplistic Take on Century Long Effort

 

Curator: Stephen J. Williams, PhD

Article ID #295: The Human Genome Gets Fully Sequenced: A Simplistic Take on Century Long Effort. Published on 6/14/2022

WordCloud Image Produced by Adam Tubman

Ever since the hard work by Rosalind Franklin to deduce structures of DNA and the coincidental work by Francis Crick and James Watson who modeled the basic building blocks of DNA, DNA has been considered as the basic unit of heredity and life, with the “Central Dogma” (DNA to RNA to Protein) at its core.  These were the discoveries in the early twentieth century, and helped drive the transformational shift of biological experimentation, from protein isolation and characterization to cloning protein-encoding genes to characterizing how the genes are expressed temporally, spatially, and contextually.

Rosalind Franklin, who’s crystolagraphic data led to determination of DNA structure. Shown as 1953 Time cover as Time person of the Year

Dr Francis Crick and James Watson in front of their model structure of DNA

 

 

 

 

 

 

 

 

 

Up to this point (1970s-mid 80s) , it was felt that genetic information was rather static, and the goal was still to understand and characterize protein structure and function while an understanding of the underlying genetic information was more important for efforts like linkage analysis of genetic defects and tools for the rapidly developing field of molecular biology.  But the development of the aforementioned molecular biology tools including DNA cloning, sequencing and synthesis, gave scientists the idea that a whole recording of the human genome might be possible and worth the effort.

How the Human Genome Project  Expanded our View of Genes Genetic Material and Biological Processes

 

 

From the Human Genome Project Information Archive

Source:  https://web.ornl.gov/sci/techresources/Human_Genome/project/hgp.shtml

History of the Human Genome Project

The Human Genome Project (HGP) refers to the international 13-year effort, formally begun in October 1990 and completed in 2003, to discover all the estimated 20,000-25,000 human genes and make them accessible for further biological study. Another project goal was to determine the complete sequence of the 3 billion DNA subunits (bases in the human genome). As part of the HGP, parallel studies were carried out on selected model organisms such as the bacterium E. coli and the mouse to help develop the technology and interpret human gene function. The DOE Human Genome Program and the NIH National Human Genome Research Institute (NHGRI) together sponsored the U.S. Human Genome Project.

 

Please see the following for goals, timelines, and funding for this project

 

History of the Project

It is interesting to note that multiple government legislation is credited for the funding of such a massive project including

Project Enabling Legislation

  • The Atomic Energy Act of 1946 (P.L. 79-585) provided the initial charter for a comprehensive program of research and development related to the utilization of fissionable and radioactive materials for medical, biological, and health purposes.
  • The Atomic Energy Act of 1954 (P.L. 83-706) further authorized the AEC “to conduct research on the biologic effects of ionizing radiation.”
  • The Energy Reorganization Act of 1974 (P.L. 93-438) provided that responsibilities of the Energy Research and Development Administration (ERDA) shall include “engaging in and supporting environmental, biomedical, physical, and safety research related to the development of energy resources and utilization technologies.”
  • The Federal Non-nuclear Energy Research and Development Act of 1974 (P.L. 93-577) authorized ERDA to conduct a comprehensive non-nuclear energy research, development, and demonstration program to include the environmental and social consequences of the various technologies.
  • The DOE Organization Act of 1977 (P.L. 95-91) mandated the Department “to assure incorporation of national environmental protection goals in the formulation and implementation of energy programs; and to advance the goal of restoring, protecting, and enhancing environmental quality, and assuring public health and safety,” and to conduct “a comprehensive program of research and development on the environmental effects of energy technology and program.”

It should also be emphasized that the project was not JUST funded through NIH but also Department of Energy

Project Sponsors

For a great read on Dr. Craig Ventnor with interviews with the scientist see Dr. Larry Bernstein’s excellent post The Human Genome Project

 

By 2003 we had gained much information about the structure of DNA, genes, exons, introns and allowed us to gain more insights into the diversity of genetic material and the underlying protein coding genes as well as many of the gene-expression regulatory elements.  However there was much uninvestigated material dispersed between genes, the then called “junk DNA” and, up to 2003 not much was known about the function of this ‘junk DNA’.  In addition there were two other problems:

  • The reference DNA used was actually from one person (Craig Ventor who was the lead initiator of the project)
  • Multiple gaps in the DNA sequence existed, and needed to be filled in

It is important to note that a tremendous amount of diversity of protein has been realized from both transcriptomic and proteomic studies.  Although about 20 to 25,000 coding genes exist the human proteome contains about 600,000 proteoforms (due to alternative splicing, posttranslational modifications etc.)

This expansion of the proteoform via alternate splicing into isoforms, gene duplication to paralogs has been shown to have major effects on, for example, cellular signaling pathways (1)

However just recently it has been reported that the FULL human genome has been sequenced and is complete and verified.  This was the focus of a recent issue in the journal Science.

Source: https://www.science.org/doi/10.1126/science.abj6987

Abstract

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.

 

The current human reference genome was released by the Genome Reference Consortium (GRC) in 2013 and most recently patched in 2019 (GRCh38.p13) (1). This reference traces its origin to the publicly funded Human Genome Project (2) and has been continually improved over the past two decades. Unlike the competing Celera effort (3) and most modern sequencing projects based on “shotgun” sequence assembly (4), the GRC assembly was constructed from sequenced bacterial artificial chromosomes (BACs) that were ordered and oriented along the human genome by means of radiation hybrid, genetic linkage, and fingerprint maps. However, limitations of BAC cloning led to an underrepresentation of repetitive sequences, and the opportunistic assembly of BACs derived from multiple individuals resulted in a mosaic of haplotypes. As a result, several GRC assembly gaps are unsolvable because of incompatible structural polymorphisms on their flanks, and many other repetitive and polymorphic regions were left unfinished or incorrectly assembled (5).

 

Fig. 1. Summary of the complete T2T-CHM13 human genome assembly.
(A) Ideogram of T2T-CHM13v1.1 assembly features. For each chromosome (chr), the following information is provided from bottom to top: gaps and issues in GRCh38 fixed by CHM13 overlaid with the density of genes exclusive to CHM13 in red; segmental duplications (SDs) (42) and centromeric satellites (CenSat) (30); and CHM13 ancestry predictions (EUR, European; SAS, South Asian; EAS, East Asian; AMR, ad-mixed American). Bottom scale is measured in Mbp. (B and C) Additional (nonsyntenic) bases in the CHM13 assembly relative to GRCh38 per chromosome, with the acrocentrics highlighted in black (B) and by sequence type (C). (Note that the CenSat and SD annotations overlap.) RepMask, RepeatMasker. (D) Total nongap bases in UCSC reference genome releases dating back to September 2000 (hg4) and ending with T2T-CHM13 in 2021. Mt/Y/Ns, mitochondria, chrY, and gaps.

Note in Figure 1D the exponential growth in genetic information.

Also very important is the ability to determine all the paralogs, isoforms, areas of potential epigenetic regulation, gene duplications, and transposable elements that exist within the human genome.

Analyses and resources

A number of companion studies were carried out to characterize the complete sequence of a human genome, including comprehensive analyses of centromeric satellites (30), segmental duplications (42), transcriptional (49) and epigenetic profiles (29), mobile elements (49), and variant calls (25). Up to 99% of the complete CHM13 genome can be confidently mapped with long-read sequencing, opening these regions of the genome to functional and variational analysis (23) (fig. S38 and table S14). We have produced a rich collection of annotations and omics datasets for CHM13—including RNA sequencing (RNA-seq) (30), Iso-seq (21), precision run-on sequencing (PRO-seq) (49), cleavage under targets and release using nuclease (CUT&RUN) (30), and ONT methylation (29) experiments—and have made these datasets available via a centralized University of California, Santa Cruz (UCSC), Assembly Hub genome browser (54).

 

To highlight the utility of these genetic and epigenetic resources mapped to a complete human genome, we provide the example of a segmentally duplicated region of the chromosome 4q subtelomere that is associated with facioscapulohumeral muscular dystrophy (FSHD) (55). This region includes FSHD region gene 1 (FRG1), FSHD region gene 2 (FRG2), and an intervening D4Z4 macrosatellite repeat containing the double homeobox 4 (DUX4) gene that has been implicated in the etiology of FSHD (56). Numerous duplications of this region throughout the genome have complicated past genetic analyses of FSHD.

The T2T-CHM13 assembly reveals 23 paralogs of FRG1 spread across all acrocentric chromosomes as well as chromosomes 9 and 20 (Fig. 5A). This gene appears to have undergone recent amplification in the great apes (57), and approximate locations of FRG1 paralogs were previously identified by FISH (58). However, only nine FRG1 paralogs are found in GRCh38, hampering sequence-based analysis.

Future of the human reference genome

The T2T-CHM13 assembly adds five full chromosome arms and more additional sequence than any genome reference release in the past 20 years (Fig. 1D). This 8% of the genome has not been overlooked because of a lack of importance but rather because of technological limitations. High-accuracy long-read sequencing has finally removed this technological barrier, enabling comprehensive studies of genomic variation across the entire human genome, which we expect to drive future discovery in human genomic health and disease. Such studies will necessarily require a complete and accurate human reference genome.

CHM13 lacks a Y chromosome, and homozygous Y-bearing CHMs are nonviable, so a different sample type will be required to complete this last remaining chromosome. However, given its haploid nature, it should be possible to assemble the Y chromosome from a male sample using the same methods described here and supplement the T2T-CHM13 reference assembly with a Y chromosome as needed.

Extending beyond the human reference genome, large-scale resequencing projects have revealed genomic variation across human populations. Our reanalyses of the 1KGP (25) and SGDP (42) datasets have already shown the advantages of T2T-CHM13, even for short-read analyses. However, these studies give only a glimpse of the extensive structural variation that lies within the most repetitive regions of the genome assembled here. Long-read resequencing studies are now needed to comprehensively survey polymorphic variation and reveal any phenotypic associations within these regions.

Although CHM13 represents a complete human haplotype, it does not capture the full diversity of human genetic variation. To address this bias, the Human Pangenome Reference Consortium (59) has joined with the T2T Consortium to build a collection of high-quality reference haplotypes from a diverse set of samples. Ideally, all genomes could be assembled at the quality achieved here, but automated T2T assembly of diploid genomes presents a difficult challenge that will require continued development. Until this goal is realized, and any human genome can be completely sequenced without error, the T2T-CHM13 assembly represents a more complete, representative, and accurate reference than GRCh38.

 

This paper was the focus of a Time article and their basis for making the lead authors part of their Time 100 people of the year.

From TIME

The Human Genome Is Finally Fully Sequenced

Source: https://time.com/6163452/human-genome-fully-sequenced/

 

The first human genome was mapped in 2001 as part of the Human Genome Project, but researchers knew it was neither complete nor completely accurate. Now, scientists have produced the most completely sequenced human genome to date, filling in gaps and correcting mistakes in the previous version.

The sequence is the most complete reference genome for any mammal so far. The findings from six new papers describing the genome, which were published in Science, should lead to a deeper understanding of human evolution and potentially reveal new targets for addressing a host of diseases.

A more precise human genome

“The Human Genome Project relied on DNA obtained through blood draws; that was the technology at the time,” says Adam Phillippy, head of genome informatics at the National Institutes of Health’s National Human Genome Research Institute (NHGRI) and senior author of one of the new papers. “The techniques at the time introduced errors and gaps that have persisted all of these years. It’s nice now to fill in those gaps and correct those mistakes.”

“We always knew there were parts missing, but I don’t think any of us appreciated how extensive they were, or how interesting,” says Michael Schatz, professor of computer science and biology at Johns Hopkins University and another senior author of the same paper.

The work is the result of the Telomere to Telomere consortium, which is supported by NHGRI and involves genetic and computational biology experts from dozens of institutes around the world. The group focused on filling in the 8% of the human genome that remained a genetic black hole from the first draft sequence. Since then, geneticists have been trying to add those missing portions bit by bit. The latest group of studies identifies about an entire chromosome’s worth of new sequences, representing 200 million more base pairs (the letters making up the genome) and 1,956 new genes.

 

NOTE: In 2001 many scientists postulated there were as much as 100,000 coding human genes however now we understand there are about 20,000 to 25,000 human coding genes.  This does not however take into account the multiple diversity obtained from alternate splicing, gene duplications, SNPs, and chromosomal rearrangements.

Scientists were also able to sequence the long stretches of DNA that contained repeated sequences, which genetic experts originally thought were similar to copying errors and dismissed as so-called “junk DNA”. These repeated sequences, however, may play roles in certain human diseases. “Just because a sequence is repetitive doesn’t mean it’s junk,” says Eichler. He points out that critical genes are embedded in these repeated regions—genes that contribute to machinery that creates proteins, genes that dictate how cells divide and split their DNA evenly into their two daughter cells, and human-specific genes that might distinguish the human species from our closest evolutionary relatives, the primates. In one of the papers, for example, researchers found that primates have different numbers of copies of these repeated regions than humans, and that they appear in different parts of the genome.

“These are some of the most important functions that are essential to live, and for making us human,” says Eichler. “Clearly, if you get rid of these genes, you don’t live. That’s not junk to me.”

Deciphering what these repeated sections mean, if anything, and how the sequences of previously unsequenced regions like the centromeres will translate to new therapies or better understanding of human disease, is just starting, says Deanna Church, a vice president at Inscripta, a genome engineering company who wrote a commentary accompanying the scientific articles. Having the full sequence of a human genome is different from decoding it; she notes that currently, of people with suspected genetic disorders whose genomes are sequenced, about half can be traced to specific changes in their DNA. That means much of what the human genome does still remains a mystery.

The investigators in the Telomere to Telomere Consortium made the Time 100 People of the Year.

Michael Schatz, Karen Miga, Evan Eichler, and Adam Phillippy

Illustration by Brian Lutz for Time (Source Photos: Will Kirk—Johns Hopkins University; Nick Gonzales—UC Santa Cruz; Patrick Kehoe; National Human Genome Research Institute)

BY JENNIFER DOUDNA

MAY 23, 2022 6:08 AM EDT

Ever since the draft of the human genome became available in 2001, there has been a nagging question about the genome’s “dark matter”—the parts of the map that were missed the first time through, and what they contained. Now, thanks to Adam Phillippy, Karen Miga, Evan Eichler, Michael Schatz, and the entire Telomere-to-Telomere Consortium (T2T) of scientists that they led, we can see the full map of the human genomic landscape—and there’s much to explore.

In the scientific community, there wasn’t a consensus that mapping these missing parts was necessary. Some in the field felt there was already plenty to do using the data in hand. In addition, overcoming the technical challenges to getting the missing information wasn’t possible until recently. But the more we learn about the genome, the more we understand that every piece of the puzzle is meaningful.

I admire the

T2T group’s willingness to grapple with the technical demands of this project and their persistence in expanding the genome map into uncharted territory. The complete human genome sequence is an invaluable resource that may provide new insights into the origin of diseases and how we can treat them. It also offers the most complete look yet at the genetic script underlying the very nature of who we are as human beings.

Doudna is a biochemist and winner of the 2020 Nobel Prize in Chemistry

Source: https://time.com/collection/100-most-influential-people-2022/6177818/evan-eichler-karen-miga-adam-phillippy-michael-schatz/

Other articles on the Human Genome Project and Junk DNA in this Open Access Scientific Journal Include:

 

International Award for Human Genome Project

 

Cracking the Genome – Inside the Race to Unlock Human DNA – quotes in newspapers

 

The Human Genome Project

 

Junk DNA and Breast Cancer

 

A Perspective on Personalized Medicine

 

 

 

 

 

 

 

Additional References

 

  1. P. Scalia, A. Giordano, C. Martini, S. J. Williams, Isoform- and Paralog-Switching in IR-Signaling: When Diabetes Opens the Gates to Cancer. Biomolecules 10, (Nov 30, 2020).

 

 

Read Full Post »

Defective viral RNA sensing gene OAS1 linked to severe COVID-19

Reporter: Stephen J. Williams, Ph.D.

Source: https://www.science.org/doi/10.1126/science.abm3921

Defective viral RNA sensing linked to severe COVID-19

JOHN SCHOGGINS SCIENCE•28 Oct 2021•Vol 374, Issue 6567•pp. 535-536•DOI: 10.1126/science.abm39214,824

Why do some people with COVID-19 get sicker than others? Maybe exposure to a particularly high dose of the causative virus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), accounts for the difference. Perhaps deficiencies in diet, exercise, or sleep contribute to worse illness. Although many factors govern how sick people become, a key driver of the severity of COVID-19 appears to be genetic, which is common for other human viruses and infectious agents (1). On page 579 of this issue, Wickenhagen et al. (2) show that susceptibility to severe COVID-19 is associated with a single-nucleotide polymorphism (SNP) in the human gene 2′-5′-oligoadenylate synthetase 1 (OAS1).The authors reasoned that SARS-CoV-2 should be inhibited by interferon-mediated antiviral responses, which are among the first cellular defense mechanisms produced in response to a viral infection. Interferons are a group of cytokines that induce the transcription of a large cadre of genes, many of which encode proteins with the potential to directly inhibit the invading virus. Wickenhagen et al. interrogated many hundreds of these putative antiviral proteins for their ability to suppress SARS-CoV-2 in cultured cells and found that OAS1 was particularly potent against SARS-CoV-2.OAS1 is an enzyme that is activated in the presence of double-stranded RNA, which is scattered along an otherwise singlestranded SARS-CoV-2 genome because of an assortment of RNA hairpins and other secondary structures. Once activated, OAS1 catalyzes the polymerization of adenosine triphosphate (ATP) into a second messenger, 2′-5′-oligoadenylate. This then triggers the conversion of ribonuclease L (RNaseL) into its active form so that it can cleave viral RNA, effectively blunting viral replication (3). Wickenhagen et al. found that OAS1 is expressed in respiratory tissues of healthy donors and COVID-19 patients and that it interacts with a region of the SARS-CoV-2 genome that contains double-stranded RNA secondary structures (see the figure).OAS1 exists predominantly as two isoforms in humans—a longer isoform (p46) and a shorter version (p42). Genetic variation dictates which isoform will be expressed. In humans, p46 is expressed in people who have a SNP that causes alternative splicing of the OAS1 messenger RNA (mRNA). This results in the utilization of a terminal exon that is not used to translate p42. Thus, the carboxyl terminus of the p46 OAS1 protein contains a distinct four–amino acid motif that forms a prenylation site. Prenylation is a posttranslational modification that targets proteins to membranes. In cell culture experiments, Wickenhagen et al. showed that only OAS1 p46, but not p42, could inhibit SARS-CoV-2. However, when the prenylation site of p46 was engineered into p42, this chimeric p42 protein was able to inhibit SARS-CoV-2, which strongly implicates a role for OAS1 specifically at membranes.Why are membranes important? SARS-CoV-2, like all coronaviruses, co-opts cellular membranes at the endoplasmic reticulum to form double-membrane vesicles, in which the virus replicates its genome. Thus, membrane-bound OAS1 p46 may be specifically activated by RNA viruses that form membrane-bound vesicles for replication. Indeed, the unrelated cardiovirus A, which also forms vesicular membranous structures, was inhibited by OAS1. Conversely, other respiratory RNA viruses, such as human parainfluenza virus type 3 and human respiratory syncytial virus, which do not use membrane-tethered vesicles for replication, were not inhibited by p46.Wickenhagen et al. examined a cohort of 499 COVID-19 patients hospitalized in the UK. Whereas all patients expressed OAS1, 42.5% of them did not express the antiviral p46 isoform. These patients were statistically more likely to have severe COVID-19 (be admitted to the intensive care unit). This suggests that OAS1 is an important antiviral factor in the control of SARS-CoV-2 infection and that its inability to activate RNaseL results in prolonged infections and severe disease, although other factors likely contribute. The authors also examined animals known to harbor different coronaviruses. They found evidence for prenylated OAS1 proteins in mice, cows, and camels. Notably, horseshoe bats, which are considered a possible reservoir for SARS-related coronaviruses (4), lack a prenylation motif in their OAS1 because of genomic changes that eliminated the critical four-amino acid motif. A horseshoe bat (Rhinolophus ferrumequinum) OAS1 was unable to inhibit SARS-CoV-2 infection in cell culture. Conversely, the black flying fox (Pteropus alecto)—a pteropid bat that is a reservoir for the Nipah and Hendra viruses, which can also infect humans—possesses a prenylated OAS1 that can inhibit SARS-CoV-2. These findings indicate that horseshoe bats may be genetically and evolutionarily primed to be optimal reservoir hosts for certain coronaviruses, like SARS-CoV-2.Other studies have now shown that the p46 OAS1 variant, which resides in a genomic locus inherited from Neanderthals (57), correlates with protection from COVID-19 severity in various populations (89). These findings mirror previous studies indicating that outcomes with West Nile virus (10) and hepatitis C virus (11) infection, both of which also use membrane vesicles for replication, are also associated with genetic variation at the human OAS1 locus. Another elegant functional study complements the findings of Wickenhagen et al. by also demonstrating that prenylated OAS1 inhibits multiple viruses, including SARS-CoV-2, and is associated with protection from severe COVID-19 in patients (12).There is a growing body of evidence that provides critical understanding of how human genetic variation shapes the outcome of infectious diseases like COVID-19. In addition to OAS1, genetic variation in another viral RNA sensor, Toll-like receptor 7 (TLR7), is associated with severe COVID-19 (1315). The effects appear to be exclusive to males, because TLR7 is on the X chromosome, so inherited deleterious mutations in TLR7 therefore result in immune cells that fail to produce normal amounts of interferon, which correlates with more severe COVID-19. Our knowledge of the host cellular factors that control SARS-CoV-2 is rapidly increasing. These findings will undoubtedly open new avenues into SARS-CoV-2 antiviral immunity and may also be beneficial for the development of strategies to treat or prevent severe COVID-19.

References and Notes

1J. L. Casanova, Proc. Natl. Acad. Sci. U.S.A.112, E7118 (2015).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR2A. Wickenhagen et al., Science374, eabj3624 (2021).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR3H. Kristiansen, H. H. Gad, S. Eskildsen-Larsen, P. Despres, R. Hartmann, J. Interferon Cytokine Res.31, 41 (2011).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR4S. Lytras, W. Xia, J. Hughes, X. Jiang, D. L. Robertson, Science373, 968 (2021).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR5S. Zhou et al., Nat. Med.27, 659 (2021).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR6H. Zeberg, S. Pääbo, Proc. Natl. Acad. Sci. U.S.A.118, e2026309118 (2021).CROSSREFPUBMEDGOOGLE SCHOLAR7F. L. Mendez, J. C. Watkins, M. F. Hammer, Mol. Biol. Evol.30, 798 (2013).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR8A. R. Banday et al., medRxiv2021).GO TO REFERENCECROSSREFGOOGLE SCHOLAR9E. Pairo-Castineira et al., Nature591, 92 (2021).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR10J. K. Lim et al., PLOS Pathog.5, e1000321 (2009).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR11M. K. El Awady et al., J. Gastroenterol. Hepatol.26, 843 (2011).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR12F. W. Soveg et al., eLife10, e71047 (2021).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR13T. Asano et al., Sci. Immunol.6, eabl4348 (2021).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR14C. Fallerini et al., eLife10, e67569 (2021).CROSSREFPUBMEDGOOGLE SCHOLAR15C. I. van der Made et al., JAMA324, 663 (2020).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR

For more on COVID-19 Please see our Coronavirus Portal at

Read Full Post »

Science Policy Forum: Should we trust healthcare explanations from AI predictive systems?

Some in industry voice their concerns

Curator: Stephen J. Williams, PhD

Post on AI healthcare and explainable AI

   In a Policy Forum article in ScienceBeware explanations from AI in health care”, Boris Babic, Sara Gerke, Theodoros Evgeniou, and Glenn Cohen discuss the caveats on relying on explainable versus interpretable artificial intelligence (AI) and Machine Learning (ML) algorithms to make complex health decisions.  The FDA has already approved some AI/ML algorithms for analysis of medical images for diagnostic purposes.  These have been discussed in prior posts on this site, as well as issues arising from multi-center trials.  The authors of this perspective article argue that choice of type of algorithm (explainable versus interpretable) algorithms may have far reaching consequences in health care.

Summary

Artificial intelligence and machine learning (AI/ML) algorithms are increasingly developed in health care for diagnosis and treatment of a variety of medical conditions (1). However, despite the technical prowess of such systems, their adoption has been challenging, and whether and how much they will actually improve health care remains to be seen. A central reason for this is that the effectiveness of AI/ML-based medical devices depends largely on the behavioral characteristics of its users, who, for example, are often vulnerable to well-documented biases or algorithmic aversion (2). Many stakeholders increasingly identify the so-called black-box nature of predictive algorithms as the core source of users’ skepticism, lack of trust, and slow uptake (3, 4). As a result, lawmakers have been moving in the direction of requiring the availability of explanations for black-box algorithmic decisions (5). Indeed, a near-consensus is emerging in favor of explainable AI/ML among academics, governments, and civil society groups. Many are drawn to this approach to harness the accuracy benefits of noninterpretable AI/ML such as deep learning or neural nets while also supporting transparency, trust, and adoption. We argue that this consensus, at least as applied to health care, both overstates the benefits and undercounts the drawbacks of requiring black-box algorithms to be explainable.

Source: https://science.sciencemag.org/content/373/6552/284?_ga=2.166262518.995809660.1627762475-1953442883.1627762475

Types of AI/ML Algorithms: Explainable and Interpretable algorithms

  1.  Interpretable AI: A typical AI/ML task requires constructing algorithms from vector inputs and generating an output related to an outcome (like diagnosing a cardiac event from an image).  Generally the algorithm has to be trained on past data with known parameters.  When an algorithm is called interpretable, this means that the algorithm uses a transparent or “white box” function which is easily understandable. Such example might be a linear function to determine relationships where parameters are simple and not complex.  Although they may not be as accurate as the more complex explainable AI/ML algorithms, they are open, transparent, and easily understood by the operators.
  2. Explainable AI/ML:  This type of algorithm depends upon multiple complex parameters and takes a first round of predictions from a “black box” model then uses a second algorithm from an interpretable function to better approximate outputs of the first model.  The first algorithm is trained not with original data but based on predictions resembling multiple iterations of computing.  Therefore this method is more accurate or deemed more reliable in prediction however is very complex and is not easily understandable.  Many medical devices that use an AI/ML algorithm use this type.  An example is deep learning and neural networks.

The purpose of both these methodologies is to deal with problems of opacity, or that AI predictions based from a black box undermines trust in the AI.

For a deeper understanding of these two types of algorithms see here:

https://www.kdnuggets.com/2018/12/machine-learning-explainability-interpretability-ai.html

or https://www.bmc.com/blogs/machine-learning-interpretability-vs-explainability/

(a longer read but great explanation)

From the above blog post of Jonathan Johnson

  • How interpretability is different from explainability
  • Why a model might need to be interpretable and/or explainable
  • Who is working to solve the black box problem—and how

What is interpretability?

Does Chipotle make your stomach hurt? Does loud noise accelerate hearing loss? Are women less aggressive than men? If a machine learning model can create a definition around these relationships, it is interpretable.

All models must start with a hypothesis. Human curiosity propels a being to intuit that one thing relates to another. “Hmm…multiple black people shot by policemen…seemingly out of proportion to other races…something might be systemic?” Explore.

People create internal models to interpret their surroundings. In the field of machine learning, these models can be tested and verified as either accurate or inaccurate representations of the world.

Interpretability means that the cause and effect can be determined.

What is explainability?

ML models are often called black-box models because they allow a pre-set number of empty parameters, or nodes, to be assigned values by the machine learning algorithm. Specifically, the back-propagation step is responsible for updating the weights based on its error function.

To predict when a person might die—the fun gamble one might play when calculating a life insurance premium, and the strange bet a person makes against their own life when purchasing a life insurance package—a model will take in its inputs, and output a percent chance the given person has at living to age 80.

Below is an image of a neural network. The inputs are the yellow; the outputs are the orange. Like a rubric to an overall grade, explainability shows how significant each of the parameters, all the blue nodes, contribute to the final decision.

In this neural network, the hidden layers (the two columns of blue dots) would be the black box.

For example, we have these data inputs:

  • Age
  • BMI score
  • Number of years spent smoking
  • Career category

If this model had high explainability, we’d be able to say, for instance:

  • The career category is about 40% important
  • The number of years spent smoking weighs in at 35% important
  • The age is 15% important
  • The BMI score is 10% important

Explainability: important, not always necessary

Explainability becomes significant in the field of machine learning because, often, it is not apparent. Explainability is often unnecessary. A machine learning engineer can build a model without ever having considered the model’s explainability. It is an extra step in the building process—like wearing a seat belt while driving a car. It is unnecessary for the car to perform, but offers insurance when things crash.

The benefit a deep neural net offers to engineers is it creates a black box of parameters, like fake additional data points, that allow a model to base its decisions against. These fake data points go unknown to the engineer. The black box, or hidden layers, allow a model to make associations among the given data points to predict better results. For example, if we are deciding how long someone might have to live, and we use career data as an input, it is possible the model sorts the careers into high- and low-risk career options all on its own.

Perhaps we inspect a node and see it relates oil rig workers, underwater welders, and boat cooks to each other. It is possible the neural net makes connections between the lifespan of these individuals and puts a placeholder in the deep net to associate these. If we were to examine the individual nodes in the black box, we could note this clustering interprets water careers to be a high-risk job.

In the previous chart, each one of the lines connecting from the yellow dot to the blue dot can represent a signal, weighing the importance of that node in determining the overall score of the output.

  • If that signal is high, that node is significant to the model’s overall performance.
  • If that signal is low, the node is insignificant.

With this understanding, we can define explainability as:

Knowledge of what one node represents and how important it is to the model’s performance.

So how does choice of these two different algorithms make a difference with respect to health care and medical decision making?

The authors argue: 

“Regulators like the FDA should focus on those aspects of the AI/ML system that directly bear on its safety and effectiveness – in particular, how does it perform in the hands of its intended users?”

A suggestion for

  • Enhanced more involved clinical trials
  • Provide individuals added flexibility when interacting with a model, for example inputting their own test data
  • More interaction between user and model generators
  • Determining in which situations call for interpretable AI versus explainable (for instance predicting which patients will require dialysis after kidney damage)

Other articles on AI/ML in medicine and healthcare on this Open Access Journal include

Applying AI to Improve Interpretation of Medical Imaging

Real Time Coverage @BIOConvention #BIO2019: Machine Learning and Artificial Intelligence #AI: Realizing Precision Medicine One Patient at a Time

LIVE Day Three – World Medical Innovation Forum ARTIFICIAL INTELLIGENCE, Boston, MA USA, Monday, April 10, 2019

Cardiac MRI Imaging Breakthrough: The First AI-assisted Cardiac MRI Scan Solution, HeartVista Receives FDA 510(k) Clearance for One Click™ Cardiac MRI Package

 

Read Full Post »

From AAAS Science News on COVID19: New CRISPR based diagnostic may shorten testing time to 5 minutes

Reporter: Stephen J. Williams, Ph.D.

 

 

 

 

 

 

 

 

 

A new CRISPR-based diagnostic could shorten wait times for coronavirus tests.

 

 

New test detects coronavirus in just 5 minutes

By Robert F. ServiceOct. 8, 2020 , 3:45 PM

Science’s COVID-19 reporting is supported by the Pulitzer Center and the Heising-Simons Foundation.

 

Researchers have used CRISPR gene-editing technology to come up with a test that detects the pandemic coronavirus in just 5 minutes. The diagnostic doesn’t require expensive lab equipment to run and could potentially be deployed at doctor’s offices, schools, and office buildings.

“It looks like they have a really rock-solid test,” says Max Wilson, a molecular biologist at the University of California (UC), Santa Barbara. “It’s really quite elegant.”

CRISPR diagnostics are just one way researchers are trying to speed coronavirus testing. The new test is the fastest CRISPR-based diagnostic yet. In May, for example, two teams reported creating CRISPR-based coronavirus tests that could detect the virus in about an hour, much faster than the 24 hours needed for conventional coronavirus diagnostic tests.CRISPR tests work by identifying a sequence of RNA—about 20 RNA bases long—that is unique to SARS-CoV-2. They do so by creating a “guide” RNA that is complementary to the target RNA sequence and, thus, will bind to it in solution. When the guide binds to its target, the CRISPR tool’s Cas13 “scissors” enzyme turns on and cuts apart any nearby single-stranded RNA. These cuts release a separately introduced fluorescent particle in the test solution. When the sample is then hit with a burst of laser light, the released fluorescent particles light up, signaling the presence of the virus. These initial CRISPR tests, however, required researchers to first amplify any potential viral RNA before running it through the diagnostic to increase their odds of spotting a signal. That added complexity, cost, and time, and put a strain on scarce chemical reagents. Now, researchers led by Jennifer Doudna, who won a share of this year’s Nobel Prize in Chemistry yesterday for her co-discovery of CRISPR, report creating a novel CRISPR diagnostic that doesn’t amplify coronavirus RNA. Instead, Doudna and her colleagues spent months testing hundreds of guide RNAs to find multiple guides that work in tandem to increase the sensitivity of the test.

In a new preprint, the researchers report that with a single guide RNA, they could detect as few as 100,000 viruses per microliter of solution. And if they add a second guide RNA, they can detect as few as 100 viruses per microliter.

That’s still not as good as the conventional coronavirus diagnostic setup, which uses expensive lab-based machines to track the virus down to one virus per microliter, says Melanie Ott, a virologist at UC San Francisco who helped lead the project with Doudna. However, she says, the new setup was able to accurately identify a batch of five positive clinical samples with perfect accuracy in just 5 minutes per test, whereas the standard test can take 1 day or more to return results.

The new test has another key advantage, Wilson says: quantifying a sample’s amount of virus. When standard coronavirus tests amplify the virus’ genetic material in order to detect it, this changes the amount of genetic material present—and thus wipes out any chance of precisely quantifying just how much virus is in the sample.

By contrast, Ott’s and Doudna’s team found that the strength of the fluorescent signal was proportional to the amount of virus in their sample. That revealed not just whether a sample was positive, but also how much virus a patient had. That information can help doctors tailor treatment decisions to each patient’s condition, Wilson says.

Doudna and Ott say they and their colleagues are now working to validate their test setup and are looking into how to commercialize it.

Posted in:

doi:10.1126/science.abf1752

Robert F. Service

Bob is a news reporter for Science in Portland, Oregon, covering chemistry, materials science, and energy stories.

 

Source: https://www.sciencemag.org/news/2020/10/new-test-detects-coronavirus-just-5-minutes

Other articles on CRISPR and COVID19 can be found on our Coronavirus Portal and the following articles:

The Nobel Prize in Chemistry 2020: Emmanuelle Charpentier & Jennifer A. Doudna
The University of California has a proud legacy of winning Nobel Prizes, 68 faculty and staff have been awarded 69 Nobel Prizes.
Toaster Sized Machine Detects COVID-19
Study with important implications when considering widespread serological testing, Ab protection against re-infection with SARS-CoV-2 and the durability of vaccine protection

Read Full Post »

Miniproteins against the COVID-19 Spike protein may be therapeutic

Reporter: Stephen J. Williams, PhD

Computer-designed proteins may protect against coronavirus

At a Glance

  • Researchers designed “miniproteins” that bound tightly to the SARS-CoV-2 spike protein and prevented the virus from infecting human cells in the lab.
  • More research is underway to test the most promising of the antiviral proteins.

 

 

 

 

 

 

 

An artist’s conception of computer-designed miniproteins (white) binding coronavirus spikes. UW Institute for Protein Design

The surface of SARS-CoV-2, the virus that causes COVID-19, is covered with spike proteins. These proteins latch onto human cells, allowing the virus to enter and infect them. The spike binds to ACE2 receptors on the cell surface. It then undergoes a structural change that allows it to fuse with the cell. Once inside, the virus can copy itself and produce more viruses.

Blocking entry of SARS-CoV-2 into human cells can prevent infection. Researchers are testing monoclonal antibody therapies that bind to the spike protein and neutralize the virus. But these antibodies, which are derived from immune system molecules, are large and not ideal for delivery through the nose. They’re also often not stable for long periods and usually require refrigeration.

Researchers led by Dr. David Baker of the University of Washington set out to design synthetic “miniproteins” that bind tightly to the coronavirus spike protein. Their study was funded in part by NIH’s National Institute of General Medical Sciences (NIGMS) and National Institute of Allergy and Infectious Diseases (NIAID). Findings appeared in Science on September 9, 2020.

The team used two strategies to create the antiviral miniproteins. First, they incorporated a segment of the ACE2 receptor into the small proteins. The researchers used a protein design tool they developed called Rosetta blueprint builder. This technology allowed them to custom build proteins and predict how they would bind to the receptor.

The second approach was to design miniproteins from scratch, which allowed for a greater range of possibilities. Using a large library of miniproteins, they identified designs that could potentially bind within a key part of the coronavirus spike called the receptor binding domain (RBD). In total, the team produced more than 100,000 miniproteins.

Next, the researchers tested how well the miniproteins bound to the RBD. The most promising candidates then underwent further testing and tweaking to improve binding.

Using cryo-electron microscopy, the team was able to build detailed pictures of how two of the miniproteins bound to the spike protein. The binding closely matched the predictions of the computational models.

Finally, the researchers tested whether three of the miniproteins could neutralize SARS-CoV-2. All protected lab-grown human cells from infection. Candidates LCB1 and LCB3 showed potent neutralizing ability. These were among the designs created from the miniprotein library. Tests suggested that these miniproteins may be more potent than the most effective antibody treatments reported to date.

“Although extensive clinical testing is still needed, we believe the best of these computer-generated antivirals are quite promising,” says Dr. Longxing Cao, the study’s first author. “They appear to block SARS-CoV-2 infection at least as well as monoclonal antibodies but are much easier to produce and far more stable, potentially eliminating the need for refrigeration.”

Notably, this study demonstrates the potential of computational models to quickly respond to future viral threats. With further development, researchers may be able to generate neutralizing designs within weeks of obtaining the genome of a new virus.

—by Erin Bryant

Source: https://www.nih.gov/news-events/nih-research-matters/computer-designed-proteins-may-protect-against-coronavirus

Original article in Science

De novo design of picomolar SARS-CoV-2 miniprotein inhibitors

 

  1. View ORCID ProfileLongxing Cao1,2
  2. Inna Goreshnik1,2
  3. View ORCID ProfileBrian Coventry1,2,3
  4. View ORCID ProfileJames Brett Case4
  5. View ORCID ProfileLauren Miller1,2
  6. Lisa Kozodoy1,2
  7. Rita E. Chen4,5
  8. View ORCID ProfileLauren Carter1,2
  9. View ORCID ProfileAlexandra C. Walls1
  10. Young-Jun Park1
  11. View ORCID ProfileEva-Maria Strauch6
  12. View ORCID ProfileLance Stewart1,2
  13. View ORCID ProfileMichael S. Diamond4,7
  14. View ORCID ProfileDavid Veesler1
  15. View ORCID ProfileDavid Baker1,2,8,*

See all authors and affiliations

Science  09 Sep 2020:
eabd9909
DOI: 10.1126/science.abd9909

Abstract

Targeting the interaction between the SARS-CoV-2 Spike protein and the human ACE2 receptor is a promising therapeutic strategy. We designed inhibitors using two de novo design approaches. Computer generated scaffolds were either built around an ACE2 helix that interacts with the Spike receptor binding domain (RBD), or docked against the RBD to identify new binding modes, and their amino acid sequences designed to optimize target binding, folding and stability. Ten designs bound the RBD with affinities ranging from 100pM to 10nM, and blocked ARS-CoV-2 infection of Vero E6 cells with IC 50 values between 24 pM and 35 nM; The most potent, with new binding modes, are 56 and 64 residue proteins (IC 50 ~ 0.16 ng/ml). Cryo-electron microscopy structures of these minibinders in complex with the SARS-CoV-2 spike ectodomain trimer with all three RBDs bound are nearly identical to the computational models. These hyperstable minibinders provide starting points for SARS-CoV-2 therapeutics.

 

RESEARCH ARTICLE

De novo design of picomolar SARS-CoV-2 miniprotein inhibitors

  1. View ORCID ProfileLongxing Cao1,2
  2. Inna Goreshnik1,2
  3. View ORCID ProfileBrian Coventry1,2,3
  4. View ORCID ProfileJames Brett Case4
  5. View ORCID ProfileLauren Miller1,2
  6. Lisa Kozodoy1,2
  7. Rita E. Chen4,5
  8. View ORCID ProfileLauren Carter1,2
  9. View ORCID ProfileAlexandra C. Walls1
  10. Young-Jun Park1
  11. View ORCID ProfileEva-Maria Strauch6
  12. View ORCID ProfileLance Stewart1,2
  13. View ORCID ProfileMichael S. Diamond4,7
  14. View ORCID ProfileDavid Veesler1
  15. View ORCID ProfileDavid Baker1,2,8,*

See all authors and affiliations

Science  09 Sep 2020:
eabd9909
DOI: 10.1126/science.abd9909

Abstract

Targeting the interaction between the SARS-CoV-2 Spike protein and the human ACE2 receptor is a promising therapeutic strategy. We designed inhibitors using two de novo design approaches. Computer generated scaffolds were either built around an ACE2 helix that interacts with the Spike receptor binding domain (RBD), or docked against the RBD to identify new binding modes, and their amino acid sequences designed to optimize target binding, folding and stability. Ten designs bound the RBD with affinities ranging from 100pM to 10nM, and blocked ARS-CoV-2 infection of Vero E6 cells with IC 50 values between 24 pM and 35 nM; The most potent, with new binding modes, are 56 and 64 residue proteins (IC 50 ~ 0.16 ng/ml). Cryo-electron microscopy structures of these minibinders in complex with the SARS-CoV-2 spike ectodomain trimer with all three RBDs bound are nearly identical to the computational models. These hyperstable minibinders provide starting points for SARS-CoV-2 therapeutics.

 

SARS-CoV-2 infection generally begins in the nasal cavity, with virus replicating there for several days before spreading to the lower respiratory tract (1). Delivery of a high concentration of a viral inhibitor into the nose and into the respiratory system generally might therefore provide prophylactic protection and/or therapeutic benefit for treatment of early infection, and could be particularly useful for healthcare workers and others coming into frequent contact with infected individuals. A number of monoclonal antibodies are in development as systemic treatments for COVID-19 (26), but these proteins are not ideal for intranasal delivery as antibodies are large and often not extremely stable molecules and the density of binding sites is low (two per 150 KDa. antibody); antibody-dependent disease enhancement (79) is also a potential issue. High-affinity Spike protein binders that block the interaction with the human cellular receptor angiotensin-converting enzyme 2 (ACE2) (10) with enhanced stability and smaller sizes to maximize the density of inhibitory domains could have advantages over antibodies for direct delivery into the respiratory system through intranasal administration, nebulization or dry powder aerosol. We found previously that intranasal delivery of small proteins designed to bind tightly to the influenza hemagglutinin can provide both prophylactic and therapeutic protection in rodent models of lethal influenza infection (11).

Design strategy

We set out to design high-affinity protein minibinders to the SARS-CoV-2 Spike RBD that compete with ACE2 binding. We explored two strategies: first we incorporated the alpha-helix from ACE2 which makes the majority of the interactions with the RBD into small designed proteins that make additional interactions with the RBD to attain higher affinity (Fig. 1A). Second, we designed binders completely from scratch without relying on known RBD-binding interactions (Fig. 1B). An advantage of the second approach is that the range of possibilities for design is much larger, and so potentially a greater diversity of high-affinity binding modes can be identified. For the first approach, we used the Rosetta blueprint builder to generate miniproteins which incorporate the ACE2 helix (human ACE2 residues 23 to 46). For the second approach, we used RIF docking (12) and design using large miniprotein libraries (11) to generate binders to distinct regions of the RBD surface surrounding the ACE2 binding site (Fig. 1 and fig. S1).

 

 

 

 

 

 

 

 

 

 

 

Download high-res image

Fig. 1 Overview of the computational design approaches.

(A) Design of helical proteins incorporating ACE2 helix. (B) Large scale de novo design of small helical scaffolds (top) followed by rotamer interaction field (RIF) docking to identify shape and chemically complementary binding modes.

For full article please  go to Science at https://science.sciencemag.org/content/early/2020/09/08/science.abd9909

 

Read Full Post »

Live Notes, Real Time Conference Coverage 2020 AACR Virtual Meeting April 28, 2020 Symposium: New Drugs on the Horizon Part 3 12:30-1:25 PM

Reporter: Stephen J. Williams, PhD

New Drugs on the Horizon: Part 3
Introduction

Andrew J. Phillips, C4 Therapeutics

  • symposium brought by AACR CICR and had about 30 proposals for talks and chose three talks
  • unfortunately the networking event is not possible but hope to see you soon in good health

ABBV-184: A novel survivin specific T cell receptor/CD3 bispecific therapeutic that targets both solid tumor and hematological malignancies

Edward B Reilly
AbbVie Inc. @abbvie

  • T-cell receptors (TCR) can recognize the intracellular targets whereas antibodies only recognize the 25% of potential extracellular targets
  • survivin is expressed in multiple cancers and correlates with poor survival and prognosis
  • CD3 bispecific TCR to survivn (Ab to CD3 on T- cells and TCR to survivin on cancer cells presented in MHC Class A3)
  • ABBV184  effective in vivo in lung cancer models as single agent;
  • in humanized mouse tumor models CD3/survivin bispecific can recruit T cells into solid tumors; multiple immune cells CD4 and CD8 positive T cells were found to infiltrate into tumor
  • therapeutic window as measured by cytokine release assays in tumor vs. normal cells very wide (>25 fold)
  • ABBV184 does not bind platelets and has good in vivo safety profile
  • First- in human dose determination trial: used in vitro cancer cell assays to determine 1st human dose
  • looking at AML and lung cancer indications
  • phase 1 trial is underway for safety and efficacy and determine phase 2 dose
  • survivin has very few mutations so they are not worried about a changing epitope of their target TCR peptide of choice

The discovery of TNO155: A first in class SHP2 inhibitor

Matthew J. LaMarche
Novartis @Novartis

  • SHP2 is an intracellular phosphatase that is upstream of MEK ERK pathway; has an SH2 domain and PTP domain
  • knockdown of SHP2 inhibits tumor growth and colony formation in soft agar
  • 55 TKIs there are very little phosphatase inhibitors; difficult to target the active catalytic site; inhibitors can be oxidized at the active site; so they tried to target the two domains and developed an allosteric inhibitor at binding site where three domains come together and stabilize it
  • they produced a number of chemical scaffolds that would bind and stabilize this allosteric site
  • block the redox reaction by blocking the cysteine in the binding site
  • lead compound had phototoxicity; used SAR analysis to improve affinity and reduce phototox effects
  • was very difficult to balance efficacy, binding properties, and tox by adjusting stuctures
  • TNO155 is their lead into trials
  • SHP2 expressed in T cells and they find good combo with I/O with uptick of CD8 cells
  • TNO155 is very selective no SHP1 inhibition; SHP2 can autoinhibit itself when three domains come together and stabilize; no cross reactivity with other phosphatases
  • they screened 1.5 million compounds and got low hit rate so that is why they needed to chemically engineer and improve on the classes they found as near hits

Closing Remarks

 

Xiaojing Wang
Genentech, Inc. @genentech

Follow on Twitter at:

@pharma_BI

@AACR

@CureCancerNow

@pharmanews

@BiotechWorld

@HopkinsMedicine

#AACR20

Read Full Post »

A Compendium of Coronavirus Must Reads from AAAS journal Science

Curator: Stephen J. Williams, PhD

How does coronavirus kill? Clinicians trace a ferocious rampage through the body, from brain to toes

 

An invader’s impact

In serious cases, SARS-CoV-2 lands in the lungs and can do deep damage there. But the virus, or the body’s response to it, can injure many other organs. Scientists are just beginning to probe the scope and nature of that harm.
8256734WindpipeBile ductBronchiiImmune cellsCapillaryBlood vesselEndothelial cellACE2SARS-CoV-2SARS-CoV-2ClotMucus12 LiverUp to half of hospitalized patients have enzyme levels that signal a struggling liver. An immune system in overdrive and drugs given to fight the virus may be causing the damage.7 NoseSome patients lose their sense of smell. Scientists speculate that the virus may move up the nose’s nerve endings and damage cells.6 EyesConjunctivitis, inflammation of the membrane that lines the front of the eye and inner eyelid, is more common in the sickest patients.3 KidneysKidney damage is common in severe cases and makes death more likely. The virus may attack the kidneys directly, or kidney failure may be part of whole-body events like plummeting blood pressure.4 IntestinesPatient reports and biopsy data suggest the virus can infect the lower gastrointestinal tract, which is rich in ACE2 receptors. Some 20% or more of patients have diarrhea.1 LungsA cross section shows immune cells crowding an inflamed alveolus, whose walls break down during attack by the virus, diminishing oxygen uptake. Patients cough, fevers rise, and it takes more and more effort to breathe.8 Heart and blood vesselsThe virus (green) enters cells, likely including those lining blood vessels, by binding to ACE2 receptors on the cell surface. Infection can also promote blood clots, heart attacks, and cardiac inflammation.5 BrainSome COVID-19 patients have strokes, seizures, mental confusion, and brain inflammation. Doctors are trying to understand which are directly caused by the virus.
V. ALTOUNIAN/SCIENCE

Some clinicians suspect the driving force in many gravely ill patients’ downhill trajectories is a disastrous overreaction of the immune system known as a “cytokine storm,” which other viral infections are known to trigger. Cytokines are chemical signaling molecules that guide a healthy immune response; but in a cytokine storm, levels of certain cytokines soar far beyond what’s needed, and immune cells start to attack healthy tissues. Blood vessels leak, blood pressure drops, clots form, and catastrophic organ failure can ensue.

Read Full Post »

AAAS Science Podcast: Why some diseases are seasonal and some are not: Coronaviruses and more

Reporter: Stephen J. Williams, PhD

 

The following podcast from the American Association for Advancement of Science (AAAS) discusses the seasonality of some viruses while other viruses are able to manifest themselves in different seasons over the globe.

Please Play

https://play.google.com/music/m/Da3pxbfyuykjy3r7xe5rprupmdq?t=Why_some_diseases_come_and_go_with_the_seasons_and_how_to_develop_smarter_safer_chemicals-Science_Ma

For more articles on COVID19 and SARS-CoV-2 on this Open Access Online Journal please see

Coronavirus SARS-CoV-2 Portal

 

Read Full Post »

Diversity and Health Disparity Issues Need to be Addressed for GWAS and Precision Medicine Studies

Curator: Stephen J. Williams, PhD

 

 

From the POLICY FORUM ETHICS AND DIVERSITY Section of Science

Ethics of inclusion: Cultivate trust in precision medicine

 See all authors and affiliations

Science  07 Jun 2019:
Vol. 364, Issue 6444, pp. 941-942
DOI: 10.1126/science.aaw8299

Precision medicine is at a crossroads. Progress toward its central goal, to address persistent health inequities, will depend on enrolling populations in research that have been historically underrepresented, thus eliminating longstanding exclusions from such research (1). Yet the history of ethical violations related to protocols for inclusion in biomedical research, as well as the continued misuse of research results (such as white nationalists looking to genetic ancestry to support claims of racial superiority), continue to engender mistrust among these populations (2). For precision medicine research (PMR) to achieve its goal, all people must believe that there is value in providing information about themselves and their families, and that their participation will translate into equitable distribution of benefits. This requires an ethics of inclusion that considers what constitutes inclusive practices in PMR, what goals and values are being furthered through efforts to enhance diversity, and who participates in adjudicating these questions. The early stages of PMR offer a critical window in which to intervene before research practices and their consequences become locked in (3).

Initiatives such as the All of Us program have set out to collect and analyze health information and biological samples from millions of people (1). At the same time, questions of trust in biomedical research persist. For example, although the recent assertions of white nationalists were eventually denounced by the American Society of Human Genetics (4), the misuse of ancestry testing may have already undermined public trust in genetic research.

There are also infamous failures in research that included historically underrepresented groups, including practices of deceit, as in the Tuskegee Syphilis Study, or the misuse of samples, as with the Havasupai tribe (5). Many people who are being asked to give their data and samples for PMR must not only reconcile such past research abuses, but also weigh future risks of potential misuse of their data.

To help assuage these concerns, ongoing PMR studies should open themselves up to research, conducted by social scientists and ethicists, that examines how their approaches enhance diversity and inclusion. Empirical studies are needed to account for how diversity is conceptualized and how goals of inclusion are operationalized throughout the life course of PMR studies. This is not limited to selection and recruitment of populations but extends to efforts to engage participants and communities, through data collection and measurement, and interpretations and applications of study findings. A commitment to transparency is an important step toward cultivating public trust in PMR’s mission and practices.

From Inclusion to Inclusive

The lack of diverse representation in precision medicine and other biomedical research is a well-known problem. For example, rare genetic variants may be overlooked—or their association with common, complex diseases can be misinterpreted—as a result of sampling bias in genetics research (6). Concentrating research efforts on samples with largely European ancestry has limited the ability of scientists to make generalizable inferences about the relationships among genes, lifestyle, environmental exposures, and disease risks, and thereby threatens the equitable translation of PMR for broad public health benefit (7).

However, recruiting for diverse research participation alone is not enough. As with any push for “diversity,” related questions arise about how to describe, define, measure, compare, and explain inferred similarities and differences among individuals and groups (8). In the face of ambivalence about how to represent population variation, there is ample evidence that researchers resort to using definitions of diversity that are heterogeneous, inconsistent, and sometimes competing (9). Varying approaches are not inherently problematic; depending on the scientific question, some measures may be more theoretically justified than others and, in many cases, a combination of measures can be leveraged to offer greater insight (10). For example, studies have shown that American adults who do not self-identify as white report better mental and physical health if they think others perceive them as white (1112).

The benefit of using multiple measures of race and ancestry also extends to genetic studies. In a study of hypertension in Puerto Rico, not only did classifications based on skin color and socioeconomic status better predict blood pressure than genetic ancestry, the inclusion of these sociocultural measures also revealed an association between a genetic polymorphism and hypertension that was otherwise hidden (13). Thus, practices that allow for a diversity of measurement approaches, when accompanied by a commitment to transparency about the rationales for chosen approaches, are likely to benefit PMR research more than striving for a single gold standard that would apply across all studies. These definitional and measurement issues are not merely semantic. They also are socially consequential to broader perceptions of PMR research and the potential to achieve its goals of inclusion.

Study Practices, Improve Outcomes

Given the uncertainty and complexities of the current, early phase of PMR, the time is ripe for empirical studies that enable assessment and modulation of research practices and scientific priorities in light of their social and ethical implications. Studying ongoing scientific practices in real time can help to anticipate unintended consequences that would limit researchers’ ability to meet diversity recruitment goals, address both social and biological causes of health disparities, and distribute the benefits of PMR equitably. We suggest at least two areas for empirical attention and potential intervention.

First, we need to understand how “upstream” decisions about how to characterize study populations and exposures influence “downstream” research findings of what are deemed causal factors. For example, when precision medicine researchers rely on self-identification with U.S. Census categories to characterize race and ethnicity, this tends to circumscribe their investigation of potential gene-environment interactions that may affect health. The convenience and routine nature of Census categories seemed to lead scientists to infer that the reasons for differences among groups were self-evident and required no additional exploration (9). The ripple effects of initial study design decisions go beyond issues of recruitment to shape other facets of research across the life course of a project, from community engagement and the return of results to the interpretation of study findings for human health.

Second, PMR studies are situated within an ecosystem of funding agencies, regulatory bodies, disciplines, and other scholars. This partly explains the use of varied terminology, different conceptual understandings and interpretations of research questions, and heterogeneous goals for inclusion. It also makes it important to explore how expectations related to funding and regulation influence research definitions of diversity and benchmarks for inclusion.

For example, who defines a diverse study population, and how might those definitions vary across different institutional actors? Who determines the metrics that constitute successful inclusion, and why? Within a research consortium, how are expectations for data sharing and harmonization reconciled with individual studies’ goals for recruitment and analysis? In complex research fields that include multiple investigators, organizations, and agendas, how are heterogeneous, perhaps even competing, priorities negotiated? To date, no studies have addressed these questions or investigated how decisions facilitate, or compromise, goals of diversity and inclusion.

The life course of individual studies and the ecosystems in which they reside cannot be easily separated and therefore must be studied in parallel to understand how meanings of diversity are shaped and how goals of inclusion are pursued. Empirically “studying the studies” will also be instrumental in creating mechanisms for transparency about how PMR is conducted and how trade-offs among competing goals are resolved. Establishing open lines of inquiry that study upstream practices may allow researchers to anticipate and address downstream decisions about how results can be interpreted and should be communicated, with a particular eye toward the consequences for communities recruited to augment diversity. Understanding how scientists negotiate the challenges and barriers to achieving diversity that go beyond fulfilling recruitment numbers is a critical step toward promoting meaningful inclusion in PMR.

Transparent Reflection, Cultivation of Trust

Emerging research on public perceptions of PMR suggests that although there is general support, questions of trust loom large. What we learn from studies that examine on-the-ground approaches aimed at enhancing diversity and inclusion, and how the research community reflects and responds with improvements in practices as needed, will play a key role in building a culture of openness that is critical for cultivating public trust.

Cultivating long-term, trusting relationships with participants underrepresented in biomedical research has been linked to a broad range of research practices. Some of these include the willingness of researchers to (i) address the effect of history and experience on marginalized groups’ trust in researchers and clinicians; (ii) engage concerns about potential group harms and risks of stigmatization and discrimination; (iii) develop relationships with participants and communities that are characterized by transparency, clear communication, and mutual commitment; and (iv) integrate participants’ values and expectations of responsible oversight beyond initial informed consent (14). These findings underscore the importance of multidisciplinary teams that include social scientists, ethicists, and policy-makers, who can identify and help to implement practices that respect the histories and concerns of diverse publics.

A commitment to an ethics of inclusion begins with a recognition that risks from the misuse of genetic and biomedical research are unevenly distributed. History makes plain that a multitude of research practices ranging from unnecessarily limited study populations and taken-for-granted data collection procedures to analytic and interpretive missteps can unintentionally bolster claims of racial superiority or inferiority and provoke group harm (15). Sustained commitment to transparency about the goals, limits, and potential uses of research is key to further cultivating trust and building long-term research relationships with populations underrepresented in biomedical studies.

As calls for increasing diversity and inclusion in PMR grow, funding and organizational pathways must be developed that integrate empirical studies of scientific practices and their rationales to determine how goals of inclusion and equity are being addressed and to identify where reform is required. In-depth, multidisciplinary empirical investigations of how diversity is defined, operationalized, and implemented can provide important insights and lessons learned for guiding emerging science, and in so doing, meet our ethical obligations to ensure transparency and meaningful inclusion.

References and Notes

  1. C. P. Jones et al Ethn. Dis. 18496 (2008).
  2. C. C. GravleeA. L. NonC. J. Mulligan
  3. S. A. Kraft et al Am. J. Bioeth. 183 (2018).
  4. A. E. Shields et al Am. Psychol. 6077 (2005).

Read Full Post »

Older Posts »

%d