Funding, Deals & Partnerships: BIOLOGICS & MEDICAL DEVICES; BioMed e-Series; Medicine and Life Sciences Scientific Journal – http://PharmaceuticalIntelligence.com
Reporter and Original Article Co-Author: Amandeep Kaur, B.Sc. , M.Sc.
Abstract Since its inception in late 2019, SARS-CoV-2 has evolved resulting in emergence of various variants in different countries. These variants have spread worldwide resulting in devastating second wave of COVID-19 pandemic in many countries including India since the beginning of 2021. To control this pandemic continuous mutational surveillance and genomic epidemiology of circulating strains is very important. In this study, we performed mutational analysis of the protein coding genes of SARS-CoV-2 strains (n=2000) collected during January 2021 to March 2021. Our data revealed the emergence of a new variant in West Bengal, India, which is characterized by the presence of 11 co-existing mutations including D614G, P681H and V1230L in S-glycoprotein. This new variant was identified in 70 out of 412 sequences submitted from West Bengal. Interestingly, among these 70 sequences, 16 sequences also harbored E484K in the S glycoprotein. Phylogenetic analysis revealed strains of this new variant emerged from GR clade (B.1.1) and formed a new cluster. We propose to name this variant as GRL or lineage B.1.1/S:V1230L due to the presence of V1230L in S glycoprotein along with GR clade specific mutations. Co-occurrence of P681H, previously observed in UK variant, and E484K, previously observed in South African variant and California variant, demonstrates the convergent evolution of SARS-CoV-2 mutation. V1230L, present within the transmembrane domain of S2 subunit of S glycoprotein, has not yet been reported from any country. Substitution of valine with more hydrophobic amino acid leucine at position 1230 of the transmembrane domain, having role in S protein binding to the viral envelope, could strengthen the interaction of S protein with the viral envelope and also increase the deposition of S protein to the viral envelope, and thus positively regulate virus infection. P618H and E484K mutation have already been demonstrated in favor of increased infectivity and immune invasion respectively. Therefore, the new variant having G614G, P618H, P1230L and E484K is expected to have better infectivity, transmissibility and immune invasion characteristics, which may pose additional threat along with B.1.617 in the ongoing COVID-19 pandemic in India.
Study: Emergence of a new SARS-CoV-2 variant from GR clade with a novel S glycoprotein mutation V1230L in West Bengal, India
Systems Biology analysis of Transcription Networks, Artificial Intelligence, and High-End Computing Coming to Fruition in Personalized Oncology
Curator: Stephen J. Williams, Ph.D.
In the June 2020 issue of the journal Science, writer Roxanne Khamsi has an interesting article “Computing Cancer’s Weak Spots; An algorithm to unmask tumors’ molecular linchpins is tested in patients”[1], describing some early successes in the incorporation of cancer genome sequencing in conjunction with artificial intelligence algorithms toward a personalized clinical treatment decision for various tumor types. In 2016, oncologists Amy Tiersten collaborated with systems biologist Andrea Califano and cell biologist Jose Silva at Mount Sinai Hospital to develop a systems biology approach to determine that the drug ruxolitinib, a STAT3 inhibitor, would be effective for one of her patient’s aggressively recurring, Herceptin-resistant breast tumor. Dr. Califano, instead of defining networks of driver mutations, focused on identifying a few transcription factors that act as ‘linchpins’ or master controllers of transcriptional networks withing tumor cells, and in doing so hoping to, in essence, ‘bottleneck’ the transcriptional machinery of potential oncogenic products. As Dr. Castilano states
“targeting those master regulators and you will stop cancer in its tracks, no matter what mutation initially caused it.”
It is important to note that this approach also relies on the ability to sequence tumors by RNA-seq to determine the underlying mutations which alter which master regulators are pertinent in any one tumor. And given the wide tumor heterogeneity in tumor samples, this sequencing effort may have to involve multiple biopsies (as discussed in earlier posts on tumor heterogeneity in renal cancer).
As stated in the article, Califano co-founded a company called Darwin-Health in 2015 to guide doctors by identifying the key transcription factors in a patient’s tumor and suggesting personalized therapeutics to those identified molecular targets (OncoTarget™). He had collaborated with the Jackson Laboratory and most recently Columbia University to conduct a $15 million 3000 patient clinical trial. This was a bit of a stretch from his initial training as a physicist and, in 1986, IBM hired him for some artificial intelligence projects. He then landed in 2003 at Columbia and has been working on identifying these transcriptional nodes that govern cancer survival and tumorigenicity. Dr. Califano had figured that the number of genetic mutations which potentially could be drivers were too vast:
A 2018 study which analyzed more than 9000 tumor samples reported over 1.5 million mutations[2]
and impossible to develop therapeutics against. He reasoned that you would just have to identify the common connections between these pathways or transcriptional nodes and termed them master regulators.
A Pan-Cancer Analysis of Enhancer Expression in Nearly 9000 Patient Samples
Chen H, Li C, Peng X, et al. Cell. 2018;173(2):386-399.e12.
Abstract
The role of enhancers, a key class of non-coding regulatory DNA elements, in cancer development has increasingly been appreciated. Here, we present the detection and characterization of a large number of expressed enhancers in a genome-wide analysis of 8928 tumor samples across 33 cancer types using TCGA RNA-seq data. Compared with matched normal tissues, global enhancer activation was observed in most cancers. Across cancer types, global enhancer activity was positively associated with aneuploidy, but not mutation load, suggesting a hypothesis centered on “chromatin-state” to explain their interplay. Integrating eQTL, mRNA co-expression, and Hi-C data analysis, we developed a computational method to infer causal enhancer-gene interactions, revealing enhancers of clinically actionable genes. Having identified an enhancer ∼140 kb downstream of PD-L1, a major immunotherapy target, we validated it experimentally. This study provides a systematic view of enhancer activity in diverse tumor contexts and suggests the clinical implications of enhancers.
A diagram of how concentrating on these transcriptional linchpins or nodes may be more therapeutically advantageous as only one pharmacologic agent is needed versus multiple agents to inhibit the various upstream pathways:
VIPER Algorithm (Virtual Inference of Protein activity by Enriched Regulon Analysis)
The algorithm that Califano and DarwinHealth developed is a systems biology approach using a tumor’s RNASeq data to determine controlling nodes of transcription. They have recently used the VIPER algorithm to look at RNA-Seq data from more than 10,000 tumor samples from TCGA and identified 407 transcription factor genes that acted as these linchpins across all tumor types. Only 20 to 25 of them were implicated in just one tumor type so these potential nodes are common in many forms of cancer.
Other institutions like the Cold Spring Harbor Laboratories have been using VIPER in their patient tumor analysis. Linchpins for other tumor types have been found. For instance, VIPER identified transcription factors IKZF1 and IKF3 as linchpins in multiple myeloma. But currently approved therapeutics are hard to come by for targets with are transcription factors, as most pharma has concentrated on inhibiting an easier target like kinases and their associated activity. In general, developing transcription factor inhibitors in more difficult an undertaking for multiple reasons.
Identifying the multiple dysregulated oncoproteins that contribute to tumorigenesis in a given patient is crucial for developing personalized treatment plans. However, accurate inference of aberrant protein activity in biological samples is still challenging as genetic alterations are only partially predictive and direct measurements of protein activity are generally not feasible. To address this problem we introduce and experimentally validate a new algorithm, VIPER (Virtual Inference of Protein-activity by Enriched Regulon analysis), for the accurate assessment of protein activity from gene expression data. We use VIPER to evaluate the functional relevance of genetic alterations in regulatory proteins across all TCGA samples. In addition to accurately inferring aberrant protein activity induced by established mutations, we also identify a significant fraction of tumors with aberrant activity of druggable oncoproteins—despite a lack of mutations, and vice-versa. In vitro assays confirmed that VIPER-inferred protein activity outperforms mutational analysis in predicting sensitivity to targeted inhibitors.
Schematic overview of the VIPER algorithm From: Alvarez MJ, Shen Y, Giorgi FM, Lachmann A, Ding BB, Ye BH, Califano A: Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nature genetics 2016, 48(8):838-847.
(a) Molecular layers profiled by different technologies. Transcriptomics measures steady-state mRNA levels; Proteomics quantifies protein levels, including some defined post-translational isoforms; VIPER infers protein activity based on the protein’s regulon, reflecting the abundance of the active protein isoform, including post-translational modifications, proper subcellular localization and interaction with co-factors. (b) Representation of VIPER workflow. A regulatory model is generated from ARACNe-inferred context-specific interactome and Mode of Regulation computed from the correlation between regulator and target genes. Single-sample gene expression signatures are computed from genome-wide expression data, and transformed into regulatory protein activity profiles by the aREA algorithm. (c) Three possible scenarios for the aREA analysis, including increased, decreased or no change in protein activity. The gene expression signature and its absolute value (|GES|) are indicated by color scale bars, induced and repressed target genes according to the regulatory model are indicated by blue and red vertical lines. (d) Pleiotropy Correction is performed by evaluating whether the enrichment of a given regulon (R4) is driven by genes co-regulated by a second regulator (R4∩R1). (e) Benchmark results for VIPER analysis based on multiple-samples gene expression signatures (msVIPER) and single-sample gene expression signatures (VIPER). Boxplots show the accuracy (relative rank for the silenced protein), and the specificity (fraction of proteins inferred as differentially active at p < 0.05) for the 6 benchmark experiments (see Table 2). Different colors indicate different implementations of the aREA algorithm, including 2-tail (2T) and 3-tail (3T), Interaction Confidence (IC) and Pleiotropy Correction (PC).
Other articles from Andrea Califano on VIPER algorithm in cancer include:
Echeverria GV, Ge Z, Seth S, Zhang X, Jeter-Jones S, Zhou X, Cai S, Tu Y, McCoy A, Peoples M, Sun Y, Qiu H, Chang Q, Bristow C, Carugo A, Shao J, Ma X, Harris A, Mundi P, Lau R, Ramamoorthy V, Wu Y, Alvarez MJ, Califano A, Moulder SL, Symmans WF, Marszalek JR, Heffernan TP, Chang JT, Piwnica-Worms H.Sci Transl Med. 2019 Apr 17;11(488):eaav0936. doi: 10.1126/scitranslmed.aav0936.PMID: 30996079
Chen H, Li C, Peng X, Zhou Z, Weinstein JN, Liang H: A Pan-Cancer Analysis of Enhancer Expression in Nearly 9000 Patient Samples. Cell 2018, 173(2):386-399 e312.
Alvarez MJ, Shen Y, Giorgi FM, Lachmann A, Ding BB, Ye BH, Califano A: Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nature genetics 2016, 48(8):838-847.
Other articles of Note on this Open Access Online Journal Include:
Bioinformatic Tools for Cancer Mutational Analysis: COSMIC and Beyond, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 1: Next Generation Sequencing (NGS)
Bioinformatic Tools for Cancer Mutational Analysis: COSMIC and Beyond
Curator: Stephen J. Williams, Ph.D.
Updated 7/26/2019
Updated 04/27/2019
Signatures of Mutational Processes in Human Cancer (from COSMIC)
The genomic landscape of cancer. The COSMIC database has a fully curated and annotated database of recurrent genetic mutations founds in various cancers (data taken form cancer sequencing projects). For interactive map please go to the COSMIC database here: http://cancer.sanger.ac.uk/cosmic
Somatic mutations are present in all cells of the human body and occur throughout life. They are the consequence of multiple mutational processes, including the intrinsic slight infidelity of the DNA replication machinery, exogenous or endogenous mutagen exposures, enzymatic modification of DNA and defective DNA repair. Different mutational processes generate unique combinations of mutation types, termed “Mutational Signatures”.
The current set of mutational signatures is based on an analysis of 10,952 exomes and 1,048 whole-genomes across 40 distinct types of human cancer. These analyses are based on curated data that were generated by The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), and a large set of freely available somatic mutations published in peer-reviewed journals. Complete details about the data sources will be provided in future releases of COSMIC.
The profile of each signature is displayed using the six substitution subtypes: C>A, C>G, C>T, T>A, T>C, and T>G (all substitutions are referred to by the pyrimidine of the mutated Watson–Crick base pair). Further, each of the substitutions is examined by incorporating information on the bases immediately 5’ and 3’ to each mutated base generating 96 possible mutation types (6 types of substitution ∗ 4 types of 5’ base ∗ 4 types of 3’ base). Mutational signatures are displayed and reported based on the observed trinucleotide frequency of the human genome, i.e., representing the relative proportions of mutations generated by each signature based on the actual trinucleotide frequencies of the reference human genome version GRCh37. Note that only validated mutational signatures have been included in the curated census of mutational signatures.
Additional information is provided for each signature, including the cancer types in which the signature has been found, proposed aetiology for the mutational processes underlying the signature, other mutational features that are associated with each signature and information that may be relevant for better understanding of a particular mutational signature.
The set of signatures will be updated in the future. This will include incorporating additional mutation types (e.g., indels, structural rearrangements, and localized hypermutation such as kataegis) and cancer samples. With more cancer genome sequences and the additional statistical power this will bring, new signatures may be found, the profiles of current signatures may be further refined, signatures may split into component signatures and signatures
COSMIC v75 includes curations across GRIN2A, fusion pair TCF3-PBX1, and genomic data from 17 systematic screen publications. We are also beginning a reannotation of TCGA exome datasets using Sanger’s Cancer Genome Project analyis pipeline to ensure consistency; four studies are included in this release, to be expanded across the next few releases. The Cancer Gene Census now has a dedicated curator, Dr. Zbyslaw Sondka, who will be focused on expanding the Census, enhancing the evidence underpinning it, and developing improved expert-curated detail describing each gene’s impact in cancer. Finally, as we begin to streamline our ever-growing website, we have combined all information for each gene onto one page and simplified the layout and design to improve navigation
may be found in cancer types in which they are currently not detected.
Signature 1 has been found in all cancer types and in most cancer samples.
Proposed aetiology:
Signature 1 is the result of an endogenous mutational process initiated by spontaneous deamination of 5-methylcytosine.
Additional mutational features:
Signature 1 is associated with small numbers of small insertions and deletions in most tissue types.
Comments:
The number of Signature 1 mutations correlates with age of cancer diagnosis.
Signature 2
Cancer types:
Signature 2 has been found in 22 cancer types, but most commonly in cervical and bladder cancers. In most of these 22 cancer types, Signature 2 is present in at least 10% of samples.
Proposed aetiology:
Signature 2 has been attributed to activity of the AID/APOBEC family of cytidine deaminases. On the basis of similarities in the sequence context of cytosine mutations caused by APOBEC enzymes in experimental systems, a role for APOBEC1, APOBEC3A and/or APOBEC3B in human cancer appears more likely than for other members of the family.
Additional mutational features:
Transcriptional strand bias of mutations has been observed in exons, but is not present or is weaker in introns.
Comments:
Signature 2 is usually found in the same samples as Signature 13. It has been proposed that activation of AID/APOBEC cytidine deaminases is due to viral infection, retrotransposon jumping or to tissue inflammation. Currently, there is limited evidence to support these hypotheses. A germline deletion polymorphism involving APOBEC3A and APOBEC3B is associated with the presence of large numbers of Signature 2 and 13 mutations and with predisposition to breast cancer. Mutations of similar patterns to Signatures 2 and 13 are commonly found in the phenomenon of local hypermutation present in some cancers, known as kataegis, potentially implicating AID/APOBEC enzymes in this process as well.
Signature 3
Cancer types:
Signature 3 has been found in breast, ovarian, and pancreatic cancers.
Proposed aetiology:
Signature 3 is associated with failure of DNA double-strand break-repair by homologous recombination.
Additional mutational features:
Signature 3 associates strongly with elevated numbers of large (longer than 3bp) insertions and deletions with overlapping microhomology at breakpoint junctions.
Comments:
Signature 3 is strongly associated with germline and somatic BRCA1 and BRCA2 mutations in breast, pancreatic, and ovarian cancers. In pancreatic cancer, responders to platinum therapy usually exhibit Signature 3 mutations.
Signature 4
Cancer types:
Signature 4 has been found in head and neck cancer, liver cancer, lung adenocarcinoma, lung squamous carcinoma, small cell lung carcinoma, and oesophageal cancer.
Proposed aetiology:
Signature 4 is associated with smoking and its profile is similar to the mutational pattern observed in experimental systems exposed to tobacco carcinogens (e.g., benzo[a]pyrene). Signature 4 is likely due to tobacco mutagens.
Additional mutational features:
Signature 4 exhibits transcriptional strand bias for C>A mutations, compatible with the notion that damage to guanine is repaired by transcription-coupled nucleotide excision repair. Signature 4 is also associated with CC>AA dinucleotide substitutions.
Comments:
Signature 29 is found in cancers associated with tobacco chewing and appears different from Signature 4.
Signature 5
Cancer types:
Signature 5 has been found in all cancer types and most cancer samples.
Proposed aetiology:
The aetiology of Signature 5 is unknown.
Additional mutational features:
Signature 5 exhibits transcriptional strand bias for T>C substitutions at ApTpN context.
Comments:
Signature 6
Cancer types:
Signature 6 has been found in 17 cancer types and is most common in colorectal and uterine cancers. In most other cancer types, Signature 6 is found in less than 3% of examined samples.
Proposed aetiology:
Signature 6 is associated with defective DNA mismatch repair and is found in microsatellite unstable tumours.
Additional mutational features:
Signature 6 is associated with high numbers of small (shorter than 3bp) insertions and deletions at mono/polynucleotide repeats.
Comments:
Signature 6 is one of four mutational signatures associated with defective DNA mismatch repair and is often found in the same samples as Signatures 15, 20, and 26.
Signature 7
Cancer types:
Signature 7 has been found predominantly in skin cancers and in cancers of the lip categorized as head and neck or oral squamous cancers.
Proposed aetiology:
Based on its prevalence in ultraviolet exposed areas and the similarity of the mutational pattern to that observed in experimental systems exposed to ultraviolet light Signature 7 is likely due to ultraviolet light exposure.
Additional mutational features:
Signature 7 is associated with large numbers of CC>TT dinucleotide mutations at dipyrimidines. Additionally, Signature 7 exhibits a strong transcriptional strand-bias indicating that mutations occur at pyrimidines (viz., by formation of pyrimidine-pyrimidine photodimers) and these mutations are being repaired by transcription-coupled nucleotide excision repair.
Comments:
Signature 8
Cancer types:
Signature 8 has been found in breast cancer and medulloblastoma.
Proposed aetiology:
The aetiology of Signature 8 remains unknown.
Additional mutational features:
Signature 8 exhibits weak strand bias for C>A substitutions and is associated with double nucleotide substitutions, notably CC>AA.
Comments:
Signature 9
Cancer types:
Signature 9 has been found in chronic lymphocytic leukaemias and malignant B-cell lymphomas.
Proposed aetiology:
Signature 9 is characterized by a pattern of mutations that has been attributed to polymerase η, which is implicated with the activity of AID during somatic hypermutation.
Additional mutational features:
Comments:
Chronic lymphocytic leukaemias that possess immunoglobulin gene hypermutation (IGHV-mutated) have elevated numbers of mutations attributed to Signature 9 compared to those that do not have immunoglobulin gene hypermutation.
Signature 10
Cancer types:
Signature 10 has been found in six cancer types, notably colorectal and uterine cancer, usually generating huge numbers of mutations in small subsets of samples.
Proposed aetiology:
It has been proposed that the mutational process underlying this signature is altered activity of the error-prone polymerase POLE. The presence of large numbers of Signature 10 mutations is associated with recurrent POLE somatic mutations, viz., Pro286Arg and Val411Leu.
Additional mutational features:
Signature 10 exhibits strand bias for C>A mutations at TpCpT context and T>G mutations at TpTpT context.
Comments:
Signature 10 is associated with some of most mutated cancer samples. Samples exhibiting this mutational signature have been termed ultra-hypermutators.
Signature 11
Cancer types:
Signature 11 has been found in melanoma and glioblastoma.
Proposed aetiology:
Signature 11 exhibits a mutational pattern resembling that of alkylating agents. Patient histories have revealed an association between treatments with the alkylating agent temozolomide and Signature 11 mutations.
Additional mutational features:
Signature 11 exhibits a strong transcriptional strand-bias for C>T substitutions indicating that mutations occur on guanine and that these mutations are effectively repaired by transcription-coupled nucleotide excision repair.
Comments:
Signature 12
Cancer types:
Signature 12 has been found in liver cancer.
Proposed aetiology:
The aetiology of Signature 12 remains unknown.
Additional mutational features:
Signature 12 exhibits a strong transcriptional strand-bias for T>C substitutions.
Comments:
Signature 12 usually contributes a small percentage (<20%) of the mutations observed in a liver cancer sample.
Signature 13
Cancer types:
Signature 13 has been found in 22 cancer types and seems to be commonest in cervical and bladder cancers. In most of these 22 cancer types, Signature 13 is present in at least 10% of samples.
Proposed aetiology:
Signature 13 has been attributed to activity of the AID/APOBEC family of cytidine deaminases converting cytosine to uracil. On the basis of similarities in the sequence context of cytosine mutations caused by APOBEC enzymes in experimental systems, a role for APOBEC1, APOBEC3A and/or APOBEC3B in human cancer appears more likely than for other members of the family. Signature 13 causes predominantly C>G mutations. This may be due to generation of abasic sites after removal of uracil by base excision repair and replication over these abasic sites by REV1.
Additional mutational features:
Transcriptional strand bias of mutations has been observed in exons, but is not present or is weaker in introns.
Comments:
Signature 2 is usually found in the same samples as Signature 13. It has been proposed that activation of AID/APOBEC cytidine deaminases is due to viral infection, retrotransposon jumping or to tissue inflammation. Currently, there is limited evidence to support these hypotheses. A germline deletion polymorphism involving APOBEC3A and APOBEC3B is associated with the presence of large numbers of Signature 2 and 13 mutations and with predisposition to breast cancer. Mutations of similar patterns to Signatures 2 and 13 are commonly found in the phenomenon of local hypermutation present in some cancers, known as kataegis, potentially implicating AID/APOBEC enzymes in this process as well.
Signature 14
Cancer types:
Signature 14 has been observed in four uterine cancers and a single adult low-grade glioma sample.
Proposed aetiology:
The aetiology of Signature 14 remains unknown.
Additional mutational features:
Comments:
Signature 14 generates very high numbers of somatic mutations (>200 mutations per MB) in all samples in which it has been observed.
Signature 15
Cancer types:
Signature 15 has been found in several stomach cancers and a single small cell lung carcinoma.
Proposed aetiology:
Signature 15 is associated with defective DNA mismatch repair.
Additional mutational features:
Signature 15 is associated with high numbers of small (shorter than 3bp) insertions and deletions at mono/polynucleotide repeats.
Comments:
Signature 15 is one of four mutational signatures associated with defective DNA mismatch repair and is often found in the same samples as Signatures 6, 20, and 26.
Signature 16
Cancer types:
Signature 16 has been found in liver cancer.
Proposed aetiology:
The aetiology of Signature 16 remains unknown.
Additional mutational features:
Signature 16 exhibits an extremely strong transcriptional strand bias for T>C mutations at ApTpN context, with T>C mutations occurring almost exclusively on the transcribed strand.
Comments:
Signature 17
Cancer types:
Signature 17 has been found in oesophagus cancer, breast cancer, liver cancer, lung adenocarcinoma, B-cell lymphoma, stomach cancer and melanoma.
Proposed aetiology:
The aetiology of Signature 17 remains unknown.
Additional mutational features:
Comments:
Signature 1Signature 18
Cancer types:
Signature 18 has been found commonly in neuroblastoma. Additionally, Signature 18 has been also observed in breast and stomach carcinomas.
Proposed aetiology:
The aetiology of Signature 18 remains unknown.
Additional mutational features:
Comments:
Signature 19
Cancer types:
Signature 19 has been found only in pilocytic astrocytoma.
Proposed aetiology:
The aetiology of Signature 19 remains unknown.
Additional mutational features:
Comments:
Signature 20
Cancer types:
Signature 20 has been found in stomach and breast cancers.
Proposed aetiology:
Signature 20 is believed to be associated with defective DNA mismatch repair.
Additional mutational features:
Signature 20 is associated with high numbers of small (shorter than 3bp) insertions and deletions at mono/polynucleotide repeats.
Comments:
Signature 20 is one of four mutational signatures associated with defective DNA mismatch repair and is often found in the same samples as Signatures 6, 15, and 26.
Signature 21
Cancer types:
Signature 21 has been found only in stomach cancer.
Proposed aetiology:
The aetiology of Signature 21 remains unknown.
Additional mutational features:
Comments:
Signature 21 is found only in four samples all generated by the same sequencing centre. The mutational pattern of Signature 21 is somewhat similar to the one of Signature 26. Additionally, Signature 21 is found only in samples that also have Signatures 15 and 20. As such, Signature 21 is probably also related to microsatellite unstable tumours.
Signature 22
Cancer types:
Signature 22 has been found in urothelial (renal pelvis) carcinoma and liver cancers.
Proposed aetiology:
Signature 22 has been found in cancer samples with known exposures to aristolochic acid. Additionally, the pattern of mutations exhibited by the signature is consistent with the one previous observed in experimental systems exposed to aristolochic acid.
Additional mutational features:
Signature 22 exhibits a very strong transcriptional strand bias for T>A mutations indicating adenine damage that is being repaired by transcription-coupled nucleotide excision repair.
Comments:
Signature 22 has a very high mutational burden in urothelial carcinoma; however, its mutational burden is much lower in liver cancers.
Signature 23
Cancer types:
Signature 23 has been found only in a single liver cancer sample.
Proposed aetiology:
The aetiology of Signature 23 remains unknown.
Additional mutational features:
Signature 23 exhibits very strong transcriptional strand bias for C>T mutations.
Comments:
Signature 24
Cancer types:
Signature 24 has been observed in a subset of liver cancers.
Proposed aetiology:
Signature 24 has been found in cancer samples with known exposures to aflatoxin. Additionally, the pattern of mutations exhibited by the signature is consistent with that previous observed in experimental systems exposed to aflatoxin.
Additional mutational features:
Signature 24 exhibits a very strong transcriptional strand bias for C>A mutations indicating guanine damage that is being repaired by transcription-coupled nucleotide excision repair.
Comments:
Signature 25
Cancer types:
Signature 25 has been observed in Hodgkin lymphomas.
Proposed aetiology:
The aetiology of Signature 25 remains unknown.
Additional mutational features:
Signature 25 exhibits transcriptional strand bias for T>A mutations.
Comments:
This signature has only been identified in Hodgkin’s cell lines. Data is not available from primary Hodgkin lymphomas.
Signature 26
Cancer types:
Signature 26 has been found in breast cancer, cervical cancer, stomach cancer and uterine carcinoma.
Proposed aetiology:
Signature 26 is believed to be associated with defective DNA mismatch repair.
Additional mutational features:
Signature 26 is associated with high numbers of small (shorter than 3bp) insertions and deletions at mono/polynucleotide repeats.
Comments:
Signature 26 is one of four mutational signatures associated with defective DNA mismatch repair and is often found in the same samples as Signatures 6, 15 and 20.
Signature 27
Cancer types:
Signature 27 has been observed in a subset of kidney clear cell carcinomas.
Proposed aetiology:
The aetiology of Signature 27 remains unknown.
Additional mutational features:
Signature 27 exhibits very strong transcriptional strand bias for T>A mutations. Signature 27 is associated with high numbers of small (shorter than 3bp) insertions and deletions at mono/polynucleotide repeats.
Comments:
Signature 28
Cancer types:
Signature 28 has been observed in a subset of stomach cancers.
Proposed aetiology:
The aetiology of Signature 28 remains unknown.
Additional mutational features:
Comments:
Signature 29
Cancer types:
Signature 29 has been observed only in gingivo-buccal oral squamous cell carcinoma.
Proposed aetiology:
Signature 29 has been found in cancer samples from individuals with a tobacco chewing habit.
Additional mutational features:
Signature 29 exhibits transcriptional strand bias for C>A mutations indicating guanine damage that is most likely repaired by transcription-coupled nucleotide excision repair. Signature 29 is also associated with CC>AA dinucleotide substitutions.
Comments:
The Signature 29 pattern of C>A mutations due to tobacco chewing appears different from the pattern of mutations due to tobacco smoking reflected by Signature 4.
Signature 30
Cancer types:
Signature 30 has been observed in a small subset of breast cancers.
Proposed aetiology:
The aetiology of Signature 30 remains unknown.
Examples in the literature of deposits into or analysis from the COSMIC database
“analysis of exons representing 20,857 transcripts from 18,191 genes, we conclude that the genomic landscapes of breast and colorectal cancers are composed of a handful of commonly mutated gene “mountains” and a much larger number of gene “hills” that are mutated at low frequency. “
found cellular pathways with multiple pathways
analyzed a highly curated database (Metacore, GeneGo, Inc.) that includes human protein-protein interactions, signal transduction and metabolic pathways
There were 108 pathways that were found to be preferentially mutated in breast tumors. Many of the pathways involved phosphatidylinositol 3-kinase (PI3K) signaling
the cancer genome landscape consists of relief features (mutated genes) with heterogeneous heights (determined by CaMP scores). There are a few “mountains” representing individual CAN-genes mutated at high frequency. However, the landscapes contain a much larger number of “hills” representing the CAN-genes that are mutated at relatively low frequency. It is notable that this general genomic landscape (few gene mountains and many gene hills) is a common feature of both breast and colorectal tumors.
developed software to analyze multiple mutations and mutation frequencies available from Harvard Bioinformatics at
R Software for Cancer Mutation Analysis (download here)
CancerMutationAnalysis Version 1.0:
R package to reproduce the statistical analyses of the Sjoblom et al article and the associated Technical Comment. This package is build for reproducibility of the original results and not for flexibility. Future version will be more general and define classes for the data types used. Further details are available in Working Paper 126.
CancerMutationAnalysis Version 2.0:
R package to reproduce the statistical analyses of the Wood et al article. Like its predecessor, this package is still build for reproducibility of the original results and not for flexibility. Further details are available in Working Paper 126
Update 04/27/2019
Review 2018. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Z. Sondka et al. Nature Reviews. 2018.
The Catalogue of Somatic Mutations in Cancer (COSMIC) Cancer Gene Census (CGC) reevaluates the cancer genome landscape periodically and curates the findings into a database of genetic changes occurring in various tumor types. The 2018 CGC describes in detail the effect of 719 cancer driving genes. The recent expansion includes functional and mechanistic descriptions of how each gene contributes to disease etiology and in terms of the cancer hallmarks as described by Hanahan and Weinberg. These functional characteristics show the complexity of the cancer mutational landscape and genome and suggest ” multiple cancer-related functions for many genes, which are often highly tissue-dependent or tumour stage-dependent.” The 2018 CGC expands a second tier of genes, expanding the list of cancer related genes.
Criteria for curation of genes into CGC (curation process)
choosing candidate genes are selected from published literature, conference abstracts, large cancer genome screens deposited in databases, and analysis of current COSMIC database
COSMIC data are analyzed to determine presence of patterns of somatic mutations and frequency of such mutations in cancer
literature review to determine the role of the gene in cancer
Minimum evidence
– at least two publications from different groups shows increased mutation frequency in at least one type of cancer (PubMed)
– at least two publications from different groups showing experimental evidence of functional involvement in at least one hallmark of cancer in order to classify the mutant gene as oncogene, tumor suppressor, or fusion partner (like BCR-Abl)
independent assessment by at least two postdoctoral fellows
gene must be classified as either Tier 1 of Tier 2 CGC gene
inclusion in database
continued curation efforts
definitions:
Tier 1 gene: genes which have strong evidence from both mutational and functional analysis as being involved in cancer
Tier 2 gene: genes with mutational patterns typical of cancer drivers but not functionally characterized as well as genes with published mechanistic description of involvement in cancer but without proof of somatic mutations in cancer
Tier 2 genes (719 genes): include 103 oncogenes, 181 tumor suppressors, 134 fusion partners and 31 with unknown function
Updated 7/26/2019
The COSMIC database is undergoing an extensive update and reannotation, in order to ensure standardisation and modernisation across COSMIC data. This will substantially improve the identification of unique variants that may have been described at the genome, transcript and/or protein level. The introduction of a Genomic Identifier, along with complete annotation across multiple, high quality Ensembl transcripts and improved compliance with current HGVS syntax, will enable variant matching both within COSMIC and across other bioinformatic datasets.
As a result of these updates there will be significant changes in the upcoming releases as we work through this process. The first stage of this work was the introduction of improvedHGVS syntax compliance in our May release. The majority of the changes will be reflected in COSMIC v90, which will be released in late August or early September, and the remaining changes will be introduced over the next few releases.
The significant changes in v90 include:
Updated genes, transcripts and proteins from Ensembl release 93 on both the GRCh37 and GRCh38 assemblies.
Full reannotation of COSMIC variants with known genomic coordinates using Ensembl’s Variant Effect Predictor (VEP). This provides accurate and standardised annotation uniformly across all relevant transcripts and genes that include the genomic location of the variant.
New stable genomic identifiers (COSV) that indicate the definitive position of the variant on the genome. These unique identifiers allow variants to be mapped between GRCh37 and GRCh38 assemblies and displayed on a selection of transcripts.
Updated cross-reference links between COSMIC genes and other widely-used databases such as HGNC, RefSeq, Uniprot and CCDS.
Complete standardised representation of COSMIC variants, following the most recent HGVS recommendations, where possible.
Remapping of gene fusions on the updated transcripts on both the GRCh37 and GRCh38 assemblies, along with the genomic coordinates for the breakpoint positions.
Reduced redundancy of mutations. Duplicate variants have been merged into one representative variant.
Key points for you
COSMIC variants have been annotated on all relevant Ensembl transcripts across both the GRCh37 and GRCh38 assemblies from Ensembl release 93. New genomic identifiers (e.g. COSV56056643) are used, which refers to the variant change at the genomic level rather than gene, transcript or protein level and can thus be used universally. Existing COSM IDs will continue to be supported and will now be referred to as legacy identifiers e.g. COSM476. The legacy identifiers (COSM) are still searchable. In the case of mutations without genomic coordinates, hence without a COSV identifier, COSM identifiers will continue to be used.
All relevant Ensembl transcripts in COSMIC (which have been selected based on Ensembl canonical classification and on the quality of the dataset to include only GENCODE basic transcripts) will now have both accession and version numbers, so that the exact transcript is known, ensuring reproducibility. This also provides transparency and clarity as the data are updated.
How these changes will be reflected in the download files
As we are now mapping all variants on all relevant Ensembl transcripts, the number of rows in the majority of variant download files has increased significantly. In the download files, additional columns are provided including the legacy identifier (COSM) and the new genomic identifier (COSV). An internal mutation identifier is also provided to uniquely represent each mutation, on a specific transcript, on a given assembly build. The accession and version number for each transcript are included. File descriptions for each of the download files will be available from the downloads page for clarity. We have included an example of the new columns below.
For example: COSMIC Complete Mutation Data (Targeted screens)
[17:Q] Mutation Id – An internal mutation identifier to uniquely represent each mutation on a specific transcript on a given assembly build.
[18:R] Genomic Mutation Id – Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release.
[19:S] Legacy Mutation Id – Legacy mutation identifier (COSM) that will represent existing COSM mutation identifiers.
We will shortly have some sample data that can be downloaded in the new table structure, to give you real data to manipulate and integrate, this will be available on the variant updates page.
How this affects you
We are aware that many of the changes we are making will affect integration into your pipelines and analytical platforms. By giving you advance notice of the changes, we hope much of this can be mitigated, and the end result of having clean, standardised data will be well worth any disruption. The variant updates page on the COSMIC website will provide a central point for this information and further technical details of the changes that we are making to COSMIC.
summarizes the clinical importance of five new lung cancer genome sequencing projects. These studies have identified genetic and epigenetic alterations in hundreds of lung tumors, of which some alterations could be taken advantage of using currently approved medications.
The reports, all published this month, included genomic information on more than 400 lung tumors. In addition to confirming genetic alterations previously tied to lung cancer, the studies identified other changes that may play a role in the disease.
“All of these studies say that lung cancers are genomically complex and genomically diverse,” said Dr. Matthew Meyerson of Harvard Medical School and the Dana-Farber Cancer Institute, who co-led several of the studies, including a large-scale analysis of squamous cell lung cancer by The Cancer Genome Atlas (TCGA) Research Network.
Some genes, Dr. Meyerson noted, were inactivated through different mechanisms in different tumors. He cautioned that little is known about alterations in DNA sequences that do not encode genes, which is most of the human genome.
Four of the papers are summarized below, with the first described in detail, as the Nature paper used a multi-‘omics strategy to evaluate expression, mutation, and signaling pathway activation in a large cohort of lung tumors. A literature informatics analysis is given for one of the papers. Please note that links on GENE names usually refer to the GeneCard entry.
Paper 1. Comprehensive genomic characterization of squamous cell lung cancers[1]
The Cancer Genome Atlas Research Network Project just reported, in the journal Nature, the results of their comprehensive profiling of 230 resected lung adenocarcinomas. The multi-center teams employed analyses of
microRNA
Whole Exome Sequencing including
Exome mutation analysis
Gene copy number
Splicing alteration
Methylation
Proteomic analysis
Summary:
Some very interesting overall findings came out of this analysis including:
High rates of somatic mutations including activating mutations in common oncogenes
Newly described loss of function MGA mutations
Sex differences in EGFR and RBM10 mutations
driver roles for NF1, MET, ERBB2 and RITI identified in certain tumors
differential mutational pattern based on smoking history
splicing alterations driven by somatic genomic changes
MAPK and PI3K pathway activation identified by proteomics not explained by mutational analysis = UNEXPLAINED MECHANISM of PATHWAY ACTIVATION
however, given the plethora of data, and in light of a similar study results recently released, there appears to be a great need for additional mining of this CGAP dataset. Therefore I attempted to curate some of the findings along with some other recent news relevant to the surprising findings with relation to biomarker analysis.
Makeup of tumor samples
230 lung adenocarcinomas specimens were categorized by:
Subtype
33% acinar
25% solid
14% micro-papillary
9% papillary
8% unclassified
5% lepidic
4% invasive mucinous
Gender
Smoking status
81% of patients reported past of present smoking
The authors note that TCGA samples were combined with previous data for analysis purpose.
A detailed description of Methodology and the location of deposited data are given at the following addresses:
Gender and Smoking Habits Show different mutational patterns
WES mutational analysis
a) smoking status
– there was a strong correlations of cytosine to adenine nucleotide transversions with past or present smoking. In fact smoking history separated into transversion high (past and previous smokers) and transversion low (never smokers) groups, corroborating previous results.
→ mutations in groups Transversion High Transversion Low
TP53, KRAS, STK11, EGFR, RB1, PI3CA
KEAP1, SMARCA4 RBM10
b) Gender
Although gender differences in mutational profiles have been reported, the study found minimal number of significantly mutated genes correlated with gender. Notably:
EGFR mutations enriched in female cohort
RBM10 loss of function mutations enriched in male cohort
Although the study did not analyze the gender differences with smoking patterns, it was noted that RBM10 mutations among males were more prevalent in the transversion high group.
Whole exome Sequencing and copy number analysis reveal Unique, Candidate Driver Genes
Whole exome sequencing revealed that 62% of tumors contained mutations (either point or indel) in known cancer driver genes such as:
KRAS, EGFR, BRMF, ERBB2
However, authors looked at the WES data from the oncogene-negative tumors and found unique mutations not seen in the tumors containing canonical oncogenic mutations.
Unique potential driver mutations were found in
TP53, KEAP1, NF1, and RIT1
The genomics and expression data were backed up by a proteomics analysis of three pathways:
MAPK pathway
mTOR
PI3K pathway
…. showing significant activation of all three pathways HOWEVER the analysis suggested that activation of signaling pathways COULD NOT be deduced from DNA sequencing alone. Phospho-proteomic analysis was required to determine the full extent of pathway modification.
For example, many tumors lacked an obvious mutation which could explain mTOR or MAPK activation.
Altered cell signaling pathways included:
Increased MAPK signaling due to activating KRAS
Higher mTOR due to inactivating STK11 leading to increased proliferation, translation
Pathway analysis of mutations revealed alterations in multiple cellular pathways including:
Reduced oxidative stress response
Nucleosome remodeling
RNA splicing
Cell cycle progression
Histone methylation
Summary:
Authors noted some interesting conclusions including:
MET and ERBB2 amplification and mutations in NF1 and RIT1 may be unique driver events in lung adenocarcinoma
Possible new drug development could be targeted to the RTK/RAS/RAF pathway
MYC pathway as another important target
Cluster analysis using multimodal omics approach identifies tumors based on single-gene driver events while other tumor have multiple driver mutational events (TUMOR HETEROGENEITY)
Paper 2. A Genomics-Based Classification of Human Lung Tumors[2]
3,726 point mutations and more than 90 indels in the coding sequence
Smokers with lung cancer show 10× the number of point mutations than never-smokers
Novel lung cancer genes, including DACH1, CFTR, RELN, ABCB5, and HGF were identified
Tumor samples from males showed high frequency of MYCBP2 MYCBP2 involved in transcriptional regulation of MYC.
Variant allele frequency analysis revealed 10/17 tumors were at least biclonal while 7/17 tumors were monoclonal revealing majority of tumors displayed tumor heterogeneity
Novel pathway alterations in lung cancer include cell-cycle and JAK-STAT pathways
14 fusion proteins found, including ROS1-ALK fusion. ROS1-ALK fusions have been frequently found in lung cancer and is indicative of poor prognosis[4].
Novel metabolic enzyme fusions
Alterations were identified in 54 genes for which targeted drugs are available. Drug-gable mutant targets include: AURKC, BRAF, HGF, EGFR, ERBB4, FGFR1, MET, JAK2, JAK3, HDAC2, HDAC6, HDAC9, BIRC6, ITGB1, ITGB3, MMP2, PRKCB, PIK3CG, TERT, KRAS, MMP14
Table. Validated Gene-Fusions Obtained from Ref-Seq Data
Note: Gene columns contain links for GeneCard while Gene function links are to the gene’s GO (Gene Ontology) function.
There has been a recent literature on the importance of the EML4-ALK fusion protein in lung cancer. EML4-ALK positive lung tumors were found to be les chemo sensitive to cytotoxic therapy[5] and these tumor cells may exhibit an epitope rendering these tumors amenable to immunotherapy[6]. In addition, inhibition of the PI3K pathway has sensitized EMl4-ALK fusion positive tumors to ALK-targeted therapy[7]. EML4-ALK fusion positive tumors show dependence on the HSP90 chaperone, suggesting this cohort of patients might benefit from the new HSP90 inhibitors recently being developed[8].
Table. Significantly mutated genes (point mutations, insertions/deletions) with associated function.
Table. Literature Analysis of pathways containing significantly altered genes in NSCLC reveal putative targets and risk factors, linkage between other tumor types, and research areas for further investigation.
Note: Significantly mutated genes, obtained from WES, were subjected to pathway analysis (KEGG Pathway Analysis) in order to see which pathways contained signicantly altered gene networks. This pathway term was then used for PubMed literature search together with terms “lung cancer”, “gene”, and “NOT review” to determine frequency of literature coverage for each pathway in lung cancer. Links are to the PubMEd search results.
KEGG pathway Name
# of PUBMed entries containing Pathway Name, Gene ANDLung Cancer
A few interesting genetic risk factors and possible additional targets for NSCLC were deduced from analysis of the above table of literature including HIF1-α, mIR-31, UBQLN1, ACE, mIR-193a, SRSF1. In addition, glioma, melanoma, colorectal, and prostate and lung cancer share many validated mutations, and possibly similar tumor driver mutations.
please click on graph for larger view
Paper 4. Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing[9]
Exome and genome characterization of somatic alterations in 183 lung adenocarcinomas
12 somatic mutations/megabase
U2AF1, RBM10, and ARID1A are among newly identified recurrently mutated genes
Structural variants include activating in-frame fusion of EGFR
Epigenetic and RNA deregulation proposed as a potential lung adenocarcinoma hallmark
Summary
Lung adenocarcinoma, the most common subtype of non-small cell lung cancer, is responsible for more than 500,000 deaths per year worldwide. Here, we report exome and genome sequences of 183 lung adenocarcinoma tumor/normal DNA pairs. These analyses revealed a mean exonic somatic mutation rate of 12.0 events/megabase and identified the majority of genes previously reported as significantly mutated in lung adenocarcinoma. In addition, we identified statistically recurrent somatic mutations in the splicing factor gene U2AF1 and truncating mutations affecting RBM10 and ARID1A. Analysis of nucleotide context-specific mutation signatures grouped the sample set into distinct clusters that correlated with smoking history and alterations of reported lung adenocarcinoma genes. Whole-genome sequence analysis revealed frequent structural rearrangements, including in-frame exonic alterations within EGFR and SIK2 kinases. The candidate genes identified in this study are attractive targets for biological characterization and therapeutic targeting of lung adenocarcinoma.
Paper 5. Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer[10]
Highlights
Whole exome and transcriptome (RNASeq) sequencing 29 small-cell lung carcinomas
High mutation rate 7.4 protein-changing mutations/million base pairs
Inactivating mutations in TP53 and RB1
Functional mutations in CREBBP, EP300, MLL, PTEN, SLIT2, EPHA7, FGFR1 (determined by literature and database mining)
The mutational spectrum seen in human data also present in a Tp53-/- Rb1-/- mouse lung tumor model
Curator Graphical Summary of Interesting Findings From the Above Studies
The above figure (please click on figure) represents themes and findings resulting from the aforementioned studies including
questions which will be addressed in Future Postson this site.
UPDATED 10/10/2021
The following article uses RNASeq to screen lung adenocarcinomas for fusion proteins in patients with either low or high tumor mutational burden. Findings included presence of MET fusion proteins in addition to other fusion proteins irrespective if tumors were driver negative by DNASeq screening.
High Yield of RNA Sequencing for Targetable Kinase Fusions in Lung Adenocarcinomas with No Mitogenic Driver Alteration Detected by DNA Sequencing and Low Tumor Mutation Burden
Source:
High Yield of RNA Sequencing for Targetable Kinase Fusions in Lung Adenocarcinomas with No Mitogenic Driver Alteration Detected by DNA Sequencing and Low Tumor Mutation Burden
RymaBenayed, MichaelOffin, KerryMullaney, PurvilSukhadia, KellyRios, PatriceDesmeules, RyanPtashkin, HelenWon, JasonChang, DarraghHalpenny, Alison M.Schram, Charles M.Rudin, David M.Hyman, Maria E.Arcila, Michael F.Berger, AhmetZehir, Mark G.Kris, AlexanderDrilon and MarcLadanyi
Purpose: Targeted next-generation sequencing of DNA has become more widely used in the management of patients with lung adenocarcinoma; however, no clear mitogenic driver alteration is found in some cases. We evaluated the incremental benefit of targeted RNA sequencing (RNAseq) in the identification of gene fusions and MET exon 14 (METex14) alterations in DNA sequencing (DNAseq) driver–negative lung cancers.
Experimental Design: Lung cancers driver negative by MSK-IMPACT underwent further analysis using a custom RNAseq panel (MSK-Fusion). Tumor mutation burden (TMB) was assessed as a potential prioritization criterion for targeted RNAseq.
Results: As part of prospective clinical genomic testing, we profiled 2,522 lung adenocarcinomas using MSK-IMPACT, which identified 195 (7.7%) fusions and 119 (4.7%) METex14 alterations. Among 275 driver-negative cases with available tissue, 254 (92%) had sufficient material for RNAseq. A previously undetected alteration was identified in 14% (36/254) of cases, 33 of which were actionable (27 in-frame fusions, 6 METex14). Of these 33 patients, 10 then received matched targeted therapy, which achieved clinical benefit in 8 (80%). In the 32% (81/254) of DNAseq driver–negative cases with low TMB [0–5 mutations/Megabase (mut/Mb)], 25 (31%) were positive for previously undetected gene fusions on RNAseq, whereas, in 151 cases with TMB >5 mut/Mb, only 7% were positive for fusions (P < 0.0001).
Conclusions: Targeted RNAseq assays should be used in all cases that appear driver negative by DNAseq assays to ensure comprehensive detection of actionable gene rearrangements. Furthermore, we observed a significant enrichment for fusions in DNAseq driver–negative samples with low TMB, supporting the prioritization of such cases for additional RNAseq.
Translational Relevance
Inhibitors targeting kinase fusions have shown dramatic and durable responses in lung cancer patients, making their comprehensive detection critical. Here, we evaluated the incremental benefit of targeted RNA sequencing (RNAseq) in the identification of gene fusions in patients where no clear mitogenic driver alteration is found by DNA sequencing (DNAseq)–based panel testing. We found actionable alterations (kinase fusions or MET exon 14 skipping) in 13% of cases apparently driver negative by previous DNAseq testing. Among the driver-negative samples tested by RNAseq, those with low tumor mutation burden (TMB) were significantly enriched for gene fusions when compared with the ones with higher TMB. In a clinical setting, such patients should be prioritized for RNAseq. Thus, a rational, algorithmic approach to the use of targeted RNA-based next-generation sequencing (NGS) to complement large panel DNA-based NGS testing can be highly effective in comprehensively uncovering targetable gene fusions or oncogenic isoforms not just in lung cancer but also more generally across different tumor types.
Wake Up and Smell the Fusions: Single-Modality Molecular Testing Misses Drivers
by Kurtis D.Davies and Dara L.Aisner
Abstract
Multitarget assays have become common in clinical molecular diagnostic laboratories. However, all assays, no matter how well designed, have inherent gaps due to technical and biological limitations. In some clinical cases, testing by multiple methodologies is needed to address these gaps and ensure the most accurate molecular diagnoses.
In this issue of Clinical Cancer Research, Benayed and colleagues illustrate the growing need to consider multiple molecular testing methodologies for certain clinical specimens (1). The rapidly expanding list of actionable molecular alterations across cancer types has resulted in the wide adoption of multitarget testing approaches, particularly those based on next-generation sequencing (NGS). NGS-based assays are commonly viewed as “one-stop shops” to detect a vast array of molecular variants. However, as Benayed and colleagues discuss, even well-designed and highly vetted NGS assays have inherent gaps that, under certain circumstances, are ideally addressed by analyzing the sample using an alternative approach.
In the article, the authors examined a cohort of lung adenocarcinoma patient samples that had been deemed “driver- negative” via MSK-IMPACT, an FDA-cleared test that is widely considered by experts in the field to be one of the best examples of a DNA-based large gene panel NGS assay (2). Of 589 driver-negative cases, 254 had additional material amenable for a different approach: RNA-based NGS designed specifically for gene fusion and oncogenic gene isoform detection. After accounting for quality control failures, 232 samples were successfully sequenced, and, among these, 36 samples (representing an astonishing 15.5% of tested cases) were found to be positive for a driver gene fusion or oncogenic isoform that had not been detected by DNA-based NGS. The real-world value derived from this orthogonal testing schema was more than theoretical, with 8 of 10 (80%) patients demonstrating clinical benefit when treated according to the alteration identified via the RNA-based approach.
To detect gene rearrangements that lead to oncogenic gene fusions (and to detect mutations and insertions/deletions that lead to MET exon 14 skipping), MSK-IMPACT employs hybrid capture-based enrichment of selected intronic regions from genomic DNA. While this approach has proven to be successful in a variety of settings, there are associated limitations that were determined in this study to underlie the discrepancies between MSK-IMPACT and the RNA-based assay. First, some introns that are involved in clinically actionable rearrangement events are very large, thus requiring substantial sequencing capital that can represent a disproportionate fraction of the assay. Despite the ability via NGS to perform sequencing at a large scale, this sequencing capacity is still finite, and thus decisions must be made to sacrifice coverage of certain large genomic regions to ensure sufficient sequencing depth for other desired genomic targets. In the case of MSK-IMPACT (and most other DNA-based NGS assays), certain important introns in NTRK3 and NRG1 are not included in covered content, simply because they are too large (>90 Kb each). The second primary problem with DNA-based analysis of introns is that they often contain highly repetitive elements that are extremely difficult to assess via NGS due to their recurring presence across the genome. Attempts to sequence these regions are largely unfruitful because any sequencing data obtained cannot be specifically aligned/mapped to the desired targeted region of the genome (3). This is particularly true for intron 31 of ROS1, because it contains two repetitive long interspersed nuclear elements, and many DNA-based assays, including MSK-IMPACT, poorly cover this intron (4). In this study by Benayed and colleagues, the most common discrepant alteration was fusion involving ROS1, which accounted for 10 of 36 (28%) cases. At least six of these, those that demonstrated fusion to ROS1 exon 32, were likely directly explained by incomplete intron 31 sequencing. RNA-based analysis is able to overcome the above described limitations owing to the simple fact that sequencing is focused on exons post-splicing and the need to sequence introns is entirely avoided (Fig. 1).
Schematic representation of underlying genomic complexities that can lead to false-negative gene fusion results in DNA-based NGS analysis. In some cases, RNA-based approaches may overcome the limitations of DNA-based testing.
Lack of sufficient intronic coverage could not account for all of the discrepancies between DNA-based and RNA-based analysis however. Six samples in the cohort were found to be positive for MET exon 14 skipping based on RNA. In five of these, genomic alterations in MET introns 13 or 14 were observed, however they did not conform to canonical splice site alterations and thus were not initially called (although this was addressed by bioinformatics updates). In RNA-based testing, however, determination of exon skipping is simplified such that, regardless of the specific genomic alteration that interferes with splicing, absence of the exon in the transcript is directly observed (5). In another two of the discrepant cases, tumor purity was observed to be low in the sample, meaning that the expected variant allele frequency (VAF) for a genomic event would also likely be low, potentially below detectable levels. However, overexpression of the fusions at the transcript level was theorized to compensate for low VAF (Fig. 1). Additional explanations for discordant findings between the assays included sample-specific poor sequencing in selected introns and complex rearrangements that hindered proper capture (Fig. 1).
The take home message from Benayed and colleagues is simply this: there is no perfect assay that will detect 100% of the potential actionable alterations in patient samples. Even an extremely well designed, thoroughly vetted, and FDA-cleared assay such as MSK-IMPACT will have inherent and unavoidable “holes” due to intrinsic limitations. The solution to this dilemma, as adeptly described by Benayed and colleagues, is additional testing using a different approach. While in an ideal world every clinical tumor sample would be tested by multiple modalities to ensure the most comprehensive clinical assessment, the reality is that these samples are often scant and testing is fiscally burdensome (and often not reimbursed). Therefore, algorithms to determine which samples should be reflexed to secondary assays after testing with a primary assay are critical for maximizing benefit. In this study, the first algorithmic step was lack of an identified driver (because activated oncogenic drivers tend to exist exclusively of each other), which amounted to 23% of samples tested with the primary assay. In addition, the authors found a significantly higher rate of actionable gene fusions in samples with a low (<5 mut/Mb) tumor mutational burden, meaning that this metric, which was derived from the primary assay, could also be used to help inform decision making regarding additional testing. While this scenario is somewhat specific to lung cancer, similar approaches could be prescribed on a cancer type–specific basis.
These findings should be considered a “wake-up call” for oncologists in regard to the ordering and interpretation of molecular testing. It is clear from these and other published findings that advanced molecular analysis has limitations that require nuanced technical understanding. As this arena evolves, it is critical for oncologists (and trainees) to gain an increased comprehension of how to identify when the “gaps” in a test might be most clinically relevant. This requires a level of technical cognizance that has been previously unexpected of clinical practitioners, yet is underscored by the reality that opportunities for effective targeted therapy can and will be missed if the treating oncologist is unaware of how to best identify patients for whom additional testing is warranted. This study also highlights the mantra of “no test is perfect” regardless of prestige of the testing institution, number of past tests performed, or regulatory status. NGS, despite its benefits, does not mean all-encompassing. It is only through the adaptability of laboratories to utilize knowledge such as is provided by Benayed and colleagues that advances in laboratory medicine can be quickly deployed to maximize benefits for oncology patients.
Govindan R, Ding L, Griffith M, Subramanian J, Dees ND, Kanchi KL, Maher CA, Fulton R, Fulton L, Wallis J et al: Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell 2012, 150(6):1121-1134.
Takeuchi K, Soda M, Togashi Y, Suzuki R, Sakata S, Hatano S, Asaka R, Hamanaka W, Ninomiya H, Uehara H et al: RET, ROS1 and ALK fusions in lung cancer. Nature medicine 2012, 18(3):378-381.
Morodomi Y, Takenoyama M, Inamasu E, Toyozawa R, Kojo M, Toyokawa G, Shiraishi Y, Takenaka T, Hirai F, Yamaguchi M et al: Non-small cell lung cancer patients with EML4-ALK fusion gene are insensitive to cytotoxic chemotherapy. Anticancer research 2014, 34(7):3825-3830.
Yoshimura M, Tada Y, Ofuzi K, Yamamoto M, Nakatsura T: Identification of a novel HLA-A 02:01-restricted cytotoxic T lymphocyte epitope derived from the EML4-ALK fusion gene. Oncology reports 2014, 32(1):33-39.
Workman P, van Montfort R: EML4-ALK fusions: propelling cancer but creating exploitable chaperone dependence. Cancer discovery 2014, 4(6):642-645.
Imielinski M, Berger AH, Hammerman PS, Hernandez B, Pugh TJ, Hodis E, Cho J, Suh J, Capelletti M, Sivachenko A et al: Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 2012, 150(6):1107-1120.
Peifer M, Fernandez-Cuesta L, Sos ML, George J, Seidel D, Kasper LH, Plenker D, Leenders F, Sun R, Zander T et al: Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nature genetics 2012, 44(10):1104-1110.
Other posts on this site which refer to Lung Cancer and Cancer Genome Sequencing include: