Posts Tagged ‘clinical associations’

Cancer Mutations Across the Landscape

Curator: Larry H. Bernstein, MD, FCAP

This is an up-to-date article about the significance of mutations found in 12 major types of cancer.

Mutational landscape and significance across 12 major cancer types

Cyriac Kandoth1*, Michael D. McLellan1*, Fabio Vandin2, Kai Ye1,3, Beifang Niu1, Charles Lu1, et al.

1The Genome Institute, Washington University in St Louis, Missouri 63108, USA. 2Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA. 3Department of Genetics, Washington University in St Louis, Missouri 63108, USA. 4Department of Medicine, Washington University in St Louis, Missouri 63108, USA. 5Siteman Cancer Center, Washington University in St Louis, Missouri 63108, USA. 6Department of Mathematics, Washington University in St Louis, Missouri 63108, USA.

NATURE 17 Oct 2013;  5 0 2

The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate

  1. the distributions of mutation frequencies,
  2. types and contexts across tumour types, and
  3. establish their links to tissues of origin,
  4. environmental/ carcinogen influences, and
  5. DNA repair defects.

Using the integrated data sets, we identified 127 significantly mutated genes from well-knownand emerging cellular processes in cancer.

  1. (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase,Wnt/b-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control)
  2. (for example, histone, histone modification, splicing, metabolism and proteolysis)

The average number of mutations in these significantly mutated genes varies across tumour types;

  1. most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small.
  2. Mutations in transcriptional factors/regulators show tissue specificity, whereas
  3. histone modifiers are often mutated across several cancer types.

Clinical association analysis identifies genes having a significant effect on survival, and

  • investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis.

Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment


The advancement of DNA sequencing technologies now enables the processing of thousands of tumours of many types for systematic mutation discovery. This expansion of scope, coupled with appreciable progress in algorithms1–5, has led directly to characterization of signifi­cant functional mutations, genes and pathways6–18. Cancer encompasses more than 100 related diseases19, making it crucial to understand the commonalities and differences among various types and subtypes. TCGA was founded to address these needs, and its large data sets are providing unprecedented opportunities for systematic, integrated analysis.

We performed a systematic analysis of 3,281 tumours from 12 cancer types to investigate underlying mechanisms of cancer initiation and progression. We describe variable mutation frequencies and contexts and their associations with environmental factors and defects in DNA repair. We identify 127 significantlymutated genes (SMGs) from diverse signalling and enzymatic processes. The finding of a TP53-driven breast, head and neck, and ovarian cancer cluster with a dearth of other mutations in SMGs suggests common therapeutic strategies might be applied for these tumours. We determined interactions among muta­tions and correlated mutations in BAP1, FBXW7 and TP53 with det­rimental phenotypes across several cancer types. The subclonal structure and transcription status of underlying somatic mutations reveal the trajectory of tumour progression in patients with cancer.

Standardization of mutation data

Stringent filters (Methods) were applied to ensure high quality muta­tion calls for 12 cancer types: breast adenocarcinoma (BRCA), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), uterine corpus endometrial carcinoma (UCEC), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), colon and rectal carcinoma (COAD, READ),bladder urothelial carcinoma (BLCA), kidney renal clear cell carcinoma (KIRC), ovarian serous carcinoma (OV) and acute myeloid leukaemia (LAML; conventionally called AML) (Supplementary Table 1). A total of 617,354 somatic mutations, consisting of

  • 398,750 missense,
  • 145,488 silent,
  • 36,443 nonsense,
  • 9,778 splice site,
  • 7,693 non-coding RNA,
  • 523 non-stop/readthrough,
  • 15,141 frameshift insertions/deletions (indels) and
  • 3,538 inframe indels,

were included for downstream analyses (Supplementary Table 2).

Distinct mutation frequencies and sequence context

Figure 1a shows that AML has the lowest median mutation frequency and LUSC the highest (0.28 and 8.15 mutations per megabase (Mb), respectively). Besides AML, all types average over 1 mutation per Mb, substantially higher than in pediatric tumours20. Clustering21 illus­trates that

  • mutation frequencies for KIRC, BRCA, OV and AML are normally distributed within a single cluster, whereas
  • other types have several clusters (for example, 5 and 6 clusters in UCEC and COAD/ READ, respectively) (Fig. 1a and Supplementary Table 3a, b).

In UCEC, the largest patient cluster has a frequency of approximately 1.5 muta­tions per Mb, and

  • the cluster with the highest frequency is more than 150 times greater.

Multiple clusters suggest that factors other than age contribute to development in these tumours14,16. Indeed,

  • there is a significant correlation between high mutation frequency and DNA repair pathway genes (for example, PRKDC, TP53 and MSH6) (Sup­plementary Table 3c). Notably,
  • PRKDC mutations are associated with high frequency in BLCA, COAD/READ, LUAD and UCEC, whereas
  • TP53 mutations are related with higher frequencies in AML, BLCA, BRCA, HNSC, LUAD, LUSC and UCEC (all P < 0.05).

Mutations in POLQ and POLE associate with high frequencies in multiple cancer types; POLE association in UCEC is consistent with previous observations14.

Comparison of spectra across the 12 types (Fig. 1b and Supplemen­tary Table 3d) reveals that LUSC and LUAD contain increased C>A transversions, a signature of cigarette smoke exposure10. Sequence context analysis across 12 types revealed

  • the largest difference being in C>T transitions and C>G transversions (Fig. 1c).

The frequency of thymine 1-bp (base pair) upstream of C>G transversions is mark­edly higher in BLCA, BRCA and HNSC than in other cancer types (Extended Data Fig. 1). GBM, AML, COAD/READ and UCEC have similar contexts in that

  • the proportions of guanine 1 base downstream of C>T transitions are between
    • 59% and 67%, substantially higher than the approximately 40% in other cancer types.

Higher frequencies of transition mutations at CpG in gastrointestinal tumours, including colorectal, were previously reported22. We found three additional cancer types (GBM, AML and UCEC) clustered in the C>T mutation at CpG, consistent with previous findings of

  • aberrant DNA methylation in endometrial cancer23 and glioblastoma24.

BLCA has a unique signature for C>T transitions compared to the other types (enriched for TC) (Extended Data Fig. 1).

Significantly mutated genes

Genes under positive selection, either in individual or multiple tumour types, tend to display higher mutation frequencies above background. Our statistical analysis3, guided by expression data and curation (Methods), identified 127 such genes (SMGs; Supplementary Table 4). These SMGs are involved in a wide range of cellular processes, broadly classified into 20 categories (Fig. 2), including

  • transcription factors/regulators, histone modifiers, genome integrity, receptor tyrosine kinase signal­ling, cell cycle, mitogen-activated protein kinases (MAPK) signalling, phosphatidylinositol-3-OH kinase (PI(3)K) signalling, Wnt/ -catenin signalling, histones, ubiquitin-mediatedproteolysis, and splicing (Fig. 2).

The identification of MAPK, PI(3)K and Wnt/ -catenin signaling path­ways is consistent with classical cancer studies. Notably, newer categories (for example, splicing, transcription regulators, metabolism, proteolysis and histones) emerge as exciting guides for the development of new therapeutic targets. Genes categorized as histone modifiers (Z = 0.57), PI(3)K signalling (Z = 1.03), and genome integrity (Z = 0.66) all relate to more than one cancer type, whereas

  • transcription factor/regulator (Z = 0.40), TGF- signalling (Z = 0.66), and Wnt/ -catenin signalling (Z = 0.55) genes tend to associate with single types (Methods).

Notably, 3,053 out of 3,281 total samples (93%) across the Pan-Cancer collection had at least one non-synonymous mutation in at least one SMG. The average number of point mutations and small indels in these genes varies across tumour types, with the highest (,6 mutations per tumour) in UCEC, LUAD and LUSC, and the lowest (,2 mutations per tumour) in AML, BRCA, KIRC and OV. This suggests that the numbers of both cancer-related genes (only 127 identified in this study) and cooperating driver mutations required during oncogenesis are small (most cases only had 2–6) (Fig. 3), although large-scale structural rearrangements were not included in this analysis.

Common mutations

The most frequently mutated gene in the Pan-Cancer cohort is TP53 (42% of samples). Its mutations predominate in serous ovarian (95%) and serous endometrial carcinomas (89%) (Fig. 2). TP53 mutations are also associated with basal subtype breast tumours. PIK3CA is the second most commonly mutated gene, occurring frequently (>10%) in most cancer types except OV, KIRC, LUAD and AML. PIK3CA mutations frequented UCEC (52%) and BRCA (33.6%), being speci­fically enriched in luminal subtype tumours. Tumours lacking PIK3CA mutations often had mutations in PIK3R1, with the highest occur­rences in UCEC (31%) and GBM (11%) (Fig. 2).

Many cancer types carried mutations in chromatin re-modelling genes. In particular, histone-lysine N-methyltransferase genes (MLL2 (also known as KMT2D), MLL3 (KMT2C) and MLL4 (KMT2B)) clus­ter in bladder, lung and endometrial cancers, whereas the lysine (K)-specific demethylase KDM5C is prevalently mutated in KIRC (7%). Mutations in ARID1A are frequent in BLCA, UCEC, LUAD and LUSC, whereas mutations in ARID5B predominate in UCEC (10%) (Fig. 2).

Fig. 1.  Distribution of mutation frequencies across 12 cancer types.

Fig. 1.  | Distribution of mutation frequencies across 12 cancer types.

Dashed grey and solid white lines denote average across cancer types and median for each type, respectively. b, Mutation spectrum of six transition (Ti) and transversion (Tv) categories for each cancer type. c, Hierarchically clustered mutation context (defined by the proportion of A, T, C and G nucleotides within ±2bp of variant site) for six mutation categories. Cancer types correspond to colours in a. Colour denotes degree of correlation: yellow (r = 0.75) and red (r = 1).

Fig. 2.  The 127 SMGs from 20 cellular processes in cancer identified in and Pan-Cancer are shown, with the highest percentage in each gene among 12 (not shown)

Fig. 3.  Distribution of mutations in 127 SMGs across Pan-Cancer cohort.

Fig. 3. | Distribution of mutations in 127 SMGs across Pan-Cancer cohort.

Box plot displays median numbers of non-synonymous mutations, with outliers shown as dots. In total, 3,210 tumours were used for this analysis (hypermutators excluded).

Figure 4 | Unsupervised clustering based on mutation status of SMGs. Tumours having no mutation or more than 500 mutations were excluded. A mutation status matrix was constructed for 2,611 tumours. Major clusters of mutations detected in UCEC, COAD, GBM, AML, KIRC, OV and BRCA were highlighted.
Complete gene list shown in Extended Data Fig. 3.  (not shown)

Fig. 5. Driver initiation and progression mutations and tumour clonal mutation is in the subclone

Figure 5 | Driver initiation and progression mutations and tumour clonal mutation is in the subclone

Survival Analysis

We examined which genes correlate with survival using the Cox proportional hazards model, first analysing individual cancer types using age and gender as covariates; an average of 2 genes (range: 0–4) with mutation frequency 2% were significant (P<_0.05) in each type (Supplementary Table 10a and Extended Data Fig. 6). KDM6A and ARID1A mutations correlate with better survival in BLCA (P = 0.03, hazard ratio (HR) = 0.36, 95% confidence interval (CI): 0.14–0.92) and UCEC (P = 0.03, HR = 0.11, 95% CI: 0.01–0.84), respectively, but mutations in SETBP1, recently identified with worse prognosis in atypical chronic myeloid leukaemia (aCML)31, have a significant detrimental effect in HNSC (P = 0.006, HR = 3.21, 95% CI: 1.39–7.44). BAP1 strongly correlates with poor survival (P = 0.00079, HR = 2.17, 95% CI: 1.38–3.41) in KIRC. Conversely, BRCA2 muta­tions (P = 0.02, HR = 0.31, 95% CI: 0.12–0.85) associate with better survival in ovarian cancer, consistent with previous reports32,33; BRCA1 mutations showed positive correlation with better survival, but did not reach significance here.

We extended our survival analysis across cancer types, restricting our attention to the subset of 97 SMGs whose mutations appeared in 2% of patients having survival data in 2 tumour types. Taking type, age and gender as covariates, we found 7 significant genes: BAP1DNMT3AHGFKDM5CFBXW7BRCA2 and TP53 (Extended Data Table 1).  In particular, BAP1 was highly significant (0.00013, HR = 2.20, 95% CI: 1.47–3.29, more than 53 mutated tumours out of 888 total), with mutations associating with detrimental outcome in four tumour types and notable associations in KIRC (P = 0.00079), consistent with a recent report28, and in UCEC(P = 0.066). Mutations in several other genes are detrimental, including DNMT3A (HR = 1.59), previously identified with poor prognosis in AML34, and KDM5C (HR = 1.63), FBXW7 (HR = 1.57) and TP53 (HR = 1.19). TP53 has significant associations with poor outcome in KIRC (P = 0.012), AML (P = 0.0007) and HNSC (P = 0.00007). Conversely, BRCA2 (P = 0.05, HR = 0.62, 95% CI: 0.38 to 0.99) correlates with survival benefit in six types, including OV and UCEC (Supplementary Table 10a, b). IDH1 mutations are associated with improved prognosis across the Pan-Cancer set (HR = 0.67, P = 0.16) and also in GBM (HR = 0.42, P = 0.09) (Supplementary Table 10a, b), consistent with previous work.35

 Driver mutations and tumour clonal architecture

To understand the temporal order of somatic events, we analysed the variant allele fraction (VAF) distribution of mutations in SMGs across AML, BRCA and UCEC (Fig. 5a and Supplementary Table 11a) and other tumour types (Extended Data Fig. 7). To minimize the effect of copy number alterations, we focused on mutations in copy neutral segments. Mutations in TP53 have higher VAFs on average in all three cancer types, suggesting early appearance during tumorigenesis.

It is worth noting that copy neutral loss of heterozygosity is commonly found in classical tumour suppressors such as TP53, BRCA1, BRCA2 and PTEN, leading to increased VAFs in these genes. In AML, DNMT3A (permutation test P = 0), RUNX1 (P = 0.0003) and SMC3 (P = 0.05) have significantly higher VAFs than average among SMGs (Fig. 5a and Supplementary Table 11b). In breast cancer, AKT1, CBFB, MAP2K4, ARID1A, FOXA1 and PIK3CA have relatively high average VAFs. For endometrial cancer, multiple SMGs (for example, PIK3CA, PIK3R1, PTEN, FOXA2 and ARID1A) have similar median VAFs. Conversely, KRAS and/or NRAS mutations tend to have lower VAFs in all three tumour types (Fig. 5a), suggesting NRAS (for example, P = 0 in AML) and KRAS (for example, P = 0.02 in BRCA) have a progression role in a subset of AML, BRCA and UCEC tumours. For all three cancer types, we clearly observed a shift towards higher expression VAFs in SMGs versus non-SMGs, most apparent in BRCA and UCEC (Extended Data Fig. 8a and Methods).

Previous analysis using whole-genome sequencing (WGS) detected subclones in approximately 50% of AML cases15,36,37; however, ana­lysis is difficult using AML exome owing to its relatively few coding mutations. Using 50 AML WGS cases, sciClone ( genome/sciclone) detected DNMT3A mutations in the founding clone for 100% (8 out of 8) of cases and NRAS mutations in the subclone for 75% (3 out of 4) of cases (Extended Data Fig. 8b). Among 304 and 160 of BRCA and UCEC tumours, respectively, with enough coding muta­tions for clustering, 35% BRCA and 44% UCEC tumours contained subclones. Our analysis provides the lower bound for tumour hetero­geneity, because only coding mutations were used for clustering. In BRCA, 95% (62 out of 65) of cases contained PIK3CA mutations in the founding clone, whereas 33% (3 out of 9) of cases had MLL3 muta­tions in the subclone. Similar patterns were found in UCEC tumours, with 96% (65 out of 68) and 95% (62 out of 65) of tumours containing PIK3CA and PTEN mutations, respectively, in the founding clone, and 9% (2 out of22) ofKRAS and 14% (1 out of 7) ofNRAS mutations in the subclone (Extended Data Fig. 8b and Supplementary Table 12).

Mutation con­text (-2 to +2 bp) was calculated for each somatic variant in each mutation category, and hierarchical clustering was then performed using the pairwise mutation context correlation across all cancer types. The mutational significance in cancer (MuSiC)3 package was used to identify significant genes for both indi­vidual tumour types and the Pan-Cancer collective. An R function ‘hclust’ was used for complete-linkage hierarchical clustering across mutations and samples, and Dendrix30 was used to identify sets of approximately mutual exclusive muta­tions. Cross-cancer survival analysis was based on the Cox proportional hazards model, as implemented in the R package ‘survival’ ( packages/survival/), and the sciClone algorithm ( generated mutation clusters using point mutations from copy number neutral segments. A complete description of the materials and methods used to generate this data set and its results is provided in the Methods.

References (20 of 38)

  1. Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).
  2. Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
  3. Dees, N. D. et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).
  4. Roth, A. et al. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907–913 (2012).
  5. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnol. 31, 213–219 (2013).
  6. Jones, S. et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 1801–1806 (2008).
  7. Parsons, D. W. et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812 (2008).
  8. Sjo¨blom, T. etal. The consensuscodingsequences of human breast and colorectal cancers. Science 314, 268–274 (2006).
  9. The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
  10. Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–1075 (2008).
  11. Wood, L. D. etal. The genomic landscapesof human breast and colorectal cancers. Science 318, 1108–1113 (2007).
  12. The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011).
  13. The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
  14. Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
  15. The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
  16. The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
  17. Ellis, M. J. et al. Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 486, 353–360 (2012).
  18. The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).
  19. Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000).
  20. Downing, J. R. et al. The Pediatric Cancer Genome Project. Nature Genet. 44, 619–622 (2012).

Read Full Post »