Posts Tagged ‘Washington University in St. Louis’

Cancer Mutations Across the Landscape

Curator: Larry H. Bernstein, MD, FCAP

This is an up-to-date article about the significance of mutations found in 12 major types of cancer.

Cancer Mutations Across the Landscape

Word Cloud by Daniel Menzin

UPDATED 4/24/2020  The genomic landscape of pediatric cancers: Curation of WES/WGS studies shows need for more data

Mutational landscape and significance across 12 major cancer types

Cyriac Kandoth1*, Michael D. McLellan1*, Fabio Vandin2, Kai Ye1,3, Beifang Niu1, Charles Lu1, et al.

1The Genome Institute, Washington University in St Louis, Missouri 63108, USA. 2Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA. 3Department of Genetics, Washington University in St Louis, Missouri 63108, USA. 4Department of Medicine, Washington University in St Louis, Missouri 63108, USA. 5Siteman Cancer Center, Washington University in St Louis, Missouri 63108, USA. 6Department of Mathematics, Washington University in St Louis, Missouri 63108, USA.

NATURE 17 Oct 2013;  5 0 2      http://dx.doi.org/10.1038/nature12634

The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate

  1. the distributions of mutation frequencies,
  2. types and contexts across tumour types, and
  3. establish their links to tissues of origin,
  4. environmental/ carcinogen influences, and
  5. DNA repair defects.

Using the integrated data sets, we identified 127 significantly mutated genes from well-knownand emerging cellular processes in cancer.

  1. (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase,Wnt/b-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control)
  2. (for example, histone, histone modification, splicing, metabolism and proteolysis)

The average number of mutations in these significantly mutated genes varies across tumour types;

  1. most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small.
  2. Mutations in transcriptional factors/regulators show tissue specificity, whereas
  3. histone modifiers are often mutated across several cancer types.

Clinical association analysis identifies genes having a significant effect on survival, and

  • investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis.

Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment


The advancement of DNA sequencing technologies now enables the processing of thousands of tumours of many types for systematic mutation discovery. This expansion of scope, coupled with appreciable progress in algorithms1–5, has led directly to characterization of signifi­cant functional mutations, genes and pathways6–18. Cancer encompasses more than 100 related diseases19, making it crucial to understand the commonalities and differences among various types and subtypes. TCGA was founded to address these needs, and its large data sets are providing unprecedented opportunities for systematic, integrated analysis.

We performed a systematic analysis of 3,281 tumours from 12 cancer types to investigate underlying mechanisms of cancer initiation and progression. We describe variable mutation frequencies and contexts and their associations with environmental factors and defects in DNA repair. We identify 127 significantlymutated genes (SMGs) from diverse signalling and enzymatic processes. The finding of a TP53-driven breast, head and neck, and ovarian cancer cluster with a dearth of other mutations in SMGs suggests common therapeutic strategies might be applied for these tumours. We determined interactions among muta­tions and correlated mutations in BAP1, FBXW7 and TP53 with det­rimental phenotypes across several cancer types. The subclonal structure and transcription status of underlying somatic mutations reveal the trajectory of tumour progression in patients with cancer.

Standardization of mutation data

Stringent filters (Methods) were applied to ensure high quality muta­tion calls for 12 cancer types: breast adenocarcinoma (BRCA), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), uterine corpus endometrial carcinoma (UCEC), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), colon and rectal carcinoma (COAD, READ),bladder urothelial carcinoma (BLCA), kidney renal clear cell carcinoma (KIRC), ovarian serous carcinoma (OV) and acute myeloid leukaemia (LAML; conventionally called AML) (Supplementary Table 1). A total of 617,354 somatic mutations, consisting of

  • 398,750 missense,
  • 145,488 silent,
  • 36,443 nonsense,
  • 9,778 splice site,
  • 7,693 non-coding RNA,
  • 523 non-stop/readthrough,
  • 15,141 frameshift insertions/deletions (indels) and
  • 3,538 inframe indels,

were included for downstream analyses (Supplementary Table 2).

Distinct mutation frequencies and sequence context

Figure 1a shows that AML has the lowest median mutation frequency and LUSC the highest (0.28 and 8.15 mutations per megabase (Mb), respectively). Besides AML, all types average over 1 mutation per Mb, substantially higher than in pediatric tumours20. Clustering21 illus­trates that

  • mutation frequencies for KIRC, BRCA, OV and AML are normally distributed within a single cluster, whereas
  • other types have several clusters (for example, 5 and 6 clusters in UCEC and COAD/ READ, respectively) (Fig. 1a and Supplementary Table 3a, b).

In UCEC, the largest patient cluster has a frequency of approximately 1.5 muta­tions per Mb, and

  • the cluster with the highest frequency is more than 150 times greater.

Multiple clusters suggest that factors other than age contribute to development in these tumours14,16. Indeed,

  • there is a significant correlation between high mutation frequency and DNA repair pathway genes (for example, PRKDC, TP53 and MSH6) (Sup­plementary Table 3c). Notably,
  • PRKDC mutations are associated with high frequency in BLCA, COAD/READ, LUAD and UCEC, whereas
  • TP53 mutations are related with higher frequencies in AML, BLCA, BRCA, HNSC, LUAD, LUSC and UCEC (all P < 0.05).

Mutations in POLQ and POLE associate with high frequencies in multiple cancer types; POLE association in UCEC is consistent with previous observations14.

Comparison of spectra across the 12 types (Fig. 1b and Supplemen­tary Table 3d) reveals that LUSC and LUAD contain increased C>A transversions, a signature of cigarette smoke exposure10. Sequence context analysis across 12 types revealed

  • the largest difference being in C>T transitions and C>G transversions (Fig. 1c).

The frequency of thymine 1-bp (base pair) upstream of C>G transversions is mark­edly higher in BLCA, BRCA and HNSC than in other cancer types (Extended Data Fig. 1). GBM, AML, COAD/READ and UCEC have similar contexts in that

  • the proportions of guanine 1 base downstream of C>T transitions are between
    • 59% and 67%, substantially higher than the approximately 40% in other cancer types.

Higher frequencies of transition mutations at CpG in gastrointestinal tumours, including colorectal, were previously reported22. We found three additional cancer types (GBM, AML and UCEC) clustered in the C>T mutation at CpG, consistent with previous findings of

  • aberrant DNA methylation in endometrial cancer23 and glioblastoma24.

BLCA has a unique signature for C>T transitions compared to the other types (enriched for TC) (Extended Data Fig. 1).

Significantly mutated genes

Genes under positive selection, either in individual or multiple tumour types, tend to display higher mutation frequencies above background. Our statistical analysis3, guided by expression data and curation (Methods), identified 127 such genes (SMGs; Supplementary Table 4). These SMGs are involved in a wide range of cellular processes, broadly classified into 20 categories (Fig. 2), including

  • transcription factors/regulators, histone modifiers, genome integrity, receptor tyrosine kinase signal­ling, cell cycle, mitogen-activated protein kinases (MAPK) signalling, phosphatidylinositol-3-OH kinase (PI(3)K) signalling, Wnt/ -catenin signalling, histones, ubiquitin-mediatedproteolysis, and splicing (Fig. 2).

The identification of MAPK, PI(3)K and Wnt/ -catenin signaling path­ways is consistent with classical cancer studies. Notably, newer categories (for example, splicing, transcription regulators, metabolism, proteolysis and histones) emerge as exciting guides for the development of new therapeutic targets. Genes categorized as histone modifiers (Z = 0.57), PI(3)K signalling (Z = 1.03), and genome integrity (Z = 0.66) all relate to more than one cancer type, whereas

  • transcription factor/regulator (Z = 0.40), TGF- signalling (Z = 0.66), and Wnt/ -catenin signalling (Z = 0.55) genes tend to associate with single types (Methods).

Notably, 3,053 out of 3,281 total samples (93%) across the Pan-Cancer collection had at least one non-synonymous mutation in at least one SMG. The average number of point mutations and small indels in these genes varies across tumour types, with the highest (,6 mutations per tumour) in UCEC, LUAD and LUSC, and the lowest (,2 mutations per tumour) in AML, BRCA, KIRC and OV. This suggests that the numbers of both cancer-related genes (only 127 identified in this study) and cooperating driver mutations required during oncogenesis are small (most cases only had 2–6) (Fig. 3), although large-scale structural rearrangements were not included in this analysis.

Common mutations

The most frequently mutated gene in the Pan-Cancer cohort is TP53 (42% of samples). Its mutations predominate in serous ovarian (95%) and serous endometrial carcinomas (89%) (Fig. 2). TP53 mutations are also associated with basal subtype breast tumours. PIK3CA is the second most commonly mutated gene, occurring frequently (>10%) in most cancer types except OV, KIRC, LUAD and AML. PIK3CA mutations frequented UCEC (52%) and BRCA (33.6%), being speci­fically enriched in luminal subtype tumours. Tumours lacking PIK3CA mutations often had mutations in PIK3R1, with the highest occur­rences in UCEC (31%) and GBM (11%) (Fig. 2).

Many cancer types carried mutations in chromatin re-modelling genes. In particular, histone-lysine N-methyltransferase genes (MLL2 (also known as KMT2D), MLL3 (KMT2C) and MLL4 (KMT2B)) clus­ter in bladder, lung and endometrial cancers, whereas the lysine (K)-specific demethylase KDM5C is prevalently mutated in KIRC (7%). Mutations in ARID1A are frequent in BLCA, UCEC, LUAD and LUSC, whereas mutations in ARID5B predominate in UCEC (10%) (Fig. 2).

Fig. 1. Distribution of mutation frequencies across 12 cancer types.

Fig. 1.  | Distribution of mutation frequencies across 12 cancer types.

Dashed grey and solid white lines denote average across cancer types and median for each type, respectively. b, Mutation spectrum of six transition (Ti) and transversion (Tv) categories for each cancer type. c, Hierarchically clustered mutation context (defined by the proportion of A, T, C and G nucleotides within ±2bp of variant site) for six mutation categories. Cancer types correspond to colours in a. Colour denotes degree of correlation: yellow (r = 0.75) and red (r = 1).

Fig. 2.  The 127 SMGs from 20 cellular processes in cancer identified in and Pan-Cancer are shown, with the highest percentage in each gene among 12 (not shown)

Fig. 3. Distribution of mutations in 127 SMGs across Pan-Cancer cohort.

Fig. 3. | Distribution of mutations in 127 SMGs across Pan-Cancer cohort.

Box plot displays median numbers of non-synonymous mutations, with outliers shown as dots. In total, 3,210 tumours were used for this analysis (hypermutators excluded).

Figure 4 | Unsupervised clustering based on mutation status of SMGs. Tumours having no mutation or more than 500 mutations were excluded. A mutation status matrix was constructed for 2,611 tumours. Major clusters of mutations detected in UCEC, COAD, GBM, AML, KIRC, OV and BRCA were highlighted.
Complete gene list shown in Extended Data Fig. 3.  (not shown)

Fig. 5. Driver initiation and progression mutations and tumour clonal mutation is in the subclone

Figure 5 | Driver initiation and progression mutations and tumour clonal mutation is in the subclone

Survival Analysis

We examined which genes correlate with survival using the Cox proportional hazards model, first analysing individual cancer types using age and gender as covariates; an average of 2 genes (range: 0–4) with mutation frequency 2% were significant (P<_0.05) in each type (Supplementary Table 10a and Extended Data Fig. 6). KDM6A and ARID1A mutations correlate with better survival in BLCA (P = 0.03, hazard ratio (HR) = 0.36, 95% confidence interval (CI): 0.14–0.92) and UCEC (P = 0.03, HR = 0.11, 95% CI: 0.01–0.84), respectively, but mutations in SETBP1, recently identified with worse prognosis in atypical chronic myeloid leukaemia (aCML)31, have a significant detrimental effect in HNSC (P = 0.006, HR = 3.21, 95% CI: 1.39–7.44). BAP1 strongly correlates with poor survival (P = 0.00079, HR = 2.17, 95% CI: 1.38–3.41) in KIRC. Conversely, BRCA2 muta­tions (P = 0.02, HR = 0.31, 95% CI: 0.12–0.85) associate with better survival in ovarian cancer, consistent with previous reports32,33; BRCA1 mutations showed positive correlation with better survival, but did not reach significance here.

We extended our survival analysis across cancer types, restricting our attention to the subset of 97 SMGs whose mutations appeared in 2% of patients having survival data in 2 tumour types. Taking type, age and gender as covariates, we found 7 significant genes: BAP1DNMT3AHGFKDM5CFBXW7BRCA2 and TP53 (Extended Data Table 1).  In particular, BAP1 was highly significant (0.00013, HR = 2.20, 95% CI: 1.47–3.29, more than 53 mutated tumours out of 888 total), with mutations associating with detrimental outcome in four tumour types and notable associations in KIRC (P = 0.00079), consistent with a recent report28, and in UCEC(P = 0.066). Mutations in several other genes are detrimental, including DNMT3A (HR = 1.59), previously identified with poor prognosis in AML34, and KDM5C (HR = 1.63), FBXW7 (HR = 1.57) and TP53 (HR = 1.19). TP53 has significant associations with poor outcome in KIRC (P = 0.012), AML (P = 0.0007) and HNSC (P = 0.00007). Conversely, BRCA2 (P = 0.05, HR = 0.62, 95% CI: 0.38 to 0.99) correlates with survival benefit in six types, including OV and UCEC (Supplementary Table 10a, b). IDH1 mutations are associated with improved prognosis across the Pan-Cancer set (HR = 0.67, P = 0.16) and also in GBM (HR = 0.42, P = 0.09) (Supplementary Table 10a, b), consistent with previous work.35

 Driver mutations and tumour clonal architecture

To understand the temporal order of somatic events, we analysed the variant allele fraction (VAF) distribution of mutations in SMGs across AML, BRCA and UCEC (Fig. 5a and Supplementary Table 11a) and other tumour types (Extended Data Fig. 7). To minimize the effect of copy number alterations, we focused on mutations in copy neutral segments. Mutations in TP53 have higher VAFs on average in all three cancer types, suggesting early appearance during tumorigenesis.

It is worth noting that copy neutral loss of heterozygosity is commonly found in classical tumour suppressors such as TP53, BRCA1, BRCA2 and PTEN, leading to increased VAFs in these genes. In AML, DNMT3A (permutation test P = 0), RUNX1 (P = 0.0003) and SMC3 (P = 0.05) have significantly higher VAFs than average among SMGs (Fig. 5a and Supplementary Table 11b). In breast cancer, AKT1, CBFB, MAP2K4, ARID1A, FOXA1 and PIK3CA have relatively high average VAFs. For endometrial cancer, multiple SMGs (for example, PIK3CA, PIK3R1, PTEN, FOXA2 and ARID1A) have similar median VAFs. Conversely, KRAS and/or NRAS mutations tend to have lower VAFs in all three tumour types (Fig. 5a), suggesting NRAS (for example, P = 0 in AML) and KRAS (for example, P = 0.02 in BRCA) have a progression role in a subset of AML, BRCA and UCEC tumours. For all three cancer types, we clearly observed a shift towards higher expression VAFs in SMGs versus non-SMGs, most apparent in BRCA and UCEC (Extended Data Fig. 8a and Methods).

Previous analysis using whole-genome sequencing (WGS) detected subclones in approximately 50% of AML cases15,36,37; however, ana­lysis is difficult using AML exome owing to its relatively few coding mutations. Using 50 AML WGS cases, sciClone (http://github.com/ genome/sciclone) detected DNMT3A mutations in the founding clone for 100% (8 out of 8) of cases and NRAS mutations in the subclone for 75% (3 out of 4) of cases (Extended Data Fig. 8b). Among 304 and 160 of BRCA and UCEC tumours, respectively, with enough coding muta­tions for clustering, 35% BRCA and 44% UCEC tumours contained subclones. Our analysis provides the lower bound for tumour hetero­geneity, because only coding mutations were used for clustering. In BRCA, 95% (62 out of 65) of cases contained PIK3CA mutations in the founding clone, whereas 33% (3 out of 9) of cases had MLL3 muta­tions in the subclone. Similar patterns were found in UCEC tumours, with 96% (65 out of 68) and 95% (62 out of 65) of tumours containing PIK3CA and PTEN mutations, respectively, in the founding clone, and 9% (2 out of22) ofKRAS and 14% (1 out of 7) ofNRAS mutations in the subclone (Extended Data Fig. 8b and Supplementary Table 12).

Mutation con­text (-2 to +2 bp) was calculated for each somatic variant in each mutation category, and hierarchical clustering was then performed using the pairwise mutation context correlation across all cancer types. The mutational significance in cancer (MuSiC)3 package was used to identify significant genes for both indi­vidual tumour types and the Pan-Cancer collective. An R function ‘hclust’ was used for complete-linkage hierarchical clustering across mutations and samples, and Dendrix30 was used to identify sets of approximately mutual exclusive muta­tions. Cross-cancer survival analysis was based on the Cox proportional hazards model, as implemented in the R package ‘survival’ (http://cran.r-project.org/web/ packages/survival/), and the sciClone algorithm (http://github.com/genome/sci-clone) generated mutation clusters using point mutations from copy number neutral segments. A complete description of the materials and methods used to generate this data set and its results is provided in the Methods.

References (20 of 38)

  1. Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).
  2. Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
  3. Dees, N. D. et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).
  4. Roth, A. et al. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907–913 (2012).
  5. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnol. 31, 213–219 (2013).
  6. Jones, S. et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 1801–1806 (2008).
  7. Parsons, D. W. et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812 (2008).
  8. Sjo¨blom, T. etal. The consensuscodingsequences of human breast and colorectal cancers. Science 314, 268–274 (2006).
  9. The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
  10. Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–1075 (2008).
  11. Wood, L. D. etal. The genomic landscapesof human breast and colorectal cancers. Science 318, 1108–1113 (2007).
  12. The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011).
  13. The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
  14. Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
  15. The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
  16. The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
  17. Ellis, M. J. et al. Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 486, 353–360 (2012).
  18. The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).
  19. Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000).
  20. Downing, J. R. et al. The Pediatric Cancer Genome Project. Nature Genet. 44, 619–622 (2012).

UPDATED 4/24/2020  The genomic landscape of pediatric cancers: Curation of WES/WGS studies shows need for more data

The genomic landscape of pediatric cancers: Implications for diagnosis and treatment


SCIENCE15 MAR 2019 : 1170-1175

Source: https://science.sciencemag.org/content/363/6432/1170


The past decade has witnessed a major increase in our understanding of the genetic underpinnings of childhood cancer.  Genomic sequencing studies have highlighted key differences between pediatric and adult cancers.  Whereas many adult cancers are characterized by a high number of somatic mutations, pediatric cancers typically have few somatic mutations but a higher prevalence of germline alterations in cancer predisposition genes.  Also noteworthy is the remarkable heterogeneity in the types of genetic alterations that likely drive the growth of pediatric cancers, including copy number alterations, gene fusions, enhancer hijacking events, and chromoplexy.  Because most studies have genetically profiled pediatric cancers only at diagnosis, the mechanisms underlying tumor progression, therapy resistance, and metastasis remain poorly understood.  We discuss evidence that points to a need for more integrative approaches aimed at identifying driver events in pediatric cancers at both diagnosis and relapse.  We also provide an overview of key aspects of germline predisposition for cancer in this age group.

Approximately 300,000 children from infancy to age 14 are diagnosed with cancer worldwide every year (1). Some of the cancer types affecting the pediatric population are also seen in adolescents and young adults (AYA), but it has become increasingly clear that cancers in the latter age group have unique biological characteristics that can affect prognosis and therapy (2). Pediatric and AYA cancer patients present with a heterogeneous set of diseases that can be broadly subclassified as leukemias, brain tumors, and non–central nervous system (CNS) solid tumors. These subgroups contain numerous distinct clinical entities, many of which are still poorly characterized from a molecular standpoint.

Recent large-scale genomic analyses have increased our understanding of the genetic drivers of pediatric cancer and have helped to identify new clinically relevant subtypes. These studies have also underscored the distinct nature of the genetic alterations in pediatric and AYA cancers versus adult cancers. Of particular note, the number of somatic mutations in most pediatric cancers is substantially lower than that in adult cancers (34). Exceptions are tumors in children who carry germline mutations that compromise repair of DNA damage (5). For many pediatric cancers, driver events are conditioned on the developmental stage in which the tumor arises. For example, a mutation occurring in one developmental compartment (e.g., a muscle stem cell) may lead to cancer, whereas the same mutation in another compartment does not (6). Pediatric cancer genomes are also characterized by specific patterns of copy number alterations and structural alterations [chromoplexy (7), chromothripsis (8)] that are prognostic indicators in several cancer subtypes. Gene fusion events have long been recognized as oncogenic drivers in many pediatric cancers; however, advanced sequencing technologies have revealed that the number of fusion partners is greater than previously thought, and that previously undetected gene rearrangements may also function as drivers. Finally, germline mutations in a wide spectrum of genes that predispose to cancer appear to play a greater role in pediatric cancer than previously appreciated (910).

Somatic alterations in pediatric cancers

Genome landscape studies

Early large-scale sequencing studies of pediatric cancers identified novel driver genes while also underscoring the overall low mutational burden (1114).  Whole exome sequencing studies of Wilms tumor, T-cell acute lymphoblastic leukemia (TALL), and acute myeloid leukemia (CML) identified some recurring mutations such as

  • FLT3-IDT
  • WT1
  • NUP98-NST1 gene fusion

however many of the driver genes were subtype specific.  Other fusion events were seen (by RNASeq) such as

  • EWS-FL1
  • Bcr-Abl
  • MYB-QK1

as well as multiple epigenetic events such as methylations.


  1. E. Steliarova-Foucher, M. Colombet, L. A. G. Ries, F. Moreno, A. Dolya, F. Bray, P. Hesseling, H. Y. Shin, C. A. Stiller, IICC-3 contributors, International incidence of childhood cancer, 2001-10: A population-based registry study. Lancet Oncol. 18, 719–731 (2017). 10.1016/S1470-2045(17)30186-9pmid:28410997
  2. 2. V. Tricoli, D. G. Blair, C. K. Anders, W. A. Bleyer, L. A. Boardman, J. Khan, S. Kummar, B. Hayes-Lattin, S. P. Hunger, M. Merchant, N. L. Seibel, M. Thurin, C. L. Willman, Biologic and clinical characteristics of adolescent and young adult cancers: Acute lymphoblastic leukemia, colorectal cancer, breast cancer, melanoma, and sarcoma. Cancer 122, 1017–1028 (2016). 10.1002/cncr.29871pmid:26849082
  3. 3. S. Lawrence, P. Stojanov, P. Polak, G. V. Kryukov, K. Cibulskis, A. Sivachenko, S. L. Carter, C. Stewart, C. H. Mermel, S. A. Roberts, A. Kiezun, P. S. Hammerman, A. McKenna, Y. Drier, L. Zou, A. H. Ramos, T. J. Pugh, N. Stransky, E. Helman, J. Kim, C. Sougnez, L. Ambrogio, E. Nickerson, E. Shefler, M. L. Cortés, D. Auclair, G. Saksena, D. Voet, M. Noble, D. DiCara, P. Lin, L. Lichtenstein, D. I. Heiman, T. Fennell, M. Imielinski, B. Hernandez, E. Hodis, S. Baca, A. M. Dulak, J. Lohr, D.-A. Landau, C. J. Wu, J. Melendez-Zajgla, A. Hidalgo-Miranda, A. Koren, S. A. McCarroll, J. Mora, B. Crompton, R. Onofrio, M. Parkin, W. Winckler, K. Ardlie, S. B. Gabriel, C. W. M. Roberts, J. A. Biegel, K. Stegmaier, A. J. Bass, L. A. Garraway, M. Meyerson, T. R. Golub, D. A. Gordenin, S. Sunyaev, E. S. Lander, G. Getz, G. Getz, Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013). 10.1038/nature12213pmid:23770567
  4. B. Vogelstein, N. Papadopoulos, V. E. Velculescu, S. Zhou, L. A. Diaz Jr.., K. W. Kinzler, Cancer genome landscapes. Science 339, 1546–1558 (2013). 10.1126/science.1235122pmid:23539594
  5. 5. B. Campbell, N. Light, D. Fabrizio, M. Zatzman, F. Fuligni, R. de Borja, S. Davidson, M. Edwards, J. A. Elvin, K. P. Hodel, W. J. Zahurancik, Z. Suo, T. Lipman, K. Wimmer, C. P. Kratz, D. C. Bowers, T. W. Laetsch, G. P. Dunn, T. M. Johanns, M. R. Grimmer, I. V. Smirnov, V. Larouche, D. Samuel, A. Bronsema, M. Osborn, D. Stearns, P. Raman, K. A. Cole, P. B. Storm, M. Yalon, E. Opocher, G. Mason, G. A. Thomas, M. Sabel, B. George, D. S. Ziegler, S. Lindhorst, V. M. Issai, S. Constantini, H. Toledano, R. Elhasid, R. Farah, R. Dvir, P. Dirks, A. Huang, M. A. Galati, J. Chung, V. Ramaswamy, M. S. Irwin, M. Aronson, C. Durno, M. D. Taylor, G. Rechavi, J. M. Maris, E. Bouffet, C. Hawkins, J. F. Costello, M. S. Meyn, Z. F. Pursell, D. Malkin, U. Tabori, A. Shlien, Comprehensive Analysis of Hypermutation in Human Cancer. Cell 171, 1042–1056.e10 (2017). 10.1016/j.cell.2017.09.048pmid:29056344
  6. 6. Chen, A. Pappo, M. A. Dyer, Pediatric solid tumor genomics and developmental pliancy. Oncogene 34, 5207–5215 (2015). 10.1038/onc.2014.474pmid:25639868
  7. S. C. Baca, D. Prandi, M. S. Lawrence, J. M. Mosquera, A. Romanel, Y. Drier, K. Park, N. Kitabayashi, T. Y. MacDonald, M. Ghandi, E. Van Allen, G. V. Kryukov, A. Sboner, J.-P. Theurillat, T. D. Soong, E. Nickerson, D. Auclair, A. Tewari, H. Beltran, R. C. Onofrio, G. Boysen, C. Guiducci, C. E. Barbieri, K. Cibulskis, A. Sivachenko, S. L. Carter, G. Saksena, D. Voet, A. H. Ramos, W. Winckler, M. Cipicchio, K. Ardlie, P. W. Kantoff, M. F. Berger, S. B. Gabriel, T. R. Golub, M. Meyerson, E. S. Lander, O. Elemento, G. Getz, F. Demichelis, M. A. Rubin, L. A. Garraway, Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013). 10.1016/j.cell.2013.03.021pmid:23622249
  8. P. J. Stephens, C. D. Greenman, B. Fu, F. Yang, G. R. Bignell, L. J. Mudie, E. D. Pleasance, K. W. Lau, D. Beare, L. A. Stebbings, S. McLaren, M.-L. Lin, D. J. McBride, I. Varela, S. Nik-Zainal, C. Leroy, M. Jia, A. Menzies, A. P. Butler, J. W. Teague, M. A. Quail, J. Burton, H. Swerdlow, N. P. Carter, L. A. Morsberger, C. Iacobuzio-Donahue, G. A. Follows, A. R. Green, A. M. Flanagan, M. R. Stratton, P. A. Futreal, P. J. Campbell, Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011). 10.1016/j.cell.2010.11.055pmid:21215367
  9. D. W. Parsons, A. Roy, Y. Yang, T. Wang, S. Scollon, K. Bergstrom, R. A. Kerstein, S. Gutierrez, A. K. Petersen, A. Bavle, F. Y. Lin, D. H. López-Terrada, F. A. Monzon, M. J. Hicks, K. W. Eldin, N. M. Quintanilla, A. M. Adesina, C. A. Mohila, W. Whitehead, A. Jea, S. A. Vasudevan, J. G. Nuchtern, U. Ramamurthy, A. L. McGuire, S. G. Hilsenbeck, J. G. Reid, D. M. Muzny, D. A. Wheeler, S. L. Berg, M. M. Chintagumpala, C. M. Eng, R. A. Gibbs, S. E. Plon, Diagnostic Yield of Clinical Tumor and Germline Whole-Exome Sequencing for Children With Solid Tumors. JAMA Oncol. 2, 616 (2016). 10.1001/jamaoncol.2015.5699pmid:26822237
  10. J. Zhang, M. F. Walsh, G. Wu, M. N. Edmonson, T. A. Gruber, J. Easton, D. Hedges, X. Ma, X. Zhou, D. A. Yergeau, M. R. Wilkinson, B. Vadodaria, X. Chen, R. B. McGee, S. Hines-Dowell, R. Nuccio, E. Quinn, S. A. Shurtleff, M. Rusch, A. Patel, J. B. Becksfort, S. Wang, M. S. Weaver, L. Ding, E. R. Mardis, R. K. Wilson, A. Gajjar, D. W. Ellison, A. S. Pappo, C.-H. Pui, K. E. Nichols, J. R. Downing, Germline Mutations in Predisposition Genes in Pediatric Cancer. N. Engl. J. Med. 373, 2336–2346 (2015). 10.1056/NEJMoa1508054pmid:26580448
  11. T. J. Pugh, O. Morozova, E. F. Attiyeh, S. Asgharzadeh, J. S. Wei, D. Auclair, S. L. Carter, K. Cibulskis, M. Hanna, A. Kiezun, J. Kim, M. S. Lawrence, L. Lichenstein, A. McKenna, C. S. Pedamallu, A. H. Ramos, E. Shefler, A. Sivachenko, C. Sougnez, C. Stewart, A. Ally, I. Birol, R. Chiu, R. D. Corbett, M. Hirst, S. D. Jackman, B. Kamoh, A. H. Khodabakshi, M. Krzywinski, A. Lo, R. A. Moore, K. L. Mungall, J. Qian, A. Tam, N. Thiessen, Y. Zhao, K. A. Cole, M. Diamond, S. J. Diskin, Y. P. Mosse, A. C. Wood, L. Ji, R. Sposto, T. Badgett, W. B. London, Y. Moyer, J. M. Gastier-Foster, M. A. Smith, J. M. Guidry Auvil, D. S. Gerhard, M. D. Hogarty, S. J. M. Jones, E. S. Lander, S. B. Gabriel, G. Getz, R. C. Seeger, J. Khan, M. A. Marra, M. Meyerson, J. M. Maris, The genetic landscape of high-risk neuroblastoma. Nat. Genet. 45, 279–284 (2013). 10.1038/ng.2529pmid:23334666
  12. J. R. Downing, R. K. Wilson, J. Zhang, E. R. Mardis, C.-H. Pui, L. Ding, T. J. Ley, W. E. Evans, The Pediatric Cancer Genome Project. Nat. Genet. 44, 619–622 (2012). 10.1038/ng.2287pmid:22641210
  13. St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project, Somatic histone H3 alterations in pediatric diffuse intrinsic pontine gliomas and non-brainstem glioblastomas. Nat. Genet. 44, 251–253 (2012). 10.1038/ng.1102pmid:22286216
  14. J. Zhang, L. Ding, L. Holmfeldt, G. Wu, S. L. Heatley, D. Payne-Turner, J. Easton, X. Chen, J. Wang, M. Rusch, C. Lu, S.-C. Chen, L. Wei, J. R. Collins-Underwood, J. Ma, K. G. Roberts, S. B. Pounds, A. Ulyanov, J. Becksfort, P. Gupta, R. Huether, R. W. Kriwacki, M. Parker, D. J. McGoldrick, D. Zhao, D. Alford, S. Espy, K. C. Bobba, G. Song, D. Pei, C. Cheng, S. Roberts, M. I. Barbato, D. Campana, E. Coustan-Smith, S. A. Shurtleff, S. C. Raimondi, M. Kleppe, J. Cools, K. A. Shimano, M. L. Hermiston, S. Doulatov, K. Eppert, E. Laurenti, F. Notta, J. E. Dick, G. Basso, S. P. Hunger, M. L. Loh, M. Devidas, B. Wood, S. Winter, K. P. Dunsmore, R. S. Fulton, L. L. Fulton, X. Hong, C. C. Harris, D. J. Dooling, K. Ochoa, K. J. Johnson, J. C. Obenauer, W. E. Evans, C.-H. Pui, C. W. Naeve, T. J. Ley, E. R. Mardis, R. K. Wilson, J. R. Downing, C. G. Mullighan, The genetic basis of early T-cell precursor acute lymphoblastic leukaemia. Nature 481, 157–163 (2012). 10.1038/nature10725pmid:22237106

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

Genetic Basis of Complex Human Diseases: Dan Koboldt’s Advice to Next-Generation Sequencing Neophytes

Word Cloud by Daniel Menzin

UPDATED 3/27/2013

The Exome is Not Enough

March 27, 2013

Dan Koboldt at MassGenomics explains why exome sequencing often fails to identify causal variants, even in Mendelian disorders — “the very plausible possibility that a noncoding functional variant is responsible.”

Koboldt, the analysis manager in the human genetics group at the Genome Institute at Washington University, says that researchers shouldn’t overlook the importance of noncoding functional variants, which require a suite of technologies to detect, including RNA-seq, ChiP-seq, DNAse sequencing and footprinting, bisulfite sequencing, and chromosome conformation capture.

“These types of experiments generate a wealth of data about regulatory activity in genomes,” he says. “While studying each of these independently is certainly informative, integrative analysis will be required to elucidate how all of these different regulatory mechanisms work together.”

While this effort will require “robust statistical models, substantial computing resources, and productive collaboration among research groups, the end result “will be a far more complete understanding of how the genome works,” he says.


Dan Koboldt works as a staff scientist in the Human Genetics group of the Genome Institute at Washington University in St. Louis. There, he works with scientists, physicians, programmers, and data analysts to understand the genetic basis of complex human diseases such as cancer, vision disorders, and metabolic syndromes through next-gen sequencing analysis. He received bachelor’s degrees in Computer Science and French from the University of Missouri-Columbia, and a master’s degree in Biology fromWashington University.

Dan has worked in the field of human genetics since 2003, when he joined the lab of Raymond E. Miller, which played a role in the International HapMap Project and later the genetic map of C. briggsae, a model organism related to C. elegans.

Disclaimer: The views expressed on this site, including blog posts and static pages, do not necessarily reflect the opinions of the Genome Institute at Washington University, the Washington University School of Medicine, or Washington University in St. Louis.

Before diving in with both feet, next-generation sequencing neophytes might want to take a gander at a post by Dan Koboldt at MassGenomics where he describes his 10 commandments for good next-gen sequencing.

In his post, Koboldt breaks up his instructions into four categories: analysis, publications, data sharing and submissions, and research ethics and cost.

His list includes some oft repeated warnings. For example, he cautions against reinventing the wheel when it comes to developing analysis software, and, for pity’s sake, don’t invent any more words that end in “ome” or “omics.”

Some other no-no’s, according to Koboldt, include publishing results before they’ve been vetted properly, testing new methods on simulated data only, and taking “unfair advantage of submitted data.”

He also admonishes newcomers to think a little bit about the cost of analysis without which “your sequencing data, your $1,000 genome, is about as useful as a chocolate teapot,” and to have a care for the privacy of their study participants’ samples and data.

Ten Commandments for Next-Gen Sequencing

10 ngs commandmentsJust as the reach of next-generation sequencing has continued to grow — in both research and clinical realms — so too has the community of NGS users.  Some have been around since the early days. The days of 454 and Solexa sequencing. Since then, the field has matured at an astonishing pace. Many standards were established to help everyone make sense of this flood of data. The recent democratization of sequencing has made next-gen sequencing available to just about anyone.

And yet, there have been growing pains. With great power comes great responsibility. To help some of the newcomers into the field, I’ve drafted these ten commandments for next-gen sequencing.

NGS Analysis

1. Thou shalt not reinvent the wheel. In spite of rapid technological advances, NGS is not a new field. Most of the current “workhorse” technologies have been on the market for a couple of years or more. As such, we have a plethora of short read aligners, de novo assemblers, variant callers, and other tools already. Even so, there is a great temptation for bioinformaticians to write their own “custom scripts” to perform these tasks. There’s a new “Applications Note” every day with some tool that claims to do something new or better.

Can you really write an aligner that’s better than BWA? More importantly, do we need one? Unless you have some compelling reason to develop something new (as we did when we developed SomaticSniper and VarScan), take advantage of what’s already out there.

2. Thou shalt not coin any new term ending with “ome” or “omics”. We have enough of these already, to the point where it’s getting ridiculous. Genome, transcriptome, and proteome are obvious applications of this nomenclature. Epigenome, sure. But the metabolome, interactome, and various other “ome” words are starting to detract from the naming system. The ones we need have already been coined. Don’t give in to the temptation.

3. Thou shall follow thy field’s conventions for jargon. Technical terms, acronyms, and abbreviations are inherent to research. We need them both for precision and brevity. When we get into trouble is when people feel the need to create their own acronyms when a suitable one already exists. Is there a significant difference between next-generation sequencing (NGS), high-throughput sequencing (HTS), and massively parallel sequencing (MPS)?

Widely accepted terms provide something of a standard, and they should be used whenever possible. Insertion/deletion variants are indels, not InDels or INDELs DIPs. Structural variants are SVs, not SVars or GVs. We don’t need any more acronyms!

NGS Publications

These commandments address behaviors that get on my nerves, both as a blogger and a peer reviewer.

4. Thou shalt not publish by press release. This is a disturbing trend that seems to happen more and more frequently in our field: the announcement of “discoveries” before they have been accepted for publication. Peer review is the required vetting process for scientific research. Yes, it takes time and yes, your competitors are probably on the verge of the same discovery. That doesn’t mean you get to skip ahead and claim credit by putting out a press release.

There are already examples of how this can come back to bite you. When the reviewers trash your manuscript, or (gasp) you learn that a mistake was made, it looks bad. It reflects poorly on the researchers and the institution, both in the field and in the eyes of the public.

5. Thou shalt not rely only on simulated data. Often when I read a paper on a new method or algorithm, they showcase it using simulated data. This often serves a noble purpose, such as knowing the “correct” answer and demonstrating that your approach can find it. Even so, you’d better apply it to some real data too. Simulations simply can’t replicate the true randomness of nature and the crap-that-can-go-wrong reality of next-gen sequencing. There’s plenty of freely available data out there; go get some of it.

6. Thou shalt obtain enough samples. One consequence of the rapid growth of our field (and accompanying drop in sequencing costs) is that small sample numbers no longer impress anyone. They don’t impress me, and they certainly don’t impress the statisticians upstairs. The novelty of exome or even whole-genome sequencing has long worn off. Now, high-profile studies must back their findings with statistically significant results, and that usually means finding a cohort of hundreds (or thousands) of patients with which to extend your findings.

This new reality may not be entirely bad news, because it surely will foster collaboration between groups that might otherwise not be able to publish individually.

Data Sharing and Submissions

7. Thou shalt withhold no data. With some exceptions, sequencing datasets are meant to be shared. Certain institutions, such as large-scale sequencing centers in the U.S., are mandated by their funding agencies to deposit data generated using public funds on a timely basis following its generation. Since the usual deposition site is dbGaP, this means that IRB approvals and dbGaP certification letters must be in hand before sequencing can begin.

Any researchers who plan to publish their findings based on sequencing datasets will have to submit them to public datasets before publication.This is not optional. It is not “something we should do when we get around to it after the paper goes out.” It is required to reproduce the work, so it should really be done before a manuscript is submitted. Consider this excerpt from Nature‘s publication guidelines:

Data sets must be made freely available to readers from the date of publication, and must be provided to editors and peer-reviewers at submission, for the purposes of evaluating the manuscript.

For the following types of data set, submission to a community-endorsed, public repository is mandatory. Accession numbers must be provided in the paper.

The policies go on to list various types of sequencing data:

  • DNA and RNA sequences
  • DNA sequencing data (traces for capillary electrophoresis and short reads for next-generation sequencing)
  • Deep sequencing data
  • Epitopes, functional domains, genetic markers, or haplotypes.

Every journal should have a similar policy; most top-tier journals already do. Editors and referees need to enforce this submission requirement by rejecting any manuscripts that do not include the submission accession numbers.

8. Thou shalt not take unfair advantage of submitted data. Many investigators are concerned about data sharing (especially when mandated upon generation, not publication) from fear of being scooped. This is a valid concern. When you submit your data to a public repository, others can find it and (if they meet the requirements) use it. Personally, I think most of these fears are not justified — I mean, have you ever tried to get data out of dbGaP? The time it takes for someone to find, request, obtain, and use submitted data should allow the producers of the data to write it up.

Large-scale efforts to which substantial resources have been devoted — such as the Cancer Genome Atlas — have additional safeguards in place. Their data use policy states that, for a given cancer type, submitted data can’t be used until the “marker paper” has been published. This is a good rule of thumb for the NGS community, and something that journal editors (and referees) haven’t always enforced.

Just because you can scoop someone doesn’t mean that you should. It’s not only bad karma, but bad for your reputation. Scientists have long memories. They will likely review your manuscript or grant proposal sometime in the future. When that happens, you want to be the person who took the high road.

Research Ethics and Cost

9. Thou shalt not discount the cost of analysis. It’s true that since the advent of NGS technology, the cost of sequencing has plummeted. The cost of analysis, however, has not. And making sense of genomic data — alignment, quality control, variant calling, annotation, interpretation — is a daunting task indeed. It takes computational resources as well as expertise. This infrastructure is not free; in fact, it can be more expensive than the sequencing itself. 

Without analysis, your sequencing data, your $1,000 genome, is about as useful as a chocolate teapot.

10. Thou shalt honor thy patients and their samples. Earlier this month, I wrote about how supposedly anonymous individuals from the CEPH collection were identified using a combination of genetic markers and online databases. It is a simple fact that we can no longer guarantee a sequenced sample’s anonymity. That simple fact, combined with our growing ability to interpret the possible consequences of an individual genome, means a great deal of risk for study volunteers.

We must safeguard the privacy of study participants — and find ways to protect them from privacy violations and/or discrimination — if we want their continued cooperation.

This means obtaining good consent documents and ensuring that they’re all correct before sequencing begins. It also means adhering to the data use policies those consents specify. As I’ve written before, samples are the new commodity in our field. Anyone can rent time on a sequencer. If you don’t make an effort to treat your samples right, someone else will.

Related Posts:


Dan Koboldt’s Publications

Bose R, Kavuri SM, Searleman AC, Shen W, Shen D, Koboldt DC, Monsey J, Goel N, Aronson AB, Li S, Ma CX, Ding L, Mardis ER, & Ellis MJ (2013).Activating HER2 mtations in HER2 gene amplification negative breast cancer. Cancer discovery PMID: 23220880

The 1000 Genomes Project Consortium (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56-65. DOI: 10.1038/nature11632

Cancer Genome Atlas Network (2012). Comprehensive molecular portraits of human breast tumours. Nature, 490 (7418), 61-70 PMID:23000897

Ellis MJ, Ding L, Shen D, Luo J, Suman VJ, Wallis JW, Van Tine BA, Hoog J, Goiffon RJ, Goldstein TC, Ng S, Lin L, Crowder R, Snider J, Ballman K, Weber J, Chen K, Koboldt DC, Kandoth C, Schierding WS, McMichael JF, Miller CA, Lu C, Harris CC, McLellan MD, Wendl MC, DeSchryver K, Allred DC, Esserman L, Unzeitig G, Margenthaler J, Babiera GV, Marcom PK, Guenther JM, Leitch M, Hunt K, Olson J, Tao Y, Maher CA, Fulton LL, Fulton RS, Harrison M, Oberkfell B, Du F, Demeter R, Vickery TL, Elhammali A, Piwnica-Worms H, McDonald S, Watson M, Dooling DJ, Ota D, Chang LW, Bose R, Ley TJ, Piwnica-Worms D, Stuart JM, Wilson RK, & Mardis ER (2012). Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature, 486 (7403), 353-60 PMID: 22722193

Welch JS, Ley TJ, Link DC, Miller CA, Larson DE, Koboldt DC, Wartman LD, Lamprecht TL, Liu F, Xia J, Kandoth C, Fulton RS, McLellan MD, Dooling DJ, Wallis JW, Chen K, Harris CC, Schmidt HK, Kalicki-Veizer JM, Lu C, Zhang Q, Lin L, O’Laughlin MD, McMichael JF, Delehaunty KD, Fulton LA, Magrini VJ, McGrath SD, Demeter RT, Vickery TL, Hundal J, Cook LL, Swift GW, Reed JP, Alldredge PA, Wylie TN, Walker JR, Watson MA, Heath SE, Shannon WD, Varghese N, Nagarajan R, Payton JE, Baty JD, Kulkarni S, Klco JM, Tomasson MH, Westervelt P, Walter MJ, Graubert TA, DiPersio JF, Ding L, Mardis ER, & Wilson RK (2012). The origin and evolution of mutations in acute myeloid leukemia. Cell, 150 (2), 264-78 PMID: 22817890

Cancer Genome Atlas Network (2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature, 487(7407), 330-7 PMID: 22810696

Dees ND, Zhang Q, Kandoth C, Wendl MC, Schierding W, Koboldt DC, Mooney TB, Callaway MB, Dooling D, Mardis ER, Wilson RK, & Ding L (2012). MuSiC: identifying mutational significance in cancer genomes.Genome research, 22 (8), 1589-98 PMID: 22759861

Walter MJ, Shen D, Ding L, Shao J, Koboldt DC, Chen K, Larson DE, McLellan MD, Dooling D, Abbott R, Fulton R, Magrini V, Schmidt H, Kalicki-Veizer J, O’Laughlin M, Fan X, Grillot M, Witowski S, Heath S, Frater JL, Eades W, Tomasson M, Westervelt P, DiPersio JF, Link DC, Mardis ER, Ley TJ, Wilson RK, & Graubert TA (2012). Clonal architecture of secondary acute myeloid leukemia. The New England journal of medicine, 366(12), 1090-8 PMID: 22417201

Matsushita H, Vesely MD, Koboldt DC, Rickert CG, Uppaluri R, Magrini VJ, Arthur CD, White JM, Chen YS, Shea LK, Hundal J, Wendl MC, Demeter R, Wylie T, Allison JP, Smyth MJ, Old LJ, Mardis ER, & Schreiber RD (2012).Cancer exome analysis reveals a T-cell-dependent mechanism of cancer immunoediting. Nature, 482 (7385), 400-4 PMID: 22318521

Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, & Wilson RK (2012). VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research PMID: 22300766

Koboldt DC, Larson DE, Chen K, Ding L, & Wilson RK (2012). Massively parallel sequencing approaches for characterization of structural variation. Methods in molecular biology (Clifton, N.J.), 838, 369-84 PMID:22228022

Graubert TA, Shen D, Ding L, Okeyo-Owuor T, Lunn CL, Shao J, Krysiak K, Harris CC, Koboldt DC, Larson DE, McLellan MD, Dooling DJ, Abbott RM, Fulton RS, Schmidt H, Kalicki-Veizer J, O’Laughlin M, Grillot M, Baty J, Heath S, Frater JL, Nasim T, Link DC, Tomasson MH, Westervelt P, DiPersio JF, Mardis ER, Ley TJ, Wilson RK, & Walter MJ (2011). Recurrent mutations in the U2AF1 splicing factor in myelodysplastic syndromes. Nature genetics, 44 (1), 53-7 PMID: 22158538

Larson DE, Harris CC, Chen K, Koboldt DC, Abbott TE, Dooling DJ, Ley TJ, Mardis ER, Wilson RK, & Ding L. (2011). SomaticSniper: Identification of Somatic Point Mutations in Whole Genome Sequencing Data.Bioinformatics, Online : doi: 10.1093/bioinformatics/btr665

Cancer Genome Atlas Research Network (2011). Integrated genomic analyses of ovarian carcinoma. Nature, 474 (7353), 609-15 PMID:21720365

Marth GT, Yu F, Indap AR, Garimella K, et al & the 1000 Genomes Project (2011). The functional spectrum of low-frequency coding variation.Genome biology, 12 (9) PMID: 21917140

Ross JA, Koboldt DC, Staisch JE, Chamberlin HM, Gupta BP, Miller RD, Baird SE, & Haag ES (2011). Caenorhabditis briggsae recombinant inbred line genotypes reveal inter-strain incompatibility and the evolution of recombination. PLoS genetics, 7 (7) PMID: 21779179

Bowne SJ, Humphries MM, Sullivan LS, Kenna PF, Tam LC, Kiang AS, Campbell M, Weinstock GM, Koboldt DC, Ding L, Fulton RS, Sodergren EJ, et al (2011). A dominant mutation in RPE65 identified by whole-exome sequencing causes retinitis pigmentosa with choroidal involvement. European journal of human genetics : EJHG, 19 (10) PMID:21938004

Link DC, Schuettpelz LG, Shen D, Wang J, Walter MJ, Kulkarni S, Payton JE, Ivanovich J, Goodfellow PJ, Le Beau M, Koboldt DC, Dooling DJ, Fulton RS, et al (2011). Identification of a novel TP53 cancer susceptibility mutation through whole-genome sequencing of a patient with therapy-related AML. JAMA : the journal of the American Medical Association, 305 (15), 1568-76 PMID: 21505135

Ley T, Ding L, Walter M, McLellan M, Lamprecht T, Larson D, Kandoth C, Payton J, Baty J, Welch J, Harris C, Lichti C, Townsend R, Fulton R, Dooling D, Koboldt D, et al. (2010). DNMT3A Mutations in Acute Myeloid Leukemia
New England Journal of Medicine DOI: 10.1056/NEJMoa1005143

Ding L, Wendl MC, Koboldt DC, & Mardis ER (2010). Analysis of next-generation genomic data in cancer: accomplishments and challenges. Human Molecular Genetics, 19 (R2):R188-96. PMID:20843826

Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J, 1000 Genomes Project, & Eichler EE (2010). Diversity of human copy number variation and multicopy genes. Science (New York, N.Y.), 330 (6004), 641-6 PMID: 21030649

The 1000 Genomes Project Consortium (2010). A map of human genome variation from population-scale sequencing. Nature, 467(7319), 1061-1073 DOI: 10.1038/nature09534

Bowne SJ, Sullivan LS, Koboldt DC, Ding L, Fulton R, Abbott RM, Sodergren EJ, Birch DG, Wheaton DH, Heckenlively JR, Liu Q, Pierce EA, Weinstock GM, & Daiger SP (2010). Identification of Disease-Causing Mutations in Autosomal Dominant Retinitis Pigmentosa (adRP) Using Next-Generation DNA Sequencing. Investigative ophthalmology & visual science PMID: 20861475

Fehniger, T., Wylie, T., Germino, E., Leong, J., Magrini, V., Koul, S., Keppel, C., Schneider, S., Koboldt, D., Sullivan, R., Heinz, M., Crosby, S., Nagarajan, R., Ramsingh, G., Link, D., Ley, T., & Mardis, E. (2010). Next-generation sequencing identifies the natural killer cell microRNA transcriptome Genome Research DOI: 10.1101/gr.107995.110

Ramsingh G, Koboldt DC, Trissal M, Chiappinelli KB, Wylie T, Koul S, Chang LW, Nagarajan R, Fehniger TA, Goodfellow P, Magrini V, Wilson RK, Ding L, Ley TJ, Mardis ER, & Link DC (2010). Complete characterization of the microRNAome in a patient with acute myeloid leukemia. BloodPMID: 20876853

Koboldt DC, Ding L, Mardis ER & Wilson RK. (2010). Challenges of sequencing human genomes. Briefings in Bioinformatics DOI:10.1093/bib/bbq016

Ding L, Ellis MJ, Li S, Larson DE, Chen K, Wallis JW, Harris CC, McLellan MD, Fulton RS, Fulton LL, Abbott RM, Hoog J, Dooling DJ, Koboldt DC, et al. (2010). Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature, 464 (7291), 999-1005 PMID:20393555

Koboldt DC and Miller RD (2010). Identification of polymorphic markers for genetic mapping. Genomics: Essential Methods, In Press.

Koboldt DC, Staisch J, Thillainathan B, Haines K, Baird SE, Chamberlin HM, Haag ES, Miller RD, & Gupta BP (2010). A toolkit for rapid gene mapping in the nematode Caenorhabditis briggsae. BMC genomics, 11 (1) PMID: 20385026

Voora D, Koboldt DC, King CR, Lenzini PA, Eby CS, Porche-Sorbet R, Deych E, Crankshaw M, Milligan PE, McLeod HL, Patel SR, Cavallari LH, Ridker PM, Grice GR, Miller RD, & Gage BF (2010). A polymorphism in the VKORC1 regulator calumenin predicts higher warfarin dose requirements in African Americans. Clinical pharmacology and therapeutics, 87 (4), 445-51 PMID: 20200517

Zhang Q, Ding L, Larson DE, Koboldt DC, McLellan MD, Chen K, Shi X, Kraja A, et al (2009). CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. Bioinformatics (Oxford, England) PMID: 20031968

Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K, Koboldt DC, et al (2009). Recurring mutations found by sequencing an acute myeloid leukemia genome. The New England journal of medicine, 361(11), 1058-66 PMID: 19657110

Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, Weinstock GM, Wilson RK, & Ding L (2009). VarScan: variant detection in massively parallel sequencing of individual and pooled samples.Bioinformatics (Oxford, England), 25 (17), 2283-5 PMID: 19542151

Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore BH, McGrath S, Hickenbotham M, Cook L, Abbott R, Larson DE, Koboldt DC, et al (2008). DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature, 456 (7218), 66-72 PMID: 18987736

Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, et al (2008). Somatic mutations affect key pathways in lung adenocarcinoma. Nature, 455 (7216), 1069-75 PMID: 18948947

Cancer Genome Atlas Research Network (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455 (7216), 1061-8 PMID: 18772890

International HapMap Consortium (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449 (7164), 851-61 PMID: 17943122

Sabeti PC, Varilly P, Fry B, et al (2007). Genome-wide detection and characterization of positive selection in human populations. Nature, 449 (7164), 913-8 PMID: 17943131

Hillier LW, Miller RD, Baird SE, Chinwalla A, Fulton LA, Koboldt DC, & Waterston RH (2007). Comparison of C. elegans and C. briggsaegenome sequences reveals extensive conservation of chromosome organization and synteny. PLoS biology, 5 (7) PMID: 17608563

Stanley SL Jr, Frey SE, Taillon-Miller P, Guo J, Miller RD, Koboldt DC, Elashoff M, Christensen R, Saccone NL, & Belshe RB (2007). The immunogenetics of smallpox vaccination. The Journal of infectious diseases, 196 (2), 212-9 PMID: 17570108

Koboldt DC, Miller RD, & Kwok PY (2006). Distribution of human SNPs and its effect on high-throughput genotyping. Human mutation, 27(3), 249-54 PMID: 16425292

The International HapMap Consortium (2005). A haplotype map of the human genome. Nature, 437 (7063), 1299-1320 PMID: 16255080

Miller RD, Phillips MS, et al (2005). High-density single-nucleotide polymorphism maps of the human genome. Genomics, 86 (2), 117-26 PMID: 15961272

Other Writing by Dan Koboldt

Dan Koboldt is also the author of Get Your Baby to Sleep, a resource to help new parents whose baby won’t sleep with advice on establishing healthy baby sleep habits and handling baby sleep problems. He contributes to The Best of Twins and In Search of Whitetails blogs as well.

How would you like to start your own blog? See this guide to building a blog or website in 20 minutes. It walks you through setting up a site with open-source WordPress software, which happens to be what runs Massgenomics.


Other related articles on this Open Access Online Scientific Journal:

“Genome in a Bottle”: NIST’s new metrics for Clinical Human Genome Sequencing “Genome in a Bottle”: NIST’s new metrics for Clinical Human Genome Sequencing


DNA – The Next-Generation Storage Media for Digital Information


How Genome Sequencing is Revolutionizing Clinical Diagnostics


NGS Market: Trends and Development for Genotype-Phenotype Associations Research


What is the Future for Genomics in Clinical Medicine?


Genomically Guided Treatment after CLIA Approval: to be offered by Weill Cornell Precision Medicine Institute


Inaugural Genomics in Medicine – The Conference Program, 2/11-12/2013, San Francisco, CA


GSK for Personalized Medicine using Cancer Drugs needs Alacris systems biology model to determine the in silico effect of the inhibitor in its “virtual clinical trial”


arrayMap: Genomic Feature Mining of Cancer Entities of Copy Number Abnormalities (CNAs) Data


NGS Cardiovascular Diagnostics: Long-QT Genes Sequenced – A Potential Replacement for Molecular Pathology


Speeding Up Genome Analysis: MIT Algorithms for Direct Computation on Compressed Genomic Datasets


Clinical Genetics, Personalized Medicine, Molecular Diagnostics, Consumer-targeted DNA – Consumer Genetics Conference (CGC) – October 3-5, 2012, Seaport Hotel, Boston, MA


“CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics” lays the manifold multivariate systems analytical tools that has moved the science forward to a groung that ensures clinical application.


Read Full Post »

Reported by Dr. Venkat S Karra, Ph.D.

A series of proteins in blood could form the basis of a test for Alzheimer’s disease in the future, say scientists in the US. They employed proteomics to identify proteins that were expressed at different levels in the blood of patients with Alzheimer’s disease or mild cognitiive impairment compared with those of healthy control patients. The results are described in Neurology.


Four plasma analytes remained after cross-checking against the findings of the Alzheimer’s Disease Neuroimaging Initiative (ADNI). They are apolipoprotein E, B-type natriuretic peptide, C-reactive protein and pancreatic polypeptide. Their levels also correlated with the cerebrospinal fluid contents of beta-amyloid proteins, which have been associated with the onset of Alzheimer’s disease. It is still too early to say for sure that a blood test based on these proteins would work. One of the next steps should be to confirm the link between the biomarkers in blood and cerebrospinal fluid.

source: spectroscopynow

Read Full Post »

Sunitinib brings Adult Acute Lymphoblastic Leukemia (ALL) to Remission – RNA Sequencing – FLT3 Receptor Blockade

Curator: Aviva Lev-Ari, PhD, RN

Sunitinib brings Adult Acute Lymphoblastic Leukemia (ALL) to Remission – RNA Sequencing – FLT3 Receptor Blockade

Word Cloud by Daniel Menzin

Updated 11/13/2013

Pazopanib versus Sunitinib in Renal Cancer

N Engl J Med 2013; 369:1968-1970November 14, 2013DOI: 10.1056/NEJMc1311795


To the Editor:

Cancer treatments are expensive. The estimation of the total cost can be challenging because of several factors such as efficacy, toxicity, and the costs and duration of supportive care and end-of-life care. Motzer et al. (Aug. 22 issue)1 report similar efficacy but a favorable safety and quality-of-life profile and less medical resource utilization with pazopanib as compared with sunitinib in first-line therapy for metastatic renal cancer. Since oncology is becoming an increasingly value-based specialty, we wanted to highlight another important aspect of this trial. Pazopanib appears to be favorable not only in terms of safety and quality of life, but also in terms of overall cost. A 30-day supply of pazopanib (at a dose of 800 mg daily) ranges from $3,500 to $8,556, whereas a 30-day supply of sunitinib (at a dose of 50 mg daily) ranges from $4,500 to $13,559.2 The total cost of pazopanib during the median progression-free survival of 8.4 months is $29,400 to $71,870, and the total cost of sunitinib during the median progression-free survival of 9.5 months is $42,750 to $127,454. Less toxicity and less medical resource utilization with pazopanib will most likely further lower the overall costs of treatment with this agent. Comparative-effectiveness trials hold great promise for maximizing patient safety, improving treatment outcomes, and reducing costs.

Ryan Ramaekers, M.D.
Mark Tharnish, Pharm.D.
M. Sitki Copur, M.D.
Saint Francis Cancer Treatment Center, Grand Island, NE

No potential conflict of interest relevant to this letter was reported.

2 References

To the Editor:

Motzer et al. report a combined analysis of two open-label noninferiority trials (927 patients in the original trial and 183 patients in a second trial), each of which compared pazopanib with sunitinib with respect to progression-free survival in renal-cell carcinoma. Quality-of-life outcomes were subjective.

Analysis of noninferiority trials is notoriously difficult.1,2 The authors’ analysis of the trials, which was open-label because of the different administration schedules of the drugs, presents problems in interpreting progression-free survival and quality of life. The studies define disease progression differently. The larger study defined progression-free survival according to independent review. The protocol for the smaller study states that progression-free survival “will be summarized . . . based on the investigator assessment.” Inference from subjective outcomes in unmasked trials (e.g., quality of life in both studies and progression-free survival in the smaller study and therefore in the combined analysis) is subject to well-known bias. Moreover, the article does not state how many of the 379 participants (34%) who discontinued the intervention before death or disease progression (see Fig. S2 in the Supplementary Appendix, available with the full text of the article at NEJM.org) were assessed for progression-free survival. A fair comparison must use rigorous methods to handle missing data.3 Since the article did not deal appropriately with missing data, its conclusions regarding noninferiority are uninterpretable.

Janet Wittes, Ph.D.
Statistics Collaborative, Washington, DC

Dr. Wittes reports that her company, Statistics Collaborative, has consulting agreements with both GlaxoSmithKline and Pfizer, the manufacturers of the drugs discussed in the article by Motzer et al. In addition, Statistics Collaborative has contracts with several other companies that produce drugs for patients with cancer. No other potential conflict of interest relevant to this letter was reported.

3 References

To the Editor:

Motzer et al. state that “the results of the progression-free survival analysis in the per-protocol population were consistent with the results of the primary analysis.” However, the predefined margin of noninferiority (<1.25) was not met. The upper limit of the confidence interval (1.255) was clearly above the defined threshold.1 In a noninferiority trial, the use of the intention-to-treat population is generally nonconservative,2 the full analysis set and the per-protocol analysis set are considered to have equal importance, and the use of the intention-to-treat population should lead to similar conclusions for a robust interpretation.3 Thus, it is surprising that the authors did not come to or discuss the same conclusions as that of the French National Authority for Health4: “serious doubt exists about the noninferiority result of pazopanib compared to sunitinib” and “the clinical significance of the noninferiority threshold defined in the protocol was an efficacy loss of 2.2 months in the median progression-free survival. This is too large for patients.”

Jochen Casper, M.D.
Silke Schumann-Binarsch, M.D.
Claus-Henning Köhne, M.D.
Klinikum Oldenburg, Oldenburg, Germany

Dr. Casper reports receiving consulting fees from Bayer, Novartis, and Pfizer and speaking fees from Novartis and Pfizer. No other potential conflict of interest relevant to this letter was reported.

4 References

The authors reply: In reply to Ramaekers et al.: we agree that decisions regarding the provision of health care include economic evaluations to identify treatments that provide the best clinical benefit at an acceptable cost.

To clarify a point in the letter by Wittes: the primary end point of this phase 3 trial was progression-free survival evaluated by an independent review committee; these data were assessed for all 1110 patients from both trials. This is specified in the protocol. The consistency of the quality-of-life results with the observed differences in the safety profiles for the two drugs speaks to the absence of bias in the quality-of-life outcome. The number of patients in whom follow-up ended before progression was assessed by the independent review committee was balanced between the two groups: 156 patients in the pazopanib group (28%) and 168 patients in the sunitinib group (30%). To Wittes’s final point regarding rigorous methods to handle missing data: the algorithm for assigning disease-progression and censoring dates followed the Guidance for Industry of the Food and Drug Administration1 and is included in the protocol of our article.

In reply to Casper et al.: there is no consensus regarding whether the per-protocol population is more conservative than the intention-to-treat population for the noninferiority analysis.2,3Reviews of noninferiority trials indicate that the per-protocol population is not generally more conservative than the intention-to-treat population, and there are scenarios in which the per-protocol analysis itself could introduce bias.3 A systematic review indicated that more than 70% of published findings from noninferiority trials in oncology show results in only the intention-to-treat population and not in the per-protocol population.4 Our phase 3 trial had a single primary analysis in the intention-to-treat population, with the per-protocol population included as a key sensitivity analysis, as supported by Fleming et al.5 No formal hypothesis testing was planned for the per-protocol population, nor was the trial powered for this. Consistency of the point estimates was desired to show an absence of bias due to the analysis population. This absence of bias was shown by the consistency of the hazard ratios (1.07 in the per-protocol analysis vs. 1.05 in the primary analysis). For an underpowered per-protocol comparison, it is inappropriate for Casper et al. to interpret that the upper bound that barely exceeded 1.25 in our per-protocol analysis is an indication of inconsistency of results across the two populations. The noninferiority margin was selected in consultation with oncology experts, and justification of the margin is in the protocol.

Robert J. Motzer, M.D.
Memorial Sloan-Kettering Cancer Center, New York, NY

Lauren McCann, Ph.D.
Keith Deen, M.S.
GlaxoSmithKline, Collegeville, PA

Since publication of their article, the authors report no further potential conflict of interest.


Food and Drug Administration. Guidance for industry: clinical trial endpoints for the approval of cancer drugs and biologics. May 2007 (http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm071590.pdf).
Jones B, Jarvis P, Lewis JA, Ebbutt AF. Trials to assess equivalence: the importance of rigorous methods. BMJ 1996;313:36-39[Erratum, BMJ 1996;313:550.]
CrossRef | Web of Science | Medline
Brittain E, Lin D. A comparison of intent-to-treat and per-protocol results in antibiotic non-inferiority trials. Stat Med 2005;24:1-10
CrossRef | Web of Science | Medline
Tanaka S, Kinjo Y, Kataoka Y, Yoshimura K, Termukai S. Statistical issues and recommendations for noninferiority trials in oncology: a systematic review. Clin Cancer Res 2012;18:1837-1847
CrossRef | Web of Science | Medline
Fleming TR, Odem-Davis K, Rothmann MD, Li Shen Y. Some essential considerations in the design and conduct of non-inferiority trials. Clin Trials2011;8:432-439
CrossRef | Web of Science | Medline

Original Article Published on 7/9/2012

July 6, 2012 NY Times reports on a new approach based on DNA and RNA sequencing and a cancer drug for kidney cancer to bring REMISSION to Adult acute lymphoblastic leukemia (ALL).

On the lower left corner of this page – Watch the VIDEO


Dr. Lukas Wartman, is a Cancer Researcher specializing in Leukemia. He suspected he had Leukemia, the very disease he had devoted his medical career to studying.

After years of treatment and two relapses of ALL, he has exhaused all conventional approaches to his disease. At Washington University in St. Louis, his colleagues in the lab, decoded Dr. Wartman’s genetic information by genome sequencing techniques t determine the genetic cause of his ALL. The team found an overactive gne, FLT3 on Chromosome 13. The gene was treated with pfizer’s Suntinib drug for advanced kidney cancer.

Blood samples free of ALL found in days after using the drug. As results were very promising, Pfizer, the drug’s maker who has turned down Dr. Wartman’s request for the drug under their compassionate use program, though he explained that his entire salary was only enough to pay for 7 1/2 months of Sutent (Suntinib). While he does not know why Pfizer gave him the drug finally, he suspects it was the plea of his Nurse Practitioner, Stephanie Bauer, NP.

Identification of the genetic cause for his ALL, thus discovering a breakthough in understanding and treatment for ALL in other patients, involved the following steps:


two tissue samples taken from Dr. Wartman’s Bone marrow and skin cells


Extracts of DNA and RNA from Dr. Wartman’s cells, two types of genetic material tested


DNA sequesnces showed genetic mutations possibly related to his ALL, none seemed treatable. However, RNA sequencing revealed that a normal Gene, FLT3, on cheomozome 13, was overactive in his leukemia cells


The FLT3 gene helps create new white blod cells in the bone marrow. Dr. Wartman’s marrow bone cells were covered with an extreme number of FLT3 receptors which possibly caused the growth of his leukemia.

TREATMENT – Receptor Blockade 

Drug known to block FLT3 receptor, Sunitinib, used for kedney cancer treatment, was given to Dr. Wartman. Two weeks after Dr, Wartman began taking the drug, tests revealed that his leukenia was in remission.


Pfizer has NOW a NEW market for Sunitinib — All CANCER PATIENTS DIAGNOSED WITH Adult acute lymphoblastic leukemia (ALL) where an overactive FLT3 gene on chomosome 13 is found.

NEW TREATMENT OPTIONS FOR Adult acute lymphoblastic leukemia (ALL)

Thus, any (ALL) diagnosed patient needs to be tested for Chromosome 13, ONLY rather then the entire genome sequencing of the Patient. If FLT3 is not found overactive, THEN proceed with entire genome sequencing of the Patient. IF another gene is overactive FIND DRUG FOR RECEPTOR BLOCKADE.


The Market for Adult ALL is much bigger than the market for kidney cancer. Thus, this discovery regarding the remission of Dr. Wartman’s remission following two relapses is so significant for Pfizer and for any patient with the diagnosis of Adult ALL.

I recommend the reader to click on the links and follow the reactions of the public to this article in The New York Times.


Read HUNDREDS of Comments by Cancer Patients and the readers of The New York Times Health Section


Read Full Post »

%d bloggers like this: