Posts Tagged ‘Cancer Genome Atlas’

Bioinformatic Tools for Cancer Mutational Analysis: COSMIC and Beyond, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 1: Next Generation Sequencing (NGS)

Bioinformatic Tools for Cancer Mutational Analysis: COSMIC and Beyond

Curator: Stephen J. Williams, Ph.D.

Updated 7/26/2019

Updated 04/27/2019

Signatures of Mutational Processes in Human Cancer (from COSMIC)

From The COSMIC Database


The genomic landscape of cancer. The COSMIC database has a fully curated and annotated database of recurrent genetic mutations founds in various cancers (data taken form cancer sequencing projects). For interactive map please go to the COSMIC database here: http://cancer.sanger.ac.uk/cosmic



Somatic mutations are present in all cells of the human body and occur throughout life. They are the consequence of multiple mutational processes, including the intrinsic slight infidelity of the DNA replication machinery, exogenous or endogenous mutagen exposures, enzymatic modification of DNA and defective DNA repair. Different mutational processes generate unique combinations of mutation types, termed “Mutational Signatures”.

In the past few years, large-scale analyses have revealed many mutational signatures across the spectrum of human cancer types [Nik-Zainal S. et al., Cell (2012);Alexandrov L.B. et al., Cell Reports (2013);Alexandrov L.B. et al., Nature (2013);Helleday T. et al., Nat Rev Genet (2014);Alexandrov L.B. and Stratton M.R., Curr Opin Genet Dev (2014)]. However, as the number of mutational signatures grows the need for a curated census of signatures has become apparent. Here, we deliver such a resource by providing the profiles of, and additional information about, known mutational signatures.

The current set of mutational signatures is based on an analysis of 10,952 exomes and 1,048 whole-genomes across 40 distinct types of human cancer. These analyses are based on curated data that were generated by The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), and a large set of freely available somatic mutations published in peer-reviewed journals. Complete details about the data sources will be provided in future releases of COSMIC.

The profile of each signature is displayed using the six substitution subtypes: C>A, C>G, C>T, T>A, T>C, and T>G (all substitutions are referred to by the pyrimidine of the mutated Watson–Crick base pair). Further, each of the substitutions is examined by incorporating information on the bases immediately 5’ and 3’ to each mutated base generating 96 possible mutation types (6 types of substitution ∗ 4 types of 5’ base ∗ 4 types of 3’ base). Mutational signatures are displayed and reported based on the observed trinucleotide frequency of the human genome, i.e., representing the relative proportions of mutations generated by each signature based on the actual trinucleotide frequencies of the reference human genome version GRCh37. Note that only validated mutational signatures have been included in the curated census of mutational signatures.

Additional information is provided for each signature, including the cancer types in which the signature has been found, proposed aetiology for the mutational processes underlying the signature, other mutational features that are associated with each signature and information that may be relevant for better understanding of a particular mutational signature.

The set of signatures will be updated in the future. This will include incorporating additional mutation types (e.g., indels, structural rearrangements, and localized hypermutation such as kataegis) and cancer samples. With more cancer genome sequences and the additional statistical power this will bring, new signatures may be found, the profiles of current signatures may be further refined, signatures may split into component signatures and signatures

See their COSMIC tutorial page here for instructional videos

Updated News: COSMIC v75 – 24th November 2015

COSMIC v75 includes curations across GRIN2A, fusion pair TCF3-PBX1, and genomic data from 17 systematic screen publications. We are also beginning a reannotation of TCGA exome datasets using Sanger’s Cancer Genome Project analyis pipeline to ensure consistency; four studies are included in this release, to be expanded across the next few releases. The Cancer Gene Census now has a dedicated curator, Dr. Zbyslaw Sondka, who will be focused on expanding the Census, enhancing the evidence underpinning it, and developing improved expert-curated detail describing each gene’s impact in cancer. Finally, as we begin to streamline our ever-growing website, we have combined all information for each gene onto one page and simplified the layout and design to improve navigation

may be found in cancer types in which they are currently not detected.

mutational signatures across human cancer

Mutational signatures across human cancer

Patterns of mutational signatures [Download signatures]

 COSMIC database identifies 30 mutational signatures in human cancer

Please goto to COSMIC site to see bigger .png of mutation signatures

Signature 1

Cancer types:

Signature 1 has been found in all cancer types and in most cancer samples.

Proposed aetiology:

Signature 1 is the result of an endogenous mutational process initiated by spontaneous deamination of 5-methylcytosine.

Additional mutational features:

Signature 1 is associated with small numbers of small insertions and deletions in most tissue types.


The number of Signature 1 mutations correlates with age of cancer diagnosis.

Signature 2

Cancer types:

Signature 2 has been found in 22 cancer types, but most commonly in cervical and bladder cancers. In most of these 22 cancer types, Signature 2 is present in at least 10% of samples.

Proposed aetiology:

Signature 2 has been attributed to activity of the AID/APOBEC family of cytidine deaminases. On the basis of similarities in the sequence context of cytosine mutations caused by APOBEC enzymes in experimental systems, a role for APOBEC1, APOBEC3A and/or APOBEC3B in human cancer appears more likely than for other members of the family.

Additional mutational features:

Transcriptional strand bias of mutations has been observed in exons, but is not present or is weaker in introns.


Signature 2 is usually found in the same samples as Signature 13. It has been proposed that activation of AID/APOBEC cytidine deaminases is due to viral infection, retrotransposon jumping or to tissue inflammation. Currently, there is limited evidence to support these hypotheses. A germline deletion polymorphism involving APOBEC3A and APOBEC3B is associated with the presence of large numbers of Signature 2 and 13 mutations and with predisposition to breast cancer. Mutations of similar patterns to Signatures 2 and 13 are commonly found in the phenomenon of local hypermutation present in some cancers, known as kataegis, potentially implicating AID/APOBEC enzymes in this process as well.

Signature 3

Cancer types:

Signature 3 has been found in breast, ovarian, and pancreatic cancers.

Proposed aetiology:

Signature 3 is associated with failure of DNA double-strand break-repair by homologous recombination.

Additional mutational features:

Signature 3 associates strongly with elevated numbers of large (longer than 3bp) insertions and deletions with overlapping microhomology at breakpoint junctions.


Signature 3 is strongly associated with germline and somatic BRCA1 and BRCA2 mutations in breast, pancreatic, and ovarian cancers. In pancreatic cancer, responders to platinum therapy usually exhibit Signature 3 mutations.

Signature 4

Cancer types:

Signature 4 has been found in head and neck cancer, liver cancer, lung adenocarcinoma, lung squamous carcinoma, small cell lung carcinoma, and oesophageal cancer.

Proposed aetiology:

Signature 4 is associated with smoking and its profile is similar to the mutational pattern observed in experimental systems exposed to tobacco carcinogens (e.g., benzo[a]pyrene). Signature 4 is likely due to tobacco mutagens.

Additional mutational features:

Signature 4 exhibits transcriptional strand bias for C>A mutations, compatible with the notion that damage to guanine is repaired by transcription-coupled nucleotide excision repair. Signature 4 is also associated with CC>AA dinucleotide substitutions.


Signature 29 is found in cancers associated with tobacco chewing and appears different from Signature 4.

Signature 5

Cancer types:

Signature 5 has been found in all cancer types and most cancer samples.

Proposed aetiology:

The aetiology of Signature 5 is unknown.

Additional mutational features:

Signature 5 exhibits transcriptional strand bias for T>C substitutions at ApTpN context.


Signature 6

Cancer types:

Signature 6 has been found in 17 cancer types and is most common in colorectal and uterine cancers. In most other cancer types, Signature 6 is found in less than 3% of examined samples.

Proposed aetiology:

Signature 6 is associated with defective DNA mismatch repair and is found in microsatellite unstable tumours.

Additional mutational features:

Signature 6 is associated with high numbers of small (shorter than 3bp) insertions and deletions at mono/polynucleotide repeats.


Signature 6 is one of four mutational signatures associated with defective DNA mismatch repair and is often found in the same samples as Signatures 15, 20, and 26.

Signature 7

Cancer types:

Signature 7 has been found predominantly in skin cancers and in cancers of the lip categorized as head and neck or oral squamous cancers.

Proposed aetiology:

Based on its prevalence in ultraviolet exposed areas and the similarity of the mutational pattern to that observed in experimental systems exposed to ultraviolet light Signature 7 is likely due to ultraviolet light exposure.

Additional mutational features:

Signature 7 is associated with large numbers of CC>TT dinucleotide mutations at dipyrimidines. Additionally, Signature 7 exhibits a strong transcriptional strand-bias indicating that mutations occur at pyrimidines (viz., by formation of pyrimidine-pyrimidine photodimers) and these mutations are being repaired by transcription-coupled nucleotide excision repair.


Signature 8

Cancer types:

Signature 8 has been found in breast cancer and medulloblastoma.

Proposed aetiology:

The aetiology of Signature 8 remains unknown.

Additional mutational features:

Signature 8 exhibits weak strand bias for C>A substitutions and is associated with double nucleotide substitutions, notably CC>AA.


Signature 9

Cancer types:

Signature 9 has been found in chronic lymphocytic leukaemias and malignant B-cell lymphomas.

Proposed aetiology:

Signature 9 is characterized by a pattern of mutations that has been attributed to polymerase η, which is implicated with the activity of AID during somatic hypermutation.

Additional mutational features:


Chronic lymphocytic leukaemias that possess immunoglobulin gene hypermutation (IGHV-mutated) have elevated numbers of mutations attributed to Signature 9 compared to those that do not have immunoglobulin gene hypermutation.

Signature 10

Cancer types:

Signature 10 has been found in six cancer types, notably colorectal and uterine cancer, usually generating huge numbers of mutations in small subsets of samples.

Proposed aetiology:

It has been proposed that the mutational process underlying this signature is altered activity of the error-prone polymerase POLE. The presence of large numbers of Signature 10 mutations is associated with recurrent POLE somatic mutations, viz., Pro286Arg and Val411Leu.

Additional mutational features:

Signature 10 exhibits strand bias for C>A mutations at TpCpT context and T>G mutations at TpTpT context.


Signature 10 is associated with some of most mutated cancer samples. Samples exhibiting this mutational signature have been termed ultra-hypermutators.

Signature 11

Cancer types:

Signature 11 has been found in melanoma and glioblastoma.

Proposed aetiology:

Signature 11 exhibits a mutational pattern resembling that of alkylating agents. Patient histories have revealed an association between treatments with the alkylating agent temozolomide and Signature 11 mutations.

Additional mutational features:

Signature 11 exhibits a strong transcriptional strand-bias for C>T substitutions indicating that mutations occur on guanine and that these mutations are effectively repaired by transcription-coupled nucleotide excision repair.


Signature 12

Cancer types:

Signature 12 has been found in liver cancer.

Proposed aetiology:

The aetiology of Signature 12 remains unknown.

Additional mutational features:

Signature 12 exhibits a strong transcriptional strand-bias for T>C substitutions.


Signature 12 usually contributes a small percentage (<20%) of the mutations observed in a liver cancer sample.

Signature 13

Cancer types:

Signature 13 has been found in 22 cancer types and seems to be commonest in cervical and bladder cancers. In most of these 22 cancer types, Signature 13 is present in at least 10% of samples.

Proposed aetiology:

Signature 13 has been attributed to activity of the AID/APOBEC family of cytidine deaminases converting cytosine to uracil. On the basis of similarities in the sequence context of cytosine mutations caused by APOBEC enzymes in experimental systems, a role for APOBEC1, APOBEC3A and/or APOBEC3B in human cancer appears more likely than for other members of the family. Signature 13 causes predominantly C>G mutations. This may be due to generation of abasic sites after removal of uracil by base excision repair and replication over these abasic sites by REV1.

Additional mutational features:

Transcriptional strand bias of mutations has been observed in exons, but is not present or is weaker in introns.


Signature 2 is usually found in the same samples as Signature 13. It has been proposed that activation of AID/APOBEC cytidine deaminases is due to viral infection, retrotransposon jumping or to tissue inflammation. Currently, there is limited evidence to support these hypotheses. A germline deletion polymorphism involving APOBEC3A and APOBEC3B is associated with the presence of large numbers of Signature 2 and 13 mutations and with predisposition to breast cancer. Mutations of similar patterns to Signatures 2 and 13 are commonly found in the phenomenon of local hypermutation present in some cancers, known as kataegis, potentially implicating AID/APOBEC enzymes in this process as well.

Signature 14

Cancer types:

Signature 14 has been observed in four uterine cancers and a single adult low-grade glioma sample.

Proposed aetiology:

The aetiology of Signature 14 remains unknown.

Additional mutational features:


Signature 14 generates very high numbers of somatic mutations (>200 mutations per MB) in all samples in which it has been observed.

Signature 15

Cancer types:

Signature 15 has been found in several stomach cancers and a single small cell lung carcinoma.

Proposed aetiology:

Signature 15 is associated with defective DNA mismatch repair.

Additional mutational features:

Signature 15 is associated with high numbers of small (shorter than 3bp) insertions and deletions at mono/polynucleotide repeats.


Signature 15 is one of four mutational signatures associated with defective DNA mismatch repair and is often found in the same samples as Signatures 6, 20, and 26.

Signature 16

Cancer types:

Signature 16 has been found in liver cancer.

Proposed aetiology:

The aetiology of Signature 16 remains unknown.

Additional mutational features:

Signature 16 exhibits an extremely strong transcriptional strand bias for T>C mutations at ApTpN context, with T>C mutations occurring almost exclusively on the transcribed strand.


Signature 17

Cancer types:

Signature 17 has been found in oesophagus cancer, breast cancer, liver cancer, lung adenocarcinoma, B-cell lymphoma, stomach cancer and melanoma.

Proposed aetiology:

The aetiology of Signature 17 remains unknown.

Additional mutational features:


Signature 1Signature 18

Cancer types:

Signature 18 has been found commonly in neuroblastoma. Additionally, Signature 18 has been also observed in breast and stomach carcinomas.

Proposed aetiology:

The aetiology of Signature 18 remains unknown.

Additional mutational features:


Signature 19

Cancer types:

Signature 19 has been found only in pilocytic astrocytoma.

Proposed aetiology:

The aetiology of Signature 19 remains unknown.

Additional mutational features:


Signature 20

Cancer types:

Signature 20 has been found in stomach and breast cancers.

Proposed aetiology:

Signature 20 is believed to be associated with defective DNA mismatch repair.

Additional mutational features:

Signature 20 is associated with high numbers of small (shorter than 3bp) insertions and deletions at mono/polynucleotide repeats.


Signature 20 is one of four mutational signatures associated with defective DNA mismatch repair and is often found in the same samples as Signatures 6, 15, and 26.

Signature 21

Cancer types:

Signature 21 has been found only in stomach cancer.

Proposed aetiology:

The aetiology of Signature 21 remains unknown.

Additional mutational features:


Signature 21 is found only in four samples all generated by the same sequencing centre. The mutational pattern of Signature 21 is somewhat similar to the one of Signature 26. Additionally, Signature 21 is found only in samples that also have Signatures 15 and 20. As such, Signature 21 is probably also related to microsatellite unstable tumours.

Signature 22

Cancer types:

Signature 22 has been found in urothelial (renal pelvis) carcinoma and liver cancers.

Proposed aetiology:

Signature 22 has been found in cancer samples with known exposures to aristolochic acid. Additionally, the pattern of mutations exhibited by the signature is consistent with the one previous observed in experimental systems exposed to aristolochic acid.

Additional mutational features:

Signature 22 exhibits a very strong transcriptional strand bias for T>A mutations indicating adenine damage that is being repaired by transcription-coupled nucleotide excision repair.


Signature 22 has a very high mutational burden in urothelial carcinoma; however, its mutational burden is much lower in liver cancers.

Signature 23

Cancer types:

Signature 23 has been found only in a single liver cancer sample.

Proposed aetiology:

The aetiology of Signature 23 remains unknown.

Additional mutational features:

Signature 23 exhibits very strong transcriptional strand bias for C>T mutations.


Signature 24

Cancer types:

Signature 24 has been observed in a subset of liver cancers.

Proposed aetiology:

Signature 24 has been found in cancer samples with known exposures to aflatoxin. Additionally, the pattern of mutations exhibited by the signature is consistent with that previous observed in experimental systems exposed to aflatoxin.

Additional mutational features:

Signature 24 exhibits a very strong transcriptional strand bias for C>A mutations indicating guanine damage that is being repaired by transcription-coupled nucleotide excision repair.


Signature 25

Cancer types:

Signature 25 has been observed in Hodgkin lymphomas.

Proposed aetiology:

The aetiology of Signature 25 remains unknown.

Additional mutational features:

Signature 25 exhibits transcriptional strand bias for T>A mutations.


This signature has only been identified in Hodgkin’s cell lines. Data is not available from primary Hodgkin lymphomas.

Signature 26

Cancer types:

Signature 26 has been found in breast cancer, cervical cancer, stomach cancer and uterine carcinoma.

Proposed aetiology:

Signature 26 is believed to be associated with defective DNA mismatch repair.

Additional mutational features:

Signature 26 is associated with high numbers of small (shorter than 3bp) insertions and deletions at mono/polynucleotide repeats.


Signature 26 is one of four mutational signatures associated with defective DNA mismatch repair and is often found in the same samples as Signatures 6, 15 and 20.

Signature 27

Cancer types:

Signature 27 has been observed in a subset of kidney clear cell carcinomas.

Proposed aetiology:

The aetiology of Signature 27 remains unknown.

Additional mutational features:

Signature 27 exhibits very strong transcriptional strand bias for T>A mutations. Signature 27 is associated with high numbers of small (shorter than 3bp) insertions and deletions at mono/polynucleotide repeats.


Signature 28

Cancer types:

Signature 28 has been observed in a subset of stomach cancers.

Proposed aetiology:

The aetiology of Signature 28 remains unknown.

Additional mutational features:


Signature 29

Cancer types:

Signature 29 has been observed only in gingivo-buccal oral squamous cell carcinoma.

Proposed aetiology:

Signature 29 has been found in cancer samples from individuals with a tobacco chewing habit.

Additional mutational features:

Signature 29 exhibits transcriptional strand bias for C>A mutations indicating guanine damage that is most likely repaired by transcription-coupled nucleotide excision repair. Signature 29 is also associated with CC>AA dinucleotide substitutions.


The Signature 29 pattern of C>A mutations due to tobacco chewing appears different from the pattern of mutations due to tobacco smoking reflected by Signature 4.

Signature 30

Cancer types:

Signature 30 has been observed in a small subset of breast cancers.

Proposed aetiology:

The aetiology of Signature 30 remains unknown.



Examples in the literature of deposits into or analysis from the COSMIC database

The Genomic Landscapes of Human Breast and Colorectal Cancers from Wood 318 (5853): 11081113 Science 2007

“analysis of exons representing 20,857 transcripts from 18,191 genes, we conclude that the genomic landscapes of breast and colorectal cancers are composed of a handful of commonly mutated gene “mountains” and a much larger number of gene “hills” that are mutated at low frequency. “

  • found cellular pathways with multiple pathways
  • analyzed a highly curated database (Metacore, GeneGo, Inc.) that includes human protein-protein interactions, signal transduction and metabolic pathways
  • There were 108 pathways that were found to be preferentially mutated in breast tumors. Many of the pathways involved phosphatidylinositol 3-kinase (PI3K) signaling
  • the cancer genome landscape consists of relief features (mutated genes) with heterogeneous heights (determined by CaMP scores). There are a few “mountains” representing individual CAN-genes mutated at high frequency. However, the landscapes contain a much larger number of “hills” representing the CAN-genes that are mutated at relatively low frequency. It is notable that this general genomic landscape (few gene mountains and many gene hills) is a common feature of both breast and colorectal tumors.
  • developed software to analyze multiple mutations and mutation frequencies available from Harvard Bioinformatics at





R Software for Cancer Mutation Analysis (download here)


CancerMutationAnalysis Version 1.0:

R package to reproduce the statistical analyses of the Sjoblom et al article and the associated Technical Comment. This package is build for reproducibility of the original results and not for flexibility. Future version will be more general and define classes for the data types used. Further details are available in Working Paper 126.

CancerMutationAnalysis Version 2.0:

R package to reproduce the statistical analyses of the Wood et al article. Like its predecessor, this package is still build for reproducibility of the original results and not for flexibility. Further details are available in Working Paper 126










Update 04/27/2019

Review 2018. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Z. Sondka et al. Nature Reviews. 2018.

The Catalogue of Somatic Mutations in Cancer (COSMIC) Cancer Gene Census (CGC) reevaluates the cancer genome landscape periodically and curates the findings into a database of genetic changes occurring in various tumor types.  The 2018 CGC describes in detail the effect of 719 cancer driving genes.  The recent expansion includes functional and mechanistic descriptions of how each gene contributes to disease etiology and in terms of the cancer hallmarks as described by Hanahan and Weinberg.  These functional characteristics show the complexity of the cancer mutational landscape and genome and suggest ” multiple cancer-related functions for many genes, which are often highly tissue-dependent or tumour stage-dependent.”  The 2018 CGC expands a second tier of genes, expanding the list of cancer related genes.

Criteria for curation of genes into CGC (curation process)

  • choosing candidate genes are selected from published literature, conference abstracts, large cancer genome screens deposited in databases, and analysis of current COSMIC database
  • COSMIC data are analyzed to determine presence of patterns of somatic mutations and frequency of such mutations in cancer
  • literature review to determine the role of the gene in cancer
  • Minimum evidence

– at least two publications from different groups shows increased mutation frequency in at least one type of cancer (PubMed)

–  at least two publications from different groups showing experimental evidence of functional involvement in at least one hallmark of cancer in order to classify the mutant gene as oncogene, tumor suppressor, or fusion partner (like BCR-Abl)

  • independent assessment by at least two postdoctoral fellows
  • gene must be classified as either Tier 1 of Tier 2 CGC gene
  • inclusion in database
  • continued curation efforts


Tier 1 gene: genes which have strong evidence from both mutational and functional analysis as being involved in cancer

Tier 2 gene: genes with mutational patterns typical of cancer drivers but not functionally characterized as well as genes with published mechanistic description of involvement in cancer but without proof of somatic mutations in cancer

Current Status of Tier 1 and Tier 2 genes in CGC

Tier 1 genes (574 genes): include 79 oncogenes, 140 tumor suppressor genes, 93 fusion partners

Tier 2 genes (719 genes): include 103 oncogenes, 181 tumor suppressors, 134 fusion partners and 31 with unknown function

Updated 7/26/2019

The COSMIC database is undergoing an extensive update and reannotation, in order to ensure standardisation and modernisation across COSMIC data. This will substantially improve the identification of unique variants that may have been described at the genome, transcript and/or protein level. The introduction of a Genomic Identifier, along with complete annotation across multiple, high quality Ensembl transcripts and improved compliance with current HGVS syntax, will enable variant matching both within COSMIC and across other bioinformatic datasets.

As a result of these updates there will be significant changes in the upcoming releases as we work through this process. The first stage of this work was the introduction of improvedHGVS syntax compliance in our May release. The majority of the changes will be reflected in COSMIC v90, which will be released in late August or early September, and the remaining changes will be introduced over the next few releases.

The significant changes in v90 include:

  • Updated genes, transcripts and proteins from Ensembl release 93 on both the GRCh37 and GRCh38 assemblies.
  • Full reannotation of COSMIC variants with known genomic coordinates using Ensembl’s Variant Effect Predictor (VEP). This provides accurate and standardised annotation uniformly across all relevant transcripts and genes that include the genomic location of the variant.
  • New stable genomic identifiers (COSV) that indicate the definitive position of the variant on the genome. These unique identifiers allow variants to be mapped between GRCh37 and GRCh38 assemblies and displayed on a selection of transcripts.
  • Updated cross-reference links between COSMIC genes and other widely-used databases such as HGNC, RefSeq, Uniprot and CCDS.
  • Complete standardised representation of COSMIC variants, following the most recent HGVS recommendations, where possible.
  • Remapping of gene fusions on the updated transcripts on both the GRCh37 and GRCh38 assemblies, along with the genomic coordinates for the breakpoint positions.
  • Reduced redundancy of mutations. Duplicate variants have been merged into one representative variant.

Key points for you

COSMIC variants have been annotated on all relevant Ensembl transcripts across both the GRCh37 and GRCh38 assemblies from Ensembl release 93. New genomic identifiers (e.g. COSV56056643) are used, which refers to the variant change at the genomic level rather than gene, transcript or protein level and can thus be used universally. Existing COSM IDs will continue to be supported and will now be referred to as legacy identifiers e.g. COSM476. The legacy identifiers (COSM) are still searchable. In the case of mutations without genomic coordinates, hence without a COSV identifier, COSM identifiers will continue to be used.

All relevant Ensembl transcripts in COSMIC (which have been selected based on Ensembl canonical classification and on the quality of the dataset to include only GENCODE basic transcripts) will now have both accession and version numbers, so that the exact transcript is known, ensuring reproducibility. This also provides transparency and clarity as the data are updated.

How these changes will be reflected in the download files

As we are now mapping all variants on all relevant Ensembl transcripts, the number of rows in the majority of variant download files has increased significantly. In the download files, additional columns are provided including the legacy identifier (COSM) and the new genomic identifier (COSV). An internal mutation identifier is also provided to uniquely represent each mutation, on a specific transcript, on a given assembly build. The accession and version number for each transcript are included. File descriptions for each of the download files will be available from the downloads page for clarity. We have included an example of the new columns below.

For example: COSMIC Complete Mutation Data (Targeted screens)

    1. [17:Q] Mutation Id – An internal mutation identifier to uniquely represent each mutation on a specific transcript on a given assembly build.
    1. [18:R] Genomic Mutation Id – Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release.
    1. [19:S] Legacy Mutation Id – Legacy mutation identifier (COSM) that will represent existing COSM mutation identifiers.

We will shortly have some sample data that can be downloaded in the new table structure, to give you real data to manipulate and integrate, this will be available on the variant updates page.

How this affects you

We are aware that many of the changes we are making will affect integration into your pipelines and analytical platforms. By giving you advance notice of the changes, we hope much of this can be mitigated, and the end result of having clean, standardised data will be well worth any disruption. The variant updates page on the COSMIC website will provide a central point for this information and further technical details of the changes that we are making to COSMIC.

Kind Regards,
Wellcome Sanger Institute
Wellcome Genome Campus,
Hinxton CB10 1SA










Read Full Post »

Loss of Gene Islands May Promote a Cancer Cell’s Survival, Proliferation and Evolution: A new Hypothesis (and second paper validating model) on Oncogenesis from the Elledge Laboratory

Writer, Curator: Stephen J. Williams, Ph.D.

It is well established that a critical event in the transformation of a cell to the malignant state involves the mutation of hosts of oncogenes and tumor suppressor genes, which in turn, confer on a cell the inability to properly control its proliferation.    On a genomic scale, these mutations can result in gene amplifications, loss of heterozygosity (LOH), and epigenetic changes resulting in tumorigenesis.  The “two hit hypothesis”, proposed by Dr. Al Knudson of Fox Chase Cancer Center[1], proposes that two mutations in the same gene are required for tumorigenesis, initially proposed to explain the progression of retinoblastoma in children, indicating a recessive disease.

(Excerpts from a great article explaining the two-hit-hypothesis is given at the end of this post).

And, although many tumor genomes display haploinsufficeint tumor suppressor genes, and fit the two hit model quite nicely, recent data show that most tumors display hemizygous recurrent deletions within their genomes.  Tumors display numerous recurrent hemizygous focal deletions that seem to contain no known tumor suppressor genes. For instance a recent analysis of over three thousand tumors including breast, bladder, pancreatic, ovarian and gastric cancers averaged greater than 10 deletions/tumor and 82 regions of recurrent focal deletions,

It has been proposed these great number of hemizygous deletions may be a result of:

  • a recessive tumor suppressor gene requiring mutation or silencing of second allele
  • the mutation may recur as they are located in fragile sites (unstable genomic regions)
  • single-copy loss may provide selective advantage regardless of the other allele

Note: some definitions of hemizygosity are given below.  In general at any locus, each parental chromosome can have 3 deletion states:

  1. wild type
  2. large deletion
  3. small deletion

Hemizygous deletions only involve one allele, not both alleles which is unlike the classic tumor suppressor like TP53

To see if it is possible that only one mutated allele of a tumor suppressor gene may be a casual event for tumorigenesis, Dr. Nicole Solimini and colleagues, from Dr. Stephen Elledge’s lab at Harvard, proposed a hypothesis they termed the cancer gene island model, after analyzing the regions of these hemizygous deletions for cancer related genes[2].  Dr. Soliin and colleagues analyzed whole-genome sequence data for 526 tumors in the COSMIC database comparing to a list generated from the Cancer Gene Census for homozygous loss-of-function mutations (mutations which result in a termination codon or frame-shift mutation: {this produces a premature stop in the protein or an altered sequence leading to a nonfunctional protein}.

Results of this analysis revealed:

  1. although tumors have a wide range of deletions per tumor (most epithelial high grade like ovarian, bladder, pancreatic, and esophageal adenocarcinomas had 10-14 deletions per tumor
  2. and although tumors exhibited a wide range (2- 16 ) loss of function mutations
  3. ONLY 14 of 82 recurrent deletions contained a known tumor suppressor gene and was a low frequency event
  4. Most recurrent cancer deletions do not contain putative tumor suppressor genes.

Therefore, as the authors suggest, an alternate method to the two-hit hypothesis may account for a selective growth advantage for these types of deletions, defining these low frequency hemizygous mutations in two general classes

  1. STOP genes: suppressors of tumor growth and proliferation
  2. GO genes: growth enhancers and oncogenes

Identifying potential STOP genes

To identify the STOP and GO genes the authors performed a primary screen of an shRNA library in telomerase (hTERT) immortalized human mammary epithelial cells using increased PROLIFERATION as a screening endpoint to determine STOP genes and decreased proliferation and lethality (essential genes) to determine possible GO genes. An initial screen identified 3582 possible STOP genes.  Using further screens and higher stringency criteria which focused on:

  • Only genes which increased proliferation in independent triplicate screens
  • Validated by competition assays
  • Were enriched more than four fold in three independent shRNA screens

the authors were able to focus on and validate 878 genes to determine the molecular pathways involved in proliferation.

These genes were involved in cell cycle regulation, apoptosis, and autophagy (which will be discussed in further posts).

To further validate that these putative STOP genes are relevant in human cancer, the list of validated STOP genes found in the screen was compared to the list of loss-of-function mutations in the 526 tumors in the COSMIC databaseSurprisingly, the validated STOP gene list were significantly enriched for known and possibly NOVEL tumor suppressor genes and especially loss of function and deletion mutations but also clustered in gene deletions in cancer.  This not only validated the authors’ model system and method but suggests that hemizygous deletions in multiple STOP genes may contribute to tumorigenesis

as the function of the majority of STOP genes is to restrain tumorigenesis

A few key conclusions from this study offer strength to an alternative view of oncogenesis NAMELY:

  • Loss of multiple STOP genes per deletion optimize a cancer cell’s proliferative capacity
  • Cancer cells display an insignificant loss of GO genes, minimizing negative impacts on cellular fitness
  • Haploinsufficiency in multiple STOP genes can result in similar alteration of function similar to complete loss of both alleles of
  • Cancer evolution may result from selection of hemizygous loss of high number of STOP and low number of GO genes
  • Leads to a CANCER GENE ISLAND model where there is a clonal evolution of transformed cells due to selective pressures

A link to the supplemental data containing STOP and GO genes found in validation screens and KEGG analysis can be found at the following link:


A link to an interview with the authors, originally posted on Harvard’s site can be found here.

Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns and Shape the Cancer Genome; a new paper from the Elledge group in the journal Cell


A concern of the authors was the extent to which gene silencing could have on their model in tumors.  The validation of the model was performed in cancer cell lines and compared to tumor genome sequence in publicly available databases however a followup paper by the same group shows that haploinsufficiency contributes a greater impact on the cancer genome than these studies have suggested.

In a follow-up paper by the Elledge group in the journal Cell[3], Theresa Davoli and colleagues, after analyzing 8,200 tumor-normal pairs, show there are many more cancer driver genes than once had been predicted.  In addition, the distribution and potency of STOP genes, oncogenes, and essential genes (GO) contribute to the complex picture of aneuploidy seen in many sporadic tumors.  The authors proposed that, together with these and their previous findings, that haploinsufficiency plays a crucial role in shaping the cancer genome.

Hemizygosity and Haploinsufficiency

Below are a few definitions from Wikipedia:

Zygosity is the degree of similarity of the alleles for a trait in an organism.

Most eukaryotes have two matching sets of chromosomes; that is, they are diploid. Diploid organisms have the same loci on each of their two sets of homologous chromosomes, except that the sequences at these loci may differ between the two chromosomes in a matching pair and that a few chromosomes may be mismatched as part of a chromosomal sex-determination system. If both alleles of a diploid organism are the same, the organism is homozygous at that locus. If they are different, the organism is heterozygous at that locus. If one allele is missing, it is hemizygous, and, if both alleles are missing, it is nullizygous.

Haploinsufficiency occurs when a diploid organism has only a single functional copy of a gene (with the other copy inactivated by mutation) and the single functional copy does not produce enough of a gene product (typically a protein) to bring about a wild-type condition, leading to an abnormal or diseased state. It is responsible for some but not all autosomal dominant disorders.

Al Knudsen and The “Two-Hit Hypothesis” of Cancer

Excerpt from a Scientist article by Eugene Russo about Dr. Knudson’s Two hit Hypothesis;

for full article please follow the link http://www.the-scientist.com/?articles.view/articleNo/19649/title/-Two-Hit–Hypothesis/

The “two-hit” hypothesis was, according to many, among the more significant milestones in that rapid evolution of biomedical science. The theory explains the relationship between the hereditary and nonhereditary, or sporadic, forms of retinoblastoma, a rare cancer affecting one in 20,000 children. Years prior to the age of gene cloning, Knudson’s 1971 paper proposed that individuals will develop cancer of the retina if they either inherit one mutated retinoblastoma (Rb) gene and incur a second mutation (possibly environmentally induced) after conception, or if they incur two mutations or hits after conception.3 If only one Rb gene functions normally, the cancer is suppressed. Knudson dubbed these preventive genes anti-oncogenes; other scientists renamed them tumor suppressors.

When first introduced, the “two-hit” hypothesis garnered more interest from geneticists than from cancer researchers. Cancer researchers thought “even if it’s right, it may not have much significance for the world of cancer,” Knudson recalls. “But I had been taught from the early days that very often we learn fundamental things from unusual cases.” Knudson’s initial motivation for the model: a desire to understand the relationship between nonhereditary forms of cancer and the much rarer hereditary forms. He also hoped to elucidate the mechanism by which common cancers, such as those of the breast, stomach, and colon, become more prevalent with age.

According to the then-accepted somatic mutation theory, the more mutations, the greater the risk of cancer. But this didn’t jibe with Knudson’s own studies on childhood cancers, which suggested that, in the case of cancers such as retinoblastoma, disease onset peaks in early childhood. Knudson set out to determine the smallest number of cancer-inducing events necessary to cause cancer and the role of these events in hereditary vs. nonhereditary cancers. Based on existing data on cancer cases and some mathematical deduction, Knudson came up with the “two-hit” hypothesis.

Not until 1986, when researchers at the Whitehead Institute for Biomedical Research in Cambridge, Mass., cloned the Rb gene, would there be solid evidence to back up Knudson’s pathogenesis paradigm.4 “Even with the cloning of the gene, it wasn’t clear how general it would be,” says Knudson. There are, it turns out, several two-hit lesions, including polyposis, neurofibromitosis, and basal cell carcinoma syndrome. Other cancers show only some correspondence with the two-hit model. In the case of Wilm’s tumor, for example, the model accounts for about 15 percent of the cancer incidence; the remaining cases seem to be more complicated.


His seminal paper on the two-hit hypothesis[1]

A.G. Knudson, “Mutation and cancer: statistical study of retinoblastoma,” Proceedings of the National Academy of Sciences, 68:820-3, 1971.

The two hit hypothesis proposed by A.G. Knudson.  A description with video of Dr. Knudson talk at AACR can be found at the following link (photo creditied to A.G. Knudson and Fox Chase Cancer Center at the following link:http://www.fccc.edu/research/research-awards/knudson/index.html


1.            Knudson AG, Jr.: Mutation and cancer: statistical study of retinoblastoma. Proceedings of the National Academy of Sciences of the United States of America 1971, 68(4):820-823.

2.            Solimini NL, Xu Q, Mermel CH, Liang AC, Schlabach MR, Luo J, Burrows AE, Anselmo AN, Bredemeyer AL, Li MZ et al: Recurrent hemizygous deletions in cancers may optimize proliferative potential. Science 2012, 337(6090):104-109.

3.            Davoli T, Xu Andrew W, Mengwasser Kristen E, Sack Laura M, Yoon John C, Park Peter J, Elledge Stephen J: Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns and Shape the Cancer Genome. Cell 2013, 155(4):948-962.

Other papers on this site on CANCER and MUTATION include:

Cancer Mutations Across the Landscape

Salivary Gland Cancer – Adenoid Cystic Carcinoma: Mutation Patterns: Exome- and Genome-Sequencing @ Memorial Sloan-Kettering Cancer Center

Whole exome somatic mutations analysis of malignant melanoma contributes to the development of personalized cancer therapy for this disease

Breast Cancer and Mitochondrial Mutations

Winning Over Cancer Progression: New Oncology Drugs to Suppress Passengers Mutations vs. Driver Mutations

Hold on. Mutations in Cancer do good.

Rewriting the Mathematics of Tumor Growth; Teams Use Math Models to Sort Drivers from Passengers

How mobile elements in “Junk” DNA promote cancer. Part 1: Transposon-mediated tumorigenesis.

Read Full Post »

Cancer Mutations Across the Landscape

Curator: Larry H. Bernstein, MD, FCAP

This is an up-to-date article about the significance of mutations found in 12 major types of cancer.

Cancer Mutations Across the Landscape

Word Cloud by Daniel Menzin

UPDATED 4/24/2020  The genomic landscape of pediatric cancers: Curation of WES/WGS studies shows need for more data

Mutational landscape and significance across 12 major cancer types

Cyriac Kandoth1*, Michael D. McLellan1*, Fabio Vandin2, Kai Ye1,3, Beifang Niu1, Charles Lu1, et al.

1The Genome Institute, Washington University in St Louis, Missouri 63108, USA. 2Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA. 3Department of Genetics, Washington University in St Louis, Missouri 63108, USA. 4Department of Medicine, Washington University in St Louis, Missouri 63108, USA. 5Siteman Cancer Center, Washington University in St Louis, Missouri 63108, USA. 6Department of Mathematics, Washington University in St Louis, Missouri 63108, USA.

NATURE 17 Oct 2013;  5 0 2      http://dx.doi.org/10.1038/nature12634

The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate

  1. the distributions of mutation frequencies,
  2. types and contexts across tumour types, and
  3. establish their links to tissues of origin,
  4. environmental/ carcinogen influences, and
  5. DNA repair defects.

Using the integrated data sets, we identified 127 significantly mutated genes from well-knownand emerging cellular processes in cancer.

  1. (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase,Wnt/b-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control)
  2. (for example, histone, histone modification, splicing, metabolism and proteolysis)

The average number of mutations in these significantly mutated genes varies across tumour types;

  1. most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small.
  2. Mutations in transcriptional factors/regulators show tissue specificity, whereas
  3. histone modifiers are often mutated across several cancer types.

Clinical association analysis identifies genes having a significant effect on survival, and

  • investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis.

Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment


The advancement of DNA sequencing technologies now enables the processing of thousands of tumours of many types for systematic mutation discovery. This expansion of scope, coupled with appreciable progress in algorithms1–5, has led directly to characterization of signifi­cant functional mutations, genes and pathways6–18. Cancer encompasses more than 100 related diseases19, making it crucial to understand the commonalities and differences among various types and subtypes. TCGA was founded to address these needs, and its large data sets are providing unprecedented opportunities for systematic, integrated analysis.

We performed a systematic analysis of 3,281 tumours from 12 cancer types to investigate underlying mechanisms of cancer initiation and progression. We describe variable mutation frequencies and contexts and their associations with environmental factors and defects in DNA repair. We identify 127 significantlymutated genes (SMGs) from diverse signalling and enzymatic processes. The finding of a TP53-driven breast, head and neck, and ovarian cancer cluster with a dearth of other mutations in SMGs suggests common therapeutic strategies might be applied for these tumours. We determined interactions among muta­tions and correlated mutations in BAP1, FBXW7 and TP53 with det­rimental phenotypes across several cancer types. The subclonal structure and transcription status of underlying somatic mutations reveal the trajectory of tumour progression in patients with cancer.

Standardization of mutation data

Stringent filters (Methods) were applied to ensure high quality muta­tion calls for 12 cancer types: breast adenocarcinoma (BRCA), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), uterine corpus endometrial carcinoma (UCEC), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), colon and rectal carcinoma (COAD, READ),bladder urothelial carcinoma (BLCA), kidney renal clear cell carcinoma (KIRC), ovarian serous carcinoma (OV) and acute myeloid leukaemia (LAML; conventionally called AML) (Supplementary Table 1). A total of 617,354 somatic mutations, consisting of

  • 398,750 missense,
  • 145,488 silent,
  • 36,443 nonsense,
  • 9,778 splice site,
  • 7,693 non-coding RNA,
  • 523 non-stop/readthrough,
  • 15,141 frameshift insertions/deletions (indels) and
  • 3,538 inframe indels,

were included for downstream analyses (Supplementary Table 2).

Distinct mutation frequencies and sequence context

Figure 1a shows that AML has the lowest median mutation frequency and LUSC the highest (0.28 and 8.15 mutations per megabase (Mb), respectively). Besides AML, all types average over 1 mutation per Mb, substantially higher than in pediatric tumours20. Clustering21 illus­trates that

  • mutation frequencies for KIRC, BRCA, OV and AML are normally distributed within a single cluster, whereas
  • other types have several clusters (for example, 5 and 6 clusters in UCEC and COAD/ READ, respectively) (Fig. 1a and Supplementary Table 3a, b).

In UCEC, the largest patient cluster has a frequency of approximately 1.5 muta­tions per Mb, and

  • the cluster with the highest frequency is more than 150 times greater.

Multiple clusters suggest that factors other than age contribute to development in these tumours14,16. Indeed,

  • there is a significant correlation between high mutation frequency and DNA repair pathway genes (for example, PRKDC, TP53 and MSH6) (Sup­plementary Table 3c). Notably,
  • PRKDC mutations are associated with high frequency in BLCA, COAD/READ, LUAD and UCEC, whereas
  • TP53 mutations are related with higher frequencies in AML, BLCA, BRCA, HNSC, LUAD, LUSC and UCEC (all P < 0.05).

Mutations in POLQ and POLE associate with high frequencies in multiple cancer types; POLE association in UCEC is consistent with previous observations14.

Comparison of spectra across the 12 types (Fig. 1b and Supplemen­tary Table 3d) reveals that LUSC and LUAD contain increased C>A transversions, a signature of cigarette smoke exposure10. Sequence context analysis across 12 types revealed

  • the largest difference being in C>T transitions and C>G transversions (Fig. 1c).

The frequency of thymine 1-bp (base pair) upstream of C>G transversions is mark­edly higher in BLCA, BRCA and HNSC than in other cancer types (Extended Data Fig. 1). GBM, AML, COAD/READ and UCEC have similar contexts in that

  • the proportions of guanine 1 base downstream of C>T transitions are between
    • 59% and 67%, substantially higher than the approximately 40% in other cancer types.

Higher frequencies of transition mutations at CpG in gastrointestinal tumours, including colorectal, were previously reported22. We found three additional cancer types (GBM, AML and UCEC) clustered in the C>T mutation at CpG, consistent with previous findings of

  • aberrant DNA methylation in endometrial cancer23 and glioblastoma24.

BLCA has a unique signature for C>T transitions compared to the other types (enriched for TC) (Extended Data Fig. 1).

Significantly mutated genes

Genes under positive selection, either in individual or multiple tumour types, tend to display higher mutation frequencies above background. Our statistical analysis3, guided by expression data and curation (Methods), identified 127 such genes (SMGs; Supplementary Table 4). These SMGs are involved in a wide range of cellular processes, broadly classified into 20 categories (Fig. 2), including

  • transcription factors/regulators, histone modifiers, genome integrity, receptor tyrosine kinase signal­ling, cell cycle, mitogen-activated protein kinases (MAPK) signalling, phosphatidylinositol-3-OH kinase (PI(3)K) signalling, Wnt/ -catenin signalling, histones, ubiquitin-mediatedproteolysis, and splicing (Fig. 2).

The identification of MAPK, PI(3)K and Wnt/ -catenin signaling path­ways is consistent with classical cancer studies. Notably, newer categories (for example, splicing, transcription regulators, metabolism, proteolysis and histones) emerge as exciting guides for the development of new therapeutic targets. Genes categorized as histone modifiers (Z = 0.57), PI(3)K signalling (Z = 1.03), and genome integrity (Z = 0.66) all relate to more than one cancer type, whereas

  • transcription factor/regulator (Z = 0.40), TGF- signalling (Z = 0.66), and Wnt/ -catenin signalling (Z = 0.55) genes tend to associate with single types (Methods).

Notably, 3,053 out of 3,281 total samples (93%) across the Pan-Cancer collection had at least one non-synonymous mutation in at least one SMG. The average number of point mutations and small indels in these genes varies across tumour types, with the highest (,6 mutations per tumour) in UCEC, LUAD and LUSC, and the lowest (,2 mutations per tumour) in AML, BRCA, KIRC and OV. This suggests that the numbers of both cancer-related genes (only 127 identified in this study) and cooperating driver mutations required during oncogenesis are small (most cases only had 2–6) (Fig. 3), although large-scale structural rearrangements were not included in this analysis.

Common mutations

The most frequently mutated gene in the Pan-Cancer cohort is TP53 (42% of samples). Its mutations predominate in serous ovarian (95%) and serous endometrial carcinomas (89%) (Fig. 2). TP53 mutations are also associated with basal subtype breast tumours. PIK3CA is the second most commonly mutated gene, occurring frequently (>10%) in most cancer types except OV, KIRC, LUAD and AML. PIK3CA mutations frequented UCEC (52%) and BRCA (33.6%), being speci­fically enriched in luminal subtype tumours. Tumours lacking PIK3CA mutations often had mutations in PIK3R1, with the highest occur­rences in UCEC (31%) and GBM (11%) (Fig. 2).

Many cancer types carried mutations in chromatin re-modelling genes. In particular, histone-lysine N-methyltransferase genes (MLL2 (also known as KMT2D), MLL3 (KMT2C) and MLL4 (KMT2B)) clus­ter in bladder, lung and endometrial cancers, whereas the lysine (K)-specific demethylase KDM5C is prevalently mutated in KIRC (7%). Mutations in ARID1A are frequent in BLCA, UCEC, LUAD and LUSC, whereas mutations in ARID5B predominate in UCEC (10%) (Fig. 2).

Fig. 1. Distribution of mutation frequencies across 12 cancer types.

Fig. 1.  | Distribution of mutation frequencies across 12 cancer types.

Dashed grey and solid white lines denote average across cancer types and median for each type, respectively. b, Mutation spectrum of six transition (Ti) and transversion (Tv) categories for each cancer type. c, Hierarchically clustered mutation context (defined by the proportion of A, T, C and G nucleotides within ±2bp of variant site) for six mutation categories. Cancer types correspond to colours in a. Colour denotes degree of correlation: yellow (r = 0.75) and red (r = 1).

Fig. 2.  The 127 SMGs from 20 cellular processes in cancer identified in and Pan-Cancer are shown, with the highest percentage in each gene among 12 (not shown)

Fig. 3. Distribution of mutations in 127 SMGs across Pan-Cancer cohort.

Fig. 3. | Distribution of mutations in 127 SMGs across Pan-Cancer cohort.

Box plot displays median numbers of non-synonymous mutations, with outliers shown as dots. In total, 3,210 tumours were used for this analysis (hypermutators excluded).

Figure 4 | Unsupervised clustering based on mutation status of SMGs. Tumours having no mutation or more than 500 mutations were excluded. A mutation status matrix was constructed for 2,611 tumours. Major clusters of mutations detected in UCEC, COAD, GBM, AML, KIRC, OV and BRCA were highlighted.
Complete gene list shown in Extended Data Fig. 3.  (not shown)

Fig. 5. Driver initiation and progression mutations and tumour clonal mutation is in the subclone

Figure 5 | Driver initiation and progression mutations and tumour clonal mutation is in the subclone

Survival Analysis

We examined which genes correlate with survival using the Cox proportional hazards model, first analysing individual cancer types using age and gender as covariates; an average of 2 genes (range: 0–4) with mutation frequency 2% were significant (P<_0.05) in each type (Supplementary Table 10a and Extended Data Fig. 6). KDM6A and ARID1A mutations correlate with better survival in BLCA (P = 0.03, hazard ratio (HR) = 0.36, 95% confidence interval (CI): 0.14–0.92) and UCEC (P = 0.03, HR = 0.11, 95% CI: 0.01–0.84), respectively, but mutations in SETBP1, recently identified with worse prognosis in atypical chronic myeloid leukaemia (aCML)31, have a significant detrimental effect in HNSC (P = 0.006, HR = 3.21, 95% CI: 1.39–7.44). BAP1 strongly correlates with poor survival (P = 0.00079, HR = 2.17, 95% CI: 1.38–3.41) in KIRC. Conversely, BRCA2 muta­tions (P = 0.02, HR = 0.31, 95% CI: 0.12–0.85) associate with better survival in ovarian cancer, consistent with previous reports32,33; BRCA1 mutations showed positive correlation with better survival, but did not reach significance here.

We extended our survival analysis across cancer types, restricting our attention to the subset of 97 SMGs whose mutations appeared in 2% of patients having survival data in 2 tumour types. Taking type, age and gender as covariates, we found 7 significant genes: BAP1DNMT3AHGFKDM5CFBXW7BRCA2 and TP53 (Extended Data Table 1).  In particular, BAP1 was highly significant (0.00013, HR = 2.20, 95% CI: 1.47–3.29, more than 53 mutated tumours out of 888 total), with mutations associating with detrimental outcome in four tumour types and notable associations in KIRC (P = 0.00079), consistent with a recent report28, and in UCEC(P = 0.066). Mutations in several other genes are detrimental, including DNMT3A (HR = 1.59), previously identified with poor prognosis in AML34, and KDM5C (HR = 1.63), FBXW7 (HR = 1.57) and TP53 (HR = 1.19). TP53 has significant associations with poor outcome in KIRC (P = 0.012), AML (P = 0.0007) and HNSC (P = 0.00007). Conversely, BRCA2 (P = 0.05, HR = 0.62, 95% CI: 0.38 to 0.99) correlates with survival benefit in six types, including OV and UCEC (Supplementary Table 10a, b). IDH1 mutations are associated with improved prognosis across the Pan-Cancer set (HR = 0.67, P = 0.16) and also in GBM (HR = 0.42, P = 0.09) (Supplementary Table 10a, b), consistent with previous work.35

 Driver mutations and tumour clonal architecture

To understand the temporal order of somatic events, we analysed the variant allele fraction (VAF) distribution of mutations in SMGs across AML, BRCA and UCEC (Fig. 5a and Supplementary Table 11a) and other tumour types (Extended Data Fig. 7). To minimize the effect of copy number alterations, we focused on mutations in copy neutral segments. Mutations in TP53 have higher VAFs on average in all three cancer types, suggesting early appearance during tumorigenesis.

It is worth noting that copy neutral loss of heterozygosity is commonly found in classical tumour suppressors such as TP53, BRCA1, BRCA2 and PTEN, leading to increased VAFs in these genes. In AML, DNMT3A (permutation test P = 0), RUNX1 (P = 0.0003) and SMC3 (P = 0.05) have significantly higher VAFs than average among SMGs (Fig. 5a and Supplementary Table 11b). In breast cancer, AKT1, CBFB, MAP2K4, ARID1A, FOXA1 and PIK3CA have relatively high average VAFs. For endometrial cancer, multiple SMGs (for example, PIK3CA, PIK3R1, PTEN, FOXA2 and ARID1A) have similar median VAFs. Conversely, KRAS and/or NRAS mutations tend to have lower VAFs in all three tumour types (Fig. 5a), suggesting NRAS (for example, P = 0 in AML) and KRAS (for example, P = 0.02 in BRCA) have a progression role in a subset of AML, BRCA and UCEC tumours. For all three cancer types, we clearly observed a shift towards higher expression VAFs in SMGs versus non-SMGs, most apparent in BRCA and UCEC (Extended Data Fig. 8a and Methods).

Previous analysis using whole-genome sequencing (WGS) detected subclones in approximately 50% of AML cases15,36,37; however, ana­lysis is difficult using AML exome owing to its relatively few coding mutations. Using 50 AML WGS cases, sciClone (http://github.com/ genome/sciclone) detected DNMT3A mutations in the founding clone for 100% (8 out of 8) of cases and NRAS mutations in the subclone for 75% (3 out of 4) of cases (Extended Data Fig. 8b). Among 304 and 160 of BRCA and UCEC tumours, respectively, with enough coding muta­tions for clustering, 35% BRCA and 44% UCEC tumours contained subclones. Our analysis provides the lower bound for tumour hetero­geneity, because only coding mutations were used for clustering. In BRCA, 95% (62 out of 65) of cases contained PIK3CA mutations in the founding clone, whereas 33% (3 out of 9) of cases had MLL3 muta­tions in the subclone. Similar patterns were found in UCEC tumours, with 96% (65 out of 68) and 95% (62 out of 65) of tumours containing PIK3CA and PTEN mutations, respectively, in the founding clone, and 9% (2 out of22) ofKRAS and 14% (1 out of 7) ofNRAS mutations in the subclone (Extended Data Fig. 8b and Supplementary Table 12).

Mutation con­text (-2 to +2 bp) was calculated for each somatic variant in each mutation category, and hierarchical clustering was then performed using the pairwise mutation context correlation across all cancer types. The mutational significance in cancer (MuSiC)3 package was used to identify significant genes for both indi­vidual tumour types and the Pan-Cancer collective. An R function ‘hclust’ was used for complete-linkage hierarchical clustering across mutations and samples, and Dendrix30 was used to identify sets of approximately mutual exclusive muta­tions. Cross-cancer survival analysis was based on the Cox proportional hazards model, as implemented in the R package ‘survival’ (http://cran.r-project.org/web/ packages/survival/), and the sciClone algorithm (http://github.com/genome/sci-clone) generated mutation clusters using point mutations from copy number neutral segments. A complete description of the materials and methods used to generate this data set and its results is provided in the Methods.

References (20 of 38)

  1. Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).
  2. Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
  3. Dees, N. D. et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).
  4. Roth, A. et al. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907–913 (2012).
  5. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnol. 31, 213–219 (2013).
  6. Jones, S. et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 1801–1806 (2008).
  7. Parsons, D. W. et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812 (2008).
  8. Sjo¨blom, T. etal. The consensuscodingsequences of human breast and colorectal cancers. Science 314, 268–274 (2006).
  9. The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
  10. Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–1075 (2008).
  11. Wood, L. D. etal. The genomic landscapesof human breast and colorectal cancers. Science 318, 1108–1113 (2007).
  12. The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011).
  13. The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
  14. Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
  15. The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
  16. The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
  17. Ellis, M. J. et al. Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 486, 353–360 (2012).
  18. The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).
  19. Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000).
  20. Downing, J. R. et al. The Pediatric Cancer Genome Project. Nature Genet. 44, 619–622 (2012).

UPDATED 4/24/2020  The genomic landscape of pediatric cancers: Curation of WES/WGS studies shows need for more data

The genomic landscape of pediatric cancers: Implications for diagnosis and treatment


SCIENCE15 MAR 2019 : 1170-1175

Source: https://science.sciencemag.org/content/363/6432/1170


The past decade has witnessed a major increase in our understanding of the genetic underpinnings of childhood cancer.  Genomic sequencing studies have highlighted key differences between pediatric and adult cancers.  Whereas many adult cancers are characterized by a high number of somatic mutations, pediatric cancers typically have few somatic mutations but a higher prevalence of germline alterations in cancer predisposition genes.  Also noteworthy is the remarkable heterogeneity in the types of genetic alterations that likely drive the growth of pediatric cancers, including copy number alterations, gene fusions, enhancer hijacking events, and chromoplexy.  Because most studies have genetically profiled pediatric cancers only at diagnosis, the mechanisms underlying tumor progression, therapy resistance, and metastasis remain poorly understood.  We discuss evidence that points to a need for more integrative approaches aimed at identifying driver events in pediatric cancers at both diagnosis and relapse.  We also provide an overview of key aspects of germline predisposition for cancer in this age group.

Approximately 300,000 children from infancy to age 14 are diagnosed with cancer worldwide every year (1). Some of the cancer types affecting the pediatric population are also seen in adolescents and young adults (AYA), but it has become increasingly clear that cancers in the latter age group have unique biological characteristics that can affect prognosis and therapy (2). Pediatric and AYA cancer patients present with a heterogeneous set of diseases that can be broadly subclassified as leukemias, brain tumors, and non–central nervous system (CNS) solid tumors. These subgroups contain numerous distinct clinical entities, many of which are still poorly characterized from a molecular standpoint.

Recent large-scale genomic analyses have increased our understanding of the genetic drivers of pediatric cancer and have helped to identify new clinically relevant subtypes. These studies have also underscored the distinct nature of the genetic alterations in pediatric and AYA cancers versus adult cancers. Of particular note, the number of somatic mutations in most pediatric cancers is substantially lower than that in adult cancers (34). Exceptions are tumors in children who carry germline mutations that compromise repair of DNA damage (5). For many pediatric cancers, driver events are conditioned on the developmental stage in which the tumor arises. For example, a mutation occurring in one developmental compartment (e.g., a muscle stem cell) may lead to cancer, whereas the same mutation in another compartment does not (6). Pediatric cancer genomes are also characterized by specific patterns of copy number alterations and structural alterations [chromoplexy (7), chromothripsis (8)] that are prognostic indicators in several cancer subtypes. Gene fusion events have long been recognized as oncogenic drivers in many pediatric cancers; however, advanced sequencing technologies have revealed that the number of fusion partners is greater than previously thought, and that previously undetected gene rearrangements may also function as drivers. Finally, germline mutations in a wide spectrum of genes that predispose to cancer appear to play a greater role in pediatric cancer than previously appreciated (910).

Somatic alterations in pediatric cancers

Genome landscape studies

Early large-scale sequencing studies of pediatric cancers identified novel driver genes while also underscoring the overall low mutational burden (1114).  Whole exome sequencing studies of Wilms tumor, T-cell acute lymphoblastic leukemia (TALL), and acute myeloid leukemia (CML) identified some recurring mutations such as

  • FLT3-IDT
  • WT1
  • NUP98-NST1 gene fusion

however many of the driver genes were subtype specific.  Other fusion events were seen (by RNASeq) such as

  • EWS-FL1
  • Bcr-Abl
  • MYB-QK1

as well as multiple epigenetic events such as methylations.


  1. E. Steliarova-Foucher, M. Colombet, L. A. G. Ries, F. Moreno, A. Dolya, F. Bray, P. Hesseling, H. Y. Shin, C. A. Stiller, IICC-3 contributors, International incidence of childhood cancer, 2001-10: A population-based registry study. Lancet Oncol. 18, 719–731 (2017). 10.1016/S1470-2045(17)30186-9pmid:28410997
  2. 2. V. Tricoli, D. G. Blair, C. K. Anders, W. A. Bleyer, L. A. Boardman, J. Khan, S. Kummar, B. Hayes-Lattin, S. P. Hunger, M. Merchant, N. L. Seibel, M. Thurin, C. L. Willman, Biologic and clinical characteristics of adolescent and young adult cancers: Acute lymphoblastic leukemia, colorectal cancer, breast cancer, melanoma, and sarcoma. Cancer 122, 1017–1028 (2016). 10.1002/cncr.29871pmid:26849082
  3. 3. S. Lawrence, P. Stojanov, P. Polak, G. V. Kryukov, K. Cibulskis, A. Sivachenko, S. L. Carter, C. Stewart, C. H. Mermel, S. A. Roberts, A. Kiezun, P. S. Hammerman, A. McKenna, Y. Drier, L. Zou, A. H. Ramos, T. J. Pugh, N. Stransky, E. Helman, J. Kim, C. Sougnez, L. Ambrogio, E. Nickerson, E. Shefler, M. L. Cortés, D. Auclair, G. Saksena, D. Voet, M. Noble, D. DiCara, P. Lin, L. Lichtenstein, D. I. Heiman, T. Fennell, M. Imielinski, B. Hernandez, E. Hodis, S. Baca, A. M. Dulak, J. Lohr, D.-A. Landau, C. J. Wu, J. Melendez-Zajgla, A. Hidalgo-Miranda, A. Koren, S. A. McCarroll, J. Mora, B. Crompton, R. Onofrio, M. Parkin, W. Winckler, K. Ardlie, S. B. Gabriel, C. W. M. Roberts, J. A. Biegel, K. Stegmaier, A. J. Bass, L. A. Garraway, M. Meyerson, T. R. Golub, D. A. Gordenin, S. Sunyaev, E. S. Lander, G. Getz, G. Getz, Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013). 10.1038/nature12213pmid:23770567
  4. B. Vogelstein, N. Papadopoulos, V. E. Velculescu, S. Zhou, L. A. Diaz Jr.., K. W. Kinzler, Cancer genome landscapes. Science 339, 1546–1558 (2013). 10.1126/science.1235122pmid:23539594
  5. 5. B. Campbell, N. Light, D. Fabrizio, M. Zatzman, F. Fuligni, R. de Borja, S. Davidson, M. Edwards, J. A. Elvin, K. P. Hodel, W. J. Zahurancik, Z. Suo, T. Lipman, K. Wimmer, C. P. Kratz, D. C. Bowers, T. W. Laetsch, G. P. Dunn, T. M. Johanns, M. R. Grimmer, I. V. Smirnov, V. Larouche, D. Samuel, A. Bronsema, M. Osborn, D. Stearns, P. Raman, K. A. Cole, P. B. Storm, M. Yalon, E. Opocher, G. Mason, G. A. Thomas, M. Sabel, B. George, D. S. Ziegler, S. Lindhorst, V. M. Issai, S. Constantini, H. Toledano, R. Elhasid, R. Farah, R. Dvir, P. Dirks, A. Huang, M. A. Galati, J. Chung, V. Ramaswamy, M. S. Irwin, M. Aronson, C. Durno, M. D. Taylor, G. Rechavi, J. M. Maris, E. Bouffet, C. Hawkins, J. F. Costello, M. S. Meyn, Z. F. Pursell, D. Malkin, U. Tabori, A. Shlien, Comprehensive Analysis of Hypermutation in Human Cancer. Cell 171, 1042–1056.e10 (2017). 10.1016/j.cell.2017.09.048pmid:29056344
  6. 6. Chen, A. Pappo, M. A. Dyer, Pediatric solid tumor genomics and developmental pliancy. Oncogene 34, 5207–5215 (2015). 10.1038/onc.2014.474pmid:25639868
  7. S. C. Baca, D. Prandi, M. S. Lawrence, J. M. Mosquera, A. Romanel, Y. Drier, K. Park, N. Kitabayashi, T. Y. MacDonald, M. Ghandi, E. Van Allen, G. V. Kryukov, A. Sboner, J.-P. Theurillat, T. D. Soong, E. Nickerson, D. Auclair, A. Tewari, H. Beltran, R. C. Onofrio, G. Boysen, C. Guiducci, C. E. Barbieri, K. Cibulskis, A. Sivachenko, S. L. Carter, G. Saksena, D. Voet, A. H. Ramos, W. Winckler, M. Cipicchio, K. Ardlie, P. W. Kantoff, M. F. Berger, S. B. Gabriel, T. R. Golub, M. Meyerson, E. S. Lander, O. Elemento, G. Getz, F. Demichelis, M. A. Rubin, L. A. Garraway, Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013). 10.1016/j.cell.2013.03.021pmid:23622249
  8. P. J. Stephens, C. D. Greenman, B. Fu, F. Yang, G. R. Bignell, L. J. Mudie, E. D. Pleasance, K. W. Lau, D. Beare, L. A. Stebbings, S. McLaren, M.-L. Lin, D. J. McBride, I. Varela, S. Nik-Zainal, C. Leroy, M. Jia, A. Menzies, A. P. Butler, J. W. Teague, M. A. Quail, J. Burton, H. Swerdlow, N. P. Carter, L. A. Morsberger, C. Iacobuzio-Donahue, G. A. Follows, A. R. Green, A. M. Flanagan, M. R. Stratton, P. A. Futreal, P. J. Campbell, Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011). 10.1016/j.cell.2010.11.055pmid:21215367
  9. D. W. Parsons, A. Roy, Y. Yang, T. Wang, S. Scollon, K. Bergstrom, R. A. Kerstein, S. Gutierrez, A. K. Petersen, A. Bavle, F. Y. Lin, D. H. López-Terrada, F. A. Monzon, M. J. Hicks, K. W. Eldin, N. M. Quintanilla, A. M. Adesina, C. A. Mohila, W. Whitehead, A. Jea, S. A. Vasudevan, J. G. Nuchtern, U. Ramamurthy, A. L. McGuire, S. G. Hilsenbeck, J. G. Reid, D. M. Muzny, D. A. Wheeler, S. L. Berg, M. M. Chintagumpala, C. M. Eng, R. A. Gibbs, S. E. Plon, Diagnostic Yield of Clinical Tumor and Germline Whole-Exome Sequencing for Children With Solid Tumors. JAMA Oncol. 2, 616 (2016). 10.1001/jamaoncol.2015.5699pmid:26822237
  10. J. Zhang, M. F. Walsh, G. Wu, M. N. Edmonson, T. A. Gruber, J. Easton, D. Hedges, X. Ma, X. Zhou, D. A. Yergeau, M. R. Wilkinson, B. Vadodaria, X. Chen, R. B. McGee, S. Hines-Dowell, R. Nuccio, E. Quinn, S. A. Shurtleff, M. Rusch, A. Patel, J. B. Becksfort, S. Wang, M. S. Weaver, L. Ding, E. R. Mardis, R. K. Wilson, A. Gajjar, D. W. Ellison, A. S. Pappo, C.-H. Pui, K. E. Nichols, J. R. Downing, Germline Mutations in Predisposition Genes in Pediatric Cancer. N. Engl. J. Med. 373, 2336–2346 (2015). 10.1056/NEJMoa1508054pmid:26580448
  11. T. J. Pugh, O. Morozova, E. F. Attiyeh, S. Asgharzadeh, J. S. Wei, D. Auclair, S. L. Carter, K. Cibulskis, M. Hanna, A. Kiezun, J. Kim, M. S. Lawrence, L. Lichenstein, A. McKenna, C. S. Pedamallu, A. H. Ramos, E. Shefler, A. Sivachenko, C. Sougnez, C. Stewart, A. Ally, I. Birol, R. Chiu, R. D. Corbett, M. Hirst, S. D. Jackman, B. Kamoh, A. H. Khodabakshi, M. Krzywinski, A. Lo, R. A. Moore, K. L. Mungall, J. Qian, A. Tam, N. Thiessen, Y. Zhao, K. A. Cole, M. Diamond, S. J. Diskin, Y. P. Mosse, A. C. Wood, L. Ji, R. Sposto, T. Badgett, W. B. London, Y. Moyer, J. M. Gastier-Foster, M. A. Smith, J. M. Guidry Auvil, D. S. Gerhard, M. D. Hogarty, S. J. M. Jones, E. S. Lander, S. B. Gabriel, G. Getz, R. C. Seeger, J. Khan, M. A. Marra, M. Meyerson, J. M. Maris, The genetic landscape of high-risk neuroblastoma. Nat. Genet. 45, 279–284 (2013). 10.1038/ng.2529pmid:23334666
  12. J. R. Downing, R. K. Wilson, J. Zhang, E. R. Mardis, C.-H. Pui, L. Ding, T. J. Ley, W. E. Evans, The Pediatric Cancer Genome Project. Nat. Genet. 44, 619–622 (2012). 10.1038/ng.2287pmid:22641210
  13. St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project, Somatic histone H3 alterations in pediatric diffuse intrinsic pontine gliomas and non-brainstem glioblastomas. Nat. Genet. 44, 251–253 (2012). 10.1038/ng.1102pmid:22286216
  14. J. Zhang, L. Ding, L. Holmfeldt, G. Wu, S. L. Heatley, D. Payne-Turner, J. Easton, X. Chen, J. Wang, M. Rusch, C. Lu, S.-C. Chen, L. Wei, J. R. Collins-Underwood, J. Ma, K. G. Roberts, S. B. Pounds, A. Ulyanov, J. Becksfort, P. Gupta, R. Huether, R. W. Kriwacki, M. Parker, D. J. McGoldrick, D. Zhao, D. Alford, S. Espy, K. C. Bobba, G. Song, D. Pei, C. Cheng, S. Roberts, M. I. Barbato, D. Campana, E. Coustan-Smith, S. A. Shurtleff, S. C. Raimondi, M. Kleppe, J. Cools, K. A. Shimano, M. L. Hermiston, S. Doulatov, K. Eppert, E. Laurenti, F. Notta, J. E. Dick, G. Basso, S. P. Hunger, M. L. Loh, M. Devidas, B. Wood, S. Winter, K. P. Dunsmore, R. S. Fulton, L. L. Fulton, X. Hong, C. C. Harris, D. J. Dooling, K. Ochoa, K. J. Johnson, J. C. Obenauer, W. E. Evans, C.-H. Pui, C. W. Naeve, T. J. Ley, E. R. Mardis, R. K. Wilson, J. R. Downing, C. G. Mullighan, The genetic basis of early T-cell precursor acute lymphoblastic leukaemia. Nature 481, 157–163 (2012). 10.1038/nature10725pmid:22237106

Read Full Post »

Larry H Bernstein, MD, FCAP
Pharmaceutical Intelligence

UPDATED 4/23/2020:  New Design for Phase 1 pediatric oncology trials to expedite dose escalation studies.

Clinical Trials Revisited


Cancer Clinical Trials of Tomorrow

Advances in genomics and cancer biology will alter the design of human cancer studies

By Tomasz M. Beer | April 1, 2013   The Scientist
We stand on the cusp of significant change in the fundamental structure of cancer clinical trials, as the emphasis begins to shift from large-scale studies of relatively unselected patients to smaller studies testing more narrowly targeted therapies in molecularly characterized populations.
The previous (and still current) generation of trials established the cancer treatment standards used today. Trials that demonstrated the value of combination chemotherapy in the adjuvant treatment of breast cancer are an excellent example. Meticulous development of treatment regimens through Phase 1 and Phase 2 trials, followed by large-scale comparisons of the new regimens to established treatment protocols, have defined the modern practice of oncology for the last 4 decades. Future cancer clinical trials will be very different from those of the past, adopting a more personalized, sometimes called “precision,” approach.
It is, of course, not entirely true that past clinical trials did not include efforts to target treatments to the right patients. Where possible, targeted therapies are already being implemented. Using the presence of endocrine receptors to guide endocrine therapy for breast cancer was one of the first forays into molecular selection of patients. Unfortunately, the ability to select subgroups of patients for study has been severely curtailed by a still-limited knowledge of human cancer biology.
This is rapidly changing, however, thanks to advances in genomics and comprehensive cancer biology research over the last decade. Large-scale efforts, such as The Cancer Genome Atlas, are comprehensively defining many of the crucial molecular characteristics of human malignancies by illuminating genetic alterations that are clinically and biologically important, and which, by virtue of their functional roles, are viable targets for cancer treatment. At the same time, the ability to design small-molecule inhibitors of specific cancer targets is rapidly accelerating. In 2011, two new agents exemplified the power of these trends: crizotinib was approved for the treatment of lung cancers that harbor a specific mutation in the ALK gene, and vemurafenib was approved for the treatment of melanomas with a specific BRAF mutation. In both cases, the drugs were approved along with companion diagnostic tests that identify patients with the target mutation, who are therefore likely to benefit from treatment.

Smaller, more precise trials ahead

Clinical trials are being transformed by these trends. It will not happen overnight, as the knowledge of cancer biology and the availability of targeted agents are uneven. Unselected populations of patients will still be studied, but it is inevitable that there will be a rise in the number of trials that incorporate molecular tumor testing prior to treatment, with treatment selection informed by the molecular features of each individual’s cancer. Such personalized trials have the potential to yield better outcomes by increasing the probability of response and to employ less toxic therapies by increasingly targeting cancer-specific functions, rather than normal proliferative functions.
To the extent that targeted therapies will prove more effective when given to selected patients, clinical trials should get dramatically smaller. Trial size is largely driven by how effective the treatment is expected to be, so fewer participants are needed when the therapeutic benefit is larger. But the promise of smaller trials will not to be universal; for example, when two targeted agents are compared to one another in the same molecularly selected population, the differences in efficacy may be small and larger trials will be required.
As approaches to cancer treatment advance, there will need to be continual engagement with patients and with cancer survivors.
Furthermore, smaller trials may not necessarily move faster or be easier to complete, as they will require the “right patients,” who may be hard to find. Many of the mutations that represent promising targets are present in a minority of tumors. Today, molecular characterization of tumors is often done as part of the screening process for each trial. Many, and sometimes most, of the patients prove ineligible, making this approach frustrating and difficult to carry out. A better avenue of attack would be to make comprehensive molecular characterization of tumors a routine part of establishing a patient’s eligibility for a range of therapies. With the plummeting cost of genomic analysis, one can envision a day in the near future when a complete cancer genome (and perhaps other molecular evaluations) becomes a standard component of an initial diagnostic evaluation. Patients will be armed with molecular information about their own tumors, and thus able to make more-informed decisions about standard and investigational therapies that match the mutations driving their cancer.

New challenges

The road to personalized and targeted treatment strategies will offer new challenges. For rare targets that are present in a minority of cases across many different types of cancers, one will have to consider clinical trials that include a number of different cancers. There are many design pitfalls to such trials, chiefly the additional clinical and molecular heterogeneity introduced by the inclusion of more than one cancer type. Despite these challenges, it will inevitably make sense in some settings to select patients who share a particular tumor biology, regardless of the tissue of origin.
Another major challenge is how to combine targeted therapies to improve clinical outcomes. To date, targeted therapies have not been able to cure advanced solid tumors. Clinical benefits, while sometimes quite impressive when compared to marginally effective treatments, still fall far short. It stands to reason that redundant survival and growth pathways enable tumors to overcome therapies that inhibit a single target. The simultaneous inhibition of relevant redundant pathways may yield dramatically better results, but will also dramatically increase the complexity of molecularly personalized clinical trials.
As approaches to cancer treatment advance, there will need to be continual engagement with patients and with cancer survivors. Fewer than 5 percent of adult cancer patients participate in a clinical trial. To carry out meaningful clinical trials in the future, that number must increase. This will be most important for treatments that target relatively rare mutations; a large number of potential volunteers will have to be screened to identify a sufficient number who harbor the relevant target. To succeed, we must partner with a much larger fraction of cancer patients.
Designing and executing future cancer clinical trials will not be easy, but physician-scientists are armed with a fast-growing body of omics-informed knowledge with which to surmount these hurdles.
Tomasz M. Beer is deputy director of the Knight Cancer Institute and a professor of medicine at Oregon Health & Science University in Portland. He is the coauthor of Cancer Clinical Trials: A Commonsense Guide to Experimental Cancer Therapies and Clinical Trials. Written for people living with cancer, the book is accompanied by a blog (www.cancer-clinical-trials.com) that seeks to disseminate knowledge about clinical trials.


tumor suppression, tumor heterogeneity, genetics & genomics, disease/medicine, clinical trials, chemotherapy, cancer genomics and cancer

UPDATED 4/23/2020:  New Design for Phase 1 pediatric oncology trials to expedite dose escalation studies.



Ushering in the next generation of precision trials for pediatric cancer

Steven G. DuBois, Laura B. Corson, Kimberly Stegmaier, Katherine A. Janeway

Science  15 Mar 2019:Vol. 363, Issue 6432, pp. 1175-1181 DOI: 10.1126/science.aaw4153



Cancer treatment decisions are increasingly based on the genomic profile of the patient’s tumor, a strategy called “precision oncology.” Over the past few years, a growing number of clinical trials and case reports have provided evidence that precision oncology is an effective approach for at least some children with cancer. Here, we review key factors influencing pediatric drug development in the era of precision oncology. We describe an emerging regulatory framework that is accelerating the pace of clinical trials in children as well as design challenges that are specific to trials that involve young cancer patients. Last, we discuss new drug development approaches for pediatric cancers whose growth relies on proteins that are difficult to target therapeutically, such as transcription factors.

Some terms from the bibliography:

3+3 design: A commonly used rule-based design for phase 1 clinical trials in which patients are enrolled in cohorts of three patients, and decisions to increase or decrease the dose level for the next three participants are based on toxicities observed in those three patients.


Basket trial: A precision oncology trial design in which patients with many different cancer types are enrolled, the tumor is tested for a set of biomarkers of interest, and then patients are assigned to one of several clinical trial subprotocols based on the presence of a biomarker corresponding to a particular molecularly targeted therapy.


Bayesian model–based trial designs: A broad class of trial designs that use data known before the trial as well as data obtained during the conduct of the trial to adapt trial parameters as more information becomes available

Continual reassessment method: One example of a Bayesian model–based trial design in which an initial mathematical model of the relationship between drug dose and probability of unacceptable toxicity is continually updated as new information becomes available to assign subsequent patients to a dose anticipated to have an unacceptable toxicity rate below a set rate.

First-in-child trial: The first clinical trial of a specific agent to include a pediatric population, traditionally considered patients <18 years of age.


Rolling 6 design: A variation of the 3+3 design in which up to six participants may be enrolled to a dosing cohort before enrollment pauses to assess toxicity.

Safety run-in: An initial component of a phase 2 or phase 3 trial in which a small group of patients are treated with a previously untested regimen to evaluate toxicity before opening the trial to a larger group of participants.

Umbrella trial: A precision oncology trial design in which patients with a specific cancer type are enrolled, tumor is tested for a set of biomarkers of interest, and then patients are assigned to one of several clinical trial subprotocols based on the presence of a biomarker corresponding to a particular molecularly targeted therapy.


In this review article, DuBois et al describe new paradigms for pediatric precision oncology trial design and how these designs should be contrasted with the old models and differentiate from the design for these types of trials in the adult.  As the genomic landscape of pediatric tumors is becoming clearer (12) the authors noticed two themes which are becoming evident:

  1. Pediatric cancers harbor certain genomic mutations rarely seen in adult cancers
  2. Pediatric cancers share some genomic alterations and mutational gene signatures with adult tumors

However there is only a small number of pediatric clinical trials to investigate if specific genetic mutations predict outcome to a given personalized therapy.

            Thus, there an urgent need for precision clinical trials in pediatric cancers.

Several reviews have described numerous ongoing and recently completed trials however most are phase 1 dose escalation trials including basket trials and umbrella trials but based on previous data from adult trials using the same precision drug.  For example, pediatric trials involving the TRK inhibitor laratrectinib in tumors harboring a NTRK fusion gene or a pediatric crizotinib trial for pediatric glioblastomas having an ALK fusion protein have shown great success yet most of the early phase 1 work was based on adults or carried out in a way that does not take advantage of the new regulatory framework designed to expedite new drugs for adult precision medicines.

Speeding up the early phase trials in pediatric cancers: new trial design paradigms

Dose escalation phase I trials have, traditionally been the starting point for clinical development of new pediatric anticancer drugs however these first in child trials have seriously lagged their adult counterparts by many years.  These trials relied on the standard 3 x 3  or rolling six trial design, and doses escalated until a pediatric MTD  (maximum tolerated dose) was achieved.  In recent years new precision medicine pediatric trial design has been adopted to expedite the process, based on the fundamental shift in thinking that many new oncology agents will not have a true MTD when tested in adults.

Doses in phase 1 trials for targeted therapies like those in precision medicine are usually escalated based on considerations other than toxicity, like pharmacodynamics or biomarker analysis.  A pediatric phase 1 dose escalation trial may require more subjects than an adult trial.  But

although these newer approaches to early-phase trial design more efficiently establish a pediatric dose, they do little to advance our understanding of with patients are most likely to benefit from a new therapy.

Thus the need for good biomarkers to be included early on in these initial trial designs.  For example, Dana Farber’s first in child clinical trial NCT03654716, a Phase 1 Study of the Dual MDM2/MDMX Inhibitor ALRN-6924 in Pediatric Cancer (as a possible treatment for resistant (refractory) solid tumor, brain tumor, lymphoma or leukemia), are reducing the time children are waiting for entry into a trial, as unselected patients can enroll and the biomarker, increased MDM2 expression is used to determine those patients who go on to phase 2 dose escalation. In other cases, such as NCI Children’s Oncology Group basket trials, they have completely supplanted formal phase 1 trial design and instead incorporated molecularly targeted therapies based on adult doses but adjusted for patient size.  The use of combinations with traditional therapies in trial design is also helping to speed up the process for enrollment.  The authors also suggest that tumor profiling is pertinent however should be put in trial design so the costs to patients can be covered by the trial funds.


Figure 1Fig. 1 Evolution of precision trials for pediatric cancer.

Illustration: Kellie Holoski/Science

Source: Ushering in the next generation of precision trials for pediatric cancer BY STEVEN G. DUBOIS, LAURA B. CORSON, KIMBERLY STEGMAIER, KATHERINE A. JANEWAY SCIENCE 15 MAR 2019 : 1175-1181 https://science.sciencemag.org/content/363/6432/1175


  1. S. N. Gröbner, B. C. Worst, J. Weischenfeldt, I. Buchhalter, K. Kleinheinz, V. A. Rudneva, P. D. Johann, G. P. Balasubramanian, M. Segura-Wang, S. Brabetz, S. Bender, B. Hutter, D. Sturm, E. Pfaff, D. Hübschmann, G. Zipprich, M. Heinold, J. Eils, C. Lawerenz, S. Erkek, S. Lambo, S. Waszak, C. Blattmann, A. Borkhardt, M. Kuhlen, A. Eggert, S. Fulda, M. Gessler, J. Wegert, R. Kappler, D. Baumhoer, S. Burdach, R. Kirschner-Schwabe, U. Kontny, A. E. Kulozik, D. Lohmann, S. Hettmer, C. Eckert, S. Bielack, M. Nathrath, C. Niemeyer, G. H. Richter, J. Schulte, R. Siebert, F. Westermann, J. J. Molenaar, G. Vassal, H. Witt, B. Burkhardt, C. P. Kratz, O. Witt, C. M. van Tilburg, C. M. Kramm, G. Fleischhack, U. Dirksen, S. Rutkowski, M. Frühwald, K. von Hoff, S. Wolf, T. Klingebiel, E. Koscielniak, P. Landgraf, J. Koster, A. C. Resnick, J. Zhang, Y. Liu, X. Zhou, A. J. Waanders, D. A. Zwijnenburg, P. Raman, B. Brors, U. D. Weber, P. A. Northcott, K. W. Pajtler, M. Kool, R. M. Piro, J. O. Korbel, M. Schlesner, R. Eils, D. T. W. Jones, P. Lichter, L. Chavez, M. Zapatka, S. M. Pfister, ICGC PedBrain-Seq Project, ICGC MMML-Seq Project, The landscape of genomic alterations across childhood cancers. Nature 555, 321–327 (2018). 10.1038/nature25480pmid:29489754


2.  X. Ma, Y. Liu, Y. Liu, L. B. Alexandrov, M. N. Edmonson, C. Gawad, X. Zhou, Y. Li, M. C. Rusch, J. Easton, R. Huether, V. Gonzalez-Pena, M. R. Wilkinson, L. C. Hermida, S. Davis, E. Sioson, S. Pounds, X. Cao, R. E. Ries, Z. Wang, X. Chen, L. Dong, S. J. Diskin, M. A. Smith, J. M. Guidry Auvil, P. S. Meltzer, C. C. Lau, E. J. Perlman, J. M. Maris, S. Meshinchi, S. P. Hunger, D. S. Gerhard, J. Zhang, Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature 555, 371–376 (2018). 10.1038/nature25795pmid:29489755

Related articles

Clinical Trials (journal)

Clinical Trials (journal) (Photo credit: Wikipedia)

Contemporary Clinical Trials

Contemporary Clinical Trials (Photo credit: Wikipedia)

Cover of "Cancer Biology (3rd Edition)"
Cover of Cancer Biology (3rd Edition)

Read Full Post »

Curator: Aviva Lev-Ari, PhD, RN

New Institute for Precision Medicine Created at Weill Cornell Medical College and NewYork-Presbyterian Hospital


NEW YORK (Jan. 31, 2013) — Recognizing that medicine is not “one size fits all,” Weill Cornell Medical College and NewYork-Presbyterian Hospital have created the pioneering Institute for Precision Medicine at Weill Cornell and NewYork-Presbyterian/Weill Cornell Medical Center. This new, cutting-edge translational medicine research hub will explore the new frontier of precision medicine, offering optimal targeted, individualized treatment based on each patient’s genetic profile. The institute’s new genomic research discoveries will help develop novel, personalized medical therapies to be tested in innovative clinical trials, while also building a comprehensive biobank to improve research and patient care.

Dr. Mark Rubin

The Institute for Precision Medicine will be led by Dr. Mark Rubin, a renowned pathologist and prostate cancer expert who uses whole genomic sequencing in his laboratory to investigate DNA mutations that lead to disease, particularly prostate cancer. Dr. Rubin currently serves as vice chair for experimental pathology, director of Translational Research Laboratory Services, the Homer T. Hirst III Professor of Oncology, professor of pathology and laboratory medicine and professor of pathology in urology at Weill Cornell and is a pathologist at NewYork-Presbyterian/Weill Cornell.

Dr. Rubin and his team seek to replace the traditional one-size-fits-all medicine paradigm with one that focuses on targeted, individualized patient care using a patient’s own genetic profile and medical history. Physician-scientists at the institute will seek to precisely identify the genetic influencers of a patient’s specific illness — such as cancer, cardiovascular disease, neurodegenerative disease and others — and use this genetic information to design a more-effective course of treatment that targets those specific contributing factors. Also, genomic analyses of tumor tissue will enable researchers to help patients with advanced disease and no current treatment options, as well as to isolate the causes of drug resistance in patients who stop responding to treatments, redirecting them to more successful therapies.

Preventive precision medicine will also be a key initiative at the institute, allowing physician-scientists to help identify a patient’s risk of diseases and take necessary steps to aid in its prevention through medical treatment and/or lifestyle modification. In addition, the Institute for Precision Medicine will leverage an arsenal of innovative genomic sequencing, biobanking and bioinformatics technology to transform the existing paradigm for diagnosing and treating patients.

“This institute will revolutionize the way we treat disease, linking cutting-edge research and next-generation sequencing in the laboratory to the patient’s bedside,” Dr. Rubin says. “We will use advanced technology and the collective wealth of knowledge from our clinicians, basic scientists, pathologists, molecular biologists and computational biologists to pinpoint the molecular underpinnings of disease — information that will spur the discovery of novel treatments and therapies. It’s an exciting time to be involved in precision medicine and I look forward to advancing this game-changing field of medicine.”

“Precision medicine is the future of medicine, and its application will help countless patients,” says Dr. Laurie H. Glimcher, the Stephen and Suzanne Weiss Dean of Weill Cornell Medical College. “The Institute for Precision Medicine, with Dr. Rubin’s expertise and strong leadership, will accelerate our understanding of the human genome, provide key insights into the causes of disease and enable our physician-scientists to translate this knowledge from the lab to the clinical setting to help deliver personalized treatments to the sickest of our patients.”

Three main resources will facilitate the institute’s groundbreaking precision medicine work:

  • genomics sequencing,
  • biobanking and
  • bioinformatics.

Weill Cornell and NewYork-Presbyterian will invest in state-of-the-art technology to conduct sequencing, a more expansive biobank for all patient specimens and tissue samples and dedicated bioinformaticians who will closely analyze patient data, searching for genetic mutations and other abnormalities to identify and target with treatment.

“The Institute for Precision Medicine will enable our doctors to tailor effective treatments for individual patients and also predict the diseases that are likely to affect a patient long before they develop,” says Dr. Steven J. Corwin, CEO of NewYork-Presbyterian Hospital. “By harnessing the full potential of our enhanced understanding of the human genome, and extending its reach into the clinical realm, the institute will transform patient care at NewYork-Presbyterian/Weill Cornell Medical Center and beyond.”

Dr. Rubin, the institute’s inaugural director, is a board-certified pathologist and physician-scientist with specific expertise in genitourinary pathology and an internationally recognized leader in prostate cancer genomics and biomarker research. His groundbreaking research investigating molecular biomarkers distinguishing indolent from aggressive disease has led to landmark discoveries that revolutionized the understanding of prostate cancer’s molecular underpinnings. This includes co-discovering two of the most common mutations in prostate cancer,

  • the TMPRSS2-ETS rearrangements and 
  • SPOP mutations.

Dr. Rubin is one of the “Dream Team” principal investigators of a multi-institutional $10 million grant from Stand Up 2 Cancer (SU2C) and the Prostate Cancer Foundation, addressing patients with advanced prostate cancer through a multi-phase approach employing next generation sequencing to help inform the direction of future clinical trials. Additionally, Dr. Rubin serves as a co-principal investigator on the National Cancer Institute‘s (NCI) Early Detection Research Network (EDRN) Biomarker Discovery Laboratory and worked for many years as part of the NCI Prostate Cancer Specialized Programs of Research Excellence (SPORE).

Dr. Rubin has authored more than 275 peer-reviewed publications, predominantly in prostate cancer, and holds multiple NCI-funded grants in prostate cancer genomics and biomarker development. He is a member of the World Health Organization Prostate Cancer Tumor Classification and the Prostate TCGA (The Cancer Genome Atlas) Working Group. He serves as an ad hoc reviewer for multiple publications including Nature, Science, Cancer Cell, Cancer Discovery and the New England Journal of Medicine. Dr. Rubin also serves as the chair of the EDRN Prostate Cancer Working Group and is a member of the ERDN Steering Committee. He is active in the NCI/NHGRI-sponsored TCGA serving on the Prostate Cancer Working Group and he is an external advisor for the Canadian International Cancer Genome Consortium (ICGC). He served on the NCI Cancer Biomarker Study Section for five years and as an ad hoc reviewer for other NCI and international granting organizations.

Dr. Rubin is the recipient of the Arthur Purdy Stout Society of Surgical Pathologists Annual Prize (2003), the Young Investigator Award (2004) given by the United States and Canadian Academy of Pathology and the Huggins Medal (2012), the highest award bestowed by the Society of Urologic Oncology. Finally, Dr. Rubin was a co-team leader with his long-term collaborator, Arul M. Chinnaiyan (University of Michigan) for the first annual American Association of Cancer Research Team Science Award (2007) in recognition for their groundbreaking work on TMPRSS2-ETS fusion prostate cancer.


Clinical Laboratory Improvement Amendments (CLIA)

The Centers for Medicare & Medicaid Services (CMS) regulates all laboratory testing (except research) performed on humans in the U.S. through the Clinical Laboratory Improvement Amendments (CLIA). In total, CLIA covers approximately 225,000 laboratory entities. The Division of Laboratory Services, within the Survey and Certification Group, under the Office of Clinical Standards and Quality (OCSQ) has the responsibility for implementing the CLIA Program.

The objective of the CLIA program is to ensure quality laboratory testing. Although all clinical laboratories must be properly certified to receive Medicare or Medicaid payments, CLIA has no direct Medicare or Medicaid program responsibilities.

For the following information, refer to the downloads/links listed below:

  • For additional information about a particular laboratory, contact the appropriate State Agency or Regional Office CLIA contact (refer to State Agency or Regional Office CLIA link found on the left-hand navigation plane);
  • Information about direct access testing (DAT) and the CLIA regulations is included in the Direct Access Testing download;
  • OIG reports relating to CLIA;
  • Guidance for Coordination of CLIA Activities Among CMS Central Office, CMS Regional Offices, State Agencies (including State with Licensure Requirements), Accreditation Organizations and States with CMS Approved State Laboratory Programs is contained in the Partners in Laboratory Oversight download;
  • Quality control (QC) highlights from the regulations published in the Federal Register on January 24, 2003 are listed under the QC Highlights download;
  • Micro sample pipetting information for laboratories;
  • Information on alternative (non-traditional) laboratory is contained in the Special Alert download;
  • Identifying Best Practices in Laboratory Medicine – a Battelle Project for the Centers for Disease Control and Prevention (CDC); and
  • FDA Safety Tip for laboratories on how workload should be calculated when using currently FDA-approved semi-automated gynecologic cytology screening devices.

For specific information about the quality assurance guidelines for testing using the rapid HIV-1 antibody tests waived under CLIA, refer to the CDC Division of Laboratory Systems website listed under the related links outside CMS section below.

Complaint Reporting

To report a complaint about a laboratory, contact the appropriate State Agency that is found on the State Agency & Regional Office CLIA Contacts page located in the left navigation bar in this section.


New Weill Cornell Precision Medicine Institute Plans to Offer Genomically Guided Treatment after CLIA Approval

February 06, 2013

Through a newly created Institute for Precision Medicine,Weill Cornell Medical College and New York Presbyterian Hospital plan to begin offering targeted, individualized treatment informed by patients’ genomes.

The institute first plans to guide treatment decisions for cancer patients using their genomic data, and then broaden the effort to those with common illnesses, such as cardiovascular disease and neurodegenerative disorders.

The new institute is currently awaiting regulatory approval from CLIA and New York State, according to its leader, Mark Rubin, a professor of pathology at Weill Cornell.

With that approval in hand, the center will begin using genome sequencing and other tools to inform treatment strategies for patients – first focusing on cancer, and then eventually broadening to other disease areas, he said.

While Rubin did not detail how the institute will recruit patients, he said the center plans to see cancer patients who can benefit from single-gene tests or other molecular diagnostics to inform treatment decisions, those with advanced diseases without treatment options, and patients who stop responding to standard treatments and could be redirected to other therapies.

“For some patients, there are very clear indications of whether they need a specific targeted therapy. Those are pretty straightforward,” Rubin said.

“And then there is emerging data that sequencing, either exome or whole-genome, can provide insight on which treatments cancer patients might need that are not considered standard treatments,” he said.

Insights from advanced sequencing technologies are also changing how researchers study patients, sometimes facilitating N-of-1 trials. “There are a few examples where treatments have been implemented and shown to be effective in a clinical trial of one,” Rubin said, “where they are the only person on the trial because of their specific mutations.”

He said the institute plans to be agnostic in terms of what technologies it uses for sequencing, but currently it relies on Illumina technology.

“We will have a number of different approaches, but the key is to do as best as possible in the clinical setting so that the results can be used in the management of patients,” Rubin said.

According to Rubin, the institute aims to find the optimal ways to collect genomic data, analyze it, and store it. As the center gears up and sees larger numbers of patients, Rubin said it also plans to use data and samples it collects to support larger retrospective or prospective studies, for which the institute is considering partnering with the New York Genome Center.

But not all patients the institute sees may require large-scale genome sequencing, Rubin reiterated.

“It may turn out that the most efficient way to determine if someone has a certain mutation, like EGFR, is to run the single-gene test up front. That’s not going to change for some types of disease,” he noted. “So, what I see our role being is developing these more complex approaches, which could be whole-genome sequencing, or using multiplex panels of genes.”

The institute will focus its efforts first on cancer patients because the development of genomically targeted therapies is relatively accelerated compared to other disease areas, so the potential to match a mutation in a cancer patient’s genome to a potential treatment may be greater than for those with other illnesses. But according to Rubin, the institute does plan to expand to other populations, like cardiovascular conditions, neurodegenerative diseases, and possibly infectious diseases.

In addition, he said the Institute is also discussing how it might use prognostic genomic information to look at disease risk, with the potential to inform early interventional treatment decision making.

“Because we are a hospital that sees well patients being followed by their doctors, that’s something we’re contemplating as pilot,” he said.

“We don’t have a plan in mind yet, but those types of studies are probably very important in specific disease entities, for patients at risk for a particular constitutional disease … or you could imagine we might screen large numbers of our patient population to look for risk factors that may not have been identified yet,” he said.

Several commercial and academic groups have recently begun offering clinical cancer sequencing and other genomic analyses to potentially guide therapeutic decision making.

Foundation Medicine, for example, sells a test that sequences the exons of nearly 200 genes known to be mutated in solid tumors and provides a report informing doctors of actionable mutations.

Other firms, like Caris Life Sciences, provide reports to doctors and patients matching gene-expression or sequence data to potentially actionable therapies.

The University of Michigan and the International Genomics Consortium announced last fall that they were creating a non-profit company called Paradigm to provide a targeted-sequencing-based diagnostic service to guide personalized treatment for cancer and, eventually, other diseases (PGx Reporter 7/5/2012).

Though the new Weill Cornell institute plans to start fairly narrowly, Rubin’s overall goals for the center may put it in a position to offer a potentially more comprehensive service than many of these other groups.

“The most challenging part right now is for us to understand that sequencing is just a test, something that may or may not be useful in itself. It’s in working in the clinical setting that we are going to really define it,” Rubin said.

The institute, in the early days of its operations, is still in a learning phase, but “expectations are high” for the effort to succeed, according to Rubin. “Our job is to live up to the promise [of sequencing] to help identify novel targets for patients who may not have any choices with respect to treatment,” Rubin said. “And also to make discoveries that may be useful for a larger population.”

Molika Ashford is a GenomeWeb contributing editor and covers personalized medicine and molecular diagnostics. E-mail Molika Ashford.

Related Stories


Complaint Reporting

To report a complaint about a laboratory, contact the appropriate State Agency that is found on the State Agency & Regional Office CLIA Contacts page located in the left navigation bar in this section.

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

arrayMap: A Reference Resource for Genomic Copy Number Imbalances in Human Malignancies

Haoyang Cai#, Nitin Kumar#, Michael Baudis*Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland

Abstract Top


The delineation of genomic copy number abnormalities (CNAs) from cancer samples has been instrumental for identification of tumor suppressor genes and oncogenes and proven useful for clinical marker detection. An increasing number of projects have mapped CNAs using high-resolution microarray based techniques. So far, no single resource does provide a global collection of readily accessible oncogenomic array data.

Methodology/Principal Findings

We here present arrayMap, a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The arrayMap database provides a platform for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. To date, the resource incorporates more than 40,000 arrays in 224 cancer types extracted from several resources, including the NCBI’s Gene Expression Omnibus (GEO), EBI’s ArrayExpress (AE), The Cancer Genome Atlas (TCGA), publication supplements and direct submissions. For the majority of the included datasets, probe level and integrated visualization facilitate gene level and genome wide data review. Results from multi-case selections can be connected to downstream data analysis and visualization tools.


To our knowledge, currently no data source provides an extensive collection of high resolution oncogenomic CNA data which readily could be used for genomic feature mining, across a representative range of cancer entities. arrayMap represents our effort for providing a long term platform for oncogenomic CNA data independent of specific platform considerations or specific project dependence. The online database can be accessed at http//www.arraymap.org.

Citation: Cai H, Kumar N, Baudis M (2012) arrayMap: A Reference Resource for Genomic Copy Number Imbalances in Human Malignancies. PLoS ONE 7(5): e36944. doi:10.1371/journal.pone.0036944

Editor: Ying Xu, University of Georgia, United States of America

Received: January 10, 2012; Accepted: April 16, 2012; Published: May 18, 2012

Copyright: © 2012 Cai et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: HC is supported through a personal grant from the China Scholarship Council. NK and MB had received support through the Krebsliga Schweiz and the University of Zurich’s Research Priority Program Systems Biology. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

* E-mail: michael.baudis@imls.uzh.ch

# These authors contributed equally to this work.

Introduction Top

Genomic copy number abnormalities (CNAs) are a relevant feature in the development of basically all forms of human malignancies [1]. Many genomic imbalances are recurrent and display tumor-specific patterns [2],[3]. It is believed that these genomic instabilities reveal mutations in tumor suppressor genes and oncogenes which eventually result in a clone of fully malignant cells. Investigation of CNA hot spots (chromosomal loci frequently involved in CNA) has proven to be an effective methodology to identify novel cancer-causing genes [4][5]. On a systems level, CNA data along with expression or somatic mutation data is used to detect pathways altered in cancers and to deduce functional relevance of pathway members[6][7]. Since many CNAs have been attributed to specific tumor types or clinical risk profiles, in some entities copy number profiling is employed to characterize different biological as well as clinical subtypes with implications for treatment and individual prognosis. Subtype-associated CNA regions are used to predict causative genes, furthering understanding of biological differences and leading to discovery of new therapeutic targets [8][9].

Throughout the last two decades, molecular-cytogenetic techniques have been applied to scan genomic copy number profiles in virtually all types of human neoplasias. For whole genome analysis, these techniques predominantly consist of chromosomal and array comparative genomic hybridization (CGH), including CNA detection by cDNA and single nucleotide polymorphism (SNP) arrays [10][12]. While chromosomal CGH has a limited spatial resolution of several megabases, the resolution of recent array based technologies (aCGH) is mainly limited due to cost/benefit evaluations instead of technical obstacles. In this article, we use the terms “array CGH” and “aCGH” for all technical variants of whole genome copy number arrays. This includes e.g. single color arrays for which regional copy number normalization is performed through bioinformatics procedures applied to external references and internal data distribution.

The flood of new insights into structural genomic changes in health and disease has led to an increased interest in genomic data sets in genetic and cancer research. Several systematic studies of CNAs across many cancer types have been performed [13][14]. These efforts attempt a more complete understanding of functional effect of CNAs in the context of cancer.

The exponential increase of high resolution CNA datasets offers new challenges and opportunities for large-scale genomic data mining, data modeling and functional data integration. Several online resources have been developed, focusing on different aspects of data content as well as representation [6][15][19]. An overview of some of the prominent examples is given in Table 1. In principle, these databases facilitate access and utilization of CNA data. However, they are limited to specific aCGH platforms and/or single institutions as well as limited disease categories, or, as in the cases of GEO [15] and Ensembl ArrayExpress[16], mainly serve as raw data repositories. To the best of our knowledge, no single data source does yet provide an extensive collection of high resolution oncogenomic CNA data which readily could be used for genomic feature mining, across a representative range of cancer entities.

Table 1. Prominent online resources of genomic data.


Here we present “arrayMap”, a web-based reference database for genomic copy number data sets in cancer. We have generated a pipeline to accumulate and process oncogenomic array data into a unified and structured format. The resource incorporates associated histopathological and clinical information where accessible.

So far, arrayMap contains more than 40,000 arrays on 224 cancer types from five main data sources: NCBI GEO, EBI ArrayExpress, The Cancer Genome Atlas, publication supplements and user submitted data. Samples of interest can be browsed, visualized and analyzed via an intuitive interface. Computational tools are provided for biostatistical data analysis such as CNA clustering for case specific or for subset data and basic clinical correlations. arrayMap is publicly available at www.arraymap.org.

Results Top

Data Content

Our combination of both “top-down” (publication driven) as well as “bottom-up” (array data driven) approaches allowed us to identify a comprehensive set of accessible aCGH based cancer CNA data sets and to estimate the ratio of accessible data of the overall published/deposited data.

As main result of the array data driven approach, we extracted 495 series comprising of 32002 arrays, generated on 237 platforms from NCBI’s GEO. Among those, raw data files of approximately 29000 whole genome arrays were suitable for inclusion into our data processing pipeline. When reviewing the content of AE, we found that the majority of AE cancer genome data sets were also submitted to GEO. At the time of writing, 11 datasets including 712 arrays not present in GEO had been processed based on AE specific series. Detailed information on the GEO/AE data sets is provided in Table S1.

The top-down procedure was based on our group’s continuous monitoring of cancer related articles utilizing genome copy number screening approaches, as established for our “Progenetix” project (www.progenetix.org[19]). The census date for the literature based data collection was August 15 2011. At this point, we had identified 931 articles discussing a total of 53213 genomic cancer CNA profiles based on aCGH techniques. Of these, 8728 cases out of 199 articles so far had been extracted from publication related sources (e.g. supplementary data tables) and annotated and made been accessible through Progenetix. This data included cases for which only supervised information but no probe data was available (e.g. author annotated Golden Path or cytogenetic CNA regions). Literature based data sets containing probe specific data or with the respective data presented to us by the authors (640 samples) were included into our arrayMap data processing pipeline.

The data content of arrayMap is summarized in Table 2. Current numbers on the website will include changes based on ongoing annotation efforts (i.e. addition of data sets, removal of low quality arrays).

Table 2. aCGH data integrated in arrayMap.


As a by-product of our data collection and annotation efforts, we are able to provide estimates of content and trends for the platform usage and cancer entity coverage for the majority of published data. According to the assigned ICD-O 3 (International Classification of Diseases for Oncology, 3rd Edition) code and descriptive diagnostic text, breast carcinoma predominates as single largest clinical entity with 6459 arrays.Table S2 presents sample sets in arrayMap classified by ICD-O code.

The most widely available array CGH platforms are either based on large insert clones (BAC/P1 arrays) or based on shorter single-stranded DNA molecules (oligonucleotide arrays), which may or may not include single-nucleotide polymorphism specific probe sequences (SNP arrays). Also, although designed for gene expression profiling, cDNA arrays were used by several laboratories for measuring genomic copy number changes. Although all these platforms are considered suitable for whole genome CNA analysis, their probe densities and other parameters can affect specific features of the analysis results [20][23]Table S3 lists the general platform types and corresponding overall numbers of the data registered in arrayMap.

In reviewing the technical platform composition, two related trends become apparent (Figure 1). Originally developed in groups with expertise in molecular cytogenetics and cancer genome analysis, printed large insert clone arrays (BAC/P1) were the first whole genome CNA screening tools with a spatial resolution surpassing that of chromosomal CGH. Other groups re-employed cDNA arrays, developed for expression screening, for genomic hybridizations. However, over the last years one can observe the overwhelming use of various industrially produced oligonucleotide array platforms, which compensate their low single probe fidelity through a probe density at 1–3 orders of magnitude higher than common for BAC/P1 arrays. Another reason for the success of oligonucleotide arrays is the integration of SNP specific probes, which in principle allows to use of the same experiments for genetic association studies and the evaluation of copy number neutral loss of heterozygosity regions [12][24][25].

Figure 1. Distribution of resolutions and techniques of GEO platforms.

Each point represents a genomic array. The Y axis is labeled with probe number in log scale. The X axis denotes the time sequence of array data generation. From left to right are years from 2001 to 2011.



Data Access and Usage Scenarios

Based on our experience from the Progenetix project, a strong emphasis was put on a user friendly data interface. Here, we followed a “dual user type” scenario: Users without bioinformatics background should be able to intuitively visualize core data features as well as to perform standard analysis procedures, while for bioinformaticians the formatted database content should be accessible to use with their analysis tools of choice.

Query interface.

Data browsing in arrayMap is based on two types of query methods: search by experimental series metadata and search by sample features.

In the series query form, users can perform various search options by specifying (i) descriptive diagnosis text; (ii) disease classification (ICD-O 3 code(s)); (iii) disease locus (ICD topography code(s)); (iv) PubMed ID; (v) technique(s); (vi) series ID. For sample specific queries, additional features are available: sample ID; platform ID or description; and single or combined regional CNAs. Users can input gene name(s) in “regional CAN” search field. When at least two characters are entered into the field, suggestions based on a HUGO gene list are displayed for selection. Gene selections will be converted to genomic locations.

In the results table, associated array information is displayed. A number of links to additional and/or outside data is provided, according to the information available: the corresponding PubMed entries; the original GEO/AE accession display page for more complete information; the case and publication entries on the Progenetix website for further analysis; and importantly the array specific data visualization page.

Data download options.

On pages resulting from sample queries or sample data processing, users are presented with options to download sample data based on the current queryÕs return. Currently, three different file types are offered: JSON files, tab separated feature files and segments list files. These files enable bioinformaticians to perform further analyses based on their tools of choice. Particularly, the JSON format can be used for direct database import (e.g. MongoDB) or can be deparsed by common libraries (e.g. JSON.pm), or being read into web applications.

Array probe data visualization.

In the array plot interface, original plots of genomic array data sets can be searched and visualized (Figure S1). Default threshold parameters which were either provided with the data or assigned during the initial visualization will be loaded. In single array visualization, the general view of probe distribution and post-thresholding segmentation results are displayed for the whole genome as well as for each individual chromosome. If multiple arrays are retrieved, users can select sample data for downstream analysis procedures. Figure S2 shows the screenshot of single array visualization.

Users can segment the raw data values and re-plot the results after revising the following parameters:

  • Golden path edition, default HG18/NCBI Build 36. This is still the commonly used version of the human reference genome assembly. At the moment, coordinates of probes from all platforms were remapped to HG18. For the near future, we intend to allow for a selection of updated genome editions.
  • Chromosomes to plot, default 1 to 22. Single or all chromosomes can be selected for re-plotting. To avoid gender bias, most platforms do not contain probes in chromosome X and Y during the design.
  • Loss/gain thresholds. Cut-offs from which a segment is considered a genomic loss or gain. The optimum thresholds may vary between platforms.
  • Region size in kb. Sets a filter to remove CNA below (e.g. probable noise) or above (e.g. exclude non-focal CNA) a certain size range.
  • Minimal probe numbers for segments. This parameter can be used to limit the minimal number of probes required for a segment to be considered (e.g. to remove aberrant segmentation due to probe level noise). Empirical examples would be values of 2–3 for high quality BAC arrays and 6–10 for Affymetrix SNP 6 arrays.
  • Plot region. Single genomic region to be plotted, overriding the chromosome selection above. When selected, plots with this region will be generated for all current arrays. This is valuable to e.g. display the CNA status and copy number transition points for specific genes of interest (Figure S3).
Zoom-in visualization of focal CNA.

Figure 2 shows the visualization of focal genomic imbalances, e.g. to identify genes of interest targeted by focal CNA. The whole genome view of GSM535547 (human high grade glioma sample analyzed by Agilent Human Genome CGH Microarray 244A) shows a small regional deletion in chromosome 9p21. When plotting the approximate locus of the deletion (specified as chr9:21600000-22400000), genes, probes and chromosome bands in this zoomed in region are shown. Two genes, MTAP and CDKN2A can be seen as being localized in a potential homozygously deleted region. The focal deletion of these known tumor suppressor genes [26][27] points to their specific involvement in the glioblastoma sample analyzed here.

Figure 2. Zoom-in visualization of focal CNA.

(A) GSM535547 (human high grade glioma, Agilent CGH 244A) shows high quality of probe hybridization signal. CNAs are easy to distinguish. (B) When zoom-in the whole chromosome 9, an approximately 80 MB deletion is displayed, with two breakpoints located in p and q arm respectively. In addition, a small regional deletion in 9p21 is quite clear. Color bars in lower region of the panel represent 848 genes located in chromosome 9. (C) Zoom in the potential homozygously deleted region in 9p21 by specifying the exact region: chr9:21600000-22400000. The zoomed-in plot shows probes, chromosome band and two tumor suppressor genes, MTAP and CDKN2A. Gene name and location will be given while mouse hover. They link to UCSC genome browser with additional information.


Querying compound CNA.

The concept of focal CNA detection can be integrated with a global search for arrays containing gene specific regional imbalances. As an example, we demonstrate the search for arrays displaying imbalances in 4 gene loci associated with glioblastoma: EGFR, a transmembrane receptor and proto-oncogene [28]; PTEN, a tumor suppressor gene [29]; ASPM, frequently overexpressed in glioblastoma relative to normal brain tissue [30]; and CDKN2A (see above). In the “Search Samples” form, the “Match (Multiptle) Regions & Types” can be used to specify the genomic regions of those four genes including the expected CNA type: for EGFR (chr7:55054219-55242524:1), PTEN (chr10:89613175-89718511:-1), ASPM (chr1:195319885-195382287:1) and CDKN2A (chr9:21957751-21984490:-1), respectively. When executing the query, these regions were matched with the whole database and returned cases which have imbalances overlapping all these regions. When excluding controls and “worst quality” datasets, 303 out of 42421 arrays could be identified matching all four CNA regions. In addition to glioblastoma, several other types of cancer cases were among the results, including e.g. neuroblastomas, breast carcinomas, melanomas and lung carcinomas, which is in accordance with some previous observations [31][34]. CNA and associated data of those cases can be processed by online tools for further analysis and visualization (Figure S4) or downloaded for offline processing.

Copy number profiling of selected cancer entities.

One aim of arrayMap is to allow researchers to conveniently perform aCGH meta-analysis across different platforms. By selecting a single or several cancer entities e.g. based on their ICD entity codes or diagnostic keywords, users are able to generate disease specific CNA frequency profiles or to compare profiles of different cancer types.

As an example, we used ICD-O code 9440/3 (glioblastoma, NOS) to query the database. 1478 arrays from 25 publications were returned and passed to our suite of online analysis tools. Chromosomal ideograms and histograms were generated representing the frequency of copy number aberrations identified over the whole dataset (Figure 3A). In the overall aberration profile, the most common genomic imbalances included whole chromosome 7 gain and chromosome 10 loss, as well as focal gains e.g. on bands 1q21 and 17q21. In our example dataset, a prominent focal deletion hot-spot was centered around 9p21.3 (921 of 1478 arrays, 62.31%) which had been discussed previously [35]. The distribution of CNAs over the individual arrays was visualized through a matrix plot (Figure 3B). As additional information to the frequency histograms, this form of visualization facilitates e.g. the detection of CNA patterns among individual arrays as well as the concordance of individual CNAs (e.g. here the arm-level changes in chromosome 7 and 10).

Figure 3. Copy number profiling of glioblastoma.

(A) Chromosomal ideogram and histogram showing frequency of copy number aberrations. Percentage values corresponding to gains (yellow) and losses (blue) identified over the whole dataset. The most frequent imbalances include gain of chromosome 7 and loss of chromosome 10, 9p21.3. (B) Matrix plot of 1478 glioblastoma cases. The Y axis represents individual samples. The distribution of genomic copy number imbalances reveals the individual aberration patterns of glioblastoma. (C) Heatmap of regional CNA frequencies for 1478 arrays. The intensity of green and red color components correlates to the relative gain and loss frequencies, respectively. If dataset contains cancer subtypes, cancers with similar CNA frequency profiles will be clustered together, such that differences between subtypes will be revealed (e.g. see Figure S4H).



In the matrix plot, clicking on a certain segment would open the related view in the UCSC genome browser[36], for detailed information related to this genomic region (SVG plot only). The plot order of arrays can be re-sorted according to ICD morphology, ICD topography, clinical group or PubMed ID, which can be helpful in associating CNA patterns to external classification categories. For the selected classification criterium (default: ICD morphology), regional CNA frequencies for cases matching the different values will be visualized through a heatmap (Figure 3C); this feature is especially useful when comparing a number of different primary classification criteria.

An Overall Genomic Copy Number Profile of Cancer

Our high quality core dataset in arrayMap was used to generate an overall cancer copy number aberration profile based on 29,137 arrays (Figure 4). This data represented 177 cancer types according to ICD-O 3 code, with 59 types among them contained more than 50 arrays. Overall, one of the most common genomic alteration is copy-number gain of chromosome band 8q24, which is found in 30% of total samples. According to the COSMIC [37] database, the most significant cancer gene in this region is MYC. It is a well-documented oncogene codes for a transcription factor that is believed to regulate the expression of 15% of all genes, including genes involved in cell division, growth, and apoptosis [38][39]. Other common imbalances observed in at least 25% of oncogenomic arrays included gains of regions on e.g. 17q21 (29%), 1q21 (33%) and loss of regions on e.g. 8p23 (32%) and 9p21 (25%), including focal deletions of the CDKN2A/B locus (Figure 2).

Figure 4. The overall cancer copy number aberration profile consisted of 29137 arrays.

This plot represents 177 cancer types according to ICD-O 3 code. Percentage values in Y axis corresponding to numbers of gains (green) and losses (red) account for the whole dataset.



While the overall CNA frequency distribution points towards DNA features targeted in multiple entities, this information is insufficient for deriving molecular mechanisms associated with specific cancer types. The genomic heterogeneity of different neoplasias is reflected in the varying patterns of regional CNA frequencies. Based on our core dataset, we have generated a heatmap-style visualization of frequency profiles for all ICD-O entities containing more than 50 arrays (Figure S5). The striking patterning of the CNA profiles indicates the non-random occurrence of CNAs, and should be seen as an invitation to explore e.g. CNA similarities shared by separate histopathological entities, as a way to transpose knowledge about pathophysiological mechanisms.

Discussion Top

arrayMap was developed to facilitate the progress of oncogenomic research. Our aim is to provide high-quality genomic copy number profiles of human tumors, along with a set of tools for accessing and analyzing CNA data. The service has been implemented with a straightforward web interface, including search options for CNA features and clinical annotation data. All assembled datasets are processed into platform independent segmentation and, for the vast majority of arrays, probe level data files, and are presented in consistent formats. Importantly, the direct access to precomputed probe level data plots supports a rapid evaluation of experiments for features of interest. As a curated database using standardized annotation schemes (e.g. ICD classification), arrayMap facilitates the exploration of cancer type specific CNA data, as well as the statistical association of genomic features to clinical parameters.

arrayMap is a dynamic database that is being continuously expanded and improved. We will review the existing and newly published articles to update the database periodically. Over the past decade, we have witnessed a rapidly increasing number of aCGH publications, which gives us sufficient evidences to anticipate that cases in our database will continue to be deposited at a high rate. Although arrayMap is not a user driven repository, we welcome and support users interested in using the site for yet undisclosed data, if they agree on data sharing upon publication.

Although, in contrast to the continuous data from expression analysis, copy number analysis explores discrete value spaces (countable number of DNA copies, for segments defined by genomic base positions), interpretation of the data can vary due to different low level (e.g. signal/background correction) and higher level (e.g. segmentation algorithms, regional or size based filtering) procedures. In that respect, we have to emphasize that the results of our data processing and annotation procedures are open to scrutiny. We encourage a critical review of individual results, and are open for suggestions regarding improved processing procedures for specific platforms.

In this paper, we have provided example scenarios of using arrayMap on different levels, i.e. locus centric and for entity profiling. We believe that systematic analyses will help researchers to discover features which are indiscernible in individual studies, and thus bring new insights for understanding of disease pathology and the development of new therapeutic approaches [40][43]. We expect that researchers will integrate arrayMap data with their own analysis efforts, e.g. to increase sample size or for result verification purposes. We hope that this database will promote further evolution of microarray data meta-analysis. ArrayMap provides access to more than 200 tumor types, which makes it suitable for research across cancer entities. Furthermore, normal sample controls are of vital importance for genomic imbalances studies. ArrayMap includes more than 3000 normal samples from healthy individuals or from normal tissues of cancer patients. These data could be integrated as reference dataset e.g. to account for copy number variation data superimposed on the tumor profiling results.

In the near future, with the continuous accumulation of very high resolution CNA data from genomic arrays and next-generation sequencing experiments, it will become possible to integrate these data into systems biology methods to elucidate effects of genomic instability, and describe the results from more perspectives. Envisioned examples would be e.g. the identification of genes that are involved in metastasis and treatment response; identification of chromosomal breakpoints distribution in cancer; and modeling functional networks in cancer by systems biology approaches.

Methods Top

Dataset Collection

Raw experimental data from a variety of platforms and repositories were extracted. They were converted to an uniform format which is suited to our reanalysis and visualization system. After a series of parsing procedures, the called copy number data is stored in arrayMap. The flowchart of arrayMap data collection and analysis is as shown in Figure 5. Five main data sources are integrated into arrayMap:


Figure 5. The flowchart of arrayMap data collection and analysis procedures.

Publicly available raw data or segmented data was collected from the respective data sources. Files were re-processed by distinct procedures, according to the different data types. Probe coordinates were remapped to the most commonly encountered human reference genome assembly (NCBI Build 36/hg18). All probe specific ratios were converted to log2 values. Thresholds for genomic gain and loss were obtained from the original publications or series annotations; if not available, empirical thresholds were assigned. A minimum of 2 probes was required for calling a CNA segment, with higher values used on high-density arrays and/or in cases of excessive probe level noise. Processed probe and segment information was converted to uniform formats and stored in per-sample text files, which are accessed through the arrayMap web applications.



For extracting appropriate data Series from GEO/AE, two basic criteria have to be fulfilled. First, the raw data has to be from human malignancies analyzed by BAC, cDNA, aCGH or oligonucleotide arrays. Second, the array platform must be genome wide, with the optional omission of the sex chromosomes. Chromosome or region specific arrays were excluded because they were not able to reveal the whole genomic profile of the respective cancer. Associated clinical data was extracted if available.


Segmentation data with available clinical information was extracted and incorporated into the database. Due to data sharing restrictions, TCGA data is an exception in that, so far no probe level data is incorporated into arrayMap. This exception was accepted since users will be able to access individual TCGA datasets through the projects web portal at http://tcga-data.nci.nih.gov/tcga/.


Many aCGH datasets can be found in the text or supplementary files of publications. In order to collect data from publications, we relied on our Progenetix projectÕs setup. Data in Progenetix is manually curated. The collection strategies are:

  • literature mining using complex search parameters through PubMed
  • identification of called aCGH data, in GP annotation or tabular format (article, supplementary tables)
  • evaluation of supplementary files for probe specific data tables
  • follow-up on article links outs, to repository entries or referenced datasets
User submission.

User submitted data was provided in a number of formats which were converted to the standard format as described. Although we accept and support private datasets, we insist on integration of at least the genomic and core clinical data (e.g. disease classifiers) upon publication of the datasets analysis results.

Dataset Analysis

Probe remapping.

A pipeline has been generated for determining the genomic positions for the tens to hundreds of thousands array probes with reference to a common genome Golden Path edition. For each array platform, the genome positions of probes were remapped to the current commonly used version of the human reference genome assembly (NCBI Build 36.1/hg18). Specific mapping procedures were employed for different types of probes. BAC clones were firstly remapped according to the clone sets information of Sanger/DECIPHER database [44]. If the probe position was not available, the UCSC Genome annotation database [36] (release hg18) was used for compensation. After these two steps, a mean of 98% of the BAC clones were remapped. For IMAGE clone sets, only the UCSC Genome annotation database was used. The average remapping rate of IMAGE clones was 91%. Affymetrix raw CEL data files were analyzed based on hg18 library files, namely the output segments have hg18 coordinates. The summary of the percentage of mapped probes is given in Table 3. The mapping details for each platform can be found in the (Table S4).

Table 3. Percentage of remapped probes according to platform types.


Probe signal normalization.

The array data available was given in a variety of formats, most frequently as log2 ratio of probe hybridization intensity. In order to make data from different platforms directly comparable, all other types of normalized values were converted to log2. For dye swap experiments, reference/tumor intensity ratios data was “reversed” representing a tumor/reference value. For some two-color arrays for which only raw signal intensity were provided, the normalized log2 ratio for each probe was calculated by.

where T and T represent tumor sample intensity and tumor channel background intensity respectively, and R and R represent reference sample intensity and reference channel background intensity respectively. If multiple instances of the same clone exist, the average signal intensity of the certain clone was considered.

To call gains and losses according to normalized log2 ratio is an important step to identify copy number imbalances. For each re-analyzable dataset, related publications were explored to obtain original threshold descriptions. If this information was not available, empirical thresholds were assigned and resulting CNA calls were visually compared with probe value plots. Processing method and threshold information for each array are provided in the Table S5.

Affymetrix genotyping arrays.

For the widely used Affymetrix GenomeWide SNP arrays, raw CEL files were downloaded and underwent a massive re-analysis using the R package aroma.affymetrix [45] with the CRMAv.2 method [46]. During the processing step, approximately 50 normal sample arrays were employed as a reference set for each array type to reduce the noise level. Normal tissue arrays from different labs were extracted and used to build the reference dataset. In order to obtain high quality arrays, we excluded arrays which contain segments greater than 3 mega-bases, since copy number variations are always smaller than 3 mega-bases. The list of normal tissue reference arrays is giving in Table S6.

Quality control.

In our review of array data deposited in GEO or collected from publication supplements we encountered a large number of individual data sets with insufficient or limited probe quality. Also, for samples of unprocessed raw data (e.g. Affymetrix CEL files), we found that QC measures reported previously (e.g. call rate [47], NUSE [48], RLE [48]) only had a limited accuracy for detection of arrays with inadequate probe level data. Currently, the most viable strategy for quality assessment of processed, heterogeneous copy number arrays is the visual inspection of probe plotting and segmentation results through an experienced researcher. For the first arrayMap edition we generated a quality classification system, which contains a total of 4 categories based on inspections of genome-wide array plots:

  • Excellent. Probe signal distribution is significantly different between normal regions and imbalance regions. Signal baseline is distinct and unique, making segmentation threshold realistic appearing. Chromosomal changes are pretty clear.
  • Good. In general good quality. Probe signal may contain some noise, but tolerable. Chromosomal changes are distinguishable.
  • Hypersegmented. Serrated distribution of probe signal intensities, causing dozens of separate peaks and discontinuous segments. Chromosomal changes are always up to several hundreds and smaller than 5 mega-bases.
  • Noisy. Probe signal intensities are highly scattered, but well-distributed, with high standard deviation, resulting in the inability to differentiate copy number changes.

Depending on the intended research purpose this basic classification system can be used for a pre-analysis triage of copy number data. Applying stringent review criteria we identified a core dataset with “excellent” quality arrays accounting for approximately 60 percent of total arrays. We are currently working on a platform independent quality assessment system for genomic arrays, which will be implemented in future versions of the arrayMap resource.

Associated data.

For arrayMap, data is stored with separate datasets for each array. This is in contrast to the Progenetix database, for which technical replicates where available are combined into case specific CNA profiles. In arrayMap, technical replicates are assigned an identical case identifier to facilitate downstream statistical procedures including e.g. clinical data correlations. The assignment of the correct diagnostic entity to each sample is an essential step in generating a binding between genomic and associated data points. At the same time, to ensure annotation consistency and make the retrieval process more efficient, for all CNA profiles the following data points were manually collected from GEO/ArrayExpress and published papers if available.

  • Descriptive diagnostic text, as available through the original source
  • Diagnostic classification according to the International Classification of Diseases in Oncology (ICDO 3, morphology with code)
  • Tumor locus according to ICD (ICD topography with code)
  • Source of material (e.g. primary tumor, cell line, metastasis)
  • Clinical parameters where available, including age, gender, grade, clinical stage (TNM coded), recurrence/progression, time to recurrence/progression, death and followup
Web Server.

An online interface of arrayMap database was created using Perl common gateway interface (CGI) and R scripts running on Mac OS X Server. Sample and series data is stored using a MongoDB database eingine (http://www.mongodb.org). Precomputed array plots are stored as flat files, mostly in both SVG and PNG versions. The online release of the service has been optimized to be compatible with major browsers supporting current web standards (CSS2, HTML5, XML with inline SVG; e.g. Safari > = 3.0, Firefox > = 3.0, InternetExplorer > = 9, Google Chrome) with limited fallback support. Dynamic graphics provided in the array plot module were implemented as server side services by technologies including XML/XHTML, JavaScript, SVG and HTML5 Canvas.

For the future, we intend a quarterly database content revision to ensure inclusion of newly published articles and GEO/AE entries. Archived versions of the sample annotations will be made available upon special request. Additional feature and small data updates will be performed as seen necessary. The “News” page of Progenetix/arrayMap will be used for feature and content announcements.

Supporting Information Top

Figure S1.

Array data sets visualization. Original plots and optimized parameters for GSE21530 which contains 8 intimal sarcoma samples hybridized on Agilent CGH Microarray 244A platform. The normalized probe signal log2 ratios and post-thresholding segmentation results for each array are intuitively displayed. Genomic alterations are represented by horizontal green (gain) and red (loss) lines. Alterations defined here as regions with log2 ratio >0.15 or <−0.15. Simplified schemas of CNAs link to UCSC genome browser for further review.


Figure S2.

Screenshot of single array visualization. ArrayMap plots for GSM630977 (acute myelogenous leukemia). Besides the whole genome view, subviews of each chromosome are displayed as well. From these plots, different kinds of genetic variation events are clearly revealed, e.g. massive genomic rearrangement in chromosome 6; arm-level gain of chromosome 8q and 3MB focal change around 1p31.3. Through the “Plot Array Data” interface, users can segment the raw data values and re-plot the results with customized parameters.


Figure S3.

Plot single genomic region. In the “Plot Array Data” interface, input the precise location (chr5:1100000-1400000) in “Plot Region” field. Plots with this region were generated for all 8 arrays in the current series (GSE21530). In this region, there are 5 genes which are shown schematically as colored boxes. CNA status and copy number transition points for these genes are displayed.


Figure S4.

Compound CNA query. (A) Four gene loci associated with glioblastoma (EGFR, PTEN, ASPM and CDKN2A) were inserted into “Match (Multiple) Regions & Types” field. 303 out of 42421 arrays were returned. (B) Classification information of these 303 arrays were displayed and can be selected for the following analysis. (C) Statistical and plot parameters can be customized. Associated data was processed by online tools, and returned results included: (D) Chromosomal ideogram and (E) histogram, show frequency of copy number aberrations; (F) Matrix plot reveals the aberration pattern of selected arrays; (G) Array classification tree generated by hierarchical Ward clustering, arrays with similar frequency of CNA are part of the tree branch. (H) Heatmap of CNA frequencies clustered by clinical group.


Figure S5.

Heatmap of frequency profiles for 59 cancer types. Heatmap visualization of frequency profiles for all ICD-O entities containing more than 50 arrays in our core dataset. Region specific gain/loss frequencies were mapped to 1MB intervals. The intensity of colors (green: gains; losses: red) corresponds to the relative frequency of CNAs for each interval.


Table S1.

Entities extracted from NCBI GEO and EBI ArrayExpress.


Table S2.

Cancer entities grouped by ICD-O code.


Table S3.

Platform type distribution in arrayMap.


Table S4.

Probe remapping rate for platforms.


Table S5.

Processing method and threshold for calling genomic gains and losses.


Table S6.

Normal tissue reference arrays for Affymetrix platforms.


Acknowledgments Top

We want to thank Christian von Mering, Homayoun Bagheri, Henrik Bengtsson and Nuria Lopez-Bigas for helpful discussions.

Author Contributions Top

Conceived and designed the experiments: HC NK MB. Performed the experiments: HC MB. Analyzed the data: HC NK MB. Contributed reagents/materials/analysis tools: HC NK MB. Wrote the paper: HC MB.

References Top

  1. Stallings RL (2007) Are chromosomal imbalances important in cancer? Trends in genetics : TIG 23: 278–283. doi: 10.1016/j.tig.2007.03.009FIND THIS ARTICLE ONLINE
  2. Myllykangas S, Himberg J, Böhling T, Nagy B, Hollmén J, et al. (2006) DNA copy number amplification profiling of human neoplasms. Oncogene 25: 7324–7332. FIND THIS ARTICLE ONLINE
  3. Weir BA, Woo MS, Getz G, Perner S, Ding L, et al. (2007) Characterizing the cancer genome in lung adenocarcinoma. Nature 450: 893–898. FIND THIS ARTICLE ONLINE
  4. Wiedemeyer R, Brennan C, Heffernan TP, Xiao Y, Mahoney J, et al. (2008) Feedback circuit among INK4 tumor suppressors constrains human glioblastoma development. Cancer cell 13: 355–364.FIND THIS ARTICLE ONLINE
  5. Mullighan CG, Goorha S, Radtke I, Miller CB, Coustan-Smith E, et al. (2007) Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature 446: 758–764. FIND THIS ARTICLE ONLINE
  6. Cancer Genome Atlas Research Network (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455: 1061–1068. FIND THIS ARTICLE ONLINE
  7. Kan Z, Jaiswal BS, Stinson J, Janakiraman V, Bhatt D, et al. (2010) Diverse somatic mutation patterns and pathway alterations in human cancers. Nature 466: 869–873. FIND THIS ARTICLE ONLINE
  8. Bergamaschi A, Kim YH, Wang P, Sørlie T, Hernandez-Boussard T, et al. (2006) Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and geneexpression subtypes of breast cancer. Genes, chromosomes & cancer 45: 1033–1040. FIND THIS ARTICLE ONLINE
  9. Hu X, Stern HM, Ge L, O’Brien C, Haydu L, et al. (2009) Genetic alterations and oncogenic pathways associated with breast cancer subtypes. Molecular cancer research : MCR 7: 511–522. FIND THIS ARTICLE ONLINE
  10. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, et al. (1992) Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science (New York, NY) 258: 818–821. FIND THIS ARTICLE ONLINE
  11. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, et al. (1999) Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nature genetics 23: 41–46. FIND THIS ARTICLE ONLINE
  12. Bignell GR, Huang J, Greshock J, Watt S, Butler A, et al. (2004) High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome research 14: 287–295. FIND THIS ARTICLE ONLINE
  13. Baudis M (2007) Genomic imbalances in 5918 malignant epithelial tumors: an explorative metaanalysis of chromosomal CGH data. BMC cancer 7: 226. FIND THIS ARTICLE ONLINE
  14. Alloza E, Al-Shahrour F, Cigudosa JC, Dopazo J (2011) A large scale survey reveals that chromosomal copy-number alterations significantly affect gene modules involved in cancer initiation and progression. BMC medical genomics 4: 37. FIND THIS ARTICLE ONLINE
  15. Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, et al. (2011) NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic acids research 39: D1005–10. FIND THIS ARTICLE ONLINE
  16. Parkinson H, Sarkans U, Kolesnikov N, Abeygunawardena N, Burdett T, et al. (2010) ArrayExpress update–an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic acids research 39: D1002–D1004. FIND THIS ARTICLE ONLINE
  17. Scheinin I, Myllykangas S, Borze I, Böhling T, Knuutila S, et al. (2008) CanGEM: mining gene copy number changes in cancer. Nucleic acids research 36: D830–5. FIND THIS ARTICLE ONLINE
  18. Cao Q, Zhou M, Wang X, Meyer CA, Zhang Y, et al. (2011) CaSNP: a database for interrogating copy number alterations of cancer genome from SNP array data. Nucleic acids research 39: D968–74.FIND THIS ARTICLE ONLINE
  19. Baudis M, Cleary ML (2001) Progenetix.net: an online repository for molecular cytogenetic aberration data. Bioinformatics (Oxford, England) 17: 1228–1229. FIND THIS ARTICLE ONLINE
  20. Baumbusch LO, Aarøe J, Johansen FE, Hicks J, Sun H, et al. (2008) Comparison of the Agilent, ROMA/NimbleGen and Illumina platforms for classification of copy number alterations in human breast tumors. BMC genomics 9: 379. FIND THIS ARTICLE ONLINE
  21. Curtis C, Lynch AG, Dunning MJ, Spiteri I, Marioni JC, et al. (2009) The pitfalls of platform comparison: DNA copy number array technologies assessed. BMC genomics 10: 588. FIND THIS ARTICLE ONLINE
  22. Greshock J, Feng B, Nogueira C, Ivanova E, Perna I, et al. (2007) A comparison of DNA copy number profiling platforms. Cancer research 67: 10173–10180. FIND THIS ARTICLE ONLINE
  23. Bengtsson H, Ray A, Spellman P, Speed TP (2009) A single-sample method for normalizing and combining full-resolution copy numbers from multiple platforms, labs and analysis methods. Bioinformatics (Oxford, England) 25: 861–867. FIND THIS ARTICLE ONLINE
  24. Heinrichs S, Look T (2007) Identification of structural aberrations in cancer by SNP array analysis. Genome biology. pp. 1–5.
  25. Carter NP (2007) Methods and strategies for analyzing copy number variation using DNA microarrays. Nature genetics 39: S16–S21. FIND THIS ARTICLE ONLINE
  26. Lubin M, Lubin A (2009) Selective killing of tumors deficient in methylthioadenosine phosphorylase: a novel strategy. PloS one 4: e5735. FIND THIS ARTICLE ONLINE
  27. Krasinskas AM, Bartlett DL, Cieply K, Dacic S (2010) CDKN2A and MTAP deletions in peritoneal mesotheliomas are correlated with loss of p16 protein expression and poor survival. Modern pathology : an official journal of the United States and Canadian Academy of Pathology, Inc 23: 531–538. FIND THIS ARTICLE ONLINE
  28. Smith JS, Tachibana I, Passe SM, Huntley BK, Borell TJ, et al. (2001) PTEN mutation, EGFR amplification, and outcome in patients with anaplastic astrocytoma and glioblastoma multiforme. Journal of the National Cancer Institute 93: 1246–1256. FIND THIS ARTICLE ONLINE
  29. Li J (1997) PTEN, a Putative Protein Tyrosine Phosphatase Gene Mutated in Human Brain, Breast, and Prostate Cancer. Science (New York, NY) 275: 1943–1947. FIND THIS ARTICLE ONLINE
  30. Horvath S, Zhang B, Carlson M, Lu KV, Zhu S, et al. (2006) Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target. Proceedings of the National Academy of Sciences of the United States of America 103: 17402–17407. FIND THIS ARTICLE ONLINE
  31. Zhang W, Zhu J, Bai J, Jiang H, Liu F, et al. (2010) Comparison of the inhibitory effects of three transcriptional variants of CDKN2A in human lung cancer cell line A549. Journal of experimental & clinical cancer research : CR 29: 74. FIND THIS ARTICLE ONLINE
  32. van der Rhee JI, Krijnen P, Gruis NA, de Snoo FA, Vasen HFA, et al. (2011) Clinical and histologic characteristics of malignant melanoma in families with a germline mutation in CDKN2A. Journal of the American Academy of Dermatology.
  33. Bourdeaut F, Isidor B, Ferrand S, Thomas C, Moreau A, et al. (2011) Homozygous PTEN deletion in neuroblastoma arising in a child with Cowden syndrome. American journal of medical genetics Part A 155: 1763–1766. FIND THIS ARTICLE ONLINE
  34. Jin K, Kong X, Shah T, Penet MF, Wildes F, et al. (2011) Breast Cancer Special Feature: The HOXB7 protein renders breast cancer cells resistant to tamoxifen through activation of the EGFR pathway. Proceedings of the National Academy of Sciences of the United States of America.
  35. Wiltshire RN, Rasheed BK, Friedman HS, Friedman AH, Bigner SH (2000) Comparative genetic patterns of glioblastoma multiforme: potential diagnostic tool for tumor classification. Neurooncology 2: 164–173. FIND THIS ARTICLE ONLINE
  36. Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, et al. (2011) The UCSC Genome Browser database: update 2011. Nucleic acids research 39: D876–82. FIND THIS ARTICLE ONLINE
  37. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, et al. (2011) COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic acids research 39: D945–50.FIND THIS ARTICLE ONLINE
  38. Gearhart J, Pashos EE, Prasad MK (2007) Pluripotency redux–advances in stem-cell research. The New England journal of medicine 357: 1469–1472. FIND THIS ARTICLE ONLINE
  39. Dalla-Favera R, Bregni M, Erikson J, Patterson D, Gallo RC, et al. (1982) Human c-myc onc gene is located on the region of chromosome 8 that is translocated in Burkitt lymphoma cells. Proceedings of the National Academy of Sciences of the United States of America Vol. 79: 7824–7827. FIND THIS ARTICLE ONLINE
  40. Climent J, Dimitrow P, Fridlyand J, Palacios J, Siebert R, et al. (2007) Deletion of chromosome 11q predicts response to anthracycline-based chemotherapy in early breast cancer. Cancer research 67: 818–826. FIND THIS ARTICLE ONLINE
  41. Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, et al. (2006) Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer cell 10: 529–541. FIND THIS ARTICLE ONLINE
  42. Stevens KN, Fredericksen Z, Vachon CM, Wang X, Margolin S, et al. (2012) 19p13.1 is a triple negative-specific breast cancer susceptibility locus. Cancer research.
  43. Park NI, Rogan PK, Tarnowski HE, Knoll JHM (2012) Structural and genic characterization of stable genomic regions in breast cancer: Relevance to chemotherapy. Molecular oncology.
  44. Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, et al. (2009) DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. The American Journal of Human Genetics 84: 524–533. FIND THIS ARTICLE ONLINE
  45. Bengtsson H, Simpson K, Bullard J, Hansen K (2008) aroma.affymetrix: A genetic framework in R for analyzing small to very large Affymetrix data sets in bounded memory. Tech Report #745 Department of Statistics, University of California, Berkeley.
  46. Bengtsson H, Wirapati P, Speed TP (2009) A single-array preprocessing method for estimating fullresolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6. Bioinformatics (Oxford, England) 25: 2149–2156. FIND THIS ARTICLE ONLINE
  47. Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ, et al. (2010) Quality control and quality assurance in genotypic data for genome-wide association studies. Genetic Epidemiology 34: 591–602.FIND THIS ARTICLE ONLINE
  48. F C, AL A, SA K, TP S, VL SM (2005) NUSE and RLE: Quality assessment of oligonucleotide microarray data to quantify systemic variation. 2005 Meeting of the Federation of Clinical Immunology Societies Boston, MA.



Read Full Post »

How Mobile Elements in “Junk” DNA Promote Cancer – Part 1: Transposon-mediated Tumorigenesis

Author, Writer and Curator: Stephen J. Williams, Ph.D.

How Mobile Elements in “Junk” DNA Promote Cancer – Part 1 Transposon-mediated Tumorigenesis

Word Cloud by Daniel Menzin


Landscape of Somatic Retrotransposition in Human Cancers. Science (2012); Vol. 337:967-971. (1)

Sequencing of the human genome via massive programs such as the Cancer Genome Atlas Program (CGAP) and the Encyclopedia of DNA Elements (ENCODE) consortium in conjunction with considerable bioinformatics efforts led by the National Center for Biotechnology Information (NCBI) have unlocked a myriad of yet unclassified genes (for good review see (2).  The project encompasses 32 institutions worldwide which, so far, have generated 1640 data sets, initially depending on microarray platforms but now moving to the more cost effective new sequencing technology.  Initially the ENCODE project focused on three types of cells: an immature white blood cell line GM12878, leukemic line K562, and an approved human embryonic cell line H1-hESC.  The analysis was rapidly expanded to another 140 cell types.  DNA sequencing had revealed 20,687 known coding regions with hints of 50 more coding regions.  Another 11,224 DNA stretches were classified as pseudogenes.  The ENCODE project reveals that many genes encode for an RNA, not protein product, so called regulatory RNAs.

However some of the most recent and interesting results focus on the noncoding regions of the human genome, previously discarded as uninteresting or “junk” DNA .  Only 2% of the human genome contains coding regions while 98% of this noncoding part of the genome is actually found to be highly active “with about 4 million constantly communicating switches” (3).  Some of these “switches” in the noncoding portion contain small, repetitive elements which are mobile throughout the genome, and can control gene expression and/or predispose to disease such as cancer.  These mobile elements, found in almost all organisms, are classified as transposable elements (TE), inserting themselves into far-reaching regions of the genome.  Retro-transposons are capable of generating new insertions through RNA intermediates.  These transposable elements are normally kept immobile by epigenetic mechanisms(4-6) however some TEs can escape epigenetic repression and insert in areas of the genome, a process described as insertional mutagenesis as the process can lead to gene alterations seen in disease(7).  In addition, this insertional mutagenesis can lead to the transformation of cells and, as described in Post 2, act as a model system to determine drivers of oncogenesis. This insertional mutagenesis is a different mechanism of genetic alteration and rearrangement seen in cancer like recombination and fusion of gene fragments as seen with the Philadelphia chromosome and BCR/ABL fusion protein (8).  The mechanism of transposition and putative effects leading to mutagenesis are described in the following figure:


Figure.  Insertional mutagenesis based on transposon-mediated mechanism.  A) Basic structure of  transposon contains gene/sequence flanked by two inverted repeats (IR) and/or direct repeats (DR).  An enzyme, the transposase (red hexagon) binds and cuts at the IR/DR and transposon is pasted at another site in DNA, containing an insertion site.  B)   Multiple transpositions may results in oncogenic events by inserting in promoters leading to altered expression of genes driving oncogenesis or inserting within coding regions and inactivating tumor suppressors or activating oncogenes.  Deep sequencing of the resultant tumor genomes ( based on nested PCR from IR/DRs) may reveal common insertion sites (CIS) and oncogenic mutations could be identified.

In a bioinformatics study Eunjung Lee et al.(1), in collaboration with the Cancer Genome Atlas Research Network, the authors had analyzed 43 high-coverage whole-genome sequencing datasets from five cancer types to determine transposable element insertion sites.  Using a novel computational method, the authors had identified 194 high-confidence somatic TE insertion sites present in cancers of epithelial origin such as colorectal, prostate and ovarian, but not in brain or blood cancers.  Sixty four of the 194 detected somatic TE insertions were located within 62 annotated genes. Genes with TE insertion in colon cancers have commonly high mutation rates and enriched genes were associated with cell adhesion functions (CDH12, ROBO2,NRXN3, FPR2, COL1A1, NEGR1, NTM and CTNNA2) or tumor suppressor functions (NELL1m ROBO2, DBC1, and PARK2).  None of the somatic events were located within coding regions, with the TE sequences being detected in untranslated regions (UTR) or intronic regions.  Previous studies had shown insertion in these regions (UTR or intronic) can disrupts gene expression (9). Interestingly, most of the genes with insertion sites were down-regulated, suggested by a recent paper showing that local changes in methylation status of transposable elements can drive retro-transposition (10,11).  Indeed, the authors found that somatic insertions are biased toward the hypomethylated regions in cancer cell DNA.  The authors also confirmed that the insertion sites were unique to cancer and were somatic insertions, not germline (germline: arising during embryonic development) in origin by analyzing 44 normal genomes (41 normal blood samples from cancer patients and three healthy individuals).

The authors conclude:

“that some TE insertions provide a selective advantage during tumorigenesis,

rather than being merely passenger events that precede clonal expansion(1).”

The authors also suggest that more bioinformatics studies, which utilize the expansive genomic and epigenetic databases, could determine functional consequences of such transposable elements in cancerThe following Post will describe how use of transposon-mediated insertional mutagenesis is leading to discoveries of the drivers (main genetic events) leading to oncogenesis.

1.            Lee, E., Iskow, R., Yang, L., Gokcumen, O., Haseley, P., Luquette, L. J., 3rd, Lohr, J. G., Harris, C. C., Ding, L., Wilson, R. K., Wheeler, D. A., Gibbs, R. A., Kucherlapati, R., Lee, C., Kharchenko, P. V., and Park, P. J. (2012) Science 337, 967-971

2.            Pennisi, E. (2012) Science 337, 1159, 1161

3.            Park, A. (2012) Don’t Trash These Genes. “Junk DNA may lead to valuable cures. in Time, Time, Inc., New York, N.Y.

4.            Maksakova, I. A., Mager, D. L., and Reiss, D. (2008) Cellular and molecular life sciences : CMLS 65, 3329-3347

5.            Slotkin, R. K., and Martienssen, R. (2007) Nature reviews. Genetics 8, 272-285

6.            Yang, N., and Kazazian, H. H., Jr. (2006) Nature structural & molecular biology 13, 763-771

7.            Hancks, D. C., and Kazazian, H. H., Jr. (2012) Current opinion in genetics & development 22, 191-203

8.            Sattler, M., and Griffin, J. D. (2001) International journal of hematology 73, 278-291

9.            Han, J. S., Szak, S. T., and Boeke, J. D. (2004) Nature 429, 268-274

10.          Reichmann, J., Crichton, J. H., Madej, M. J., Taggart, M., Gautier, P., Garcia-Perez, J. L., Meehan, R. R., and Adams, I. R. (2012) PLoS computational biology 8, e1002486

11.          Byun, H. M., Heo, K., Mitchell, K. J., and Yang, A. S. (2012) Journal of biomedical science 19, 13

Other research paper on ENCODE and Cancer were published on this Scientific Web site as follows:

Expanding the Genetic Alphabet and linking the genome to the metabolome

Junk DNA codes for valuable miRNAs: non-coding DNA controls Diabetes

ENCODE Findings as Consortium

Reveals from ENCODE project will invite high synergistic collaborations to discover specific targets

ENCODE: the key to unlocking the secrets of complex genetic diseases

Impact of evolutionary selection on functional regions: The imprint of evolutionary selection on ENCODE regulatory elements is manifested between species and within human populations

Metabolite Identification Combining Genetic and Metabolic Information: Genetic association links unknown metabolites to functionally related genes

Advances in Separations Technology for the “OMICs” and Clarification of Therapeutic Targets

Commentary on Dr. Baker’s post “Junk DNA codes for valuable miRNAs: non-coding DNA controls Diabetes”

Cancer Genomics – Leading the Way by Cancer Genomics Program at UC Santa Cruz

Read Full Post »

Cancer Genomics – Leading the Way by Cancer Genomics Program at UC Santa Cruz

Reporter: Aviva Lev-Ari, PhD, RN

UPDATED ON 6/17/2013

UCSC Designing Social Network-type Model for Analyzing Cancer Data

June 17, 2013

NEW YORK (GenomeWeb News) – Seeking to make the masses of cancer sequence data that is being generated more useful for researchers, investigators at University of California, Santa Cruz, plan to use a $3.5 million grant from the National Cancer Institute to create a new platform for organizing and accessing these data.

The UCSC group plans to create a method for making the raw sequence information in repositories like the university’s Cancer Genomics Hub more useful for investigators seeking to make clinical predictions about how cancer mutations respond to drugs, for example.

The aim of the project will be to develop a new database called the Biomedical Evidence Graph, or BMEG, which will use a graph database structure, like Facebook does, to enable swift access to complex and interconnected datasets.

Principal investigator Joshua Stuart, a UCSC associate professor of engineering, likened the difficulty for many investigators of using raw sequence data to average computer users trying to work directly with binary code.

“Your web browser doesn’t understand zeros and ones. There are layers and layers of software programs between that and what you see on a web page. We need to do the same thing for DNA sequences to reach the higher levels of interpretation needed for scientific discovery,” Stuart said in a statement.

Stuart said that a platform similar to what social networks like Facebook use offer a “natural way” to represent data from tumor samples based upon the connections between their molecular profiles.

CGHub, which launched last year to house data from The Cancer Genome Atlas consortium and similar projects, holds thousands of genome sequences from individual patients and access is highly controlled and limited to approved projects.

BMEG, however, will not require such security because it will host higher-level data from analyses of the raw genome sequencing. This will enable a broader group of investigators to use and analyze these datasets without having to download massive files to their computers.

“TCGA researchers have built a lot of great tools for data analysis, and we need to get those installed in the BMEG so the rest of the world can engage in that higher level analysis,” Stuard said. “The idea is to build a shared knowledge base and create a playground where lots of researchers can interact, test their algorithms, and compare results.”

The BMEG will be located with the CGHub servers at the San Diego Supercomputer Center, and investigators will be able to run their analyses as apps on the BMEG, UCSC said.



Five3, maker of cancer genomics software, takes off from UCSC labs

October 29, 2012 | By 

A group from the University of California, Santa Cruz (UCSC), has embarked on a new project to commercialize cancer genomics software through a new startup company called Five3 Genomics. The company has attracted a few of the biggest names in genomics and biotech to serve as advisers.

Recent software applications have enabled scientists to analyze cancer genomic data to track molecular changes in cells, spotting some of the triggers that cause tumors to grow. Led by CEO and co-founder Steve Benz, Five3 Genomics plans to sell its cancer genomics software to healthcare companies and pharmaceutical firms. Drugmakers could use the company’s software to discover new targets for cancer therapies, while hospitals could use the technology to put patients on existing drugs that home in on the molecular triggers of their cancer.

Benz and his fellow co-founders have a crack group of bioinformatics and biotech experts to help guide their startup. They’ve called on their UCSC mentors, David Haussler and Joshua Stuart. Haussler’s lab has participated in some of the most pioneering efforts in genomics over the past couple of decades, including the Human Genome Project that raced to decode an entire human genome. Also, Dr. Patrick Soon-Shiong, who has made billions of dollars in biotech, is serving as a scientific adviser.

“We’re working with academic collaborators to build out the platform and starting conversations with pharmaceutical companies and insurance companies,” Benz, who recently wrapped up his doctorate at UCSC, told the Santa Cruz Sentinel newspaper. “It’s a great opportunity to be able to take this technology and commercialize it so that it can be used to help patients.”



UCSC grad students launch cancer genomics company in Santa Cruz

By Sentinel Staff Report

Santa Cruz Sentinel

Posted:   10/24/2012 04:11:37 PM PDT

SANTA CRUZ — The co-founders of Five3 Genomics, a new biotech company based in Santa Cruz, are former graduate students in the Baskin School of Engineering at UC Santa Cruz, where they helped develop innovative cancer genomics software.

Their company, which has signed a license agreement with UCSC, offers software and services for cancer researchers, pharmaceutical companies, and health-care organizations. Its goal is to provide the data processing and analysis required for personalized cancer therapy, in which treatments are matched to the specific genetic aberrations found in an individual patient’s cancer cells.

“We’re working with academic collaborators to build out the platform and starting conversations with pharmaceutical companies and insurance companies,” said CEO Steve Benz, who completed his doctorate in bioinformatics this year. “It’s a great opportunity to be able to take this technology and commercialize it so that it can be used to help patients.”

In addition to Benz, the co-founders of Five3 Genomics include Chief Technical Officer Zachary Sanborn and Chief Scientific Officer Charles Vaske. All three of them worked as graduate students with UC Santa Cruz bioinformatics experts David Haussler and Joshua Stuart, who are doing pioneering work in the field of cancer genomics. Haussler, a professor of biomolecular engineering and Howard Hughes Medical Institute investigator, said that Benz, Sanborn, and Vaske were “brilliant gradstudents.”

“Working at UCSC they were exposed to the cutting edge in computational genomics,” Haussler said. “They played a key role in developing our cancer genomics program.”

Vaske, who earned his doctorate in 2009, and Benz were lead developers of a software program from Stuart’s lab called Paradigm. Stuart, a professor of biomolecular engineering, has been a close collaborator with Haussler on cancer genomics projects, including The Cancer Genome Atlas funded by the National Institutes of Health and two cancer research “Dream Teams” funded by Stand Up To Cancer and other organizations.

Paradigm, one of the core technologies for Five3 Genomics, is used to understand which molecular pathways are affected by the genetic changes in a patient’s cancer cells. This information can be used in a clinical setting to guide therapeutic decisions and by pharmaceutical companies to identify new targets for drug development.

“On the pharmaceutical side, we can provide indications for new uses for drugs that are already out there, as well as identify targets for new drugs,” Benz said.

Sanborn, who will finish his doctorate this year, worked in Haussler’s lab on a DNA sequence analysis program called BamBam, which is used to identify the genetic changes in cancer cells. Sanborn and Benz also contributed to the development of the UCSC Cancer Genome Browser in Haussler’s lab.

The scientific advisers for Five3 Genomics include Haussler and Stuart, as well as Dr. Patrick Soon-Shiong, a surgeon, medical researcher, and biotechnology entrepreneur, and Dr. Margaret Tempero, deputy director and director of research programs at the UCSC Helen Diller Family Comprehensive Cancer Center.

“It’s particularly gratifying to see this UCSC research transition to a commercial product, so these cutting-edge techniques can begin to benefit the public as quickly as possible,” said Bruce Margon, vice chancellor for research at UCSC.



Biotech billionaire’s supercomputer cuts cancer analysis to 47 seconds

October 4, 2012 | By 

Dr. Patrick Soon-Shiong, a surgeon and biotech mogul, has spotlighted a supercomputer-based system and network to rapidly transfer and analyze cancer genetic data in mere seconds as opposed to the weeks or months of previous approaches. The supercomputer crunches genetic data from a tumor with results on abnormalities in 47 seconds, and the high-speed fiber-optic network Soon-Shiong has championed transfers samples in shy of 18 seconds, according to an announcement Wednesday.

Soon-Shiong’s L.A.-based company NantHealth has joined forces with Verizon, Intel, Hewlett-Packard, Blue Shield of California and other players to advance a national system to enable rapid sharing of genomic information among cancer doctors, aiding physicians in making the right call on treatments for patients based on the characteristics of their tumors. It’s a big deal because lack of such information contributes to misdiagnoses.

Via NantHealth and other vehicles, Soon-Shiong has worked on integrating a variety of digital technologies to revolutionize scientific research and medicine. As Reuters reports, he’s poured more than $400 million from his estimated fortune of more than $7 billion into building the fiber-optic network. His nonprofit is working on connecting sequencing centers, medical research hubs and hospitals to the network to create an infrastructure for these groups to share data from big science endeavors such as The Cancer Genome Atlas.

Soon-Shiong built most of his fortune with the sales of Abraxis BioScience to Celgene ($CELG) in 2010 for $2.9 billion and APP Pharmaceuticals to Germany’s Fresenius two years earlier for billions. (Abraxis developed Celgene’s anti-cancer drug Abraxane.) He’s now reportedly the richest man in Los Angeles, where he owns a piece of the NBA’s Los Angeles Lakers and has been connected with efforts to bring an NFL franchise back to the city.



Bringing genomic medicine into clinical practice by placing supercomputers in the hands of physicians at point of care

WASHINGTON—-Dr. Patrick Soon-Shiong, Chairman of NantHealth and the Chan Soon-Shiong Institute for Advanced Health announced a revolutionary advance in cancer treatment that will reduce the necessary time for analysis from 8 weeks to an unprecedented 47 seconds per patient. For the first time, oncologists can compare virtually every known treatment option on the basis of genetics, risk, and cost – before treatment begins, not after.

Alongside Senator Bill Frist, MD, of the Bipartisan Policy Center and J. Michael McGinnis, MD of the Institute of Medicine and Doctors Helping Doctors, Dr. Soon-Shiong reported on the successful real-time analysis of the largest collection of tumor genomes in the United States, of 6,017 cancer genomes from 3,022 patients with 19 different cancer types, in the record time of 69 hours. Genomic analysis has taken an average of 8 to 10 weeks to complete. That delay leads not just to less efficient, more costly care, but sometimes to the wrong course of treatment altogether – and, thus, higher mortality. “Incorrect care that leads to loss of life is unacceptable,” said Dr. Soon-Shiong, “and from today onward, it will no longer be necessary.”

Oncologists currently prescribe a course of cancer treatment based on the anatomical location of the cancer. Yet a patient with breast cancer could benefit from the positive results discovered from a patient with lung cancer, if the underlying molecular pathways involving both cancers were the same. The inability to utilize genomic sequencing to guide treatment has been due to the inability to convert a patient’s DNA into actionable information in actionable time.

But by collaborating with Blue Shield of California, the Chan Soon-Shiong Institute for Advanced Health, the National LambdaRail, Doctors Helping Doctors, Verizon, Bank of America, AT&T, Intel, and Hewlett-Packard, NantHealth has built a supercomputer-based high-speed fiber network that will not only provide thousands of oncology practices with life-saving information, but do so in exponentially faster time. “Doctors will finally be able to provide higher-quality treatment in a dramatically more efficient, effective, and affordable manner,” says Dr. Soon-Shiong.

“It currently takes approximately two months and tens of thousands of dollars to perform the sequencing and analysis of a single cancer patient’s genome. We can’t reduce the cost of care and improve outcomes in cancer if we don’t have the capability to know the right treatment for the right patient before treatment begins. We needed a national supercomputing infrastructure that brings genomic medicine into clinical practice. By placing supercomputers in the hands of physicians, that need is now a reality,” said Dr. Soon-Shiong.

Accuracy will also be radically improved. Among NantHealth’s partner oncologists utilizing its fact-based software platform (eviti – http://www.eviti.com) the number of cases where doctors have made incorrect recommendations has dropped from 32% to virtually zero“With this patient-centered, fact-based approach to collecting and analyzing data, millions more patients will have a better chance of beating cancer,” Dr. Soon-Shiong emphasized. Over the past 12 months over 2,000 oncology practices representing 8,000 oncologists and nurses have successfully installed and utilized this fact-based (eviti) software platform, positively impacting thousands of cancer patients lives.


In July 2012, NantWorks’ scientific team (Five3 Genomics – http://www.Five3Genomics.com) collected 6,017 tumor and germline exomes, representing 3,022 cancer patients with 19 unique cancer types. The sample collection included: 999 breast cancer; 1.156 kidney and bladder cancer; 985 gastrointestinal cancer; 744 brain cancer; 745 lung cancer; 670 ovarian, uterine and cervical caner; 436 head and neck cancer; 177 prostate cancer; 70 melanoma cancer; and 35 blood tumor samples.

This massive amount of data totaled 96,512 gigabytes and was successfully transferred and processed via our supercomputing, high-speed fiber netowrk in 69 hours. This overall transfer speed represents a stream of one sample every 17.4 seconds, and the supercomputer analysis for genetic and protein alterations between the tumor and normal sample completed every 47 seconds per patient.

Given the nation’s estimated cancer rate of 1.8 million new cases in 2012, this infrastructure now brings the capability of analyzing 5,000 patients per day.

He noted that medicine has continued to make dramatic advances, but the delivery of medicine has lagged far behind, stuck in a world where information is trapped, patterns get missed, and patients suffer. Powered by advanced supercomputing technology and wireless mobile health, the network has become one of country’s fastest genomic platforms with connectivity to over 8000 practicing oncologists and nurses. “This revolution in healthcare is long overdue – converging 21st century medical science with 21st century technology,” Dr. Soon-Shiong concluded.

Through NantHealth’s genomic analysis network, doctors can finally make cancer treatment more efficient, more effective, and more affordable for more patients. And with public and private partners equally as committed to reshaping the way doctors deliver healthcare and treat cancer, there are no limits to what this health information breakthrough might lead to for all cancer patients.

A network of major cancer centers including those at City of Hope, John Wayne Cancer Institute, and Methodist Hospital in Houston, have contributed to this collection of over 6,000 genomes, which also included the entire collection of exome samples from The Cancer Genome Atlas.

About NantWorks

The core mission of NantWorks, LLC, is to converge a wide range of technologies to accelerate scientific discoveries, enhance research and improve healthcare treatment and outcomes. Founded and led by Dr. Patrick Soon-Shiong, NantWorks is building an integrated fact-based, genomically-informed, personalized approach to the delivery of care and the development of next generation diagnostics and therapeutics. For more information, see http://www.nantworks.com.


NantWorks, LLC
Jen Hodson



Research cache in works

by Emily Gersema – Jan. 28, 2012 01:29 PM

The Republic | azcentral.com

Supercomputing supports genetic, cancer research in Arizona: compare patient cases to tailor care

A massive building near Phoenix Sky Harbor International Airport is now home to a supercomputer that one day is expected to store clinical-research reports, medical records and the decoded genetic makeup of millions of patients and their cancers.

Having this vault of medical information is a dream for doctors, specialists and researchers who are trying to tailor medical care to the individual needs of their cancer patients. Despite huge advances in research and medicine, doctors have no one-stop shop for up-to-date clinical-trial results, other medical cases and genetic maps of their patients.

With access to this massive library, cancer doctors potentially could specify with precision the dosages of medicines, chemotherapy and radiation therapy for their patients by comparing those cases to those of other patients with similar genetic makeups and similar cancers.

In effect, this supercomputer could be a gateway to personalized medical care, as its creator, billionaire scientist Patrick Soon-Shiong, envisions it. His staff at CSS Institute for Advanced Health in California, which owns the project, and supporters of personalized medicine said the vault also could help reduce doctor error in the diagnosis and treatment of patients.

Better treatments and more accurate diagnoses could help lower the cost of medical care and enable patients to get treatment at home instead of at the hospital, they said.

The presence of the supercomputer could put Phoenix on the cutting edge of medical research and treatment. The path to these potential medical breakthroughs, however, is fraught with privacy concerns. Patient advocates fear the project could open a pathway to exploitation if patient information isn’t confidential. They want assurances that the institute would require patient consent to obtain records, the records would be kept private and the project would be under close regulatory oversight.

The engine: A supercomputer

While the word “supercomputer” evokes an image of a giant computer, the machine located in the Phoenix storage site resembles a large herd of smaller computers that have been linked to one another.

“It used to be a one big monolithic thing,” said Anoj Willy, of the CSS Institute. “But now what we’re able to do is take lots of general-purpose computers and band them to create a big, superprocessing engine.”

The CSS Institute project, which involves equipment and products from Hewlett-Packard and Intel Corp., is in its earliest stages, Willy said. The institute plans to focus data collection on genetic research and cancer.

The endeavor would create at least 50 jobs with annual salaries of about $75,000. Soon-Shiong also would invest at least $200 million in development, construction, machinery and equipment to build the electronic-data-storage facility.

The institute is in the process of signing agreements with various institutions that have been sequencing genomes — the maps of DNA strands that make up living things.

Bob Peirce, senior vice president of Soon-Shiong’s Nant Holdings in Los Angeles, said that while scientists have made strides in human genomic sequencing, the maps of these sequences are scattered at different sites around the world, depending on which institution decoded them.

Researchers have not yet decoded the whole human genome, Peirce said. They have each decoded snippets.

The lack of a complete map and a one-stop shop for the genomic information for doctors and researchers impedes their progress in personalized medical treatment, he said.

This means genomic sequences currently aren’t “relevant to the average patient or the average doctor,” Peirce said.

Creating a complete map of the human genome would require a massive, computerized data center, like the one being built by Soon-Shiong in Phoenix — to decode what scientists estimate are 3 billion pairs of DNA strands.

In addition, Soon-Shiong wants the supercomputer and its data centers, including one planned for Scottsdale, to aid in mapping the genetic makeup of individual patients’ cancerous cells.

“We need to be in a position where we can analyze the genome of the cancer and determine the genome of the host patient (to treat them),” Peirce said.

Peirce offered assurances that the data would be highly secured to guard against hackers. The data could be accessed by people who are deemed “authorized users,” he said, which could include the patients themselves who are trying to monitor their conditions and care. The institute has been working with a “chief technical officer,” who worked at the Pentagon, on securing the data centers and information they contain, Peirce said. He declined to name the officer.

The concern: Privacy

Edward Abrahams, president of the Personalized Medicine Coalition, a non-profit group in Washington, D.C., said researchers are on the cusp of creating medical care tailored to each person’s needs, and they can reach that with a supercomputer.

But they are faced with several challenges. Chief among them is patient privacy, he said.

The federal Health Insurance Portability and Accountability Act guards patient privacy, but its reach is limited. Patient information is kept private within the realm of health care — at the doctor’s office, the hospital and with the patient’s insurance company, said Bob Gellman, a privacy expert in Washington, D.C.

“An institution like this (CSS Institute) is not covered by health-privacy laws,” Gellman said. “It’s not a health-care provider. It’s not an insurer.”

Gellman said a worst-case scenario would involve a patient sharing genetic information with a company or organization, only to have it misused or exploited by another party.

“The information when it sat in the health-care system — when it sat in your doctor’s office — had all kinds of protections,” Gellman said. “But if you give the information with your consent to somebody else, then someone could just go to that third party and say, ‘Give me all your information.’ “

In that scenario, the records and data are out of the patient’s control and are unprotected.

Individuals trying to solve the health problems of their autistic children, for example, may want to participate.

“That may be a perfectly rational decision.” Gellman said. “But for people who don’t know or aren’t aware of that (institution’s) motivation … you might agree to give this information, and 20 years later, you’re in litigation with somebody or you’re applying for a job and it comes up.”


Read more: http://www.azcentral.com/arizonarepublic/business/articles/2012/01/26/20120126medical-research-cache-in-works.html?nclick_check=1#ixzz2AjfTgdsf


Cancer Research targets human genome breathrough with supercomputer

Platform Computing LSF integrated with genetic sequencing technology

By Antony Savvas | Published: 16:04 GMT, 09 December 11 | Computerworld UK

A new supercomputing workload management system is aiding scientific work by Cancer Research and the Cambridge Research Institute’s human genome project.

Cancer Research UK is using Platform Computing’s LSF software to improve cluster efficiency and reduce IT costs on the CRI genome research.

By integrating Platform LSF with a new advanced genetic sequencing platform, the institute has already gained greater insight into genetic cancer mutations that will lead to scientific breakthroughs in the areas of cancer diagnosis, treatment and prevention, said Cancer Research.

“Platform LSF gives us the means to produce and manage a wealth of gene sequencing data that we could only have dreamed about previously,” said Peter Maccallum, head of IT and scientific computing at Cancer Research UK in Cambridge. “This has already lead to tangible published work looking into breast cancer, and is proving its worth in helping our researchers further the understanding of how cancers progress.”

Prior to implementing Platform LSF, CRI’s 21 research groups employed separate computing resources in separate locations, which drove up server costs, reduced utilisation rates and increased server maintenance.

By orchestrating workloads and managing CRI’s research applications in a single data centre, Platform LSF has enabled CRI to save approximately £50,000 by removing hardware and maintenance duplication across each location, while increasing the amount of data processed. Cancer Research says the institute can now direct more computing resources directly to its research teams “to use in a more timely and cost efficient manner”.

CRI has already saved the equivalent in man hours of one full-time employee by integrating Platform LSF, says Cancer Research. As a result, the institute plans to scale Platform LSF internally by adding more servers as compute requirements increase.

CRI is also collaborating with Platform Computing to architecturally support cross-organisation systems for HPC (high performance computing) clusters, that will enable CRI to collaborate with other research organisations in order to meet the growing demand for genomics research.

In other recent medical technology news, scientists at Cambridge University are developing a computer system that can read vast amounts of scientific literature, make rapid connections between facts and develop hypotheses. Cambridge University said most biomedical scientists cannot keep on top of reading all of the publications in their field, let alone an adjacent field. As a first step to solving the problem, Cambridge has developed its CRAB text-mining tool.



Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

Screen Shot 2021-07-19 at 7.01.57 PM
Word Cloud By Danielle Smolyar

Methylation Subtypes and Large-Scale Epigenetic Alterations in Gastric Cancer

  1. Hermioni Zouridis1,*,,
  2. Niantao Deng1,2,*,
  3. Tatiana Ivanova1,
  4. Yansong Zhu1,
  5. Bernice Wong3,
  6. Dan Huang4,
  7. Yong Hui Wu1,5,
  8. Yingting Wu6,7,
  9. Iain Beehuat Tan2,8,
  10. Natalia Liem9,
  11. Veena Gopalakrishnan1,
  12. Qin Luo1,
  13. Jeanie Wu5,
  14. Minghui Lee5,
  15. Wei Peng Yong9,10,
  16. Liang Kee Goh1,
  17. Bin Tean Teh1,3,4,
  18. Steve Rozen6,11 and
  19. Patrick Tan1,5,9,12,

+Author Affiliations

  1. 1Cancer and Stem Cell Biology Program, Duke-NUS Graduate Medical School, 8 College Road, Singapore 169857, Singapore.

  2. 2NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, 5 Lower Kent Ridge Road, Singapore 119074, Singapore.

  3. 3National Cancer Centre Singapore–Van Andel Research Institute Translational Research Laboratory, Department of Medical Sciences, National Cancer Centre, 11 Hospital Drive, Singapore 169610, Singapore.

  4. 4Laboratory of Cancer Genetics, Van Andel Research Institute, Grand Rapids, MI 49503, USA.

  5. 5Cellular and Molecular Research, National Cancer Centre, Singapore 169610, Singapore.

  6. 6Neuroscience and Behavioural Disorders, Duke-NUS Graduate Medical School, Singapore 169857, Singapore.

  7. 7Singapore-MIT Alliance, National University of Singapore, Singapore 119074, Singapore.

  8. 8Division of Medical Oncology, National Cancer Centre, Singapore 169610, Singapore.

  9. 9Cancer Science Institute of Singapore, National University of Singapore, Singapore 119074, Singapore.

  10. 10National Cancer Institute Singapore, National University Hospital, Singapore 119228, Singapore.

  11. 11Department of Psychiatry and Behavioral Sciences, Duke University Medical Center, Durham, NC 27710, USA.

  12. 12Genome Institute of Singapore, 60 Biopolis Street, Genome 02-01, Singapore 138672, Singapore.

+Author Notes

  • * These authors contributed equally to this work.

  • † Present address: LabConnect, LLC, 2910 First Avenue South, Suite 200, Seattle, WA 98134, USA.

  1. ‡To whom correspondence should be addressed. E-mail: gmstanp@duke-nus.edu.sg


Epigenetic alterations are fundamental hallmarks of cancer genomes. We surveyed the landscape of DNA methylation alterations in gastric cancer by analyzing genome-wide CG dinucleotide (CpG) methylation profiles of 240 gastric cancers (203 tumors and 37 cell lines) and 94 matched normal gastric tissues. Cancer-specific epigenetic alterations were observed in 44% of CpGs, comprising both tumor hyper- and hypomethylation. Twenty-five percent of the methylation alterations were significantly associated with changes in tumor gene expression. Whereas most methylation-expression correlations were negative, several positively correlated methylation-expression interactions were also observed, associated with CpG sites exhibiting atypical transcription start site distances and gene body localization. Methylation clustering of the tumors revealed a CpG island methylator phenotype (CIMP) subgroup associated with widespread hypermethylation, young patient age, and adverse patient outcome in a disease stage–independent manner. CIMP cell lines displayed sensitivity to 5-aza-2′-deoxycytidine, a clinically approved demethylating drug. We also identified long-range regions of epigenetic silencing (LRESs) in CIMP tumors. Combined analysis of the methylation, gene expression, and drug treatment data suggests that certain LRESs may silence specific genes within the region, rather than all genes. Finally, we discovered regions of long-range tumor hypomethylation, associated with increased chromosomal instability. Our results provide insights into the epigenetic impact of environmental and biological agents on gastric epithelial cells, which may contribute to cancer.

Sci Transl Med 17 October 2012: 
Vol. 4, Issue 156, p. 156ra140 
Sci. Transl. Med. DOI: 10.1126/scitranslmed.3004504

Methylation-based Stomach Cancer Subtypes

October 17, 2012

NEW YORK (GenomeWeb News) – A new study in Science Translational Medicine is highlighting the epigenetic subtypes that exist within stomach cancer.

“Our results strongly demonstrate that gastric cancer is not one disease but a conglomerate of multiple diseases, each with a different underlying biology and hallmark features,” senior author Patrick Tan, a cancer researcher with the Duke-National University of Singapore Graduate Medical School, said in a statement.

“If gastric cancer is the result of multiple interacting factors, including both environmental factors and host genetic factors, we need better ways to diagnosis and treat it,” added Tan, who is also affiliated with Singapore’s National Cancer Centre and the Genome Institute of Singapore.

Tan and colleagues based in Singapore and the US did array-based DNA methylation analyses on more than 200 gastric tumors and dozens of gastric cancer lines. Their subsequent analyses of these methylation profiles indicated that stomach cancers have many stretches of sequence with higher or lower levels of methylation compared with nearly 100 matched normal stomach samples.

Within the tumor and cell lines, the analysis revealed subsets of gastric cancer with distinct methylation profiles that appear to be prognostically important.

In particular, a group of tumors known as CIMP (CpG island methylator phenotype) tumors, which show excess methylation at some cytosine and guanine-rich regions of the genome, tended to turn up in younger gastric cancer patients and those with poor outcomes.

On the other hand, results of the study also hint that the pronounced methylation shifts in these CIMP gastric cancers could also render them more vulnerable to demethylating compounds.

“Gastric cancer is a heterogenous disease with individual patients often displaying markedly different responses to the same treatment,” Tan said. “Improving gastric cancer clinical outcomes will require molecular approaches capable of subdividing patients into biologically similar subgroups, and designing subtype-specific therapies for each group.”

Previous genomic studies have started to unravel the range of somatic mutations and other genetic alterations that can contribute to gastric adenocarcinoma, the researchers noted. Less is known about the epigenetic features of the often deadly disease, which is especially common in some Asian populations, though some studies have identified specific genes with unusual epigenetic profiles in gastric cancer.

In an effort to more fully understand the epigenetic features of stomach cancer, Tan and his colleagues used Illumina Infinium arrays to profile cytosine methylation patterns in tumor samples from 203 individuals with gastric cancer, along with matched normal stomach tissue samples for 94 of the patients.

Using a similar strategy, the group also measured genome-wide methylation patterns in 37 stomach cancer cell lines.

When they compared methylation profiles across the samples, the researchers saw that some 44 percent of the CpG sites tested had higher- or lower-than-usual cytosine methylation levels that were specific to the stomach cancer. Around a quarter of these seemed to coincide with either jumps or — more frequently — dips in gene expression in the tumors, they reported.

A subset of the tumors had especially high levels of CpG island methylation, the team found. Follow-up analyses indicated that these tumors — which comprise an apparent CIMP sub-group of the stomach cancer — were more commonly found in young patients and/or those with poor survival outcomes.

Over-represented amongst the genes in highly methylated regions of CIMP tumors were genes implicated in stem cell-related processes, researchers noted, as were sites recognized by the histone regulating Polycomb repressive complex.

“Taken collectively,” they wrote, “these results suggest that CIMP tumors may represent a clinically and biologically distinct sub-group of gastric cancers.”

Moreover, in one of its follow-up experiments the team found that it was possible to curb the proliferation of seven gastric cancer-derived cell lines in the CIMP sub-group using a demethylating drug called 5-aza-2′-deoxycytidine, or 5-Aza-dC — an effect they did not see in 10 non-CIMP cell lines treated with the drug.

Based on findings from their methylation and gene expression profiling in gastric cancer so far, the study authors argued that an improved appreciation of the methylome-based sub-types present in the disease might aid future efforts to improve stomach cancer diagnosis and treatment options.

“[A]dditional work will focus on developing simple diagnostic tests to detect gastric cancer at earlier stages, plus drugs and drug targets that might exhibit high potency against different molecular subtypes of disease,” Tan said in a statement.

Read Full Post »

Expanding the Genetic Alphabet and Linking the Genome to the Metabolome

English: The citric acid cycle, also known as ...

English: The citric acid cycle, also known as the tricarboxylic acid cycle (TCA cycle) or the Krebs cycle. Produced at WikiPathways. (Photo credit: Wikipedia)

Expanding the Genetic Alphabet and Linking the Genome to the Metabolome


Reporter& Curator:  Larry Bernstein, MD, FCAP


















Unlocking the diversity of genomic expression within tumorigenesis and “tailoring” of therapeutic options

1. Reshaping the DNA landscape between diseases and within diseases by the linking of DNA to treatments

In the NEW York Times of 9/24,2012 Gina Kolata reports on four types of breast cancer and the reshaping of breast cancer DNA treatment based on the findings of the genetically distinct types, which each have common “cluster” features that are driving many cancers.  The discoveries were published online in the journal Nature on Sunday (9/23).  The study is considered the first comprehensive genetic analysis of breast cancer and called a roadmap to future breast cancer treatments.  I consider that if this is a landmark study in cancer genomics leading to personalized drug management of patients, it is also a fitting of the treatment to measurable “combinatorial feature sets” that tie into population biodiversity with respect to known conditions.   The researchers caution that it will take years to establish transformative treatments, and this is clearly because in the genetic types, there are subsets that have a bearing on treatment “tailoring”.   In addition, there is growing evidence that the Watson-Crick model of the gene is itself being modified by an expansion of the alphabet used to construct the DNA library, which itself will open opportunities to explain some of what has been considered junk DNA, and which may carry essential information with respect to metabolic pathways and pathway regulation.  The breast cancer study is tied to the  “Cancer Genome Atlas” Project, already reported.  It is expected that this work will tie into building maps of genetic changes in common cancers, such as, breast, colon, and lung.  What is not explicit I presume is a closely related concept, that the translational challenge is closely related to the suppression of key proteomic processes tied into manipulating the metabolome.

Saha S. Impact of evolutionary selection on functional regions: The imprint of evolutionary selection on ENCODE regulatory elements is manifested between species and within human populations. 9/12/2012. PharmaceuticalIntelligence.Wordpress.com

Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, Shen EH, Ng L, et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature  Sept 14-20, 2012

Sarkar A. Prediction of Nucleosome Positioning and Occupancy Using a Statistical Mechanics Model. 9/12/2012. PharmaceuticalIntelligence.WordPress.com

Heijden et al.   Connecting nucleosome positions with free energy landscapes. (Proc Natl Acad Sci U S A. 2012, Aug 20 [Epub ahead of print]).  http://www.ncbi.nlm.nih.gov/pubmed/22908247

2. Fiddling with an expanded genetic alphabet – greater flexibility in design of treatment (pharmaneogenesis?)

Diagram of DNA polymerase extending a DNA stra...

Diagram of DNA polymerase extending a DNA strand and proof-reading. (Photo credit: Wikipedia)

A clear indication of this emerging remodeling of the genetic alphabet is a new
study led by scientists at The Scripps Research Institute appeared in the
June 3, 2012 issue of Nature Chemical Biology that indicates the genetic code as
we know it may be expanded to include synthetic and unnatural sequence pairing (Study Suggests Expanding the Genetic Alphabet May Be Easier than Previously Thought, Genome). They infer that the genetic instructions for living organisms
that is composed of four bases (C, G, A and T)— is open to unnatural letters. An expanded “DNA alphabet” could carry more information than natural DNA, potentially coding for a much wider range of molecules and enabling a variety of powerful applications. The implications of the application of this would further expand the translation of portions of DNA to new transciptional proteins that are heretofore unknown, but have metabolic relavence and therapeutic potential. The existence of such pairing in nature has been studied in Eukariotes for at least a decade, and may have a role in biodiversity. The investigators show how a previously identified pair of artificial DNA bases can go through the DNA replication process almost as efficiently as the four natural bases.  This could as well be translated into human diversity, and human diseases.

The Romesberg laboratory collaborated on the new study and his lab have been trying to find a way to extend the DNA alphabet since the late 1990s. In 2008, they developed the efficiently replicating bases NaM and 5SICS, which come together as a complementary base pair within the DNA helix, much as, in normal DNA, the base adenine (A) pairs with thymine (T), and cytosine (C) pairs with guanine (G). It had been clear that their chemical structures lack the ability to form the hydrogen bonds that join natural base pairs in DNA. Such bonds had been thought to be an absolute requirement for successful DNA replication, but that is not the case because other bonds can be in play.

The data strongly suggested that NaM and 5SICS do not even approximate the edge-to-edge geometry of natural base pairs—termed the Watson-Crick geometry, after the co-discoverers of the DNA double-helix. Instead, they join in a looser, overlapping, “intercalated” fashion that resembles a ‘mispair.’ In test after test, the NaM-5SICS pair was efficiently replicable even though it appeared that the DNA polymerase didn’t recognize it. Their structural data showed that the NaM-5SICS pair maintain an abnormal, intercalated structure within double-helix DNA—but remarkably adopt the normal, edge-to-edge, “Watson-Crick” positioning when gripped by the polymerase during the crucial moments of DNA replication. NaM and 5SICS, lacking hydrogen bonds, are held together in the DNA double-helix by “hydrophobic” forces, which cause certain molecular structures (like those found in oil) to be repelled by water molecules, and thus to cling together in a watery medium.

The finding suggests that NaM-5SICS and potentially other, hydrophobically bound base pairs could be used to extend the DNA alphabet and that Evolution’s choice of the existing four-letter DNA alphabet—on this planet—may have been developed allowing for life based on other genetic systems.

3.  Studies that consider a DNA triplet model that includes one or more NATURAL nucleosides and looks closely allied to the formation of the disulfide bond and oxidation reduction reaction.

This independent work is being conducted based on a similar concep. John Berger, founder of Triplex DNA has commented on this. He emphasizes Sulfur as the most important element for understanding evolution of metabolic pathways in the human transcriptome. It is a combination of sulfur 34 and sulphur 32 ATMU. S34 is element 16 + flourine, while S32 is element 16 + phosphorous. The cysteine-cystine bond is the bridge and controller between inorganic chemistry (flourine) and organic chemistry (phosphorous). He uses a dual spelling, using  sulfphur to combine the two referring to the master catalyst of oxidation-reduction reactions. Various isotopic alleles (please note the duality principle which is natures most important pattern). Sulfphur is Methionine, S adenosylmethionine, cysteine, cystine, taurine, gluthionine, acetyl Coenzyme A, Biotin, Linoic acid, H2S, H2SO4, HSO3-, cytochromes, thioredoxin, ferredoxins, purple sulfphur anerobic bacteria prokaroytes, hydrocarbons, green sulfphur bacteria, garlic, penicillin and many antibiotics; hundreds of CSN drugs for parasites and fungi antagonists. These are but a few names which come to mind. It is at the heart of the Krebs cycle of oxidative phosphorylation, i.e. ATP. It is also a second pathway to purine metabolism and nucleic acids. It literally is the key enzymes between RNA and DNA, ie, SH thiol bond oxidized to SS (dna) cysteine through thioredoxins, ferredoxins, and nitrogenase. The immune system is founded upon sulfphur compounds and processes. Photosynthesis Fe4S4 to Fe2S3 absorbs the entire electromagnetic spectrum which is filtered by the Allen belt some 75 miles above earth. Look up chromatium vinosum or allochromatium species.  There is reasonable evidence it is the first symbiotic species of sulfphur anerobic bacteria (Fe4S4) with high potential mvolts which drives photosynthesis while making glucose with H2S.
He envisions a sulfphur control map to automate human metabolism with exact timing sequences, at specific three dimensional coordinates on Bravais crystalline lattices. He proposes adding the inosine-xanthosine family to the current 5 nucleotide genetic code. Finally, he adds, the expanded genetic code is populated with “synthetic nucleosides and nucleotides” with all kinds of customized functional side groups, which often reshape nature’s allosteric and physiochemical properties. The inosine family is nature’s natural evolutionary partner with the adenosine and guanosine families in purine synthesis de novo, salvage, and catabolic degradation. Inosine has three major enzymes (IMPDH1,2&3 for purine ring closure, HPGRT for purine salvage, and xanthine oxidase and xanthine dehydrogenase.

English: DNA replication or DNA synthesis is t...

English: DNA replication or DNA synthesis is the process of copying a double-stranded DNA molecule. This process is paramount to all life as we know it. (Photo credit: Wikipedia)

3. Nutritional regulation of gene expression,  an essential role of sulfur, and metabolic control 

Finally, the research carried out for decades by Yves Ingenbleek and the late Vernon Young warrants mention. According to their work, sulfur is again tagged as essential for health. Sulfur (S) is the seventh most abundant element measurable in human tissues and its provision is mainly insured by the intake of methionine (Met) found in plant and animal proteins. Met is endowed with unique functional properties as it controls the ribosomal initiation of protein syntheses, governs a myriad of major metabolic and catalytic activities and may be subjected to reversible redox processes contributing to safeguard protein integrity.

Consuming diets with inadequate amounts of methionine (Met) are characterized by overt or subclinical protein malnutrition, and it has serious morbid consequences. The result is reduction in size of their lean body mass (LBM), best identified by the serial measurement of plasma transthyretin (TTR), which is seen with unachieved replenishment (chronic malnutrition, strict veganism) or excessive losses (trauma, burns, inflammatory diseases).  This status is accompanied by a rise in homocysteine, and a concomitant fall in methionine.  The ratio of S to N is quite invariant, but dependent on source.  The S:N ratio is typical 1:20 for plant sources and 1:14.5 for animal protein sources.  The key enzyme involved with the control of Met in man is the enzyme cystathionine-b-synthase, which declines with inadequate dietary provision of S, and the loss is not compensated by cobalamine for CH3- transfer.

As a result of the disordered metabolic state from inadequate sulfur intake (the S:N ratio is lower in plants than in animals), the transsulfuration pathway is depressed at cystathionine-β-synthase (CβS) level triggering the upstream sequestration of homocysteine (Hcy) in biological fluids and promoting its conversion to Met. They both stimulate comparable remethylation reactions from homocysteine (Hcy), indicating that Met homeostasis benefits from high metabolic priority. Maintenance of beneficial Met homeostasis is counterpoised by the drop of cysteine (Cys) and glutathione (GSH) values downstream to CβS causing reducing molecules implicated in the regulation of the 3 desulfuration pathways

4. The effect on accretion of LBM of protein malnutrition and/or the inflammatory state: in closer focus

Hepatic synthesis is influenced by nutritional and inflammatory circumstances working concomitantly and liver production of  TTR integrates the dietary and stressful components of any disease spectrum. Thus we have a depletion of visceral transport proteins made by the liver and fat-free weight loss secondary to protein catabolism. This is most accurately reflected by TTR, which is a rapid turnover protein, but it is involved in transport and is essential for thyroid function (thyroxine-binding prealbumin) and tied to retinol-binding protein. Furthermore, protein accretion is dependent on a sulfonation reaction with 2 ATP.  Consequently, Kwashiorkor is associated with thyroid goiter, as the pituitary-thyroid axis is a major sulfonation target. With this in mind, it is not surprising why TTR is the sole plasma protein whose evolutionary patterns closely follow the shape outlined by LBM fluctuations. Serial measurement of TTR therefore provides unequaled information on the alterations affecting overall protein nutritional status. Recent advances in TTR physiopathology emphasize the detecting power and preventive role played by the protein in hyper-homocysteinemic states.

Individuals submitted to N-restricted regimens are basically able to maintain N homeostasis until very late in the starvation processes. But the N balance study only provides an overall estimate of N gains and losses but fails to identify the tissue sites and specific interorgan fluxes involved. Using vastly improved methods the LBM has been measured in its components. The LBM of the reference man contains 98% of total body potassium (TBK) and the bulk of total body sulfur (TBS). TBK and TBS reach equal intracellular amounts (140 g each) and share distribution patterns (half in SM and half in the rest of cell mass). The body content of K and S largely exceeds that of magnesium (19 g), iron (4.2 g) and zinc (2.3 g).

TBN and TBK are highly correlated in healthy subjects and both parameters manifest an age-dependent curvilinear decline with an accelerated decrease after 65 years. Sulfur Methylation (SM) undergoes a 15% reduction in size per decade, an involutive process. The trend toward sarcopenia is more marked and rapid in elderly men than in elderly women decreasing strength and functional capacity. The downward SM slope may be somewhat prevented by physical training or accelerated by supranormal cytokine status as reported in apparently healthy aged persons suffering low-grade inflammation or in critically ill patients whose muscle mass undergoes proteolysis.

5.  The results of the events described are:

  • Declining generation of hydrogen sulfide (H2S) from enzymatic sources and in the non-enzymatic reduction of elemental S to H2S.
  • The biogenesis of H2S via non-enzymatic reduction is further inhibited in areas where earth’s crust is depleted in elemental sulfur (S8) and sulfate oxyanions.
  • Elemental S operates as co-factor of several (apo)enzymes critically involved in the control of oxidative processes.

Combination of protein and sulfur dietary deficiencies constitute a novel clinical entity threatening plant-eating population groups. They have a defective production of Cys, GSH and H2S reductants, explaining persistence of an oxidative burden.

6. The clinical entity increases the risk of developing:

  • cardiovascular diseases (CVD) and
  • stroke

in plant-eating populations regardless of Framingham criteria and vitamin-B status.
Met molecules supplied by dietary proteins are submitted to transmethylation processes resulting in the release of Hcy which:

  • either undergoes Hcy — Met RM pathways or
  • is committed to transsulfuration decay.

Impairment of CβS activity, as described in protein malnutrition, entails supranormal accumulation of Hcy in body fluids, stimulation of activity and maintenance of Met homeostasis. The data show that combined protein- and S-deficiencies work in concert to deplete Cys, GSH and H2S from their body reserves, hence impeding these reducing molecules to properly face the oxidative stress imposed by hyperhomocysteinemia.

Although unrecognized up to now, the nutritional disorder is one of the commonest worldwide, reaching top prevalence in populated regions of Southeastern Asia. Increased risk of hyperhomocysteinemia and oxidative stress may also affect individuals suffering from intestinal malabsorption or westernized communities having adopted vegan dietary lifestyles.

Ingenbleek Y. Hyperhomocysteinemia is a biomarker of sulfur-deficiency in human morbidities. Open Clin. Chem. J. 2009 ; 2 : 49-60.

7. The dysfunctional metabolism in transitional cell transformation

A third development is also important and possibly related. The transition a cell goes through in becoming cancerous tends to be driven by changes to the cell’s DNA. But that is not the whole story. Large-scale techniques to the study of metabolic processes going on in cancer cells is being carried out at Oxford, UK in collaboration with Japanese workers. This thread will extend our insight into the metabolome. Otto Warburg, the pioneer in respiration studies, pointed out in the early 1900s that most cancer cells get the energy they need predominantly through a high utilization of glucose with lower respiration (the metabolic process that breaks down glucose to release energy). It helps the cancer cells deal with the low oxygen levels that tend to be present in a tumor. The tissue reverts to a metabolic profile of anaerobiosis.  Studies of the genetic basis of cancer and dysfunctional metabolism in cancer cells are complementary. Tomoyoshi Soga’s large lab in Japan has been at the forefront of developing the technology for metabolomics research over the past couple of decades (metabolomics being the ugly-sounding term used to describe research that studies all metabolic processes at once, like genomics is the study of the entire genome).

Their results have led to the idea that some metabolic compounds, or metabolites, when they accumulate in cells, can cause changes to metabolic processes and set cells off on a path towards cancer. The collaborators have published a perspective article in the journal Frontiers in Molecular and Cellular Oncology that proposes fumarate as such an ‘oncometabolite’. Fumarate is a standard compound involved in cellular metabolism. The researchers summarize that shows how accumulation of fumarate when an enzyme goes wrong affects various biological pathways in the cell. It shifts the balance of metabolic processes and disrupts the cell in ways that could favor development of cancer.  This is of particular interest because “fumarate” is the intermediate in the TCA cycle that is converted to malate.

Animation of the structure of a section of DNA...

Animation of the structure of a section of DNA. The bases lie horizontally between the two spiraling strands. (Photo credit: Wikipedia)

The Keio group is able to label glucose or glutamine, basic biological sources of fuel for cells, and track the pathways cells use to burn up the fuel.  As these studies proceed, they could profile the metabolites in a cohort of tumor samples and matched normal tissue. This would produce a dataset of the concentrations of hundreds of different metabolites in each group. Statistical approaches could suggest which metabolic pathways were abnormal. These would then be the subject of experiments targeting the pathways to confirm the relationship between changed metabolism and uncontrolled growth of the cancer cells.

Related articles

Read Full Post »

Older Posts »

%d bloggers like this: