First challenge to make use of the new NCI Cloud Pilots – Somatic Mutation Challenge – RNA: Best algorithms for detecting all of the abnormal RNA molecules in a cancer cell
Reporter: Aviva Lev-Ari, PhD, RN
Genomic rearrangements in cancer cells produce fusion transcripts, which may give rise to chimeric protein products not present in normal cells. In addition, cancer cells can express alternate forms of encoded messages that give rise to protein variants different from normal tissue. These chimeras and protein variants can serve as robust diagnostic markers or drug targets. Moreover, ongoing research efforts are beginning to unveil the potential clinical relevance of these variant RNA products. Increasing the “alterome” of tumors by fully characterizing their RNA landscapes will expand our understanding of cancer mechanisms, provide new biomarkers and reveal possible new RNA-based therapeutics, thus improving personalized patient treatment.
“Predicting RNA species in a cancer cell is a particularly challenging task,” says Josh Stuart, Professor at the UC Santa Cruz Genomics Institute and one of the challenge leaders. “RNA expression reflects much of the deranged complexity of the underlying cancer cell DNA and then adds another level of derangement on top of that.”
The goal of the SMC-RNA Challenge is to identify the best methods for detecting rearrangements in RNA sequencing (RNA-seq) data. Sub-challenges are focused on detecting and quantifying mRNA fusions and isoforms. Methods will be evaluated with both in silico and spiked-in data.
Two key questions that will be addressed are:
1) What is the best way to estimate the abundances of a set of known RNA isoforms? and
2) What is the best way to predict the presence of novel gene fusions?
Both of these questions will involve in silico generated and wet lab spiked-in RNA sequencing data.
We are launching the ICGC-TCGA DREAM Somatic Mutation Calling RNA Challenge (SMC-RNA), a community-based collaborative competition of researchers from across the world. We will rigorously assess the accuracy of methods to perform two key tasks in cancer RNA-Seq data analysis: the quantification of known isoforms and detecting novel fusion transcripts. We will generate RNA-Seq data for a synthetic-based challenge. Since synthetic data may not capture the complexity of real human tumours, we will also introduce a phase in which teams make predictions on real human-tumours. Challenge organizers will perform retrospective experimental validation on predictions to create a gold-standard. Validation will employ a combination of long-read sequencing and target-capture approaches. The SMC-RNA Challenge will analyze a couple of dozen samples created to have known alterations representing different tumor types, allowing confidence that the winning methods will be generalizable across the broad range of human cancers.
The ICGC-TCGA DREAM Somatic Mutation Calling – RNA Challenge (SMC-RNA) is an international effort to improve standard methods for identifying cancer-associated rearrangements in RNA sequencing (RNA-seq) data. Leaders of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) cancer genomics projects are joining with Sage Bionetworks and IBM-DREAM to initiate this innovative open crowd-sourced Challenge [1-3].
Why is RNA biology important in cancer?
While only a small fraction of the genome encodes proteins, the majority is either transcribed or has putative regulatory functions, with the consequence that cellular functions are extensively regulated at the RNA level. The regulation of RNA, and its dramatic dysregulation in cancer cells, occurs in multiple ways. RNA abundances may be altered, and these have served as the basis for clinically-important prognostic biomarkers. Genomic rearrangements in cancer cells produce fusion transcripts which may give rise to chimeric protein products not present in normal cells. These can serve as robust diagnostic markers (e.g. TMPRSS2-ERG in prostate cancer) or drug targets (e.g. BCR-ABL in CML). Ongoing research efforts are beginning to unveil the potential clinical relevance of aberrant processing of RNA in cancer, such as defects in alternative-splicing. To fully document the molecular differences in transcription between tumor cells and their normal counterparts an assortment of computational methods are needed. Increasing the “alterome” of tumors by fully characterizing their RNA landscapes will expand our understanding of cancer mechanisms, provide new biomarkers and reveal possible new RNA-based therapeutics, improving personalized patient treatment.
What is RNA Sequencing?
RNA-seq is using next-generation sequencing techniques to sequence RNA. It allows the the transcriptome to be sequenced at high coverage, provides raw read counts that can be used to assess expression levels, and from it elucidate other biologically relevant information. RNA is reverse transcribed into cDNA and then sequenced at high depth by high-throughput technologies, such as Illumina HiSeq, Roche 454, and PacBio . After sequencing, reads can be aligned de novo, to a reference genome, or to a reference transcriptome. Using RNA-seq has some advantages over microarrays, including no prior knowledge of the transcriptome needed and an unbiased expression analysis . In comparison, microarrays, the current gold standard for RNA analysis require probe design and have been found to contain more bias in low-intensity genes . Some key challenges in RNA-seq include biases occurring in RNA fragmentation, cDNA fragmentation, and library preparation, in addition to, potential PCR artifacts that skew expression levels, and possible alignment to multiple locations in a reference genome . Due to these and other influences, methods for detecting and quantifying transcriptional isoforms, as well as fusion genes remains a challenging set of problems and competing methods for interpreting RNA-Seq results continues to be poor.
Scientific Rationale: Gene fusions occur when two genes at the DNA level are joined and may be due to an oncogenic event. A fusion may also occur at the RNA level where a ligation between two transcripts occurs. Gene fusions have an important role in the initial steps of tumorigenesis. Specifically, gene fusions have been found to be the driver mutations in neoplasia and have been linked to various tumour subtypes. An increasing number of gene fusions are being recognized as important diagnostic and prognostic parameters in malignant haematological disorders and childhood sarcomas. Gene fusions occur in all malignancies and account for 20% of human cancer morbidity .
Scientific Rationale: Isoforms are alternative expressions of a gene formed from splicing during post-transcriptional processing. Dysregulation of alternative splicing occurs in every category of Hanahan’s and Weinberg’s hallmarks of cancer. Modifications in splicing may occur due to mutations of cis-acting splicing elements, trans-acting regulators, and microRNAs. Moreover, cancer cell lines, regardless of their tissue of origin, can be effectively discriminated from non-cancer cell lines at isoform level, but not at gene level. Existence of an isoform signature, rather than a gene signature, could be used to distinguish cancer cells from normal cells .
The goal of this Challenge is to use a crowd-based competition to identify the optimal methods for detecting (and quantifying) mRNA fusions and isoforms from RNA-seq data.
Sub-Challenge 1: Quantify Known Isoforms
Can algorithms estimate the levels of a set of provided isoforms?
1a: In silico simulated data challenge
1b: Spike-in data challenge on real data from long-reads and hybrid capture.
Sub-Challenge 2: Detect Gene Fusions
Can algorithms predict the presence of gene fusions?
2a: In silico simulated gene fusions.
2b: Spike-in long-read data.
- All participants will be invited as consortium co-authors on Challenge marker papers
- Winners will receive travel awards & speaking invitations at the next DREAM conference or Sage Congress
- New methods will be considered for co-publication with the Challenge marker papers by our publishing partner
- Other rewards will be announced as they are determined.
Challenge Organizers / Scientific Advisory Board
- Kyle Ellrott, Oregon Health Sciences University
- Josh Stuart, University of California, Santa Cruz
- Paul C. Boutros, Ontario Institute for Cancer Research
- Paul Spellman, Oregon Health Sciences University
- Christopher Maher, Washington University
- Stephen Friend, Sage Bionetworks
- Thea Norman, Sage Bionetworks
- Gustavo Stolovitzky, IBM, DREAM
- International Cancer Genome Consortium et al. International network of cancer genome projects. Nature 464, 993–998 (2010).http://icgc.org/
- The Cancer Genome Atlas (TCGA). http://cancergenome.nih.gov/
- Dialogue for Reverse Engineering Assessments and Methods (DREAM). http://www.the-dream-project.org/
- Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10(1), 57-63.
- Han, Y., Gao, S., Muegge, K., Zhang, W., & Zhou, B. (2015). Advanced Applications of RNA Sequencing and Challenges.Bioinformatics and biology insights, 9(Suppl 1), 29.
- Robinson, D., Wang, J., & Storey, J. (2015). A nested parallel experiment demonstrates differences in intensity-dependence between RNA-seq and microarrays. Nucleic Acids Research. 43(20)
- Mertens, F., Johansson, B., Fioretos, T., & Mitelman, F. (2015). The emerging complexity of gene fusions in cancer. Nature Reviews Cancer, 15(6), 371-381.
- Liu S, Cheng C. Alternative RNA splicing and cancer. Wiley interdisciplinary reviews RNA. 2013,4(5),547-566.