
Complex rearrangements and oncogene amplification revealed by long-read DNA and RNA sequencing of a breast cancer cell line
Reporter: Stephen J. Williams, PhD
In a Genome Research report by Marie Nattestad et al. [1], the SK-BR-3 breast cancer cell line was sequenced using a long read single molecule sequencing protocol in order to develop one of the most detailed maps of structural variations in a cancer genome to date. The authors detected over 20,000 variants with this new sequencing modality, whereas most of these variants would have been missed by short read sequencing. In addition, a complex sequence of nested duplications and translocations occurred surrounding the ERBB2 (HER2) while full-length transcriptomic analysis revealed novel gene fusions within the nested genomic variants. The authors suggest that combining this long-read genome and transcriptome sequencing results in a more comprehensive coverage of tumor gene variants and “sheds new light on the complex mechanisms involved in cancer genome evolution.”
Genomic instability is a hallmark of cancer [2], which lead to numerous genetic variations such as:
- Copy number variations
- Chromosomal alterations
- Gene fusions
- Deletions
- Gene duplications
- Insertions
- Translocations
Efforts such as the Cancer Genome Atlas [3], and the International Genome Consortium (2010) use short-read sequencing technology to detect and analyze thousands of commonly occurring mutations however short-read technology has a high false positive and negative rate for detecting less common genetic structural variations {as high as 50% [4]}. In addition, short reads cannot detect variations in close proximity to each other or on the same molecule, therefore underestimating the variation number.
Methods: The authors used a long-read sequencing technology from Pacific Biosciences (SMRT) to analyze the mutational and structural variation in the SK-BR-3 breast cancer cell line. A split read and within-read mapping approach was used to detect variants of different types and sizes. In general, long-reads have better alignment qualities than short reads, resulting in higher quality mapping. Transcriptomic analysis was performed using Iso-Seq.
Results: Using the SMRT long-read sequencing technology from Pacific Biosciences, the authors were able to obtain 71.9% sequencing coverage with average read length of 9.8 kb for the SK-BR-3 genome.
A few notes:
- Most amplified regions (33.6 copies) around the locus spanning the ERBB2 oncogene and around MYC locus (38 copies), EGFR locus (7 copies) and BCAS1 (16.8 copies)
- The locus 8q24.12 had the most amplifications (this locus contains the SNTB1 gene) at 69.2 copies
- Long-read sequencing showed more insertions than deletions and suggests an underestimate of the lengths of low complexity regions in the human reference genome
- Found 1,493 long read variants, 603 of which were between different chromosomes
- Using Iso-Seq in conjunction with the long-read platform, they detected 1,692,379 isoforms (93%) mapping to the reference genome and 53 putative gene fusions (39 of which they found genomic evidence)
A table modified from the paper on the gene fusions is given below:
Table 1. Gene fusions with RNA evidence from Iso-Seq and DNA evidence from SMRT DNA sequencing where the genomic path is found using SplitThreader from Sniffles variant calls. Note link in table is GeneCard for each gene.
SplitThreader path
# | Genes | Distance (bp) |
Number of variants |
Chromosomes in path |
Previously observed in references | |
1 | KLHDC2 | SNTB1 | 9837 | 3 | 14|17|8 | Asmann et al. (2011) as only a 2-hop fusion |
2 | CYTH1 | EIF3H | 8654 | 2 | 17|8 | Edgren et al. (2011); Kim and Salzberg |
(2011); RNA only, not observed as 2-hop | ||||||
3 | CPNE1 | PREX1 | 1777 | 2 | 20 | Found and validated as 2-hop by Chen et al. 2013 |
4 | GSDMB | TATDN1 | 0 | 1 | 17|8 | Edgren et al. (2011); Kim and Salzberg |
(2011); Chen et al. (2013); validated by | ||||||
Edgren et al. (2011) | ||||||
5 | LINC00536 | PVT1 | 0 | 1 | 8 | No |
6 | MTBP | SAMD12 | 0 | 1 | 8 | Validated by Edgren et al. (2011) |
7 | LRRFIP2 | SUMF1 | 0 | 1 | 3 | Edgren et al. (2011); Kim and Salzberg |
(2011); Chen et al. (2013); validated by | ||||||
Edgren et al. (2011) | ||||||
8 | FBXL7 | TRIO | 0 | 1 | 5 | No |
9 | ATAD5 | TLK2 | 0 | 1 | 17 | No |
10 | DHX35 | ITCH | 0 | 1 | 20 | Validated by Edgren et al. (2011) |
11 | LMCD1-AS1 | MECOM | 0 | 1 | 3 | No |
12 | PHF20 | RP4-723E3.1 | 0 | 1 | 20 | No |
13 | RAD51B | SEMA6D | 0 | 1 | 14|15 | No |
14 | STAU1 | TOX2 | 0 | 1 | 20 | No |
15 | TBC1D31 | ZNF704 | 0 | 1 | 8 | Edgren et al. (2011); Kim and Salzberg |
(2011); Chen et al. (2013); validated by | ||||||
Edgren et al. (2011); Chen et al. (2013) |
SplitThreader found two different paths for the RAD51B-SEMA6D gene fusion and for the LINC00536-PVT1 gene fusion. Number of Iso-Seq reads refers to full-length HQ-filtered reads. Alignments of SMRT DNA sequence reads supporting each of these gene fusions are shown in Supplemental Note S2.
References
- Nattestad M, Goodwin S, Ng K, Baslan T, Sedlazeck FJ, Rescheneder P, Garvin T, Fang H, Gurtowski J, Hutton E et al: Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line. Genome research 2018, 28(8):1126-1135.
- Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 2000, 100(1):57-70.
- Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, Xie M, Zhang Q, McMichael JF, Wyczalkowski MA et al: Mutational landscape and significance across 12 major cancer types. Nature 2013, 502(7471):333-339.
- Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Fritz MH et al: An integrated map of structural variation in 2,504 human genomes. Nature 2015, 526(7571):75-81.
Other articles on Cancer Genome Sequencing in this Open Access Journal Include:
International Cancer Genome Consortium Website has 71 Committed Cancer Genome Projects Ongoing
Loss of Gene Islands May Promote a Cancer Genome’s Evolution: A new Hypothesis on Oncogenesis
Identifying Aggressive Breast Cancers by Interpreting the Mathematical Patterns in the Cancer Genome
CancerBase.org – The Global HUB for Diagnoses, Genomes, Pathology Images: A Real-time Diagnosis and Therapy Mapping Service for Cancer Patients – Anonymized Medical Records accessible to
Leave a Reply