Archive for the ‘Next Generation Sequencing (NGS)’ Category

Complex rearrangements and oncogene amplification revealed by long-read DNA and RNA sequencing of a breast cancer cell line

Reporter: Stephen J. Williams, PhD

In a Genome Research report by Marie Nattestad et al. [1], the SK-BR-3 breast cancer cell line was sequenced using a long read single molecule sequencing protocol in order to develop one of the most detailed maps of structural variations in a cancer genome to date.  The authors detected over 20,000 variants with this new sequencing modality, whereas most of these variants would have been missed by short read sequencing.  In addition, a complex sequence of nested duplications and translocations occurred surrounding the ERBB2 (HER2) while full-length transcriptomic analysis revealed novel gene fusions within the nested genomic variants.  The authors suggest that combining this long-read genome and transcriptome sequencing results in a more comprehensive coverage of tumor gene variants and “sheds new light on the complex mechanisms involved in cancer genome evolution.”

Genomic instability is a hallmark of cancer [2], which lead to numerous genetic variations such as:

  • Copy number variations
  • Chromosomal alterations
  • Gene fusions
  • Deletions
  • Gene duplications
  • Insertions
  • Translocations

Efforts such as the Cancer Genome Atlas [3], and the International Genome Consortium (2010) use short-read sequencing technology to detect and analyze thousands of commonly occurring mutations however short-read technology has a high false positive and negative rate for detecting less common genetic structural variations {as high as 50% [4]}. In addition, short reads cannot detect variations in close proximity to each other or on the same molecule, therefore underestimating the variation number.

Methods:  The authors used a long-read sequencing technology from Pacific Biosciences (SMRT) to analyze the mutational and structural variation in the SK-BR-3 breast cancer cell line.  A split read and within-read mapping approach was used to detect variants of different types and sizes.  In general, long-reads have better alignment qualities than short reads, resulting in higher quality mapping. Transcriptomic analysis was performed using Iso-Seq.

Results: Using the SMRT long-read sequencing technology from Pacific Biosciences, the authors were able to obtain 71.9% sequencing coverage with average read length of 9.8 kb for the SK-BR-3 genome.

A few notes:

  1. Most amplified regions (33.6 copies) around the locus spanning the ERBB2 oncogene and around MYC locus (38 copies), EGFR locus (7 copies) and BCAS1 (16.8 copies)
  2. The locus 8q24.12 had the most amplifications (this locus contains the SNTB1 gene) at 69.2 copies
  3. Long-read sequencing showed more insertions than deletions and suggests an underestimate of the lengths of low complexity regions in the human reference genome
  4. Found 1,493 long read variants, 603 of which were between different chromosomes
  5. Using Iso-Seq in conjunction with the long-read platform, they detected 1,692,379 isoforms (93%) mapping to the reference genome and 53 putative gene fusions (39 of which they found genomic evidence)

A table modified from the paper on the gene fusions is given below:

Table 1. Gene fusions with RNA evidence from Iso-Seq and DNA evidence from SMRT DNA sequencing where the genomic path is found using SplitThreader from Sniffles variant calls. Note link in table is  GeneCard for each gene.

SplitThreader path


# Genes Distance
of variants
in path
Previously observed in references
1 KLHDC2 SNTB1 9837 3 14|17|8 Asmann et al. (2011) as only a 2-hop fusion
2 CYTH1 EIF3H 8654 2 17|8 Edgren et al. (2011); Kim and Salzberg
(2011); RNA only, not observed as 2-hop
3 CPNE1 PREX1 1777 2 20 Found and validated as 2-hop by Chen et al. 2013
4 GSDMB TATDN1 0 1 17|8 Edgren et al. (2011); Kim and Salzberg
(2011); Chen et al. (2013); validated by
Edgren et al. (2011)
5 LINC00536 PVT1 0 1 8 No
6 MTBP SAMD12 0 1 8 Validated by Edgren et al. (2011)
7 LRRFIP2 SUMF1 0 1 3 Edgren et al. (2011); Kim and Salzberg
(2011); Chen et al. (2013); validated by
Edgren et al. (2011)
8 FBXL7 TRIO 0 1 5 No
9 ATAD5 TLK2 0 1 17 No
10 DHX35 ITCH 0 1 20 Validated by Edgren et al. (2011)
11 LMCD1-AS1 MECOM 0 1 3 No
12 PHF20 RP4-723E3.1 0 1 20 No
13 RAD51B SEMA6D 0 1 14|15 No
14 STAU1 TOX2 0 1 20 No
15 TBC1D31 ZNF704 0 1 8 Edgren et al. (2011); Kim and Salzberg
(2011); Chen et al. (2013); validated by
Edgren et al. (2011); Chen et al. (2013)


SplitThreader found two different paths for the RAD51B-SEMA6D gene fusion and for the LINC00536-PVT1 gene fusion. Number of Iso-Seq reads refers to full-length HQ-filtered reads. Alignments of SMRT DNA sequence reads supporting each of these gene fusions are shown in Supplemental Note S2.




  1. Nattestad M, Goodwin S, Ng K, Baslan T, Sedlazeck FJ, Rescheneder P, Garvin T, Fang H, Gurtowski J, Hutton E et al: Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line. Genome research 2018, 28(8):1126-1135.
  2. Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 2000, 100(1):57-70.
  3. Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, Xie M, Zhang Q, McMichael JF, Wyczalkowski MA et al: Mutational landscape and significance across 12 major cancer types. Nature 2013, 502(7471):333-339.
  4. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Fritz MH et al: An integrated map of structural variation in 2,504 human genomes. Nature 2015, 526(7571):75-81.


Other articles on Cancer Genome Sequencing in this Open Access Journal Include:


International Cancer Genome Consortium Website has 71 Committed Cancer Genome Projects Ongoing

Loss of Gene Islands May Promote a Cancer Genome’s Evolution: A new Hypothesis on Oncogenesis

Identifying Aggressive Breast Cancers by Interpreting the Mathematical Patterns in the Cancer Genome

CancerBase.org – The Global HUB for Diagnoses, Genomes, Pathology Images: A Real-time Diagnosis and Therapy Mapping Service for Cancer Patients – Anonymized Medical Records accessible to


Read Full Post »

Narrative Building for the Future of LPBI Group: List of Talking Points


Exchange between Gail and Aviva


On Tuesday, June 25, 2019, 11:43:27 AM EDT, Aviva Lev-Ari <AvivaLev-Ari@alum.berkeley.edu> wrote:


HOW can we get  Kevin Landwher of terarecon.com to create a Podcast for LPBI Group IP Assets, including a section on our forthcoming Genomics, Volume 2 


In response to this question we are in discussion on POINTS #1,2,3,4


From: Gail Thornton <gailsthornton@yahoo.com>

Reply-To: Gail Thornton <gailsthornton@yahoo.com>

Date: Sunday, June 30, 2019 at 8:38 AM

To: Aviva Lev-Ari <aviva.lev-ari@comcast.net>

Cc: Aviva Lev-Ari <AvivaLev-Ari@alum.berkeley.edu>, Rick Mandahl <rmandahl@gmail.com>, Amnon Danzig <amnon.danzig@gmail.com>

Subject: Please AUDIT PODCAST —>>>>>>>> Beyond the Screen Episode 6: Next Generation AI Companies Providing Physicians a Starting Point in AI


These videos from terarecon.com typically focus on one topic (not many as you’ve described below). 

If there are too many topics proposed to this company, they will not be interested.

My recommendation is for you to finalize Genomics, volume 2, and let’s see the story we have about that specific topic.



On Tuesday, June 25, 2019, 11:43:27 AM EDT, Aviva Lev-Ari <AvivaLev-Ari@alum.berkeley.edu> wrote:


HOW can we get  Kevin Landwher of terarecon.com to create a Podcast for LPBI Group IP Assets, including a section on our forthcoming Genomics, Volume 2 



On Saturday, June 29, 2019, 03:56:08 PM EDT, Aviva Lev-Ari <aviva.lev-ari@comcast.net> wrote:


POINT #1 for VIDEO coverage – Focus on Genomics, Volume 2

After 7/15, Prof. Feldman will be back in the US, stating to work on Part 5 in Genomics, Volume 2. We will Skype to discuss what to include in 5.1, 5.2, 5.3, 5.4

On 7/15, I am submitting my work on creation of Parts 1,2,3,4,6

Dr. Williams and Dr. Saha are working already on Part 7&8.

Below you have abbreviated eTOCs.

Go to URL of the Book to see what I placed already inside this book.

Dr. Williams and Prof. Feldman will compose 


Introduction to Volume 2

Volume Summary


Based on these four parts and the eTOCs you will have ample content for the video, which may start with the epitome of our book creation: Genomics Volume 2 (you interview the three Editors why it is Epitome)

POINT #2 or #3 or #4  for VIDEOs to Focus on coverage for Marketing LPBI Group

by DESCRIPTION of what was accomplished


  • Venture history/background
  • Venture milestones: all posts in the Journal with the Title
  • “We celebrate …..
  • 5-6 Titles like that, I may add two more
  • Site Statistics
  • Book articles cumulative views (Article Scoring System: Data Extract)
  • section on BioMed e-Series
  • section on List of Conference covered in Real Time
  • FIT Team input to Venture Valuation: top 5 or top 10 Factors in consensus 
  • the 3D graphs on Opportunity Maps: Gail, Rick, Amnon, Aviva – each explains their own outcome
  • section on Pipeline

Video on What is the Ideal Solution for the FUTURE of LPBI Group

  • Interviews with All FIT Members

For POINT #1:

To build the narrative for a VIDEO dedication to Genomics, Volume Two and Marketing campaign as a NEW BOOK on NGS, the Narrative will use content extracts to built a CASE for

Why GENOMICS Volume 2 – is the Epitome of all BioMed e-Series???????


forthcoming Genomics, Volume 2 



Aviva completed Parts 1,2,3,4,6, 

[5 is by Prof. Feldman] 

[7,8 are by Scientists on FIT]:

Latest in Genomics Methodologies for Therapeutics:

Gene Editing, NGS & BioInformatics,

Simulations and the Genome Ontology



Volume Two

Prof. Marcus W. Feldman, PhD, Editor

Prof. Stephen J. Williams, PhD, Editor


Aviva Lev-Ari, PhD, RN, Editor 


Abbreviated eTOCs

Part 1: NGS

1.1 The Science

1.2 Technologies and Methodologies

1.3 Clinical Aspects

1.4 Business and Legal


Part 2: CRISPR for Gene Editing and DNA Repair

2.1 The Science

2.2 Technologies and Methodologies

2.3 Clinical Aspects

2.4 Business and Legal


Part 3: AI in Medicine

3.1 The Science

3.2 Technologies and Methodologies

3.3 Clinical Aspects

3.4 Business and Legal

3.5 Latest in Machine Learning (ML) Algorithms harnessed for Medical Diagnosis: Pattern Recognition & Prediction of Disease Onset


Part 4: Single Cell Genomics

4.1 The Science

4.2 Technologies and Methodologies

4.3 Clinical Aspects

4.4 Business and Legal


Part 5: Evolution Biology Genomics Modeling @Feldman Lab, Stanford University – Written and Curated by Prof. Marc Feldman






Part 6: Simulation Modeling in Genomics

6.1   Mutation Analysis – Gene Encoding

6.2   Mitochondrial Variations

6.3   Variant Analysis

6.4   Variant Detection in Hereditary Cancer Genes

6.5   Immuno-Informatics

6.6   RNA Sequencing

6.7   Complex Insertions and Deletions

6.8   Evolutionary Biology

6.9   Simulation Programs

6.10  A comparison of tools for the simulation of genomic next-generation sequencing data


Part 7: Applications of Genomics: Genotypes, Phenotypes and Complex Diseases

7.1 Genome-wide associations with complex diseases (GWAS)

7.2 Non-coding DNA and phenotypes—including diseases like cancer

7.3 Epigenomic associations with phenotypes including cancer

7.4 Rare variants and diseases

7.5 Population-level genomics and the meaning of group differences

7.6 Targeting drugs for complex diseases


Part 8: Epigenomics and Genomic Regulation

8.1  Genomic controls on epigenomics

8.2  The ENCODE project and gene regulation

8.3  Small interfering RNAs and gene expression

8.4  Epigenomics in cancer

8.5  Environmental epigenomics

Read Full Post »

Simulation Tools of Genomic Next Generation Sequencing Data: Comparative Analysis & Genetic Simulation Resources

Reporting: Aviva Lev-Ari, PhD, RN



What is next generation sequencing?

Behjati S, Tarpey PS.

Arch Dis Child Educ Pract Ed. 2013 Dec;98(6):236-8. doi: 10.1136/archdischild-2013-304340. Epub 2013 Aug 28. Review.

Computational pan-genomics: status, promises and challenges.

Computational Pan-Genomics Consortium.

Brief Bioinform. 2018 Jan 1;19(1):118-135. doi: 10.1093/bib/bbw089. Review.

Tracking the NGS revolution: managing life science research on shared high-performance computing clusters.

Dahlö M, Scofield DG, Schaal W, Spjuth O.

Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy028.


[Clinical Applications of Next-Generation Sequencing].

Rebollar-Vega RG, Arriaga-Canon C, de la Rosa-Velázquez IA.

Rev Invest Clin. 2018;70(4):153-157. doi: 10.24875/RIC.18002544.


Free Article


Clinical Genomics: Challenges and Opportunities.

Vijay P, McIntyre AB, Mason CE, Greenfield JP, Li S.

Crit Rev Eukaryot Gene Expr. 2016;26(2):97-113. doi: 10.1615/CritRevEukaryotGeneExpr.2016015724. Review.

Next-generation sequencing in the clinic: promises and challenges.

Xuan J, Yu Y, Qing T, Guo L, Shi L.

Cancer Lett. 2013 Nov 1;340(2):284-95. doi: 10.1016/j.canlet.2012.11.025. Epub 2012 Nov 19. Review.

The Future of Whole-Genome Sequencing for Public Health and the Clinic.

Allard MW.

J Clin Microbiol. 2016 Aug;54(8):1946-8. doi: 10.1128/JCM.01082-16. Epub 2016 Jun 15.


Free PMC Article


Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists.

Roy S, Coldren C, Karunamurthy A, Kip NS, Klee EW, Lincoln SE, Leon A, Pullambhatla M, Temple-Smolkin RL, Voelkerding KV, Wang C, Carter AB.

J Mol Diagn. 2018 Jan;20(1):4-27. doi: 10.1016/j.jmoldx.2017.11.003. Epub 2017 Nov 21. Review.



Next-Generation Sequencing and Mutational Analysis: Implications for Genes Encoding LINC Complex Proteins.

Nagy PL, Worman HJ.

Methods Mol Biol. 2018;1840:321-336. doi: 10.1007/978-1-4939-8691-0_22.


Genome-wide genetic marker discovery and genotyping using next-generation sequencing.

Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML.

Nat Rev Genet. 2011 Jun 17;12(7):499-510. doi: 10.1038/nrg3012. Review.



Best practices for evaluating mutation prediction methods.

Rogan PK, Zou GY.

Hum Mutat. 2013 Nov;34(11):1581-2. doi: 10.1002/humu.22401. Epub 2013 Sep 10. No abstract available.



mit-o-matic: a comprehensive computational pipeline for clinical evaluation of mitochondrial variations from next-generation sequencing datasets.

Vellarikkal SK, Dhiman H, Joshi K, Hasija Y, Sivasubbu S, Scaria V.

Hum Mutat. 2015 Apr;36(4):419-24. doi: 10.1002/humu.22767.



A survey of tools for variant analysis of next-generation genome sequencing data.

Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z.

Brief Bioinform. 2014 Mar;15(2):256-78. doi: 10.1093/bib/bbs086. Epub 2013 Jan 21.


Free PMC Article


Variant callers for next-generation sequencing data: a comparison study.

Liu X, Han S, Wang Z, Gelernter J, Yang BZ.

PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.


ICO amplicon NGS data analysis: a Web tool for variant detection in common high-risk hereditary cancer genes analyzed by amplicon GS Junior next-generation sequencing.

Lopez-Doriga A, Feliubadaló L, Menéndez M, Lopez-Doriga S, Morón-Duran FD, del Valle J, Tornero E, Montes E, Cuesta R, Campos O, Gómez C, Pineda M, González S, Moreno V, Capellá G, Lázaro C.

Hum Mutat. 2014 Mar;35(3):271-7.



Development and analytical validation of a 25-gene next generation sequencing panel that includes the BRCA1 and BRCA2 genes to assess hereditary cancer risk.

Judkins T, Leclair B, Bowles K, Gutin N, Trost J, McCulloch J, Bhatnagar S, Murray A, Craft J, Wardell B, Bastian M, Mitchell J, Chen J, Tran T, Williams D, Potter J, Jammulapati S, Perry M, Morris B, Roa B, Timms K.

BMC Cancer. 2015 Apr 2;15:215. doi: 10.1186/s12885-015-1224-y.

Clinical Applications of Next-Generation Sequencing in Cancer Diagnosis.

Sabour L, Sabour M, Ghorbian S.

Pathol Oncol Res. 2017 Apr;23(2):225-234. doi: 10.1007/s12253-016-0124-z. Epub 2016 Oct 8. Review.



Studying cancer genomics through next-generation DNA sequencing and bioinformatics.

Doyle MA, Li J, Doig K, Fellowes A, Wong SQ.

Methods Mol Biol. 2014;1168:83-98. doi: 10.1007/978-1-4939-0847-9_6. Review.



Immunoinformatics and epitope prediction in the age of genomic medicine.

Backert L, Kohlbacher O.

Genome Med. 2015 Nov 20;7:119. doi: 10.1186/s13073-015-0245-0. Review.

IgSimulator: a versatile immunosequencing simulator.

Safonova Y, Lapidus A, Lill J.

Bioinformatics. 2015 Oct 1;31(19):3213-5. doi: 10.1093/bioinformatics/btv326. Epub 2015 May 25.



Computational genomics tools for dissecting tumour-immune cell interactions.

Hackl H, Charoentong P, Finotello F, Trajanoski Z.

Nat Rev Genet. 2016 Jul 4;17(8):441-58. doi: 10.1038/nrg.2016.67. Review.



SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines.

Audoux J, Salson M, Grosset CF, Beaumeunier S, Holder JM, Commes T, Philippe N.

BMC Bioinformatics. 2017 Sep 29;18(1):428. doi: 10.1186/s12859-017-1831-5.


Free PMC Article


INDELseek: detection of complex insertions and deletions from next-generation sequencing data.

Au CH, Leung AY, Kwong A, Chan TL, Ma ES.

BMC Genomics. 2017 Jan 5;18(1):16. doi: 10.1186/s12864-016-3449-9.


Free PMC Article


The State of Software for Evolutionary Biology.

Darriba D, Flouri T, Stamatakis A.

Mol Biol Evol. 2018 May 1;35(5):1037-1046. doi: 10.1093/molbev/msy014. Review.


PMCID: PMC5224698
PMID: 27320129

Systematic review of next-generation sequencing simulators: computational tools, features and perspectives.

Zhao M, Liu D, Qu H.

Brief Funct Genomics. 2017 May 1;16(3):121-128. doi: 10.1093/bfgp/elw012. Review.



A comparison of tools for the simulation of genomic next-generation sequencing data

Online Summary

  1. There is a large number of tools for the simulation of genomic data for all currently available NGS platforms, with partially overlapped functionality. Here we review 23 of these tools, highlighting their distinct functionalities, requirements and potential applications.

  2. The parameterization of these simulators is often complex. The user may decide between using existing sets of parameters values called profiles or re-estimating them from its own data.

  3. Parameters than can be modulated in these simulations include the effects of the PCR amplification of the libraries, read features and quality scores, base call errors, variation of sequencing depth across the genomes and the introduction of genomic variants.

  4. Several types of genomic variants can be introduced in the simulated reads, such as SNPs, indels, inversions, translocations, copy-number variants and short-tandem repeats.

  5. Reads can be generated from single or multiple genomes, and with distinct ploidy levels. NGS data from metagenomic communities can be simulated given an “abundance profile” that reflects the proportion of taxa in a given sample.

  6. Many of the simulators have not been formally described and/or tested in dedicated publications. We encourage the formal publication of these tools and the realization of comprehensive, comparative benchmarkings.

  7. Choosing among the different genomic NGS simulators is not easy. Here we provide a guidance tree to help users choosing a suitable tool for their specific interests.


Computer simulation of genomic data has become increasingly popular for assessing and validating biological models or to gain understanding about specific datasets. Multiple computational tools for the simulation of next-generation sequencing (NGS) data have been developed in recent years, which could be used to compare existing and new NGS analytical pipelines. Here we review 23 of these tools, highlighting their distinct functionality, requirements and potential applications. We also provide a decision tree for the informed selection of an appropriate NGS simulation tool for the specific question at hand.

Image source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224698/

An overview of current NGS technologies

The most popular NGS technologies on the market are Illumina’s sequencing by synthesis, which is probably the most widely used platform at present, Roche’s 454 pyrosequencing (454), SOLiD sequencing-by-ligation (SOLiD), IonTorrent semiconductor sequencing (IonTorrent), Pacific Biosciences’s (PacBio) single molecule real-time sequencing, and Oxford Nanopore Technologies (Nanopore) single-cell DNA template strand sequencing. These strategies can differ, for example, regarding the type of reads they produce or the kind of sequencing errors they introduce (Table 1). Only two of the current technologies (Illumina and SOLiD) are capable of producing all three sequencing read types —single endpaired end and mate pair. Read length is also dependent on the machine and the kit used; in platforms like Illumina, SOLiD, or IonTorrent it is possible to specify the number of desired base pairs per read. According to the sequencing run type selected it is possible to obtain reads with maximum lengths of 75 bp (SOLiD), 300 bp (Illumina) or 400bp (IonTorrent). On the other hand, in platforms like 454, Nanopore or PacBio, information is only given about the mean and maximum read length that can be obtained, with average lengths of 700 bp, 10 kb and 15 kb and maximum lengths of 1 kb, 10 kb and 15 kb, respectively. Error rates vary depending on the platform from <=1% in Illumina to ~30% in Nanopore. Further overviews and comparisons of NGS strategies can be found in ,.

Table 1

Main characteristics of current NGS technologies.
Technology Run Type Maximum Read Length Quality Scores Error Rates References
Single-read Paired-end Mate-pair
Illumina X X X 300 bp > Q30 0.0034 – 1%
SOLiD X X X 75 bp > Q30 0.01 – 1%
IonTorrent X X 400 bp ~ Q20 1.78%
454 X X ~700 bp (up to 1 Kb) > Q20 1.07 – 1.7% ,
Nanopore X 5.4 – 10 Kb NAY 10 – 40%
PacBio X ~15 Kb (up to 40 Kb) < Q10 5 – 10% ,

Simulation parameters

The existing sequencing platforms use distinct protocols that result in datasets with different characteristics. Many of these attributes can be taken into account by the simulators (Fig. 2), although there is not a single tool that incorporates all possible variations. The main characteristics of the 23 simulators considered here are summarized in Tables 2 and and3.3. These tools differ in multiple aspects, such as sequencing technology, input requirements or output format, but maintain several common aspects. With some exceptions, all programs need a reference sequence, multiple parameter values indicating the characteristics of the sequencing experiment to be simulated (read length, error distribution, type of variation to be generated, if any, etc.) and/or a profile (a set of parameter values, conditions and/or data used for controlling the simulation), which can be provided by the simulator or estimated de novo from empirical data. The outcome will be aligned or unaligned reads in different standard file formats, such as FASTQ, FASTA or BAM. An overview of the NGS data simulation process is represented in Fig. 3. In the following sections we delve into the different steps involved.

An external file that holds a picture, illustration, etc. Object name is emss-70941-f002.jpg

General overview of the sequencing process and steps that can be parameterized in the simulations.

NGS simulators try to imitate the real sequencing process as closely as possible by considering all the steps that could influence the characteristics of the reads. a | NGS simulators do not take into account the effect of the different DNA extraction protocols in the resulting data. However, they can consider whether the sample we want to sequence includes one or more individuals, from the same or different organisms (e.g., pool-sequencing, metagenomics). Pools of related genomes can be simulated by replicating the reference sequence and introducing variants on the resulting genomes. Some tools can also simulate metagenomes with distinct taxa abundance. b | Simulators can try to mimic the length range of DNA fragmentation (empirically obtained by sonication or digestion protocols) or assume a fixed amplicon length. c | Library preparation involves ligating sequencing–platform dependent adaptors and/or barcodes to the selected DNA fragments (inserts). Some simulators can control the insert size, and produce reads with adaptors/barcodes. d | | Most NGS techniques include an amplification step for the preparation of libraries. Several simulators can take this step into account (for example, by introducing errors and/or chimaeras), with the possibility of specifying the number of reads per amplicons. e | Sequencing runs imply a decision about coverage, read length, read type (single-end, paired-end, mate-pair) and a given platform (with their specific errors and biases). Simulators exist for the different platforms, and they can use particular parameter profiles, often estimated from real data.

An external file that holds a picture, illustration, etc. Object name is emss-70941-f003.jpg

General overview of NGS simulation.

The simulation process begins with the input of a reference sequence (most cases) and simulation parameters. Some of the parameters can be given via a profile, that is estimated (by the simulator or other tools) from other reads or alignments. The outcome of this process may be reads (with or without quality information) or genome alignments in different formats.


NGS is having a big impact in a broad range of areas that benefit from genetic information, from medical genomics, phylogenetic and population genomics, to the reconstruction of ancient genomes, epigenomics and environmental barcoding. These applications include approaches such as de novo sequencing, resequencing, target sequencing or genome reduction methods. In all cases, caution is necessary in choosing a proper sequencing design and/or a reliable analytical approach for the specific biological question of interest. The simulation of NGS data can be extremely useful for planning experiments, testing hypotheses, benchmarking tools and evaluating particular results. Given a reference genome or dataset, for instance, one can play with an array of sequencing technologies to choose the best-suited technology and parameters for the particular goal, possibly optimizing time and costs. Yet, this is still not the standard practice and researchers often base their choices on practical considerations like technology and money availability. As shown throughout this Review, simulation of NGS data from known genomes or transcriptomes can be extremely useful when evaluating assembly, mapping, phasing or genotyping algorithms e.g. ,,,, exposing their advantages and drawbacks under different circumstances.

Altogether, current NGS simulators consider most, if not all, of the important features regarding the generation of NGS data. However, they are not problem-free. The different simulators are largely redundant, implementing the same or very similar procedures. In our opinion, many are poorly documented and can be difficult to use for non-experts, and some of them are no longer maintained. Most importantly, for the most part they have not been benchmarked or validated. Remarkably, among the 23 tools considered here, only 13 have been described in dedicated application notes, 3 have been mentioned as add-ons in the methods section of bigger articles, and 5 have never been referenced in a journal. Indeed, peer-reviewed publication of these tools in dedicated articles would be highly desirable. While this would not definitively guarantee quality, at least it would encourage authors to reach minimum standards in terms of validation, benchmarking, and documentation. Collaborative efforts like the Assemblathon e.g.  or iEvo (http://www.ievobio.org/) might be also a source of inspiration. Meanwhile, we hope that the decision tree presented in Fig. 1 helps users making appropriate choices.

Serghei Mangul, Lana S. Martin, Brian L. Hill, Angela Ka-Mei Lam, Margaret G. Distler, Alex Zelikovsky, Eleazar Eskin, Jonathan Flint
Nat Commun. 2019; 10: 1393. Published online 2019 Mar 27. doi: 10.1038/s41467-019-09406-4
Ge Tan, Lennart Opitz, Ralph Schlapbach, Hubert Rehrauer
Sci Rep. 2019; 9: 2856. Published online 2019 Feb 27. doi: 10.1038/s41598-019-39076-7
Apostolos Dimitromanolakis, Jingxiong Xu, Agnieszka Krol, Laurent Briollais
BMC Bioinformatics. 2019; 20: 26. Published online 2019 Jan 15. doi: 10.1186/s12859-019-2611-1
Kathleen E. Lotterhos, Jason H. Moore, Ann E. Stapleton
PLoS Biol. 2018 Dec; 16(12): e3000070. Published online 2018 Dec 10. doi: 10.1371/journal.pbio.3000070
Hayley Cassidy, Randy Poelman, Marjolein Knoester, Coretta C. Van Leer-Buter, Hubert G. M. Niesters
Front Microbiol. 2018; 9: 2677. Published online 2018 Nov 13. doi: 10.3389/fmicb.2018.02677
Genetic Simulation Resources and the GSR Certification Program
Bo Peng, Man Chong Leong, Huann-Sheng Chen, Melissa Rotunno, Katy R Brignole, John Clarke, Leah E Mechanic
Bioinformatics. 2019 Feb 15; 35(4): 709–710. Published online 2018 Aug 7. doi: 10.1093/bioinformatics/bty666
Hadrien Gourlé, Oskar Karlsson-Lindsjö, Juliette Hayer, Erik Bongcam-Rudloff
Bioinformatics. 2019 Feb 1; 35(3): 521–522. Published online 2018 Jul 19. doi: 10.1093/bioinformatics/bty630
Ze-Gang Wei, Shao-Wu Zhang
BMC Bioinformatics. 2018; 19: 177. Published online 2018 May 22. doi: 10.1186/s12859-018-2208-0
Yu Li, Renmin Han, Chongwei Bi, Mo Li, Sheng Wang, Xin Gao
Bioinformatics. 2018 Sep 1; 34(17): 2899–2908. Published online 2018 Apr 6. doi: 10.1093/bioinformatics/bty223
Roberto Semeraro, Valerio Orlandini, Alberto Magi
PLoS One. 2018; 13(4): e0194472. Published online 2018 Apr 5. doi: 10.1371/journal.pone.0194472
Soroush Samadian, Jeff P. Bruce, Trevor J. Pugh
PLoS Comput Biol. 2018 Mar; 14(3): e1006080. Published online 2018 Mar 28. doi: 10.1371/journal.pcbi.1006080
Brandon J. Varela, David Lesbarrères, Roberto Ibáñez, David M. Green
Front Microbiol. 2018; 9: 298. Published online 2018 Feb 22. doi: 10.3389/fmicb.2018.00298
Fedor M. Naumenko, Irina I. Abnizova, Nathan Beka, Mikhail A. Genaev, Yuriy L. Orlov
BMC Genomics. 2018; 19(Suppl 3): 92. Published online 2018 Feb 9. doi: 10.1186/s12864-018-4475-6
Weizhi Song, Kerrin Steensen, Torsten Thomas
PeerJ. 2017; 5: e4015. Published online 2017 Nov 8. doi: 10.7717/peerj.4015
Haibao Tang, Ewen F. Kirkness, Christoph Lippert, William H. Biggs, Martin Fabani, Ernesto Guzman, Smriti Ramakrishnan, Victor Lavrenko, Boyko Kakaradov, Claire Hou, Barry Hicks, David Heckerman, Franz J. Och, C. Thomas Caskey, J. Craig Venter, Amalio Telenti
Am J Hum Genet. 2017 Nov 2; 101(5): 700–715. Published online 2017 Nov 2. doi: 10.1016/j.ajhg.2017.09.013
Minh Duc Cao, Devika Ganesamoorthy, Chenxi Zhou, Lachlan J M Coin
Bioinformatics. 2018 Mar 1; 34(5): 873–874. Published online 2017 Oct 28. doi: 10.1093/bioinformatics/btx691
Yair Motro, Jacob Moran-Gilad
Biomol Detect Quantif. 2017 Dec; 14: 1–6. Published online 2017 Oct 23. doi: 10.1016/j.bdq.2017.10.002
Jacquiline W Mugo, Ephifania Geza, Joel Defo, Samar S M Elsheikh, Gaston K Mazandu, Nicola J Mulder, Emile R Chimusa
Bioinformatics. 2017 Oct 1; 33(19): 2995–3002. Published online 2017 Jun 24. doi: 10.1093/bioinformatics/btx369
Ryan R. Wick, Louise M. Judd, Claire L. Gorrie, Kathryn E. Holt
PLoS Comput Biol. 2017 Jun; 13(6): e1005595. Published online 2017 Jun 8. doi: 10.1371/journal.pcbi.1005595
Chen Yang, Justin Chu, René L Warren, Inanç Birol
Gigascience. 2017 Apr; 6(4): 1–6. Published online 2017 Feb 24. doi: 10.1093/gigascience/gix010

Read Full Post »

Accelerating Clinical Next-Generation Sequencing: Navigating the Path to Reimbursement

Reporter: Aviva Lev-Ari, PhD, RN

Session at PMWC 2018 Silicon Valley


Read Full Post »

QIAGEN – International Leader in NGS and RNA Sequencing

Reporter: Aviva Lev-Ari, PhD, RN


The reader is encouraged to review all the products of QIAGEN on the company web site.

miRCURY Exosome Kits

For enrichment of exosomes and other extracellular vesicles from serum/plasma or cell/urine/CSF samples
  • Excellent recovery of exosomes and other extracellular vesicles
  • Easy and straightforward protocol that takes less than 2 hours
  • No ultracentrifugation or phenol/chloroform steps required
  • Fully compatible with the miRCURY LNA miRNA PCR System
  • Suited for a variety of applications, such as miRNA or RNA profiling

miRCURY Exosome Kits enable high-quality and scalable exosome isolation with an easy protocol that does not require special laboratory equipment. The miRCURY Exosome Serum/Plasma Kit is optimized for serum and plasma samples, while the miRCURY Exosome Cell/Urine/CSF Kit is designed for processing cell-conditioned media, urine and CSF samples. Both kits provide high exosomal recovery and seamless integration with different downstream assays.



QIAGEN – Product Profile

Read Full Post »

Four patents and one patent application on Nanopore Sequencing and methods of trapping a molecule in a nanopore assigned to Genia, is been claimed in a Law Suit by The Regents of the University of California, should be assigned to UCSC

Reporter: Aviva Lev-Ari, PhD, RN


The university claims that while at UCSC Roger Chen’s research focused on nanopore sequencing, and that he along with others developed technology that became the basis of patent applications filed by the university. However, when Chen left the university in 2008 and cofounded Genia, he was awarded patents for technology developed while he was at UCSC, but those patents were assigned to Genia and not the university, according to the suit.

In the suit, the university notes four patents and one patent application assigned to Genia that it claims should be assigned to UCSC: US Patent Nos., 8,324,914; 8,461,854; 9,041,420; and 9,377,437; and US Patent Application 15/079,322. The patents and patent applications all relate to nanopore sequencing and specifically to methods of trapping a molecule in a nanopore and characterizing it based on the electrical stimulus required to move the molecule through the pore.

Genia was founded in 2009, and in 2014, Roche acquired the startup for $125 million in cash and up to $225 million in milestone payments. Earlier this year, the company published a proof-of-principle study of its technology in the Proceedings of the National Academy of Sciences.

Roche’s head of sequencing solutions, Neil Gunn, said that Roche would announce a commercialization timeline in 2017.

It’s unclear how the lawsuit will impact that commercialization, but Mick Watson, director of ARK-Genomics at the Roslin Institute in the UK, speculated in a blog post that if the suit is decided in favor of UCSC, it could result in a very large settlement and potentially even the end of Genia.






Read Full Post »

A New Computational Method illuminates the Heterogeneity and Evolutionary Histories of cells within a Tumor

Reporter: Aviva Lev-Ari, PhD, RN


Start Quote

Numerous computational approaches aimed at inferring tumor phylogenies from single or multi-region bulk sequencing data have recently been proposed. Most of these methods utilize the variant allele fraction or cancer cell fraction for somatic single-nucleotide variants restricted to diploid regions to infer a two-state perfect phylogeny, assuming an infinite-site model such that each site can mutate only once and persists. In practice, convergent evolution could result in the acquisition of the same mutation more than once, thereby violating this assumption. Similarly, mutations could be lost due to loss of heterozygosity. Indeed, both single-nucleotide variants and copy number alterations arise during tumor evolution, and both the variant allele fraction and cancer cell fraction depend on the copy number state whose inference reciprocally relies on the relative ordering of these alterations such that joint analysis can help resolve their ancestral relationship (Figure 1). To tackle this outstanding problem, El-Kebir et al. (2016) formulated the multi-state perfect phylogeny mixture deconvolution problem to infer clonal genotypes, clonal fractions, and phylogenies by simultaneously modeling single-nucleotide variants and copy number alterations from multi-region sequencing of individual tumors. Based on this framework, they present SPRUCE (Somatic Phylogeny Reconstruction Using Combinatorial Enumeration), an algorithm designed for this task. This new approach uses the concept of a ‘‘character’’ to represent the status of a variant in the genome.

Commonly, binary characters have been used to represent single-nucleotide variants— that is, the variant is present or absent. In contrast, El-Kebir et al. use multi-state characters to represent copy number alterations, which may be present in zero, one, two, or more copies in the genome.

SPRUCE outperforms existing methods on simulated data, yielding higher recall rates under a variety of scenarios. Moreover, it is more robust to noise in variant allele frequency estimates, which is a significant feature of tumor genome sequencing data. Importantly, El-Kebir and colleagues demonstrate that there is often an ensemble of phylogenetic trees consistent with the underlying data. This uncertainty calls for caution in deriving definitive conclusions about the evolutionary process from a single solution.”

End Quote


From Original Paper

Inferring Tumor Phylogenies from Multi-region Sequencing

Zheng Hu1,2 and Christina Curtis1,2,*

1Departments of Medicine and Genetics

2Stanford Cancer Institute

Stanford University School of Medicine, Stanford, CA 94305, USA

*Correspondence: cncurtis@stanford.edu


Read Full Post »

Crowdsourcing Genetic Data Yields Discovery of DNA loci associated with Major Depressive Disorder (MDD) in European Descendants


Reporter: Kelly Perlman, Life Sciences Student and Research Assistant, McGill University


UPDATED on 11/24/2019

Can AI help diagnose depression? It’s a long shot

At the moment, machine intelligence is just as subjective as human intelligence

Alejandra Canales


Researchers from Pfizer Global Research and Development, 23andMe, and the Massachusetts General Hospital have published a study in Nature Genetics, pinpointing 15 genetic loci associated with the risk of developing major depressive disorder (MDD) in individuals of European ancestry. Evidence from previous research suggests that MDD is heritable, but the details of the specific gene correlates are unclear. The identification of loci where single nucleotide polymorphisms (SNPs) related to MDD exist could provide better insight into the neurobiology of depression, and therefore better treatment options.

23andMe, a private biotechnology company situated in California, offers a DNA sequencing service in which consumers send in a saliva swab for testing, and later receive a report listing the findings of the analysis related to ancestry, physical and behavioral traits, along with risk of inheriting certain diseases. The participants of this study had agreed to provide the results of their genetic testing for scientific research.

The results of 75,607 participants with self-reported diagnoses of depression were compared to the results of 231,747 participants reporting having never experienced depression. This data was combined with the results of previously published MDD genome-wide association studies (GWAS). To test the whether these results could be replicated, another set of results from 23andMe was analyzed, in which there were 45,773 MDD subjects, and 106,354 controls.

After the joint analysis, 17 SNPs were identified at 15 different loci. Tissue and gene enrichment assays showed that the genes that were over-expressed in the CNS were related to functions including neurodevelopment, histone methylation, neurogenesis and synaptic modification.

The team then created a weighted genetic risk score (GRS) in which they compared the 17 SNPs with factors including medication use, comorbid diseases and behavioral phenotypes, all of which were correlated with the GRS. Of note, the GRS was very highly correlated with age of onset of MDD.

The crowdsourcing of genetic data proves to be an efficient and powerful tool for large-scale MDD studies. Pooling large subject databases together is essential in order to account for the heterogeneous nature of the disease. Despite not being able to precisely assess each subject’s disease phenotype, scientists can make more rapid headway by collaborating with biotechnology companies in the quest to better understand the biological mechanisms of depression. Ron Perlis, M.D., M.Sc., of the Massachusetts General Hospital and co-author of this paper explained that “finding genes associated with depression should help make clear that this is a brain disease, which we hope will decrease the stigma still associated with these kinds of illnesses”.


Details on specific significant genes:









Hyde, C. L., Nagle, M. W., Tian, C., Chen, X., Paciga, S. A., Wendland, J. R., . . . Winslow, A. R. (2016). Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nature Genetics Nat Genet. doi:10.1038/ng.3623

Major Depressive Disorder Loci Discovered in Large GWAS Enabled by 23andMe Participants’ Data. (2016, August 01). Retrieved August 09, 2016, from https://www.genomeweb.com/microarrays-multiplexing/major-depressive-disorder-loci-discovered-large-gwas-enabled-23andme


Read Full Post »

Using Online Mendelian Inheritance in Man (OMIM) database and the Human Genome Mutation Database (HGMD) Pro 2015.2 for Quantification of the growth in gene-disease and variant-disease associations

Reporter: Aviva Lev-Ari, PhD, RN


Reanalysis of Clinical Exome Data Over Time Could Yield New Diagnoses

NEW YORK (GenomeWeb) – Clinical exomes that are re-evaluated in a systematic way could yield new diagnoses and prove useful to clinicians, according to a study published yesterday in Genetics in Medicine.

A team of researchers from Stanford University set out to examine whether nondiagnostic clinical exomes could provide new information for patients if they were re-examined with current bioinformatics software and knowledge of disease-related variants as presented in the literature.

Clinical exome sequencing yields no diagnosis for about 75 percent of patients evaluated for possible Mendelian disorders, wrote senior author Gill Bejerano and his colleagues. But a reanalysis of exome and phenotypic data from 40 such individuals using current methods identified a definitive diagnosis for four of them — 10 percent — the team said.

In these cases, the causative variant was de novo and found in a relevant autosomal-dominant disease gene. At the time these exomes were first sequenced, the researchers wrote, the existing literature on these causative genes was either “weak, nonexistent, or not readily located.” When the exomes were re-examined by his team, Bejerano noted, the supporting literature was more robust.



At ACMG, Researchers Report Data Re-Analysis, Matchmaking Boosts Solved Exome Cases

In addition to re-analyzing exome data, the researchers have been working on establishing causality for novel candidate disease genes through patient matches. For this, the team has been using the GeneMatcher website, which allows them to find other clinicians and researchers around the world who have patients, or animal models, with mutations in the same genes as their own patients. Through an API developed by the Matchmaker Exchange project, GeneMatcher submitters can also query the PhenomeCentral and Decipher databases. As of March, more than 4,000 genes had been submitted to GeneMatcher from more than 1,300 submitters in 48 countries, and 1,900 matches had been made, Sobreira reported.

Her team has so far submitted data from 104 families, involving 280 genes, and has had 314 matches so far, involving 113 genes. Several cases have been successes, meaning the researchers could establish that a candidate gene is indeed disease causing, and several others are pending, both from Hopkins and from other groups. The total number of solved cases tracing their success to GeneMatcher is currently unknown, Sobreira said, but the organizers are planning to survey submitters about their success rate in the near future.




Related Articles


Read Full Post »

New NGS Guidances for Laboratory Developed Tests (LDT): FDA’s Liz Mansfield on Audio Podcast

Reporter: Aviva Lev-Ari, PhD, RN


FDA’s Liz Mansfield on New NGS Guidances

Liz Mansfield, Deputy Office Director for Personalized Medicine at the FDA


0:00 A boss who gets it

1:53 The unique challenge of regulating NGS

7:00 How does the new guidance relate to the recent LDT guidance?

12:02 “We’d like to finalize the LDT guidance.”

On July 6th, as part of the President’s Precision Medicine Initiative, the FDA issued two new draft guidances for the oversight of next gen sequencing (NGS) tests. The first guidance is for using NGS testing to diagnose germline diseases. In the second, the FDA lists guidelines for building and using genetic variant databases.

To help us understand just what the guidance is and what led to its release, we’re joined by Liz Mansfield, the Deputy Office Director for Personalized Medicine at the FDA.

It’s unusual for the FDA to issue guidance around a single technology, but Liz says that NGS is “transformative” and is eclipsing so many of the older technologies. The biggest challenge is that NGS is a technology used for discovery and has the power to test for so many things at once.

How does the new NGS guidance relate to the much talked about guidance on LDTs that came out a couple years ago? And does the new guidance represent a more incremental, step by step approach for the FDA in dealing with the explosion of today’s molecular testing field?

“No, it’s not an attempt to break down into smaller bites the issue on LDTs. It’s to address this particular technology, regardless of who the developer is,” says Liz.

The two guidances are for very specific purposes and Liz anticipates further NGS guidances to be issued in the future. For example, guidelines for dealing with somatic mutations rather than germline mutations.

Here is a link to the new FDA Guidelines:

Click to access ucm509837.pdf

Use of Public Human Genetic Variant 2 Databases to Support Clinical Validity 3 for Next Generation Sequencing 4 (NGS)-Based In Vitro Diagnostics


Click to access ucm509838.pdf

Use of Standards in FDA Regulatory 2 Oversight of Next Generation 3 Sequencing (NGS)-Based In Vitro 4 Diagnostics (IVDs) Used for 5 Diagnosing Germline Diseases


From: <theralpro.activehosted.com@emsd8.com> on behalf of Mendelspod <ayanna@mendelspod.com>

Reply-To: <reply-theralpro.activehosted.537.637.102329@emsd8.com>

Date: Tuesday, July 19, 2016 at 12:00 PM

To: Aviva Lev-Ari <AvivaLev-Ari@alum.berkeley.edu>

Subject: FDA’s Liz Mansfield on New NGS Guidances

Read Full Post »

Older Posts »