Advertisements
Feeds:
Posts
Comments

Archive for the ‘Next Generation Sequencing (NGS)’ Category


Complex rearrangements and oncogene amplification revealed by long-read DNA and RNA sequencing of a breast cancer cell line

Reporter: Stephen J. Williams, PhD

In a Genome Research report by Marie Nattestad et al. [1], the SK-BR-3 breast cancer cell line was sequenced using a long read single molecule sequencing protocol in order to develop one of the most detailed maps of structural variations in a cancer genome to date.  The authors detected over 20,000 variants with this new sequencing modality, whereas most of these variants would have been missed by short read sequencing.  In addition, a complex sequence of nested duplications and translocations occurred surrounding the ERBB2 (HER2) while full-length transcriptomic analysis revealed novel gene fusions within the nested genomic variants.  The authors suggest that combining this long-read genome and transcriptome sequencing results in a more comprehensive coverage of tumor gene variants and “sheds new light on the complex mechanisms involved in cancer genome evolution.”

Genomic instability is a hallmark of cancer [2], which lead to numerous genetic variations such as:

  • Copy number variations
  • Chromosomal alterations
  • Gene fusions
  • Deletions
  • Gene duplications
  • Insertions
  • Translocations

Efforts such as the Cancer Genome Atlas [3], and the International Genome Consortium (2010) use short-read sequencing technology to detect and analyze thousands of commonly occurring mutations however short-read technology has a high false positive and negative rate for detecting less common genetic structural variations {as high as 50% [4]}. In addition, short reads cannot detect variations in close proximity to each other or on the same molecule, therefore underestimating the variation number.

Methods:  The authors used a long-read sequencing technology from Pacific Biosciences (SMRT) to analyze the mutational and structural variation in the SK-BR-3 breast cancer cell line.  A split read and within-read mapping approach was used to detect variants of different types and sizes.  In general, long-reads have better alignment qualities than short reads, resulting in higher quality mapping. Transcriptomic analysis was performed using Iso-Seq.

Results: Using the SMRT long-read sequencing technology from Pacific Biosciences, the authors were able to obtain 71.9% sequencing coverage with average read length of 9.8 kb for the SK-BR-3 genome.

A few notes:

  1. Most amplified regions (33.6 copies) around the locus spanning the ERBB2 oncogene and around MYC locus (38 copies), EGFR locus (7 copies) and BCAS1 (16.8 copies)
  2. The locus 8q24.12 had the most amplifications (this locus contains the SNTB1 gene) at 69.2 copies
  3. Long-read sequencing showed more insertions than deletions and suggests an underestimate of the lengths of low complexity regions in the human reference genome
  4. Found 1,493 long read variants, 603 of which were between different chromosomes
  5. Using Iso-Seq in conjunction with the long-read platform, they detected 1,692,379 isoforms (93%) mapping to the reference genome and 53 putative gene fusions (39 of which they found genomic evidence)

A table modified from the paper on the gene fusions is given below:

Table 1. Gene fusions with RNA evidence from Iso-Seq and DNA evidence from SMRT DNA sequencing where the genomic path is found using SplitThreader from Sniffles variant calls. Note link in table is  GeneCard for each gene.

SplitThreader path

 

# Genes Distance
(bp)
Number
of variants
Chromosomes
in path
Previously observed in references
1 KLHDC2 SNTB1 9837 3 14|17|8 Asmann et al. (2011) as only a 2-hop fusion
2 CYTH1 EIF3H 8654 2 17|8 Edgren et al. (2011); Kim and Salzberg
(2011); RNA only, not observed as 2-hop
3 CPNE1 PREX1 1777 2 20 Found and validated as 2-hop by Chen et al. 2013
4 GSDMB TATDN1 0 1 17|8 Edgren et al. (2011); Kim and Salzberg
(2011); Chen et al. (2013); validated by
Edgren et al. (2011)
5 LINC00536 PVT1 0 1 8 No
6 MTBP SAMD12 0 1 8 Validated by Edgren et al. (2011)
7 LRRFIP2 SUMF1 0 1 3 Edgren et al. (2011); Kim and Salzberg
(2011); Chen et al. (2013); validated by
Edgren et al. (2011)
8 FBXL7 TRIO 0 1 5 No
9 ATAD5 TLK2 0 1 17 No
10 DHX35 ITCH 0 1 20 Validated by Edgren et al. (2011)
11 LMCD1-AS1 MECOM 0 1 3 No
12 PHF20 RP4-723E3.1 0 1 20 No
13 RAD51B SEMA6D 0 1 14|15 No
14 STAU1 TOX2 0 1 20 No
15 TBC1D31 ZNF704 0 1 8 Edgren et al. (2011); Kim and Salzberg
(2011); Chen et al. (2013); validated by
Edgren et al. (2011); Chen et al. (2013)

 

SplitThreader found two different paths for the RAD51B-SEMA6D gene fusion and for the LINC00536-PVT1 gene fusion. Number of Iso-Seq reads refers to full-length HQ-filtered reads. Alignments of SMRT DNA sequence reads supporting each of these gene fusions are shown in Supplemental Note S2.

 

 References

 

  1. Nattestad M, Goodwin S, Ng K, Baslan T, Sedlazeck FJ, Rescheneder P, Garvin T, Fang H, Gurtowski J, Hutton E et al: Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line. Genome research 2018, 28(8):1126-1135.
  2. Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 2000, 100(1):57-70.
  3. Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, Xie M, Zhang Q, McMichael JF, Wyczalkowski MA et al: Mutational landscape and significance across 12 major cancer types. Nature 2013, 502(7471):333-339.
  4. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Fritz MH et al: An integrated map of structural variation in 2,504 human genomes. Nature 2015, 526(7571):75-81.

 

Other articles on Cancer Genome Sequencing in this Open Access Journal Include:

 

International Cancer Genome Consortium Website has 71 Committed Cancer Genome Projects Ongoing

Loss of Gene Islands May Promote a Cancer Genome’s Evolution: A new Hypothesis on Oncogenesis

Identifying Aggressive Breast Cancers by Interpreting the Mathematical Patterns in the Cancer Genome

CancerBase.org – The Global HUB for Diagnoses, Genomes, Pathology Images: A Real-time Diagnosis and Therapy Mapping Service for Cancer Patients – Anonymized Medical Records accessible to

 

Advertisements

Read Full Post »


Narrative Building for the Future of LPBI Group: List of Talking Points

 

Exchange between Gail and Aviva

 

On Tuesday, June 25, 2019, 11:43:27 AM EDT, Aviva Lev-Ari <AvivaLev-Ari@alum.berkeley.edu> wrote:

https://www.terarecon.com/blog/beyond-the-screen-episode-6-next-generation-ai-companies-providing-physicians-a-starting-point-in-ai?utm_campaign=AuntMinnie%20June%202019

HOW can we get  Kevin Landwher of terarecon.com to create a Podcast for LPBI Group IP Assets, including a section on our forthcoming Genomics, Volume 2 

https://pharmaceuticalintelligence.com/biomed-e-books/genomics-orientations-for-personalized-medicine/volume-two-genomics-methodologies-ngs-bioinformatics-simulations-and-the-genome-ontology/

In response to this question we are in discussion on POINTS #1,2,3,4

 

From: Gail Thornton <gailsthornton@yahoo.com>

Reply-To: Gail Thornton <gailsthornton@yahoo.com>

Date: Sunday, June 30, 2019 at 8:38 AM

To: Aviva Lev-Ari <aviva.lev-ari@comcast.net>

Cc: Aviva Lev-Ari <AvivaLev-Ari@alum.berkeley.edu>, Rick Mandahl <rmandahl@gmail.com>, Amnon Danzig <amnon.danzig@gmail.com>

Subject: Please AUDIT PODCAST —>>>>>>>> Beyond the Screen Episode 6: Next Generation AI Companies Providing Physicians a Starting Point in AI

Aviva:

These videos from terarecon.com typically focus on one topic (not many as you’ve described below). 

If there are too many topics proposed to this company, they will not be interested.

My recommendation is for you to finalize Genomics, volume 2, and let’s see the story we have about that specific topic.

Gali 

 

On Tuesday, June 25, 2019, 11:43:27 AM EDT, Aviva Lev-Ari <AvivaLev-Ari@alum.berkeley.edu> wrote:

https://www.terarecon.com/blog/beyond-the-screen-episode-6-next-generation-ai-companies-providing-physicians-a-starting-point-in-ai?utm_campaign=AuntMinnie%20June%202019

HOW can we get  Kevin Landwher of terarecon.com to create a Podcast for LPBI Group IP Assets, including a section on our forthcoming Genomics, Volume 2 

https://pharmaceuticalintelligence.com/biomed-e-books/genomics-orientations-for-personalized-medicine/volume-two-genomics-methodologies-ngs-bioinformatics-simulations-and-the-genome-ontology/

 

On Saturday, June 29, 2019, 03:56:08 PM EDT, Aviva Lev-Ari <aviva.lev-ari@comcast.net> wrote:

 

POINT #1 for VIDEO coverage – Focus on Genomics, Volume 2

After 7/15, Prof. Feldman will be back in the US, stating to work on Part 5 in Genomics, Volume 2. We will Skype to discuss what to include in 5.1, 5.2, 5.3, 5.4

On 7/15, I am submitting my work on creation of Parts 1,2,3,4,6

Dr. Williams and Dr. Saha are working already on Part 7&8.

Below you have abbreviated eTOCs.

Go to URL of the Book to see what I placed already inside this book.

Dr. Williams and Prof. Feldman will compose 

Preface

Introduction to Volume 2

Volume Summary

Epilogue

Based on these four parts and the eTOCs you will have ample content for the video, which may start with the epitome of our book creation: Genomics Volume 2 (you interview the three Editors why it is Epitome)

POINT #2 or #3 or #4  for VIDEOs to Focus on coverage for Marketing LPBI Group

by DESCRIPTION of what was accomplished

 

  • Venture history/background
  • Venture milestones: all posts in the Journal with the Title
  • “We celebrate …..
  • 5-6 Titles like that, I may add two more
  • Site Statistics
  • Book articles cumulative views (Article Scoring System: Data Extract)
  • section on BioMed e-Series
  • section on List of Conference covered in Real Time
  • FIT Team input to Venture Valuation: top 5 or top 10 Factors in consensus 
  • the 3D graphs on Opportunity Maps: Gail, Rick, Amnon, Aviva – each explains their own outcome
  • section on Pipeline

Video on What is the Ideal Solution for the FUTURE of LPBI Group

  • Interviews with All FIT Members

For POINT #1:

To build the narrative for a VIDEO dedication to Genomics, Volume Two and Marketing campaign as a NEW BOOK on NGS, the Narrative will use content extracts to built a CASE for

Why GENOMICS Volume 2 – is the Epitome of all BioMed e-Series???????

 

forthcoming Genomics, Volume 2 

https://pharmaceuticalintelligence.com/biomed-e-books/genomics-orientations-for-personalized-medicine/volume-two-genomics-methodologies-ngs-bioinformatics-simulations-and-the-genome-ontology/

 

Aviva completed Parts 1,2,3,4,6, 

[5 is by Prof. Feldman] 

[7,8 are by Scientists on FIT]:

Latest in Genomics Methodologies for Therapeutics:

Gene Editing, NGS & BioInformatics,

Simulations and the Genome Ontology

 

2019

Volume Two

Prof. Marcus W. Feldman, PhD, Editor

Prof. Stephen J. Williams, PhD, Editor

And

Aviva Lev-Ari, PhD, RN, Editor 

https://pharmaceuticalintelligence.com/biomed-e-books/genomics-orientations-for-personalized-medicine/volume-two-genomics-methodologies-ngs-bioinformatics-simulations-and-the-genome-ontology/

Abbreviated eTOCs

Part 1: NGS

1.1 The Science

1.2 Technologies and Methodologies

1.3 Clinical Aspects

1.4 Business and Legal

 

Part 2: CRISPR for Gene Editing and DNA Repair

2.1 The Science

2.2 Technologies and Methodologies

2.3 Clinical Aspects

2.4 Business and Legal

 

Part 3: AI in Medicine

3.1 The Science

3.2 Technologies and Methodologies

3.3 Clinical Aspects

3.4 Business and Legal

3.5 Latest in Machine Learning (ML) Algorithms harnessed for Medical Diagnosis: Pattern Recognition & Prediction of Disease Onset

 

Part 4: Single Cell Genomics

4.1 The Science

4.2 Technologies and Methodologies

4.3 Clinical Aspects

4.4 Business and Legal

 

Part 5: Evolution Biology Genomics Modeling @Feldman Lab, Stanford University – Written and Curated by Prof. Marc Feldman

5.1

5.2

5.3

5.4

 

Part 6: Simulation Modeling in Genomics

6.1   Mutation Analysis – Gene Encoding

6.2   Mitochondrial Variations

6.3   Variant Analysis

6.4   Variant Detection in Hereditary Cancer Genes

6.5   Immuno-Informatics

6.6   RNA Sequencing

6.7   Complex Insertions and Deletions

6.8   Evolutionary Biology

6.9   Simulation Programs

6.10  A comparison of tools for the simulation of genomic next-generation sequencing data

 

Part 7: Applications of Genomics: Genotypes, Phenotypes and Complex Diseases

7.1 Genome-wide associations with complex diseases (GWAS)

7.2 Non-coding DNA and phenotypes—including diseases like cancer

7.3 Epigenomic associations with phenotypes including cancer

7.4 Rare variants and diseases

7.5 Population-level genomics and the meaning of group differences

7.6 Targeting drugs for complex diseases

 

Part 8: Epigenomics and Genomic Regulation

8.1  Genomic controls on epigenomics

8.2  The ENCODE project and gene regulation

8.3  Small interfering RNAs and gene expression

8.4  Epigenomics in cancer

8.5  Environmental epigenomics

Read Full Post »


Simulation Tools of Genomic Next Generation Sequencing Data: Comparative Analysis & Genetic Simulation Resources

Reporting: Aviva Lev-Ari, PhD, RN

 

INTRODUCTION

What is next generation sequencing?

Behjati S, Tarpey PS.

Arch Dis Child Educ Pract Ed. 2013 Dec;98(6):236-8. doi: 10.1136/archdischild-2013-304340. Epub 2013 Aug 28. Review.

Computational pan-genomics: status, promises and challenges.

Computational Pan-Genomics Consortium.

Brief Bioinform. 2018 Jan 1;19(1):118-135. doi: 10.1093/bib/bbw089. Review.

Tracking the NGS revolution: managing life science research on shared high-performance computing clusters.

Dahlö M, Scofield DG, Schaal W, Spjuth O.

Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy028.

NGS IN THE CLINIC

[Clinical Applications of Next-Generation Sequencing].

Rebollar-Vega RG, Arriaga-Canon C, de la Rosa-Velázquez IA.

Rev Invest Clin. 2018;70(4):153-157. doi: 10.24875/RIC.18002544.

PMID:
30067721

Free Article

 

Clinical Genomics: Challenges and Opportunities.

Vijay P, McIntyre AB, Mason CE, Greenfield JP, Li S.

Crit Rev Eukaryot Gene Expr. 2016;26(2):97-113. doi: 10.1615/CritRevEukaryotGeneExpr.2016015724. Review.

Next-generation sequencing in the clinic: promises and challenges.

Xuan J, Yu Y, Qing T, Guo L, Shi L.

Cancer Lett. 2013 Nov 1;340(2):284-95. doi: 10.1016/j.canlet.2012.11.025. Epub 2012 Nov 19. Review.

The Future of Whole-Genome Sequencing for Public Health and the Clinic.

Allard MW.

J Clin Microbiol. 2016 Aug;54(8):1946-8. doi: 10.1128/JCM.01082-16. Epub 2016 Jun 15.

PMID:
27307454

Free PMC Article

 

Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists.

Roy S, Coldren C, Karunamurthy A, Kip NS, Klee EW, Lincoln SE, Leon A, Pullambhatla M, Temple-Smolkin RL, Voelkerding KV, Wang C, Carter AB.

J Mol Diagn. 2018 Jan;20(1):4-27. doi: 10.1016/j.jmoldx.2017.11.003. Epub 2017 Nov 21. Review.

PMID:
29154853

MUTATION ANALYSIS – GENE ENCODING

Next-Generation Sequencing and Mutational Analysis: Implications for Genes Encoding LINC Complex Proteins.

Nagy PL, Worman HJ.

Methods Mol Biol. 2018;1840:321-336. doi: 10.1007/978-1-4939-8691-0_22.

PMID:
30141054

Genome-wide genetic marker discovery and genotyping using next-generation sequencing.

Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML.

Nat Rev Genet. 2011 Jun 17;12(7):499-510. doi: 10.1038/nrg3012. Review.

PMID:
21681211

 

Best practices for evaluating mutation prediction methods.

Rogan PK, Zou GY.

Hum Mutat. 2013 Nov;34(11):1581-2. doi: 10.1002/humu.22401. Epub 2013 Sep 10. No abstract available.

PMID:
23955774

MITOCHONDRIAL VATIATIONS

mit-o-matic: a comprehensive computational pipeline for clinical evaluation of mitochondrial variations from next-generation sequencing datasets.

Vellarikkal SK, Dhiman H, Joshi K, Hasija Y, Sivasubbu S, Scaria V.

Hum Mutat. 2015 Apr;36(4):419-24. doi: 10.1002/humu.22767.

PMID:
25677119

VARIANT ANALYSIS

A survey of tools for variant analysis of next-generation genome sequencing data.

Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z.

Brief Bioinform. 2014 Mar;15(2):256-78. doi: 10.1093/bib/bbs086. Epub 2013 Jan 21.

PMID:
23341494

Free PMC Article

 

Variant callers for next-generation sequencing data: a comparison study.

Liu X, Han S, Wang Z, Gelernter J, Yang BZ.

PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.

VARIANT DETECTION IN HEREDITARY CANCER GENES

ICO amplicon NGS data analysis: a Web tool for variant detection in common high-risk hereditary cancer genes analyzed by amplicon GS Junior next-generation sequencing.

Lopez-Doriga A, Feliubadaló L, Menéndez M, Lopez-Doriga S, Morón-Duran FD, del Valle J, Tornero E, Montes E, Cuesta R, Campos O, Gómez C, Pineda M, González S, Moreno V, Capellá G, Lázaro C.

Hum Mutat. 2014 Mar;35(3):271-7.

PMID:
24227591

 

Development and analytical validation of a 25-gene next generation sequencing panel that includes the BRCA1 and BRCA2 genes to assess hereditary cancer risk.

Judkins T, Leclair B, Bowles K, Gutin N, Trost J, McCulloch J, Bhatnagar S, Murray A, Craft J, Wardell B, Bastian M, Mitchell J, Chen J, Tran T, Williams D, Potter J, Jammulapati S, Perry M, Morris B, Roa B, Timms K.

BMC Cancer. 2015 Apr 2;15:215. doi: 10.1186/s12885-015-1224-y.

Clinical Applications of Next-Generation Sequencing in Cancer Diagnosis.

Sabour L, Sabour M, Ghorbian S.

Pathol Oncol Res. 2017 Apr;23(2):225-234. doi: 10.1007/s12253-016-0124-z. Epub 2016 Oct 8. Review.

PMID:
27722982

 

Studying cancer genomics through next-generation DNA sequencing and bioinformatics.

Doyle MA, Li J, Doig K, Fellowes A, Wong SQ.

Methods Mol Biol. 2014;1168:83-98. doi: 10.1007/978-1-4939-0847-9_6. Review.

PMID:
24870132

IMMUNOINFORMATICS

Immunoinformatics and epitope prediction in the age of genomic medicine.

Backert L, Kohlbacher O.

Genome Med. 2015 Nov 20;7:119. doi: 10.1186/s13073-015-0245-0. Review.

IgSimulator: a versatile immunosequencing simulator.

Safonova Y, Lapidus A, Lill J.

Bioinformatics. 2015 Oct 1;31(19):3213-5. doi: 10.1093/bioinformatics/btv326. Epub 2015 May 25.

PMID:
26007226

 

Computational genomics tools for dissecting tumour-immune cell interactions.

Hackl H, Charoentong P, Finotello F, Trajanoski Z.

Nat Rev Genet. 2016 Jul 4;17(8):441-58. doi: 10.1038/nrg.2016.67. Review.

PMID:
27376489

RNA SEQUENCING

SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines.

Audoux J, Salson M, Grosset CF, Beaumeunier S, Holder JM, Commes T, Philippe N.

BMC Bioinformatics. 2017 Sep 29;18(1):428. doi: 10.1186/s12859-017-1831-5.

PMID:
28969586

Free PMC Article

COMPLEX INSERTIONS AND DELETIONS

INDELseek: detection of complex insertions and deletions from next-generation sequencing data.

Au CH, Leung AY, Kwong A, Chan TL, Ma ES.

BMC Genomics. 2017 Jan 5;18(1):16. doi: 10.1186/s12864-016-3449-9.

PMID:
28056804

Free PMC Article

EVOLUTIONARY BIOLOGY

The State of Software for Evolutionary Biology.

Darriba D, Flouri T, Stamatakis A.

Mol Biol Evol. 2018 May 1;35(5):1037-1046. doi: 10.1093/molbev/msy014. Review.

SIMULATION PROGRAMS

PMCID: PMC5224698
EMSID: EMS70941
PMID: 27320129

Systematic review of next-generation sequencing simulators: computational tools, features and perspectives.

Zhao M, Liu D, Qu H.

Brief Funct Genomics. 2017 May 1;16(3):121-128. doi: 10.1093/bfgp/elw012. Review.

PMID:
27069250

 

A comparison of tools for the simulation of genomic next-generation sequencing data

Online Summary

  1. There is a large number of tools for the simulation of genomic data for all currently available NGS platforms, with partially overlapped functionality. Here we review 23 of these tools, highlighting their distinct functionalities, requirements and potential applications.

  2. The parameterization of these simulators is often complex. The user may decide between using existing sets of parameters values called profiles or re-estimating them from its own data.

  3. Parameters than can be modulated in these simulations include the effects of the PCR amplification of the libraries, read features and quality scores, base call errors, variation of sequencing depth across the genomes and the introduction of genomic variants.

  4. Several types of genomic variants can be introduced in the simulated reads, such as SNPs, indels, inversions, translocations, copy-number variants and short-tandem repeats.

  5. Reads can be generated from single or multiple genomes, and with distinct ploidy levels. NGS data from metagenomic communities can be simulated given an “abundance profile” that reflects the proportion of taxa in a given sample.

  6. Many of the simulators have not been formally described and/or tested in dedicated publications. We encourage the formal publication of these tools and the realization of comprehensive, comparative benchmarkings.

  7. Choosing among the different genomic NGS simulators is not easy. Here we provide a guidance tree to help users choosing a suitable tool for their specific interests.

Abstract

Computer simulation of genomic data has become increasingly popular for assessing and validating biological models or to gain understanding about specific datasets. Multiple computational tools for the simulation of next-generation sequencing (NGS) data have been developed in recent years, which could be used to compare existing and new NGS analytical pipelines. Here we review 23 of these tools, highlighting their distinct functionality, requirements and potential applications. We also provide a decision tree for the informed selection of an appropriate NGS simulation tool for the specific question at hand.

Image source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224698/

An overview of current NGS technologies

The most popular NGS technologies on the market are Illumina’s sequencing by synthesis, which is probably the most widely used platform at present, Roche’s 454 pyrosequencing (454), SOLiD sequencing-by-ligation (SOLiD), IonTorrent semiconductor sequencing (IonTorrent), Pacific Biosciences’s (PacBio) single molecule real-time sequencing, and Oxford Nanopore Technologies (Nanopore) single-cell DNA template strand sequencing. These strategies can differ, for example, regarding the type of reads they produce or the kind of sequencing errors they introduce (Table 1). Only two of the current technologies (Illumina and SOLiD) are capable of producing all three sequencing read types —single endpaired end and mate pair. Read length is also dependent on the machine and the kit used; in platforms like Illumina, SOLiD, or IonTorrent it is possible to specify the number of desired base pairs per read. According to the sequencing run type selected it is possible to obtain reads with maximum lengths of 75 bp (SOLiD), 300 bp (Illumina) or 400bp (IonTorrent). On the other hand, in platforms like 454, Nanopore or PacBio, information is only given about the mean and maximum read length that can be obtained, with average lengths of 700 bp, 10 kb and 15 kb and maximum lengths of 1 kb, 10 kb and 15 kb, respectively. Error rates vary depending on the platform from <=1% in Illumina to ~30% in Nanopore. Further overviews and comparisons of NGS strategies can be found in ,.

Table 1

Main characteristics of current NGS technologies.
Technology Run Type Maximum Read Length Quality Scores Error Rates References
Single-read Paired-end Mate-pair
Illumina X X X 300 bp > Q30 0.0034 – 1%
SOLiD X X X 75 bp > Q30 0.01 – 1%
IonTorrent X X 400 bp ~ Q20 1.78%
454 X X ~700 bp (up to 1 Kb) > Q20 1.07 – 1.7% ,
Nanopore X 5.4 – 10 Kb NAY 10 – 40%
PacBio X ~15 Kb (up to 40 Kb) < Q10 5 – 10% ,

Simulation parameters

The existing sequencing platforms use distinct protocols that result in datasets with different characteristics. Many of these attributes can be taken into account by the simulators (Fig. 2), although there is not a single tool that incorporates all possible variations. The main characteristics of the 23 simulators considered here are summarized in Tables 2 and and3.3. These tools differ in multiple aspects, such as sequencing technology, input requirements or output format, but maintain several common aspects. With some exceptions, all programs need a reference sequence, multiple parameter values indicating the characteristics of the sequencing experiment to be simulated (read length, error distribution, type of variation to be generated, if any, etc.) and/or a profile (a set of parameter values, conditions and/or data used for controlling the simulation), which can be provided by the simulator or estimated de novo from empirical data. The outcome will be aligned or unaligned reads in different standard file formats, such as FASTQ, FASTA or BAM. An overview of the NGS data simulation process is represented in Fig. 3. In the following sections we delve into the different steps involved.

An external file that holds a picture, illustration, etc. Object name is emss-70941-f002.jpg

General overview of the sequencing process and steps that can be parameterized in the simulations.

NGS simulators try to imitate the real sequencing process as closely as possible by considering all the steps that could influence the characteristics of the reads. a | NGS simulators do not take into account the effect of the different DNA extraction protocols in the resulting data. However, they can consider whether the sample we want to sequence includes one or more individuals, from the same or different organisms (e.g., pool-sequencing, metagenomics). Pools of related genomes can be simulated by replicating the reference sequence and introducing variants on the resulting genomes. Some tools can also simulate metagenomes with distinct taxa abundance. b | Simulators can try to mimic the length range of DNA fragmentation (empirically obtained by sonication or digestion protocols) or assume a fixed amplicon length. c | Library preparation involves ligating sequencing–platform dependent adaptors and/or barcodes to the selected DNA fragments (inserts). Some simulators can control the insert size, and produce reads with adaptors/barcodes. d | | Most NGS techniques include an amplification step for the preparation of libraries. Several simulators can take this step into account (for example, by introducing errors and/or chimaeras), with the possibility of specifying the number of reads per amplicons. e | Sequencing runs imply a decision about coverage, read length, read type (single-end, paired-end, mate-pair) and a given platform (with their specific errors and biases). Simulators exist for the different platforms, and they can use particular parameter profiles, often estimated from real data.

An external file that holds a picture, illustration, etc. Object name is emss-70941-f003.jpg

General overview of NGS simulation.

The simulation process begins with the input of a reference sequence (most cases) and simulation parameters. Some of the parameters can be given via a profile, that is estimated (by the simulator or other tools) from other reads or alignments. The outcome of this process may be reads (with or without quality information) or genome alignments in different formats.

CONCLUSIONS

NGS is having a big impact in a broad range of areas that benefit from genetic information, from medical genomics, phylogenetic and population genomics, to the reconstruction of ancient genomes, epigenomics and environmental barcoding. These applications include approaches such as de novo sequencing, resequencing, target sequencing or genome reduction methods. In all cases, caution is necessary in choosing a proper sequencing design and/or a reliable analytical approach for the specific biological question of interest. The simulation of NGS data can be extremely useful for planning experiments, testing hypotheses, benchmarking tools and evaluating particular results. Given a reference genome or dataset, for instance, one can play with an array of sequencing technologies to choose the best-suited technology and parameters for the particular goal, possibly optimizing time and costs. Yet, this is still not the standard practice and researchers often base their choices on practical considerations like technology and money availability. As shown throughout this Review, simulation of NGS data from known genomes or transcriptomes can be extremely useful when evaluating assembly, mapping, phasing or genotyping algorithms e.g. ,,,, exposing their advantages and drawbacks under different circumstances.

Altogether, current NGS simulators consider most, if not all, of the important features regarding the generation of NGS data. However, they are not problem-free. The different simulators are largely redundant, implementing the same or very similar procedures. In our opinion, many are poorly documented and can be difficult to use for non-experts, and some of them are no longer maintained. Most importantly, for the most part they have not been benchmarked or validated. Remarkably, among the 23 tools considered here, only 13 have been described in dedicated application notes, 3 have been mentioned as add-ons in the methods section of bigger articles, and 5 have never been referenced in a journal. Indeed, peer-reviewed publication of these tools in dedicated articles would be highly desirable. While this would not definitively guarantee quality, at least it would encourage authors to reach minimum standards in terms of validation, benchmarking, and documentation. Collaborative efforts like the Assemblathon e.g.  or iEvo (http://www.ievobio.org/) might be also a source of inspiration. Meanwhile, we hope that the decision tree presented in Fig. 1 helps users making appropriate choices.

SOURCE
REFERENCES
Serghei Mangul, Lana S. Martin, Brian L. Hill, Angela Ka-Mei Lam, Margaret G. Distler, Alex Zelikovsky, Eleazar Eskin, Jonathan Flint
Nat Commun. 2019; 10: 1393. Published online 2019 Mar 27. doi: 10.1038/s41467-019-09406-4
PMCID:
PMC6437167
Ge Tan, Lennart Opitz, Ralph Schlapbach, Hubert Rehrauer
Sci Rep. 2019; 9: 2856. Published online 2019 Feb 27. doi: 10.1038/s41598-019-39076-7
PMCID:
PMC6393434
Apostolos Dimitromanolakis, Jingxiong Xu, Agnieszka Krol, Laurent Briollais
BMC Bioinformatics. 2019; 20: 26. Published online 2019 Jan 15. doi: 10.1186/s12859-019-2611-1
PMCID:
PMC6332552
Kathleen E. Lotterhos, Jason H. Moore, Ann E. Stapleton
PLoS Biol. 2018 Dec; 16(12): e3000070. Published online 2018 Dec 10. doi: 10.1371/journal.pbio.3000070
PMCID:
PMC6301703
Hayley Cassidy, Randy Poelman, Marjolein Knoester, Coretta C. Van Leer-Buter, Hubert G. M. Niesters
Front Microbiol. 2018; 9: 2677. Published online 2018 Nov 13. doi: 10.3389/fmicb.2018.02677
PMCID:
PMC6243117
Genetic Simulation Resources and the GSR Certification Program
Bo Peng, Man Chong Leong, Huann-Sheng Chen, Melissa Rotunno, Katy R Brignole, John Clarke, Leah E Mechanic
Bioinformatics. 2019 Feb 15; 35(4): 709–710. Published online 2018 Aug 7. doi: 10.1093/bioinformatics/bty666
PMCID:
PMC6378936
Hadrien Gourlé, Oskar Karlsson-Lindsjö, Juliette Hayer, Erik Bongcam-Rudloff
Bioinformatics. 2019 Feb 1; 35(3): 521–522. Published online 2018 Jul 19. doi: 10.1093/bioinformatics/bty630
PMCID:
PMC6361232
Ze-Gang Wei, Shao-Wu Zhang
BMC Bioinformatics. 2018; 19: 177. Published online 2018 May 22. doi: 10.1186/s12859-018-2208-0
PMCID:
PMC5964698
Yu Li, Renmin Han, Chongwei Bi, Mo Li, Sheng Wang, Xin Gao
Bioinformatics. 2018 Sep 1; 34(17): 2899–2908. Published online 2018 Apr 6. doi: 10.1093/bioinformatics/bty223
PMCID:
PMC6129308
Roberto Semeraro, Valerio Orlandini, Alberto Magi
PLoS One. 2018; 13(4): e0194472. Published online 2018 Apr 5. doi: 10.1371/journal.pone.0194472
PMCID:
PMC5886411
Soroush Samadian, Jeff P. Bruce, Trevor J. Pugh
PLoS Comput Biol. 2018 Mar; 14(3): e1006080. Published online 2018 Mar 28. doi: 10.1371/journal.pcbi.1006080
PMCID:
PMC5891060
Brandon J. Varela, David Lesbarrères, Roberto Ibáñez, David M. Green
Front Microbiol. 2018; 9: 298. Published online 2018 Feb 22. doi: 10.3389/fmicb.2018.00298
PMCID:
PMC5826957
Fedor M. Naumenko, Irina I. Abnizova, Nathan Beka, Mikhail A. Genaev, Yuriy L. Orlov
BMC Genomics. 2018; 19(Suppl 3): 92. Published online 2018 Feb 9. doi: 10.1186/s12864-018-4475-6
PMCID:
PMC5836841
Weizhi Song, Kerrin Steensen, Torsten Thomas
PeerJ. 2017; 5: e4015. Published online 2017 Nov 8. doi: 10.7717/peerj.4015
PMCID:
PMC5681852
Haibao Tang, Ewen F. Kirkness, Christoph Lippert, William H. Biggs, Martin Fabani, Ernesto Guzman, Smriti Ramakrishnan, Victor Lavrenko, Boyko Kakaradov, Claire Hou, Barry Hicks, David Heckerman, Franz J. Och, C. Thomas Caskey, J. Craig Venter, Amalio Telenti
Am J Hum Genet. 2017 Nov 2; 101(5): 700–715. Published online 2017 Nov 2. doi: 10.1016/j.ajhg.2017.09.013
PMCID:
PMC5673627
Minh Duc Cao, Devika Ganesamoorthy, Chenxi Zhou, Lachlan J M Coin
Bioinformatics. 2018 Mar 1; 34(5): 873–874. Published online 2017 Oct 28. doi: 10.1093/bioinformatics/btx691
PMCID:
PMC6192212
Yair Motro, Jacob Moran-Gilad
Biomol Detect Quantif. 2017 Dec; 14: 1–6. Published online 2017 Oct 23. doi: 10.1016/j.bdq.2017.10.002
PMCID:
PMC5727008
Jacquiline W Mugo, Ephifania Geza, Joel Defo, Samar S M Elsheikh, Gaston K Mazandu, Nicola J Mulder, Emile R Chimusa
Bioinformatics. 2017 Oct 1; 33(19): 2995–3002. Published online 2017 Jun 24. doi: 10.1093/bioinformatics/btx369
PMCID:
PMC5870573
Ryan R. Wick, Louise M. Judd, Claire L. Gorrie, Kathryn E. Holt
PLoS Comput Biol. 2017 Jun; 13(6): e1005595. Published online 2017 Jun 8. doi: 10.1371/journal.pcbi.1005595
PMCID:
PMC5481147
Chen Yang, Justin Chu, René L Warren, Inanç Birol
Gigascience. 2017 Apr; 6(4): 1–6. Published online 2017 Feb 24. doi: 10.1093/gigascience/gix010
PMCID:
PMC5530317

Read Full Post »


Accelerating Clinical Next-Generation Sequencing: Navigating the Path to Reimbursement

Reporter: Aviva Lev-Ari, PhD, RN

Session at PMWC 2018 Silicon Valley

http://www.pmwcintl.com/sessionthemes-accelerating-clinical-next-generation-sequencing-2018sv/

Read Full Post »


QIAGEN – International Leader in NGS and RNA Sequencing

Reporter: Aviva Lev-Ari, PhD, RN

 

The reader is encouraged to review all the products of QIAGEN on the company web site.

miRCURY Exosome Kits

For enrichment of exosomes and other extracellular vesicles from serum/plasma or cell/urine/CSF samples
  • Excellent recovery of exosomes and other extracellular vesicles
  • Easy and straightforward protocol that takes less than 2 hours
  • No ultracentrifugation or phenol/chloroform steps required
  • Fully compatible with the miRCURY LNA miRNA PCR System
  • Suited for a variety of applications, such as miRNA or RNA profiling

miRCURY Exosome Kits enable high-quality and scalable exosome isolation with an easy protocol that does not require special laboratory equipment. The miRCURY Exosome Serum/Plasma Kit is optimized for serum and plasma samples, while the miRCURY Exosome Cell/Urine/CSF Kit is designed for processing cell-conditioned media, urine and CSF samples. Both kits provide high exosomal recovery and seamless integration with different downstream assays.

SOURCE

https://www.qiagen.com/us/shop/sample-technologies/tumor-cells-and-exosomes/mircury-exosome-kits/#orderinginformation

QIAGEN – Product Profile

Read Full Post »


Four patents and one patent application on Nanopore Sequencing and methods of trapping a molecule in a nanopore assigned to Genia, is been claimed in a Law Suit by The Regents of the University of California, should be assigned to UCSC

Reporter: Aviva Lev-Ari, PhD, RN

 

The university claims that while at UCSC Roger Chen’s research focused on nanopore sequencing, and that he along with others developed technology that became the basis of patent applications filed by the university. However, when Chen left the university in 2008 and cofounded Genia, he was awarded patents for technology developed while he was at UCSC, but those patents were assigned to Genia and not the university, according to the suit.

In the suit, the university notes four patents and one patent application assigned to Genia that it claims should be assigned to UCSC: US Patent Nos., 8,324,914; 8,461,854; 9,041,420; and 9,377,437; and US Patent Application 15/079,322. The patents and patent applications all relate to nanopore sequencing and specifically to methods of trapping a molecule in a nanopore and characterizing it based on the electrical stimulus required to move the molecule through the pore.

Genia was founded in 2009, and in 2014, Roche acquired the startup for $125 million in cash and up to $225 million in milestone payments. Earlier this year, the company published a proof-of-principle study of its technology in the Proceedings of the National Academy of Sciences.

Roche’s head of sequencing solutions, Neil Gunn, said that Roche would announce a commercialization timeline in 2017.

It’s unclear how the lawsuit will impact that commercialization, but Mick Watson, director of ARK-Genomics at the Roslin Institute in the UK, speculated in a blog post that if the suit is decided in favor of UCSC, it could result in a very large settlement and potentially even the end of Genia.

 

SOURCE

https://www.genomeweb.com/sequencing/university-california-files-suit-against-genia-cofounder

http://www.opiniomics.org/university-of-california-makes-legal-move-against-roger-chen-and-genia/

 

Read Full Post »


A New Computational Method illuminates the Heterogeneity and Evolutionary Histories of cells within a Tumor

Reporter: Aviva Lev-Ari, PhD, RN

 

Start Quote

Numerous computational approaches aimed at inferring tumor phylogenies from single or multi-region bulk sequencing data have recently been proposed. Most of these methods utilize the variant allele fraction or cancer cell fraction for somatic single-nucleotide variants restricted to diploid regions to infer a two-state perfect phylogeny, assuming an infinite-site model such that each site can mutate only once and persists. In practice, convergent evolution could result in the acquisition of the same mutation more than once, thereby violating this assumption. Similarly, mutations could be lost due to loss of heterozygosity. Indeed, both single-nucleotide variants and copy number alterations arise during tumor evolution, and both the variant allele fraction and cancer cell fraction depend on the copy number state whose inference reciprocally relies on the relative ordering of these alterations such that joint analysis can help resolve their ancestral relationship (Figure 1). To tackle this outstanding problem, El-Kebir et al. (2016) formulated the multi-state perfect phylogeny mixture deconvolution problem to infer clonal genotypes, clonal fractions, and phylogenies by simultaneously modeling single-nucleotide variants and copy number alterations from multi-region sequencing of individual tumors. Based on this framework, they present SPRUCE (Somatic Phylogeny Reconstruction Using Combinatorial Enumeration), an algorithm designed for this task. This new approach uses the concept of a ‘‘character’’ to represent the status of a variant in the genome.

Commonly, binary characters have been used to represent single-nucleotide variants— that is, the variant is present or absent. In contrast, El-Kebir et al. use multi-state characters to represent copy number alterations, which may be present in zero, one, two, or more copies in the genome.

SPRUCE outperforms existing methods on simulated data, yielding higher recall rates under a variety of scenarios. Moreover, it is more robust to noise in variant allele frequency estimates, which is a significant feature of tumor genome sequencing data. Importantly, El-Kebir and colleagues demonstrate that there is often an ensemble of phylogenetic trees consistent with the underlying data. This uncertainty calls for caution in deriving definitive conclusions about the evolutionary process from a single solution.”

End Quote

 

From Original Paper

Inferring Tumor Phylogenies from Multi-region Sequencing

Zheng Hu1,2 and Christina Curtis1,2,*

1Departments of Medicine and Genetics

2Stanford Cancer Institute

Stanford University School of Medicine, Stanford, CA 94305, USA

*Correspondence: cncurtis@stanford.edu

http://dx.doi.org/10.1016/j.cels.2016.07.007

Read Full Post »

Older Posts »