Advertisements
Feeds:
Posts
Comments

Posts Tagged ‘Bioinformatics methodologies’


Bioinformatics Tool Review: Genome Variant Analysis Tools

Curator: Stephen J. Williams, Ph.D.

 

The following post will be an ongoing curation of reviews of gene variant bioinformatic software.

 

The Ensembl Variant Effect Predictor.

McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, Flicek P, Cunningham F.

Genome Biol. 2016 Jun 6;17(1):122. doi: 10.1186/s13059-016-0974-4.

Author information

1

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. wm2@ebi.ac.uk.

2

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

3

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. fiona@ebi.ac.uk.

Abstract

The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

 

Rare diseases can be difficult to diagnose due to low incidence and incomplete penetrance of implicated alleles however variant analysis of whole genome sequencing can identify underlying genetic events responsible for the disease (Nature, 2015).  However, a large cohort is required for many WGS association studies in order to produce enough statistical power for interpretation (see post and here).  To this effect major sequencing projects have been initiated worldwide including:

A more thorough curation of sequencing projects can be seen in the following post:

Icelandic Population Genomic Study Results by deCODE Genetics come to Fruition: Curation of Current genomic studies

 

And although sequencing costs have dramatically been reduced over the years, the costs to determine the functional consequences of such variants remains high, as thorough basic research studies must be conducted to validate the interpretation of variant data with respect to the underlying disease, as only a small fraction of variants from a genome sequencing project will encode for a functional protein.  Correct annotation of sequences and variants, identification of correct corresponding reference genes or transcripts in GENCODE or RefSeq respectively offer compelling challenges to the proper identification of sequenced variants as potential functional variants.

To this effect, the authors developed the Ensembl Variant Effect Predictor (VEP), which is a software suite that performs annotations and analysis of most types of genomic variation in coding and non-coding regions of the genome.

Summary of Features

  • Annotation: VEP can annotate two broad categories of genomic variants
    • Sequence variants with specific and defined changes: indels, base substitutions, SNVs, tandem repeats
    • Larger structural variants > 50 nucleotides
  • Species and assembly/genomic database support: VEP can analyze data from any species with assembled genome sequence and annotated gene set. VEP supports chromosome assemblies such as the latest GRCh38, FASTA, as well as transcripts from RefSeq as well as user-derived sequences
  • Transcript Annotation: VEP includes a wide variety of gene and transcript related information including NCBI Gene ID, Gene Symbol, Transcript ID, NCBI RefSeq ID, exon/intron information, and cross reference to other databases such as UniProt
  • Protein Annotation: Protein-related fields include Protein ID, RefSeq ID, SwissProt, UniParc ID, reference codons and amino acids, SIFT pathogenicity score, protein domains
  • Noncoding Annotation: VEP reports variants in noncoding regions including genomic regulatory regions, intronic regions, transcription binding motifs. Data from ENCODE, BLUEPRINT, and NIH Epigenetics RoadMap are used for primary annotation.  Plugins to the Perl coding are also available to link other databases which annotate noncoding sequence features.
  • Frequency, phenotype, and citation annotation: VEP searches Ensembl databases containing a large amount of germline variant information and checks variants against the dbSNP single nucleotide polymorphism database. VEP integrates with mutational databases such as COSMIC, the Human Gene Mutation Database, and structural and copy number variants from Database of Genomic Variants.  Allele Frequencies are reported from 1000 Genomes and NHLBI and integrates with PubMed for literature annotation.  Phenotype information is from OMIM, Orphanet, GWAS and clinical information of variants from ClinVar.
  • Flexible Input and Output Formats: VEP supports input data format called “variant call format” or VCP, a standard in next-gen sequencing. VEP has the ability to process variant identifiers from other database formats.  Output formats are tab deliminated and give the user choices in presentation of results (HTML or text based)
  • Choice of user interface
    • Online tool (VEP Web): simple point and click; incorporates Instant VEP Functionality and copy and paste features. Results can be stored online in cloud storage on Ensembl.
    • VEP script: VEP is available as a downloadable PERL script (see below for link) and can process large amounts of data rapidly. This interface is powerfully flexible with the ability to integrate multiple plugins available from Ensembl and GitHub.  The ability to alter the PERL code and add plugins and code functions allows the flexibility to modify any feature of VEP.
    • VEP REST API: provides robust computational access to any programming language and returns basic variant annotation. Can make use of external plugins.

 

 

Watch Video on VES Instructional Webinar: https://youtu.be/7Fs7MHfXjWk

Watch Video on VES Web Version training on How to Analyze Your Sequence in VEP

 

 

Availability of data and materials

The dataset supporting the conclusions of this article is available from Illumina’s Platinum Genomes [93] and using the Ensembl release 75 gene set. Pre-built data sets are available for all Ensembl and Ensembl Genomes species [94]. They can also be downloaded automatically during set up whilst installing the VEP.

 

References

Large-scale discovery of novel genetic causes of developmental disorders.

Deciphering Developmental Disorders Study.

Nature2015 Mar 12;519(7542):223-8. doi: 10.1038/nature14135. PMID:25533962

Other articles related to Genomics and Bioinformatics on this online Open Access Journal Include:

Finding the Genetic Links in Common Disease: Caveats of Whole Genome Sequencing Studies

 

Large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes

 

US Personalized Cancer Genome Sequencing Market Outlook 2018 –

 

Icelandic Population Genomic Study Results by deCODE Genetics come to Fruition: Curation of Current genomic studies

 

 

Advertisements

Read Full Post »


Recognitions for Contributions in Genomics by Dan David Prize Awards

Reporter: Aviva Lev-Ari, PhD, RN

 

The Source for this List is a Search for “Genomics” on the Dan David Prize website

http://www.dandavidprize.org/component/finder/search?q=Genomics&Itemid=101

This is a compilation of all Dan David Prizes awarded in the Field of Genomics

When Will Genomics Cure Cancer?
A conversation with the biogeneticist ERIC S. LANDER [2012 laureate] about how genetic advances are transforming medical treatment “Eric S. Lander, one of the leaders of the Human Genome Project, a map of the 3 billion letters of DNA that make up a…

J. Craig Venter
Founder, Chairman, and President of the J. Craig Venter Institute, Rockville, MD and La Jolla, CA, USA and CEO of Synthetic Genomics Inc., La Jolla, CA, USA.

David Botstein
Anthony B. Evnin Professor of Genomics; Director, Lewis-Sigler Institute for Integrative Genomics; Director, Certificate Program in Quantitative and Computational Biology, Princeton University, Princeton, NJ, USA.

Laureates Announced 2012
Dan David Prize 2012 Laureates Announced Robert Conquest, Martin Gilbert – for Biography/History William Kentridge – for Plastic Arts David Botstein, Craig Venter, Eric Lander – for Genome Research Tel Aviv (February 27, 2012) —The international Dan…

Cutting Edge Genomic Research in the World’s First Carbon-Neutral Laboratory Facility
J. CRAIG VENTER, 2012 laureate, is Founder, Chairman, and President of the J. Craig Venter Institute, Rockville, MD and La Jolla, CA, USA and CEO of Synthetic Genomics Inc., La Jolla, CA, USA. “One of our quests is to help solve two troubling issues —…

Prof. David Haussler
Prof. David Haussler is a Distinguished Professor of Biomolecular Engineering at the University of California, Santa Cruz, and Scientific Director of the UC Santa Cruz Genomics Institute.

Eric Lander
Founding Director, Broad Institute Harvard and MIT and director of its Genome Biology Program, Cambridge, MA, USA.

Future – Bioinformatics
Bioinformatics is a field in which mathematics, statistics, and computer algorithms are harnessed towards novel biological discoveries. Bioinformatics methodologies have revolutionized biology, by making it more quantitative and less descriptive….

J. CRAIG VENTER – Life at the Speed of Light
The Dawn of an Era In his NEW BOOK ‘Life at the Speed of Light: From the Double Helix to the Dawn of Digital Life’ J. CRAIG VENTER, 2012 laureate, explains the coming era of discovery (see Wired interview below). What is the significance of Venter’s…

From the Press : Hebrew
The Marker, June 14, 2012 – Dan David Prize: The Next Generation Calcalist, June 14, 2012 – Dan David Prize Awarded: Thoughts of Creating Life, Boycotting Scientists, Protests, Entrepreneurs and Ceremonies Ma’ariv, June 12, 2012 – Who Attended the Dan…

Gary Ruvkun
Professor of Genetics, Department of Molecular BiologyMassachusetts General Hospital, Harvard University Gary Ruvkun has made a major contribution to the future of human health with the discovery of conserved hormonal signaling pathways with…

Selected Fields 2012
Past – HISTORY / BIOGRAPHY Biography is an important sub-discipline of history. Every progressive society makes room for achievement and excellence. Since ancient times, this has been done by immortalizing the names of heroes, role models and…

Prof. Michael S. Waterman
Prof. Michael S. Waterman is Professor of Biological Sciences, of Mathematics, of Computer Science, Department of Biological Sciences, University of Southern California.

 SOURCE
Other related articles published in this Open Access Online Scientific Journal include the following:

2013 Genomics: The Era Beyond the Sequencing of the Human Genome: Francis Collins, Craig Venter, Eric Lander, et al.

Curator: Aviva Lev-Ari, PhD, RN

https://pharmaceuticalintelligence.com/2013/02/11/2013-genomics-the-era-beyond-the-sequencing-human-genome-francis-collins-craig-venter-eric-lander-et-al/

Read Full Post »