Genome Analysis Toolkit (GATK) the Industry Standard will govern the New Tools in Biomedical Research by the Collaboration of Broad Institute and Intel

April 6, 2016 by 2012pharmaceutical

Genome Analysis Toolkit (GATK) the Industry Standard will govern the New Tools in Biomedical Research by the Collaboration of Broad Institute and Intel

Curator: Aviva Lev-Ari, PhD, RN

In 2015 we published the following Genomics-related e-Books on Amazon.com

Metabolic Genomics and Pharmaceutics, on Amazon since 7/21/2015

http://www.amazon.com/dp/B012BB0ZF0

Cancer Biology & Genomics for Disease Diagnosis, on Amazon since 8/11/2015

http://www.amazon.com/dp/B013RVYR2K

Genomics Orientations for Personalized Medicine, on Amazon since 11/23/2015

http://www.amazon.com/dp/B018DHBUO6

Milestones in Physiology: Discoveries in Medicine, Genomics and Therapeutics, on Amazon.com since 12/27/2015

http://www.amazon.com/dp/B019VH97LU

Cardiovascular, Volume Two: Cardiovascular Original Research: Cases in Methodology Design for Content Co-Curation, on Amazon since 11/30/2015

http://www.amazon.com/dp/B018Q5MCN8

Cardiovascular Diseases, Volume Three: Etiologies of Cardiovascular Diseases: Epigenetics, Genetics and Genomics, on Amazon since 11/29/2015

http://www.amazon.com/dp/B018PNHJ84

Cardiovascular Diseases, Volume Four: Regenerative and Translational Medicine: The Therapeutics Promise for Cardiovascular Diseases, on Amazon since 12/26/2015

http://www.amazon.com/dp/B019UM909A

In 2016 we are working on e-Publishing

Series C: e-Books on Cancer & Oncology

Volume 2: Cancer Therapies: Metabolic, Genomics, Interventional, Immunotherapy and Nanotechnology in Therapy Delivery

Authors, Curators and Editors:

Larry H Bernstein, MD, FCAP and Stephen J Williams, PhD

and

Series B: e-Books on Genomics & Medicine

Volume 2: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS & BioInformatics, Simulations and the Genome Ontology

Editors: Stephen J Williams, PhD and TBA

Work-in-Progress

We plan to conduct several interviews with Intel and Broad Institute. Contents of these interviews will be included in this forthcoming Volume on NGS, Series B Volume 2.

In this e-Book we plan to cover BioIT efforts at the following Research Centers:

Collaborative Cancer Cloud – aka Intel and OHSU’s precision medicine network
Oregon Health Sciences University (OHSU),
Dana-Farber Cancer Institute (DFCI)
Ontario Institute for Cancer Research (OICR)
Cloudera

Our Open Access Online Scientific Journal, PharmaceuticalIntelligence.com, has on 4/8/2016 11:50AM EST, the following site statistics:

940,549 e-Readers and
4,470 Scientific Articles, most are classified in more than one research category, of the
485 that construct the Journal Ontology

790 Articles on Cancer Biology and Novel Therapeutics

732 Articles are on Genome Biology

580 Articles are on Personalized & Precision Medicine

345 Articles are on Genomic Testing

508 Articles are on BioMarkers and Diagnostics

400 Articles are on Computation Biology

110 Articles on BioIT and Big Data

We covered Several years in a row CHI’s BioIT conference in Boston

http://pharmaceuticalintelligence.com/press-coverage/

This morning, 4/6/2016, the Broad Institute and Intel announced that they’re working together to develop new tools aimed at accelerating biomedical research. The tools are integrated with GATK (industry standard Genome Analysis Toolkit) to simplify and speed up genomic research, and are already being used across collective data sets from Oregon Health Sciences University (OHSU), Dana-Farber Cancer Institute (DFCI) and the Ontario Institute for Cancer Research (OICR), as part of Intel and OHSU’s precision medicine network called the Collaborative Cancer Cloud. See full press release below.

The tools include:

Cromwell – designed to launch genomic pipelines on private or public clouds in a portable and reproducible manner. Broad is working with Intel to extend Cromwell’s capabilities to support multiple input languages and execute on multiple back ends simultaneously, enabling researchers to run jobs anywhere.
GenomicsDB – is a novel way to store vast amounts of patient variant data and to perform fast processing with unprecedented scalability. Built and optimized for the management of genomic variant data, GenomicsDB runs on top of an array database system optimized for sparse data called ‘TileDB,’ developed by MIT and Intel researchers.

Also announced at BioIT World, the Broad Institute has teamed up with

Intel,
AWS,
Cloudera,
Google,
IBM,
Intel, and
Microsoft

to enable cloud-based access to the GATK. The GATK Best Practices pipeline will be available to users of cloud service providers through a software-as-a-service (SaaS) mechanism, expanding access beyond traditional desktop solutions in hopes of fueling new insights into disease and treatment. The pipeline is optimized to work with both Cromwell and GenomicsDB, with a primary focus on

variant discovery, and
genotyping

Cloudera, Broad Institute Collaborate on the Next Generation of the Genome Analysis Toolkit

Built on Cloudera Enterprise with Apache Spark as the Bioinformatics Standard, GATK4 Designed to Speed Genomic Research

PALO ALTO, Calif., April 06, 2016 — Cloudera, the global provider of the fastest, easiest, and most secure data management and analytics platform built on Apache Hadoop and the latest open source technologies, today announced a collaboration with the Broad Institute of MIT and Harvard, the world’s leading biomedical and genomic research center. The two organizations are working together this year to advance the development of Broad’s next generation Genome Analysis Toolkit, GATK4.

Cloudera Enterprise accelerates life sciences research and drug discovery by putting real-time data into the hands of the clinicians, researchers, and providers focused on personalizing the patient experience. Building the fourth generation of GATK (GATK4) on Cloudera Enterprise and utilizing the Spark distributed computing framework to speed research, the Broad Institute is facilitating better understanding of genomic sequencing, resulting in faster data exploration and ultimately empowering better clinical decisions.

Since the Human Genome Project produced the first draft sequence of the human genome in 2000, the cost of sequencing has dropped exponentially, from around $100 million USD per genome to around $1,000 USD today. Over the same period, we have seen massive growth in the storage and processing capabilities of big data technologies like Hadoop.

“This lower cost of genome sequencing and advancement in big data technologies means that we can afford to sequence the genome of patients very broadly and produce datasets that have never been available before,” said Shawn Dolley, industry leader of life sciences at Cloudera. “Building the next generation toolkit on Spark greatly accelerates in-memory computations and facilitates parallelism. Cloudera Enterprise expedites round-trips to access and compute data for data discovery, translating into significant reductions in R&D time. This will have a very meaningful scientific upside.”

Presently there are more than 31,000 registered users of the GATK. Broad Institute is working with collaborators to develop cloud-hosted options to expand access and facilitate usage of genome analysis tools for even more powerful insights and decision-making. Users could also more easily create best-practice pipelines and avoid duplicating infrastructures.

“Utilizing the Spark computing framework on Cloudera Enterprise gives us the ability to implement tools that were not possible in GATK3 due to their computational complexity,” said Dr. Eric Banks, senior director of Data Sciences and Data Engineering at Broad and a creator of the GATK software package. “On Cloudera Enterprise, we can now run analysis of genomic data two orders of magnitude faster than in previous versions of GATK, enabling faster iterative analysis for propelling genomic innovation.“

About Cloudera

Cloudera delivers the modern data management and analytics platform built on Apache Hadoop and the latest open source technologies. The world’s leading organizations trust Cloudera to help solve their most challenging business problems with Cloudera Enterprise, the fastest, easiest and most secure data platform available for the modern world. Our customers efficiently capture, store, process and analyze vast amounts of data, empowering them to use advanced analytics to drive business decisions quickly, flexibly and at lower cost than has been possible before. To ensure our customers are successful, we offer comprehensive support, training and professional services. Learn more at cloudera.com.

SOURCE

http://www.cloudera.com/about-cloudera/press-center/press-releases/2016-04-06-Cloudera-Broad-Institute-Collaborate-on-the-Next-Generation-of-the-Genome-Analysis-Toolkit.html

Broad Institute, Intel work together to develop tools to accelerate biomedical research

CAMBRIDGE, Mass. April 6, 2016 – Intel Corporation and the Broad Institute of MIT and Harvard will announce today at the Bio-IT World Conference & Expo that they are co-developing new tools, and advancing fundamental capabilities, so large genomic workflows can run at cloud scale.

Today Broad Institute is also announcing collaborations with cloud providers to enable cloud-based access to its Genome Analysis Toolkit (GATK) software package. This is expected to expand access to the GATK Best Practices pipeline. The new tools Broad is developing with Intel aim to simplify the execution of large genomic workflows such as GATK, and to improve the storage, scalability, and processing of genomic data. This has the potential to not only speed variant detection and biomarker discovery, but enable discoveries that would not have been detected with smaller cohorts.

Broad’s workflow execution engine, called “Cromwell”, is designed to launch genomic pipelines on private or public clouds in a portable and reproducible manner. Broad is working with Intel to extend Cromwell’s capabilities to support multiple input languages and execute on multiple back ends simultaneously, enabling researchers to run jobs anywhere.

This integrated workflow engine has built-in intelligence capable of finding the optimal way to execute tasks, the most appropriate hardware resources to run those tasks on, and methods to avoid redundant steps. “Orchestrating genomic workflows at cloud scale is complex,” said Dr. Eric Banks, Senior Director of Data Sciences and Data Engineering at Broad and a creator of the GATK software package. “We wanted to simplify the execution of common genomic data types like reads and variants and to create an environment that allows any researcher to do this at scale in an easy-to-use way.”

Another area of joint innovation is in the processing and storing of genomic variant datasets, which often consist of large, sparse data matrices. Gene sequence variation data is commonly stored as text files for bioinformatics. The declining cost of DNA sequencing has driven an increase in the volume of genomic data sets that researchers want to incorporate, making it increasingly difficult to jointly analyze large volumes of data from text files. Large scale reads and writes of variant call data, joint genotyping, or variant recalibration require next-generation databases that are built and optimized for genomic data.

Broad and Intel are collaborating on a faster, more flexible, and scalable solution. ‘GenomicsDB’ is a novel way to store vast amounts of patient variant data and to perform fast processing with unprecedented scalability. Built and optimized for the management of genomic variant data, GenomicsDB runs on top of an array database system optimized for sparse data called ‘TileDB.’

TileDB was developed by MIT and Intel researchers working at the Intel Science and Technology Center for Big Data, which is based at MIT’s Computer Science and Artificial Intelligence Lab. GenomicsDB is now used in the Broad’s production pipeline running on an Intel® Xeon® processor based cloud environment to perform joint genotyping.

“The time it now takes to perform the variant discovery process went from eight days to 18 hours,” Banks said. “However, that’s with 100 whole genomes. We routinely process projects with thousands of samples, so that speedup itself is truly transformative. We recently needed to abandon our attempt to run variant discovery on an eight thousand sample project because we estimated it would take 90 days without GenomicsDB. With GenomicsDB, however, it should take under a week. This means we can say ‘yes’ to our researchers far more often, on far more ambitious projects.”

“With the integration of these two tools into the genomic pipeline that we are running on a cloud environment, the orchestration and execution of the workflow is not only simplified but significantly accelerated,” said Ben Neale, an institute member at the Broad Institute’s Stanley Center for Psychiatric Research and the Broad’s Program in Medical and Population Genetics. “We are excited that the research community will be able to start testing GenomicsDB and Cromwell.” Intel is releasing TileDB and GenomicsDB as open source tools today.

Engineers building the ‘Collaborative Cancer Cloud’, a precision medicine network including Oregon Health Sciences University (OHSU), Dana-Farber Cancer Institute (DFCI), and Ontario Institute for Cancer Research (OICR) are already using these tools across their collective data sets. Long-term goals are to expand upon these tools to enable joint genotyping with other large genomic research centers in a federated and secure model, regardless of the location of data.

Broad will continue to work with Intel on next-generation computing technologies that address the size, speed, security and scalability challenges associated with large scale genomic sequencing data and analytics. “The progress that we’re seeing in our development work with Broad represents another step in the moonshot goal of taming cancer and other maladies,” said Eric Dishman, Intel Vice President, Health and Life Sciences. “Harnessing and analyzing massive amounts of genomic data may eventually be a key factor in enabling people around the world to live longer, healthier lives.”

Original announcement took place on 3/20/2014

http://www.bio-itworld.com/2014/3/20/broad-intel-announce-speed-improvements-gatk-powered-by-intel-optimizations.html

Intel’s precision medicine goal of “All in One Day by 2020”

April 6, 2016

Intel partners with Broad Institute to help beat cancer

Intel’s Collaborative Cancer Cloud is helping medical researchers analyse vast amounts of genomic data

http://www.alphr.com/intel/1003133/intel-partners-with-broad-institute-to-help-beat-cancer

August 19, 2015

Intel’s Cancer Cloud uses Big Data to fight disease

Intel is helping medical researchers to share important patient data

ABOUT OHSU

Dana-Farber Cancer Institute and Ontario Institute for Cancer Research join Collaborative Cancer Cloud

03/31/16 Portland, Ore.

Two leading cancer centers join effort to securely share genomic, imaging and clinical data to better understand the root causes of cancer and accelerate potentially lifesaving discoveries

The Knight Cancer Institute at Oregon Health & Science University and Intel Corporation are expanding participation in the Collaborative Cancer Cloud, a distributed precision medicine analytics platform, to include Dana-Farber Cancer Institute and Ontario Institute for Cancer Research. The institutions will join the OHSU Knight Cancer Institute in leveraging Intel’s technology to securely share and analyze their collectively large amounts of data, while preserving the privacy and security of patient data at each site.

The Collaborative Cancer Cloud combines Intel technologies and bioscience advancements to enable solutions that make it easier, faster and more affordable for developers, researchers and clinicians to determine how hundreds, even thousands of genes interact to drive disease in individual patients. The cancer cloud is designed to scale to unprecedented volumes of data and allows for secure, aggregated computation across distributed sites without loss of local control of the data, ensuring an institution’s ability to maintain proper custody of its datasets and protecting patient privacy and any institutional intellectual property that may result.

Engineers and scientists have come together to form a dynamic, new type of team that is developing novel hardware and software technologies optimized for current precision medicine analytics but engineered for longevity – with inherent flexibility to work with future computer platforms, data standards and analytics solutions.

The Collaborative Cancer Cloud’s unique technical capabilities are what attracted the two research institutions to join the collaboration.

“Through Dana-Farber’s ‘Profile’ project, we have created one of the world’s largest databases of genetic abnormalities that drive cancer, with over 15,000 genetic profiles of patients’ tumors, adding about 400 each month to the database,” said Barrett Rollins, M.D., chief scientific officer, Dana-Farber Cancer Institute. “We are excited to be part of the Collaborative Cancer Cloud and are convinced that the innovative data sharing structure developed by Intel and our academic partners will accelerate the delivery of better treatments to our patients.”

https://www.ohsu.edu/xd/about/news_events/news/2016/03-31-dana-farber-cancer-insti.cfm

Posted in Big Data, BioIT: BioInformatics, BioIT: BioInformatics, NGS, Clinical & Translational, Pharmaceutical R&D Informatics, Clinical Genomics, Cancer Informatics, Computational Biology/Systems and Bioinformatics, Genome Biology, Personalized and Precision Medicine & Genomic Research | Tagged Collaborative Cancer Cloud | Leave a Comment

Comments RSS

Leaders in Pharmaceutical Business Intelligence Group, LLC, Doing Business As LPBI Group, Newton, MA