Track 4 Bioinformatics: Utilizing Massive Quantities of –omic Information across Research Initiatives @ BioIT World, April 29 – May 1, 2014 Seaport World Trade Center, Boston, MA
Reporter: Aviva Lev-Ari, PhD, RN
Bioinformatics for Big Data
10:50 Chairperson’s Remarks
Les Mara, Founder, Databiology, Ltd.
11:00 Data Management Best Practices for Genomics Service Providers
Vas Vasiliadis, Director, Products, Computation Institute,
University of Chicago and Argonne National Laboratory
Genomics research teams in academia and industry are increasingly limited at all stages of their work by large and unwieldy datasets, poor integration between the computing facilities they use for analysis, and difficulty in sharing analysis results with their customers and collaborators. We will discuss issues with current approaches and describe emerging best practices for managing genomics data through its lifecycle.
Vas in REAL TIME
Computation Institute @ University of Chicago solutions to non profit entities, scale and make available in an affordable way “I have nothing to say on Big Data”, 57.7% survey by NAS, average time researcher spend on research, it will get worse, research data management morphed into better ways, industrial robust way, commercial start ups are role model. All functions of an enterprise now available as applications for small business.
- Highly scaleable, invisible
- high performance
- In Genomics, tools – shipping hard drive new ways to develop research infrastructure:
- dropbox, does not scale Amazon’s Webservices is the cloud
- security in sharing across campuses, InCommon – cross domains sw access constrains are mitigated.
- identity provision for multiple identity – identity Hub, one time association done, Group Hubs, i.e., ci connect – UChicago, access to systems at other campuses – connecting science to cycles of data, network not utilizied efficiently – tools not design for that, FTP, Firewalls are designed for data not Big data.
- Science DMZ – carve realestate for Science data transfer, monitoring the transfer
- Reproducibility, Provenance, Public mandates
- Data publication Service: VIVO, fisshare, Fedora, duracloud, doi, identification, store, preserve,, curation workflow
- Search for discovery: Faceted Search. browse distributed, access locally – automation required, outsourcing, delivery throufg SaaS
- We are all on cloud
11:30 NGS Analysis to Drug Discovery: Impact of High-Performance Computing in Life Sciences
Bhanu Rekepalli, Ph.D., Assistant Professor and Research Scientist, Joint Institute for Computational Sciences, The University of Tennessee, Oak Ridge National Laboratory
We are working with small-cluster-based applications most widely used by the scientific community on the world’s premier supercomputers. We incorporated these parallel applications into science gateways with user-friendly, web-based portals. Learn how the research at UTK-ORNL will help to bridge the gap between the rate of big data generation in life sciences and the speed and ease at which biologists and pharmacists can study this data.
Bhanu in REAL TIME
Cost per Genome does down, 2011 from $100,000 to $1,000
- Solutions:
- architecture
- parallel informatics
- SW modules
- web-based gateway
- XSEDE.org sponsured by NSF at all sponsored research by NSF
- LCF – applications: Astrophysics, Bioinfo, CFD, highly scalable wrappers for the analysis Blast scaling results in Biology
- Next generation super computers: Xeon/Phi
NICS Informatics Science gateway – PoPLAR Portal for Parallel Scaling Life Sciences Applications & Research
- automated workflows
- Smithsonian Institute, generate genomes fro all life entities in the universe: BGI
- Titan Genomic Data analysis – Everglade ecosystem, sequenced
- Univ S. Carolina great computing infrastructure
- Super computer: KRAKEN
- 5-10 proteins modeling on supercomputers for novel drug discovery
- Vascular Tree system for Heart transplant – visualization and modeling
12:00 pm The Future of Biobank Informatics
Bruce Pharr, Vice President, Product Marketing, Laboratory Systems, Remedy Informatics
As biobanks become increasingly essential to basic, translational, and clinical research for genetic studies and personalized medicine, biobank informatics must address areas from biospecimen tracking, privacy protection, and quality management to pre-analytical and clinical collection/identification of study data elements. This presentation will examine specific requirements for third-generation biobanks and how biobank informatics will meet those requirements.
Bruce Pharr in REAL TIME
Flexible Standartization
BioBank use of informatics in the1980s – bio specimens. 1999 RAND research 307 M biospecimens in US biobanks growing at 20M per year.
2nd – Gen Bioband
2005 – 3rd-Gen Biobanks – 15000 studies on Cancer, biospecimen, Consent of donors is a must.
Biobank – PAtion , Procedure, specimen acquistion, storage, processing, distribution, analysis
Building Registries – Mosaic Platform
- Specimen Track BMS,
- Mosaic Ontology: application and Engine
1. standardize specimen requirement
Registries set up the storage: administrator dashboard vs user bashboard
2. Interoperability
3. Quality analysis
4. Informed Consent
12:15 Learn How YarcData’s Graph Analytics Appliance Makes It Easy to Use Big Data in Life Sciences
Ted Slater, Senior Solutions Architect, Life Sciences, YarcData, a division of Cray
YarcData, a division of Cray, offers high performance solutions for big data graph analytics at scale, finally giving researchers the power to leverage all the data they need to stratify patients, discover new drug targets, accelerate NGS analysis, predict biomarkers, and better understand diseases and their treatments.
12:40 Luncheon Presentation I
The Role of Portals for Managing Biostatistics Projects at a CRO
Les Jordan, Director, Life Sciences IT Consulting, Quintiles
This session will focus on how portals and other tools are used within Quintiles and at other pharmas to manage projects within the biostatistics department.
1:10 Luncheon Presentation II (Sponsorship Opportunity Available) or Lunch on Your Own
1:50 Chairperson’s Remarks
Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC
Sabrina Molinaro, Ph.D., Head of Epidemiology, Institute of ClinicalPhysiology, National Research Council –
CNR Italy
1:55 Integration of Multi-Omic Data Using Linked Data Technologies
Aleksandar Milosavljevic, Ph.D., Professor, Human Genetics; Co-Director,
Program in Structural & Computational Biology and Molecular Biophysics;
Co-Director, Computational and Integrative Biomedical Research Center,
Baylor College of Medicine
By virtue of programmatic interoperability (uniform REST APIs), Genboree servers enable virtual integration of multi-omic data that is distributed across multiple physical locations. Linked Data technologies of the Semantic Web provide an additional “logical” layer of integration by enabling distributed queries across the distributed data and by bringing multi-omic data into the context of pathways and other background knowledge required for data interpretation.
2:25 Building Open Source Semantic Web-Based Biomedical Content Repositories to Facilitate and Speed Up Discovery and Research
Bhanu Bahl, Ph.D., Director, Clinical and Translational Science Centre,
Harvard Medical School
Douglas MacFadden, CIO, Harvard Catalyst at Harvard Medical School
Eagle-i open source network at Harvard provides a state-of-the-art informatics
Leave a Reply