Archive for the ‘Advanced Computing Platform’ Category

Tweets & Retweets by @pharma_BI and @AVIVA1950 at #BioIT20, 19th Annual Bio-IT World 2020 Conference, October 6-8, 2020 in Boston


Virtual Conference coverage in Real Time: Aviva Lev-Ari, PhD, RN


Amazing conference ended at 2PM on October 8, 2020

e-Proceedings 19th Annual Bio-IT World 2020 Conference, October 6-8, 2020 Boston

Virtual Conference coverage in Real Time: Aviva Lev-Ari, PhD, RN


Review Tweets and Retweets

and 2 others liked your Tweet

#BioIT20 Plenary Keynote: cutting innovative approach to #Science #Game On: How #AI, #CitizenScience #HumanComputation are facilitating the next leap forward in #Genomics and in #Biology may be in #PrecisionMedicine in the Future @pharma_BI @AVIVA1950 pic.twitter.com/L52qktkeYc

Retweeted your Tweet
#BioIT20 Plenary Keynote: cutting innovative approach to #Science #Game On: How #AI, #CitizenScience #HumanComputation are facilitating the next leap forward in #Genomics and in #Biology may be in #PrecisionMedicine in the Future @pharma_BI @AVIVA1950 pic.twitter.com/L52qktkeYc


liked your Tweet

#BioIT20 Plenary Keynote: cutting innovative approach to #Science #Game On: How #AI, #CitizenScience #HumanComputation are facilitating the next leap forward in #Genomics and in #Biology may be in #PrecisionMedicine in the Future @pharma_BI @AVIVA1950 pic.twitter.com/L52qktkeYc

NIH Office of Data Science Strategy

We’ve made progress with #FAIRData, but we still have a ways to go and our future is bright. #BioIT20 #NIHData



Aviva Lev-Ari


Driving Scientific Discovery with Data Digitization great ideas shared by moderator Timothy Gardner

#CEO Inspiration from History Total Quality Implementation is key for BioScience Data #AI won’t solve the problem #Data #Quality will


Rob Lalonde

My #BioIT20 talk, “#Bioinformatics in the #Cloud Age,” is tomorrow at 3:30pm. I discuss cloud migration trends in life sciences and #HPC. Join us! A panel with


follows the talk.


Jean Marois

My team is participating in Bio-IT World Virtual 2020, October 6-8. Join me! Use discount code 20NUA to save 20%! invt.io/1tdbae9s8lp


I’m going to Bio-IT World 2020, Oct 6-8, from home! Its a virtual event. Join me!
My team is participating in Bio-IT World Virtual 2020, October 6-8. Join me! Use discount code 20NUA to save 20%! @bioitworld #BioIT20

NIH Office of Data Science Strategy

One of the challenges we face today: we need an algorithm that can search across the 36+ PB of Sequence Read Archive (SRA) data now in the cloud. Imagine what we could do! #BioIT20 #NIHdata #SRAdata



NCBI Staff

NCBI’s virtual #BioIT20 booth will open in 15 minutes. There, you can watch videos, grab some flyers and even speak with an expert! bio-itworld.pathable.co/organizations/ The booth will close at 4:15 PM, but we’ll be back tomorrow, Oct 7 and Thursday, Oct 8 at 9AM.
Bio-IT World
Welcome to Bio-IT World Virtual
Show this thread


Happening soon at #BioIT20: Join our faculty inventor Professor Rich Head’s invited talk “CompBio: An Augmented Intelligence System for Comprehensive Interpretation of Biological Data.”

Wendy Anne Warr

This was a good discussion
Quote Tweet
Cambridge Innovation
RT percayai: We’ve put together what’s sure to be a thought-provoking discussion group for #BioIT20 “Why Current Approaches Using #AI in #…

Cambridge Innovation

RT VishakhaSharma_: Excited to speak and moderate a panel on Emerging #AI technologies bioitworld #BioIT20

Titian Software

Meet Titian at #BioIT20 on 6-8th October and discover the latest research, science and solutions for exploring the world of precision medicine and the technologies that are powering it: bit.ly/2GjCj4B




Thanks for joining us, Wendy! You’ve done a great job summing up key points from the discussion. #BioIT20

Aviva Lev-Ari

#NIHhealthInitiative #BioItWorld20

Out standing Plenary Keynote on #DataScience

CONNECTED DATA ECOSYSTEM FAIR Foundable, Accessible, Interoperable, reusable



Read Full Post »

e-Proceedings 19th Annual Bio-IT World 2020 Conference, October 6-8, 2020 Boston


 Virtual Conference coverage in Real Time: Aviva Lev-Ari, PhD, RN


Tweets & Retweets by @pharma_BI and @AVIVA1950 at #BioIT20, 19th Annual Bio-IT World 2020 Conference, October 6-8, 2020 in Boston

Virtual Conference coverage in Real Time: Aviva Lev-Ari, PhD, RN


October 6, 2020

  • Susan Gregurick


    Associate Director for Data Science

  • Connected Data Ecosystem – Project is FAIR
  • Data shareable
  • NIH – agenda on data: diverse sets of data: Images of MRI, cells, of organs, of communities,
  • Share images and link it to tables
  • METADATA 34PB enable search – moving Data to clouds for Large-Scalable Analysis
  • Sequence Read Archive (SRA) – DNA seq.
  • COVID-19 from around the World SRA in Cloud Partnerships enabled
  • Open Science – enhance SW tools for making research cloud-ready
  • NIH has 12 Centers: Genomics, Neuro-imaging
  • SCH – Smart & Connected Health
  • IT, Sensor system hardware, effective usability, medical interpretation, Transformative data Science
  • Cancer, Alzheimer’s, Genomics, Medical Imaging, Brain circuits,
  • Coding it Forward: Students come to NIH Virtually from home to join CIVIL DIGITAL FELLOWSHIP
  • COVID-19: repositories of data for researches:
  1. Treatment for Interventions
  2. Long term Sequelae
  3. Clinical platforms: BigData Catalyst, Allow US, ADSO, National COVID Cohort
  4. Across platforms: workflow after RAS August Deploy: Passport for researchers to access data faster, Privacy-Preserving Tokens, Interoperability across clinical COVID data bases
  5. Metadata super rich to link to other new data sources is a challenging issue to solve across studies

Scott Parker

Sinequa Corp

Director of Product Marketing

  • Disconnect between R&D & IT
  • Intelligence search Applications for sensitive information: Sinequa is a leader
  • shares one index cost for document go down & productivity increases

Rebecca Baker


Dir HEAL Initiative

  • END ADDICTION Project – NIH HEAL Initiative: 20 NIH collaborating on Studies
  • National Overdose Deaths overdose opioid drugs – synthetic Fentanyl
  • Heroin, Cocaine, Methamphetamine
  • During COVID Overdose increased during the pandemic
  • Increase in drug use overall and 67% of Fentanyl
  • Chronic Pain: Daily severe pain: can’t go to work – 25 Million
  • $500 Million/year Sustained Research Investment 25+ HEAL Research Programs
  • HEAL Initiative: Pain management, Translating research, New presention, enhance outcomes for affected newborns, novel medications options Pre-clinical translational research in Pain management
  • Improving treatments for opioid misuse & addiction
  • Opioid disorder people do not receive treatment: justice community, collaborative, ER, pregnant mothers
  • Medication-based treatment – do not stay long enough to achieve long-term recovery
  • People experience Pain differently: Muscular, neurological, : Biomarkers, endpoints, signatures, test non-addictive treatments for specific pains
  • Pain control balance of risks of long-term opioid therapy
  • HEAL Research – infant born after exposure to opioids in utero affect brain growth, born with withdrawal syndromes
  • Diversity of Data under HEAL Initiative –>> Harmonize the data
  • Common Data Elements in HEAL Clinical Research in Pain Management
  • CORE CDE & Supplemental CDE
  • Making HEAL Data FAIR: Findable, Accessible, Interpretable, Reusable
  • LINK HEAL data with communities studies, predict behaviours
  • Data sharing made available to the public
  • HEAL Data Lifecycle
  • effect of change due to change in dosage used – if dat is not collected – then we are not able to explore the relationships
  • Use the data to advance research beyond the current understanding of the problem
  • #NIHhealthInitiative


Ari Berman

BioTeam Inc

Chief Executive Officer

  • Distributed Questions from the Audience to the speakers

10:00 AM – 11:25 AM EDT on Tuesday, October 6

How to Hold on to Your Knowledge in an Agile World

Etzard Stolte

Roche Pharma

Global Head

October 7, 2020

The Chicagoland COVID-19 Commons: A Regional Data Commons Powering Research to Support Public Health Efforts

  • Matthew Trunnell

    VP & Chief Data Officer

9:00 AM – 9:20 AM EDT on Wednesday, October 7

  • Seattle & COVID – samples from Seattle Flu Study
  • Public Health Practice vs Research – Data from Human Subjects: Avoid delute the control
  • Chicagoland COVID-19 Data Commons – in Chicago
  1. Neighborhood level in Chicago
  2. common data model
  3. power efforts Predictive modeling : Case rate Total confirmed cases, Death cases
  4. Legal agreement of the Consortium
  5. https://chicagoland.pandemic
  • Commons – resources held in commons non-for profit
  • Data Commons: cloud based SW platforms that are co-located data, computing infrastructure and applications
  • Level 1: Basic, Level 2: Repeatable, Level 3: Governance Level4: Interoperability Level 5: Sustainable
  • COVID-19 Data Common: Public health authorities collects data – nor available to Research community
  • Research community need access to Public health authorities
  • Regional COVID-19 Data Commons: Reasons: Public health decision is LOCAL but specific to the Region
  • Fund raising in the communities
  • Data 1: Clinical Data for Health care Summary of incidence – Signals of ethnic dependencies and co-morbidities
  1. Safe harbor: removal of 18 identifiers
  2. Expert Determination
  • Data 2: Public Data: Environmental,
  • Data 3: Resident-Reported Data on iPhones: multiple languages supported early reports of people feeling unwell

CompBio: An Augmented Intelligence System for Comprehensive Interpretation of Biological Data

Richard Head

Washington Univ

Prof & Dir Genome Technology Access Ctr

9:20 AM – 9:40 AM EDT on Wednesday, October 7

  • Formating, data scrubbing,
  • Replace data fabric with simplified version
  • create “Memory Model” Machine learning does classification of patterns
  • dimensions are the variables
  • “Hyper-dimensional – ingestions of abstracts and articles
  • Example; IL^: Aggregate Memories to create a NORMALIZED Aggregate Memory
  • Relationships explored
  • Complex Knowledge Patterns Generated by the PCMM: Compared Utilization
  • Augmented AI System: Combination PCMM with AI
  • Literature mining CompBio
  • Evidence of Utility: PCMM – Accepted or Published Research Leveraging PCMM Applications
  • Example 1: Cell Metabolism CompBio – A person formulate hypothesis
  • Example 2: Analysis of RNA-Seq a rare mutational subtype of GBM
  1. Hypothesis –>> BioExplorer –>> Multiple relations revealed
  2. Example 3: Animal Models to Human Disease: CompBio – Crohn’s Assertion Engine

Summary – Augmented AI Platform for Biological DIscovery

  • PCMM – Memory modle – hyperdimensional
  • AAI Infrastructure
  • Knowledge map libraries
  • In development Medical Discoveries

PercayAI Team – commercial Development

Kingdom Capital


Precision Cancer Medicine

  • Jeffrey Rosenfeld

    Rutgers Univ

    Asst Prof

9:40 AM – 10:00 AM EDT on Wednesday, October 7


  1. Hereditary cancer sequencing – BRCA
  2. Tumor cancer sequencing
  • Panel Sizes – 500-1000x – the bigger the panel – more computational time more data need be investigated
  1. Hotspot Panels,
  2. Gene Panels,
  3. Exomes
  • Cell free DNA Testing – Liquid biopsy
  1. Apoptosis
  2. Necrosis
  • FoundationONE
  • Patient Results: ALL mutations found, Mutation Burden,
  • Gene EGFR – no mutation
  • For every Mutation what Therapy is recommended for approved drugs
  • Clinical Trials for the mutations
  • VARIANTS of unknown significance
  • WORKFLOW: many MDs send sample get 38pps report
  • Genomic Classification and Prognosis in AML: Mutations subset and therapies available
  • Paradigm Shift in Classification
  1. 2013 – Lung Adenocarcinoma <<<- –
  2. 2011 – another cancer


mTOR System: A Database for Systems-Level Biomarker Discovery in Cancer

  • Iman Tavassoly – CANCELLED

    C2i Genomics

    Physician Scientist

10:20 AM – 10:40 AM EDT on Wednesday, October 7
Add to Calendar

mTOR system is a database I have designed for exploring biomarkers and systems-level data related to mTOR pathway in cancer. This database consists of different layers of molecular markers and quantitative parameters assigned to them through a current mathematical model. This database is an example of merging systems-level data with mathematical models for precision oncology.

FAIR and the (Tr)end of Data Lakes

  • Kees Van Bochove

    The Hyve

    Founder & Owner

10:20 AM – 10:40 AM EDT on Wednesday, October 7

Normalizing Regulatory Data Using Natural Language Processing (NLP)

  • Qais Hatim, Dr.


    Visiting Assoc

David Milward


Senior Director, NLP Technology

10:40 AM – 11:00 AM EDT on Wednesday, October 7

  • ML focus on Disease
  • NLP – different words have same meanings, different expression same meaning, grammer & Meaning
  • Normalizes output
  1. Disease
  2. Genes
  3. Dates
  4. Mutations
  • Transform Unstructured into structured
  • Identifying Gaps in adverse events Labelling: Pain and Opioids
  • Improve drug safety
  • ChemAxon

Supplemental Approval Letters

Coding for Adverse events: “derived values of possible interest”

  • Use of Prominent Terminologies used at the FDA: UNII – Translation into ANSI tesaurus standard
  • Matching to the Variation found within Real Text: synonyms
  • Using ML for Normalization in Disease Context
  • Deep Learning PRE-TRAINING APPROACH for annotated date = supervised learning
  • A set of rules to handle overlapping entities
  • normalized the amp extracted from concepts
  • BERN and Terminologies: BioBERN, PubMed Central, PubMed Articles
  • NER – Named Entity Recognition
  • Evaluation of the Approach


NLP, ML, Hybrid methods, Terminology +ML methods

Building an Artificial Intelligence-Based Vaccine Discovery System: Applications in Infectious Diseases & Personalized Neoantigen-Related Immunotherapy for Treatment of Cancers

  • Kamal Rawal

    Amity Univ

    Assoc Prof

10:40 AM – 11:00 AM EDT on Wednesday, October 7

  • Classification of proteins
  • Data Collection
  • Feature Selection – Most important from 1447 features
  • Deep learning Model: Vaxi-DL: Layers, compilation
  • Overfitting Model strategy
  • Balancing Imbalanced
  • Hyper parameter tuning: Internal parameter of the model
  • Stratified K-Fold Training and Validation
  • Ensembling Approach: many weak classifier to create a STRONG Classifier
  • ROC Curve: Ensemble by Consensus
  • Before and after calibration
  • Benchmarking the system: Vaxi-DL Ensemble by Average vs by Consensus
  • SYSTEM developed: Type protein – find results
  • Rare disease CHARGE Syndrome was used for validation
  • Application to COVID-19 – Methodology
  • Application on Cancer: Which peptide can be used as antigen for prediction of immunogenic peptides


Using GPU Computing to Evaluate Variant Calling Strategies

  • George Vacek

    NVIDIA Corp

    Sequencing Strategic Development

  • Eriks Sasha Paegle

    Dell EMC

    Senior Business Development Manager

11:15 AM – 11:30 AM EDT on Wednesday, October 7

  • Navidia: 100 Genomes Cohort generated at NY Genome Center  NHGRI
  • Navidia Parabricks mentioned AZURE
  • Dell EMC: Test environment: Dell Technology Cloud Storage for Multi-Cloud: resources across GCU, AWS, Azure in Northern Virginia regions
  • Multi-Cloud ease of use: without Multi-cloud vs with Faction multi-clouds
  • Ease of use
  • Deep Averaging Network (DAN)
  • NVIDIA CLARA PARABRICK TOOLKIT: Short & Long read, Deep learning, Data Analytics, ML
  • Reference applications – host of customized applications, 3rd Party App, Libraries
  • GPU (Genomics PUs) – Drop in tools for Somatic Pipelines : Clara Parabricks v3.5
  • Partnership of NVIDIA and Petagene announced at BioIT20 – NGS Data compretion
  • Petagene technology allows lossless compression reduce storage costs
  • Project with Sanger Institute – Optimizing Muto-graph Identification
  • completed run in 24 hours instead 31 days
  • Parabricks is a joint project Dell/EMC and NVIDIA

PLENARY KEYNOTE: Game On: How AI, Citizen Science, and Human Computation Are Facilitating the Next Leap Forward

12:30 PM – 1:55 PM EDT on Wednesday, October 7

  • Allison Proffitt

    BioIT World & Diagnostics World

    Editorial Dir

Seth Cooper

Northeastern Univ

Asst Prof

  • Foldit – Scientific discovery using video games in the domain of protein structures and folding
  • Combine Human with machine
  • Score based on competition among players for higher score and collaboration in groups
  • Problem: Chemistry give input.
  • Puzzle available for one week on the Internet, games ongoing,
  • Solution analysis – continually IMPROVE the structure of Protein folding
  • Foldit Tutorials offered online
  • Player accomplishments: Articles by scientists ,
  • development of algorithms discovery
  • Electron Density fitting
  • Enzyme re-design
  • de novo Protein Design – named authors on a paper – scientific process
  • Future Work: Coronovirus Spike protein
  • Small molecule design
  • narrative
  • virtual reality – 3D protein structure for manipulation
  • htp://Fold.it/Educator Mode
  • htp://Fold.it/standalone
  • http://fold.it/
  • seth.cooper@gmail.com

Lee Lancashire, CIO

Cohen Veterans Bioscience – not for profit – advancing Brain health

  • Biotyping and stratification
  • Biomarkers
  • Omics data
  • All meet in the Common – Brain Commons: Clinician, Geneticist, Scientist, Bioinformatician, R Studio, Python, Jupyterhub
  • Multidimensional Biomarkers in Multiple Sclerosis


Pietro Michelucci

Human Computation Institute


  • Why machine can’t tackle AI on their own and AI can’t do Precision Medicine on their own
  • young people more than others N of 1 – Precision Mediicne
  • Scandinavians and Russians are immune
  • AI & Precision Medicine: can’t solve the complexity of messy data vs big data
  • Messy data: heterogeneous multidimensional, to many combinations to explore, select which combination to explore vs let the machine generate all the combination and do analysis on all and discover PATTERN
  • Causal vs spurious
  • Logical reasoning, right brain abstract and short cuts – Human brain does routinely
  • Human do better on context: Not all info is in pixels such as context
  • #ADS – SBIR suspected the hypothesis to be tested
  • improving crowd wisdom methods: 20 input by different people PLUS machine
  • combine crowd answers with machine faster and improved accuracy
  • Machine has no intuition – machine bias of Human and of machine is similar
  • Wisdom of Crowd: Bootstrapping hybrid Intelligence: CIVIUM
  • bit.ly/civiumintro



Jerome Waldispuehl

McGill Univ

Assoc Prof

  • visualization of nucleotide – tools for
  • http://phylo.cs.mcgill.ca
  • GAME: Phylo DNA Puzzles: Goal 202, Score, Top Score
  • Whole-genome multiple
  • Phylo: 350,000 participants, 1MM solutions Improve 40 to 95% computer alignments
  • education & science outreach – reach out to the Public
  • Borderlands Science + game designers: 1MM participants 50MM solutions
  • Joint initiative with a major science project
  • Improvement of 16S rRNA
  • MMOS company in Science games

Towards AI-Guided Cell Profiling of Drugs with Automated High-Content Imaging

Ola Spjuth

Uppsala Univ


2:10 PM – 2:30 PM EDT on Wednesday, October 7

  • Accelerate drug discovering using AI automation in collaboration with AstraZeneca
  • Closed-loop (autonomous) experimentation
  • collect the best data at the minimal cost
  • Active learning: query active learning model
  • Exploitation [best predictions from given data] vs Exploration
  • Automation in Life Science: micro-plate, stack of micro-plates
  • Robot scientist: come out with hypothesis and conduct research
  • high-throughput biology: Robots vs Disease
  • Cell painting: Imaging with multiplexed dyes: genetic or chemical perturbations
  • classify images into biological mechanisms
  • combinations of toxicants
  • A discovery engine: Toxicity, Efficacy, mechanisms combinations
  • Automating our cell-based lab: fixed setup
  • Open source lab automation suite: Github https://github.com/pharmbio/imagedb
  • Dealing with large scale data [TensorFlow]
  • STACKn.com – AI modeling Life cycle
  • HASTE: Hierarchical analysis of Spacial and Temporal
  • https://pharmb.io

Advanced Imaging and AI Technologies Providing New Image and Data Analysis Challenges and Opportunities

Richard Goodwin


Dir & Head of Imaging & AI

2:30 PM – 2:50 PM EDT on Wednesday, October 7

AstraZeneca is empowering its scientists to see the complexity of a disease in unprecedented detail to enable effective development and selection of new medicines. This is enabled though the use of an extensive range of cutting-edge imaging technologies that support studies into the efficacy and safety of drugs through the R&D pipeline. This presentation will introduce the range of novel in vivo and ex vivo imaging technologies employed, describe the data challenges associated with scaling up the use of molecular imaging technologies, and address the new data integration and mining challenges. Novel computational methods are required for large cohort imaging studies that involve tissue based multi-omics analysis, which integrate spatial relationships in unprecedented detail.

  • Small molecule – not suitable for complex diseases
  • focus on quality vs quantity
  • compound for commercial value
  • right safety
  • Imaging supports R&D: Molecular, medical, big data and AI
  • convergence of ML for decision making
  • Spatial imaging: morphology
  • Multiplex imaging like MRI
  • Multimodal analysis: tissue data and invivo holistic understanding of drug delivery
  • spacial transcriptomics proteomics: imaging platforms in R&D
  • AZ invest in imaging technologies already impacting projects: AI-empowered imaging delivering subcellular resolution
  • Mass Spec Imaging (MSI) – ex-vivo imaging techniques- spatial distribution of molecular
  • cartography of cancer: Drug metabolite distribution – NEW understanding of disease and drug distribution in tissue
  • DATA: digitization, integration, analysis, exploration
  • Digital pathology and beyond – AI Image Analysis – AI outperform pathololigst and radiologists
  • Data volume and dimensionality challenge and opportunity
  • Data volume and dimensionality: complete image
  • AZ Oncology – disease is understood for drug discovery using Imaging technology

PANEL: Framework and Approach to Unlock the Potential of Quantum Computing in Drug Discovery

  • Brian Martin

    AbbVie Inc

    Research Fellow & Head

Philipp Harbach

Merck KGaA

Head of In Silico Research in Germany

  • chemistry and manufacturing with QC – end user in Pharmaceutical
  • VC at Merck ask expert in Merck to guide investment of Merck in QC
  • 50 people across Merck [three areas at Merck [Pharmaceutics, Animal Health, Diagnostics]

Celia Merzbacher

SRI Intl

Assoc Dir Quantum Economic Dev Consortium (QEDC)

  • Methodology from Pistoia to be used in QC
  • QC R&D developed in parallel
  • Simulation of all the components is possible

John Wise

Pistoia Alliance Inc (2007)

We are a global, not-for-profit members’ organization working to lower barriers to innovation in life science and healthcare R&D through pre-competitive collaboration.


  • How Pharmaceutical Industry can benefit from quantum computing
  • 9 of 10 big Pharma are members of the Pistoia Alliance
  • IP created on specifications


Zahid Tharia

Pistoia Alliance Inc


  • Barriers to adoption of quantum computing (QC) in Pharma is training of staff and skills in the IT aspects of QC

3:10 PM – 4:00 PM EDT on Wednesday, October 7

In 2019, major life sciences companies mobilized to form a pre-competitive, collaborative quantum computing working group (QuPharm) and delineate a framework and approach to accelerate realizing the potential of quantum acceleration in drug discovery. Learn from industry thought leaders on how to valuate and map problems into quantum algorithms, set up organizations to enable and scale quantum computing pilots and establish effective cross-industry, tech, and start-up collaborations.

Session Wrap-Up Panel Discussion

Etzard Stolte, PhD

Roche Pharma

Global Head

  • no official policy
  • 2020 it become important to be mentioned by management as a potential use in automation
  • continual updates needed – it is manual and a disillusion without a business case
  • Roche try to commodatized tools in AI as Classifiers, automation,

Samiul Hasan


Scientific Analytics and Visualization Director

  • AI is perceived as having potential to take off on its own
  • POC – demonstrate the vlaue
  • Proof of Concept – Semantic report – a story vs one off
  • demonstration of value is needed and is continuous



Bin Li

Millennium The Takeda Oncology Co

Dir Computational Biology & Translational Medicine

  • ML community at Takeda
  • Positive to have, how successful not much yet – not used much yet
  • some models are pretty good do not need improvement

Jens Hoefkens


Industry Principal Director

  • Future of AI as support to the Human intuition vs replacement of humans
  • automation like pathology classification
  • Machine and Human working together – not as maker of decisions in clinical settings
  • POC cycle prevent production conversion
  • where is the highest value for production and deploy with scale
  • AI Assisted to sift Genomics data
  • BERT term extraction from Google technology to make sense of data assist the user
  • ML
  • RPA – Robotic concept extraction – 80% accuracy needed by scientists

4:00 PM – 4:20 PM EDT on Wednesday, October 7

October 8, 2020

Trends from the Trenches

Kevin Davies, PhD

CRISPR Journal

Exec VP & Exec Editor

Timothy Cutts

Wellcome Sanger Institute


  • Collaborations with scientists in subSahara
  • pay for data analysis – ownership issues
  • in UK 6 Labs for the entire countries: all send the data to Wellcome Sanger Institute for analysis
  • Metadata is the problem – coordination of each of the 6 labs to send the metadata created problems


  • Cindy Crowninshield

    Cambridge Healthtech Institute

    Executive Event Director

Vivien Bonazzi

Deloitte Consulting LLP

Managing Dir & Chief Biomedical Data Scientist

  • How organizations use bioscience data
  • Data Ecosystem: Hardware and software: Cloud and other options
  • Operationalize the two trends:
  1. Platforms: End to end solutions resulting in SILOS, systems are native: data ingestions
  2. Data Commons: Open arch, open source – integration and interdependence issues
  • Biomedical Agencies in NIH various Organizations in the Private sector: Sharing data must be more effective
  • IT, Data Science, Management – COVID – reduced barriers
  • Leadership: Different voices from different people
  • Data strategies & Governance not the whole but small pieces , incentives to share data

Chris Dagdigian

BioTeam Inc

Sr Dir

  • 10th Anniversary to Trends from the Trenches
  • IT infrastructure changes
  • Research IT:
  1. Genomics & BioInformatics
  2. Image-based data acquisition and analysis: CryoEM, 3D microscopy, fMRI image analysis
  3. ML and AI – GPU FPGAs, neural processors: Drive in organizations: bottom up
  4. Chemistry & Molecular Dynamics
  5. Storage and exploitation of data for insights
  6. 2020 Hype vs Reality
  7. Scientific Data: managing and understanding, data movement, federated/access
  8. Big Data: data storage, management & governance standards vs human curated data
  9. IT needs guidance and decisions from Science Team
  10. Culture change for joint management by Science & IT: data fidelity, attribution, allocation top down
  11. NERSC File System quotas & Purging overviewSilos & So
  12. Petabytes of open access data, collaborative research resources: Data rich environments
  13. Data Lakes: Gen3 Data Commons
  14. Data hygiene:metadata is Science side vs IT
  15. Biased Data: Model & Data Bias
  • Failed Predictions:
  1. Compilers matter again – not True
  2. CPU benchmarking is back – WRONG
  3. AMD vs Inter arm64 vs both
  4. Policy driven auto-tiering storage – wrong, USER self-service for tiering, movement and archive decision. Let researchers tier/move/archive based on Project, Experiment or Group
  5. Single storage namespace – Wrong: Data intensive science: scientists must do some IT jobs themselves

Kjiersten Fagnan

Lawrence Berkeley Natl Lab


  • Genome Project of DOE
  • Data management with other agencies
  • COVID: Collaborations, breaking down barriers, small labs and big labs ALL generate data and sharing
  • that collaboration is needed regardless of COVID – not happen
  • If twoo big one lab can’t handle it all
  • Funding and training does not support the Collaborations because next round of funding depend on individual publications – which requires silos
  • Data cleaning and data management:Standards are annoying and painful – not needed for publishing the results as soon as possible – just that someone else will be able to use it
  • Facebook have hundred of curators – the curation of scientific data requires same hunsrands od curators that are SCIENTISTS and Data scientists

Matthew Trunnell

Pandemic Response Commons, Seattle

VP & Chief Data Officer

  • Data commons for intra- and inter-mural data sharing
  • ML is needed for Data commons
  • Progress in FAIRness, NIH efforts driven by Susan Gregory across NIH all centers
  • Large amount of B-to-B Data sharing UBER sharing with a jurisdiction they operate
  • SNOWFLAKES – new cloud technology
  • COVID – plays an accelerator
  • Cancer vs COVID – transfer knowledge from COVID to Cancer

9:00 AM – 10:40 AM EDT on Thursday, October 8

The “Trends from the Trenches” will celebrate its 10th Anniversary at Bio-IT! Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, and cloud that are involved in supporting data-intensive science. In 2020, Chris will give the “Trends from the Trenches” presentation in its original “state-of-the-state address” followed by guest speakers giving podium talks on relevant topics. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session.


  • Project vs enterprise – Sequencing for internal research vs for clients’ data
  • Tension in governmental agencies – no robust solutions: IT, Science, Management
  • different Use cases need different infrastructure: HW & SW: Storage and data exploration
  • Data Lakes: rule base, enterprising – training is an issue in organizations
  • Management, Scientists, IT in enterprises – terra byte of storage, budgets issues, conversation on the limits that IT can ofer putting more burden on the Scientists for triage and quotas – business and scientific value
  • New capabilities in organizations: hands on in data management tactical of data management not IT bur data engineering
  • Citizen Science: privacy vs plants and microbes – no privacy issues
  • Incentives need be changed for Data Citations in addition to Papers
  • Curation Citations as Authorship citation
  • Data sharing in Cancer: GEN3 – NCI Data Commons, Data Governance and Data Permission (Access) – NCI does work in data commons – much data outside this space
  • EBI – in UK Sanger Institute has the infrastructure in one place
  • Migrating Project based Data structure: that involves scientist decisions that should not be a quota (storage is full)  in the IT space
  • Human to Human communications vs tools for data migration
  • Which Organizations get the data curation and annotation well: Subject matter from day 1 – hard to teach vs data engineering skills; TEAM as a solving is critical in Biomedical space no incentives
  • BBC – Meta tagging system is outstanding
  • NCAST TRANSLATOR – across organizations
  • Changing incentives – MORE organizations will do that task better
  • Common metadata across domains with predict uses of data in the Future – collaboration of CS to create in the science organization tagging like in BBC
  • Chris Anderson

    Clinical OMICs

    Editor in Chief

Ian Fore


Sr Biomedical Informatics Program Mgr

  • NCI – Cancer Data Commons – concierge services to organization on data services

Ravi Madduri – CVD large cohort

Univ of Chicago



  • Lara Mangravite

    Sage Bionetworks


  • Kees Van Bochove

    The Hyve

    Founder & Owner

11:10 AM – 11:30 AM EDT on Thursday, October 8


BREAKOUT: Driving Scientific Discovery with Data / Digitization

  • Timothy Gardner

    Riffyn Inc


11:35 AM – 12:00 PM EDT on Thursday, October 8


PLENARY KEYNOTE – 12:00 PM – 1:25 PM EDT on Thursday, October 8

Robert Green

Brigham & Womens Hospital

Co-founder of Genome Medicine

Prof & Dir G2P Research

  • Combining data to rapidly analyze COVID-19 Patients –
  • identify BIOMARKERS for vulnerability
  • Preventive Genomics – Angelina Jolly’s musectomy as a preventive clinical condition
  • Patients access to own genomics data
  • Population screening – to predict risks
  • Genetic Testing to Consumer: Preventive Genomics: conflated genotyping/sequencing and labs/care providers
  • Genetic Testing to Consumer: COST & Benefits – UNCLEAR
  1. diagnosis of unsuspected genetic disease
  2. stratification for surveillance
  3. which pieces of the puzzle need to be brought to bear in patient care
  4. Categories and Reporting criteria: Gene-Disease validity vs Variant Pathogenicity –>> Clinic
  5. MedSeq Project: 10MM randomized study – all genome info shared with Patient, other arm only selective genome data shared with patient: 100 patients 20% carry monogenic condition: Polygenic risk scores:
  6. CAD – high Cholesterol biomarker, A-FIb, DM2, 52% Women 48% Men
  7. No high risk error by PCP discussing and disclosing the results of the sequence
  8. Filtering the results: Indication -based testing vs Screening
  9. BabySeq Project: INFANTS sequencing to prevent disease: 11% carry a mutation in a monogenic gene for a monogenic condition -like abnormal narrowed aorta
  10. MDR – Monogenic Disease Risk
  11. MilSeq Project: US Air Force – Military active duty
  12. 5,8,10 – are all Polygenic studies
  13. Polygenic Risk Scores – High risk
  14. Classification need to be repeated every few years (2 years – re-sequence) due to changes in health and to efficiencies in new discovery in curated data which is improving as on-going
  • Risk benefit – UTILITY – Partners Biobank Return of Genomic Results
  • No interest on knowing by the Public NCCN criteria on chart review 20%
  • Brigham Preventive Genomics via telemedicine – First in the country
  • APC mutation after colonoscopy – obstruction diagnosed
  • @robertgreen


Juergen Klenk

Deloitte Consulting LLP


  • Bradykinin hypothesis for COVID-19
  • liberate the data: People , Data Risk


Natalija Jovanovic


Chief Digital Officer

  • AI in Pharma
  • Vaccine preventable diseases – produce 1Billion vaccines a year
  1. reduction of incidence: Pertusis – 92% eradication
  • manage risk profile
  • Science mechanism translatable to machines
  1. high automated ingestible data for AI
  2. Digital is about people: Good data Good algorithms Good GUI

Vivien Bonazzi

Deloitte Consulting LLP

Managing Dir & Chief Biomedical Data Scientist

12:00 PM – 1:25 PM EDT on Thursday, October 8
Add to Calendar

12:00 Organizer’s Remarks

Cindy Crowninshield, RDN, LDN, Executive Event Director, Cambridge Healthtech Institute

12:05 Keynote Introduction

Juergen A. Klenk, PhD, Principal, Deloitte Consulting LLP

12:15 Toward Preventive Genomics: Lessons from MedSeq and BabySeq

Robert Green, MD, MPH, Professor of Medicine (Genetics) and Director, G2P Research Program/Preventive Genomics Clinic, Brigham & Women’s Hospital, Broad Institute, and Harvard Medical School

12:40 AI in Pharma: Where We Are Today and How We Will Succeed in the Future

Natalija Jovanovic, PhD, Chief Digital Officer, Sanofi Pasteur

1:05 LIVE Q&A: Session Wrap-Up Panel Discussion


Juergen A. Klenk, PhD, Principal, Deloitte Consulting LLP

Vivien R. Bonazzi, PhD, Managing Director & Chief Biomedical Data Scientist, Deloitte Consulting LLP

Below are included sessions that are NOT included above. I covered ONLY the above sessions.

Session Availability


10:15 am ET – NIH’s Strategic Vision for Data Science

Susan K. Gregurick, PhD, Associate Director, Data Science (ADDS) and Director, Office of Data Science Strategy (ODSS), National Institutes of Health

Rebecca Baker, PhD, Director, HEAL (Helping to End Addiction Long-term) Initiative, Office of the Director, National Institutes of Health


11:55 am ET – W1: Data Management for Biologics: Registration and Beyond

Monica Wang, PhD, Principal Technology Lead, Scientific Informatics, Takeda

Sebastian Schlicker, Head, Biologics Business Operations, Genedata AG

11:55 am ET – W2: A Crash Course in AI: 0-60 in Three

Peter V. Henstock, PhD, Machine Learning & AI Lead, Software Engineering & Statistics & Visualization, Pfizer Inc.

11:55 am ET – W3: Data Science Driving Better Informed Decisions

Meghan Raman, Director, R&D Data Lake & Analytics, Bristol Myers Squibb Co.

Nigel Greene, PhD, Director & Head Data Science & Artificial Intelligence, Drug Safety & Metabolism, AstraZeneca Pharmaceuticals

2:15 pm ET – W4: Digital Biomarkers and Wearables in Pharma R&D and Clinical Trials

Danielle Bradnan, MS, Research Associate, Digital Health and Wellness, Lux Research

Graham Jones, PhD, Director, Innovation, Technical Research and Development, Novartis

Ariel Dowling, PhD, Director of Digital Strategy, Data Sciences Institute, Research and Development, Takeda Pharmaceuticals

2:15 pm ET – W5: AI-Celerating R&D: Foundational Approaches to How Emerging Technologies Can Create Value

Brian Martin, Head of AI, R&D Information Research, Senior Principal Data Scientist, AbbVie

2:15 pm ET – W6: Dealing with Instrument Data at Scale: Challenges and Solutions

Rachana Ananthakrishnan, Executive Director, Globus, University of Chicago

Michael A. Cianfrocco, PhD, Assistant Professor, Department of Biological Chemistry and Research Assistant Professor, Life Sciences Institute, University of Michigan

Brigitte E. Raumann, Product Manager, Globus, University of Chicago

3. Connect with peers from across the industry during these dedicated networking times.

9:25 am ET – Virtual Exhibit Hall Open

1:00 pm ET – Speed Networking

Looking to meet fellow attendees and have meaningful conversations – just as you would at an in- person event? This is the perfect way to achieve just that. Get to know your fellow attendees by joining this interactive speed networking event. To participate, each attendee will be paired at random with another fellow attendee and given a chance to interact for 7 minutes in a private zoom room. Once the 7 minutes are up, you will move on to meet with another selected attendee. Maximize your networking at the meeting and join in.

2:00 pm ET – Stretch Break

Take a minute to revitalize and join our friends from VOS Fitness for a stretch break. The professional trainer from VOS will bring you through some easy moves that will help with screen fatigue and ease your muscles after a long day of sitting at the computer. All moves can be done right at your desk and is appropriate for all fitness levels.

4. Game On!

Earn points by completing the activities listed on our Game tab. Some activities will only award points once, but others will award you every time you do it – so the more involved you are in the virtual event, the more points you will earn! You can start earning points one week before the event – so get ready to start sending meeting invitations, exploring our virtual expo and planning your schedule.

Attendees in the top 5% of points earned when the game closes at the end of the conference will be eligible to win a gift card worth $200 USD!

5. Take part in 1-on-1 networking with an easy-to-navigate profile search and scheduling platform.

  • Check out your recommended connections flagged as “Want to Meet” in the People Tab. These connections were chosen based on your similar roles, companies and conference program interests.
  • Take a moment to add relevant interest tags to your profile. Then search and connect with participants who have the same interests.
  • Engage with technology leaders in their booths and view relevant videos and demos.
  • Take part in live Q&A with speakers and participants following each educational session.
  • Create and join in ad hoc group discussions throughout the event.
  • Watch Our Quick Tutorial on how to Maximize Networking Opportunities: CII’s Virtual Event Platform – Networking

10:00 AM – 11:25 AM EDT on Tuesday, October 6
Add to Calendar


10:00 Welcome Remarks

Cindy Crowninshield, RDN, LDN, Executive Event Director, Cambridge Healthtech Institute

10:05 Keynote Introduction

Scott Parker, Director of Product Marketing, Marketing, Sinequa

10:15 PLENARY KEYNOTE PRESENTATION: NIH’s Strategic Vision for Data Science

Susan K. Gregurick, PhD, Associate Director, Data Science (ADDS) and Director, Office of Data Science Strategy (ODSS), National Institutes of Health

Rebecca Baker, PhD, Director, HEAL (Helping to End Addiction Long-term) Initiative, Office of the Director, National Institutes of Health

11:05 LIVE Q&A: Session Wrap-Up Panel Discussion


Ari E Berman, PhD, CEO, BioTeam Inc

Session Availability

Wednesday, October 7

9:00 AM EDT

    The Emergence of the AI-Augmented Drug Discoverer

    9:00 AM – 9:20 AM EDT

    Mark Davies


9:20 AM EDT

    Generative Chemistry and Generative Biology for AI-Powered Drug Discovery

    9:20 AM – 9:40 AM EDT

    Alex Zhavoronkov

    Insilico Medicine

9:40 AM EDT

    Talk Title to be Announced

    9:40 AM – 11:00 AM EDT

    Grace Wenjia You

    EMD Serono

11:00 AM EDT

    Coupling AI and Network Biology to Generate Insights for Disease Understanding and Target ID

    11:00 AM – 11:30 AM EDT
    Cortellis, A Clarivate Analytics Solution logo

    Alexander Ivliev


11:30 AM EDT

    Session Wrap-Up Panel Discussion

    11:30 AM – 11:50 AM EDT



OLD Material


Welcome to Bio-IT World 2020

In the spirit of open collaboration, the world’s premier bio-IT conference will bring together the community to focus on how we are using technologies and analytic approaches to solve problems, accelerate science, and drive the future of precision medicine. With a focus on AI, data science and other “data-driven” technologies that are advancing biomedical research, drug discovery and healthcare, the Bio-IT World Conference & Expo ’20 will bring together more than 3,000 participants to the Seaport World Trade Center in Boston from October 6-8, 2020.

The participants will have the chance to meet and share research/ideas with leading life sciences, pharmaceutical, clinical, healthcare, informatics and technology experts.




TRACK 1 Data Storage and Transport VIEW

TRACK 2 Data and Metadata Management VIEW

TRACK 3 Data Science and Analytics Technologies VIEW

TRACK 4 Software Applications and Services VIEW

TRACK 5 Data Security and Compliance VIEW

TRACK 6 Cloud Computing VIEW

TRACK 7 AI for Drug Discovery VIEW

TRACK 8 Emerging AI Technologies VIEW

TRACK 9 AI: Business Value Outcomes VIEW

TRACK 10 Data Visualization Tools VIEW

TRACK 11 Bioinformatics VIEW

TRACK 12 Pharmaceutical R&D Informatics VIEW

TRACK 13 Genome Informatics VIEW

TRACK 14 Clinical Research and Translational Informatics VIEW

TRACK 15 Cancer Informatics VIEW

TRACK 16 Open Access and Collaborations


2020 Plenary Keynote Speakers

Rebecca Baker, PhD

Director, HEAL (Helping to End Addiction Long-term) Initiative, Office of the Director, National Institutes of Health

Vivien Bonazzi, PhD

Chief Biomedical Data Scientist, Managing Director, Deloitte

Tim Cutts, PhD

Head, Scientific Computing, Wellcome Trust Sanger Institute

Chris Dagdigian

Co-Founder and Senior Director, Infrastructure, BioTeam, Inc

Kevin Davies, PhD

Executive Editor, The CRISPR Journal, Mary Ann Liebert, Inc.

Kjiersten Fagnan, PhD

Chief Informatics Officer, Data Science and Informatics Leader, DOE Joint Genome Institute, Lawrence Berkeley National Laboratory

Robert Green, MD, MPH

Professor of Medicine (Genetics) and Director, G2P Research Program/Preventive Genomics Clinic, Brigham & Women’s Hospital, Broad Institute, and Harvard Medical School

Susan K. Gregurick, PhD

Associate Director, Data Science (ADDS) and Director, Office of Data Science Strategy (ODSS), National Institutes of Health

Natalija Jovanovic, PhD

Chief Digital Officer, Sanofi Pasteur

Pietro Michelucci, PhD

Director, Human Computation Institute

Matthew Trunnell

Vice President and Chief Data Officer, Fred Hutchinson Cancer Research Center

Sponsors &
Conference Tracks

Read Full Post »

Group of Researchers @ University of California, Riverside, the University of Chicago, the U.S. Department of Energy’s Argonne National Laboratory, and Northwestern University solve COVID-19 Structure and Map Potential Therapeutics

Reporters: Stephen J Williams, PhD and Aviva Lev-Ari, PhD, RN


This illustration, created at the Centers for Disease Control and Prevention (CDC), reveals ultrastructural morphology exhibited by coronaviruses. Note the spikes that adorn the outer surface of the virus, which impart the look of a corona surrounding the virion, when viewed electron microscopically. A novel coronavirus virus was identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China in 2019.

Image and Caption Credit: Alissa Eckert, MS; Dan Higgins, MAM available at https://phil.cdc.gov/Details.aspx?pid=23311


New coronavirus protein reveals drug target

Image of newly mapped coronavirus protein, called Nsp15, which helps the virus replicate.

Image Credit: Northwestern University

Image of newly mapped coronavirus protein, called Nsp15, which helps the virus replicate.

How UC is responding to the coronavirus (COVID-19)

The University of California is vigilantly monitoring and responding to new information about the coronavirus (COVID-19) outbreak, which has been declared a global health emergency.

Get UC news and updates on this evolving situation.

The 3-D structure of a potential drug target in a newly mapped protein of COVID-19, or coronavirus, has been solved by a team of researchers from the University of California, Riverside, the University of Chicago, the U.S. Department of Energy’s Argonne National Laboratory, and Northwestern University.

The scientists said their findings suggest drugs previously developed to treat the earlier SARS outbreak could now be developed as effective drugs against COVID-19.

The initial genome analysis and design of constructs for protein synthesis were performed by the bioinformatic group of Adam Godzik, a professor of biomedical sciences at the UC Riverside School of Medicine.

The protein Nsp15 from Severe Acute Respiratory Syndrome Coronavirus 2, or SARS-CoV-2, is 89% identical to the protein from the earlier outbreak of SARS-CoV. SARS-CoV-2 is responsible for the current outbreak of COVID-19. Studies published in 2010 on SARS-CoV revealed inhibition of Nsp15 can slow viral replication. This suggests drugs designed to target Nsp15 could be developed as effective drugs against COVID-19.

Adam Godzik
Adam Godzik, UC Riverside professor of biomedical sciences
Credit: Sanford Burnham Prebys Medical Discovery Institute

“While the SARS-CoV-19 virus is very similar to the SARS virus that caused epidemics in 2003, new structures shed light on the small, but potentially important differences between the two viruses that contribute to the different patterns in the spread and severity of the diseases they cause,” Godzik said.

The structure of Nsp15, which will be released to the scientific community on March 4, was solved by the group of Andrzej Joachimiak, a distinguished fellow at the Argonne National Laboratory, University of Chicago Professor, and Director of the Structural Biology Center at Argonne’s Advanced Photon Source, a Department of Energy Office of Science user facility.

“Nsp15 is conserved among coronaviruses and is essential in their lifecycle and virulence,” Joachimiak said. “Initially, Nsp15 was thought to directly participate in viral replication, but more recently, it was proposed to help the virus replicate possibly by interfering with the host’s immune response.”

Mapping a 3D protein structure of the virus, also called solving the structure, allows scientists to figure out how to interfere in the pathogen’s replication in human cells.

“The Nsp15 protein has been investigated in SARS as a novel target for new drug development, but that never went very far because the SARS epidemic went away, and all new drug development ended,” said Karla Satchell, a professor of microbiology-immunology at Northwestern, who leads the international team of scientists investigating the structure of the SARS CoV-2 virus to understand how to stop it from replicating. “Some inhibitors were identified but never developed into drugs. The inhibitors that were developed for SARS now could be tested against this protein.”

Rapid upsurge and proliferation of SARS-CoV-2 raised questions about how this virus could become so much more transmissible as compared to the SARS and MERS coronaviruses. The scientists are mapping the proteins to address this issue.

Over the past two months, COVID-19 infected more than 80,000 people and caused at least 2,700 deaths. Although currently mainly concentrated in China, the virus is spreading worldwide and has been found in 46 countries. Millions of people are being quarantined, and the epidemic has impacted the world economy. There is no existing drug for this disease, but various treatment options, such as utilizing medicines effective in other viral ailments, are being attempted.

Godzik, Satchell, and Joachimiak — along with the entire center team — will map the structure of some of the 28 proteins in the virus in order to see where drugs can throw a chemical monkey wrench into its machinery. The proteins are folded globular structures with precisely defined functions and their “active sites” can be targeted with chemical compounds.
The first step is to clone and express the genes of the virus proteins and grow them as protein crystals in miniature ice cube-like trays. The consortium includes nine labs across eight institutions that will participate in this effort.

Above is a modified version of the Northwestern University news release written by Marla Paul.

Read Full Post »

Medicine in 2045 – Perspectives by World Thought Leaders in the Life Sciences & Medicine

Reporter: Aviva Lev-Ari, PhD, RN


This report is based on an article in Nature Medicine | VOL 25 | December 2019 | 1800–1809 | http://www.nature.com/naturemedicine

Looking forward 25 years: the future of medicine.

Nat Med 25, 1804–1807 (2019) doi:10.1038/s41591-019-0693-y


Aviv Regev, PhD

Core member and chair of the faculty, Broad Institute of MIT and Harvard; director, Klarman Cell Observatory, Broad Institute of MIT and Harvard; professor of biology, MIT; investigator, Howard Hughes Medical Institute; founding co-chair, Human Cell Atlas.

  • millions of genome variants, tens of thousands of disease-associated genes, thousands of cell types and an almost unimaginable number of ways they can combine, we had to approximate a best starting point—choose one target, guess the cell, simplify the experiment.
  • In 2020, advances in polygenic risk scores, in understanding the cell and modules of action of genes through genome-wide association studies (GWAS), and in predicting the impact of combinations of interventions.
  • we need algorithms to make better computational predictions of experiments we have never performed in the lab or in clinical trials.
  • Human Cell Atlas and the International Common Disease Alliance—and in new experimental platforms: data platforms and algorithms. But we also need a broader ecosystem of partnerships in medicine that engages interaction between clinical experts and mathematicians, computer scientists and engineers

Feng Zhang, PhD

investigator, Howard Hughes Medical Institute; core member, Broad Institute of MIT and Harvard; James and Patricia Poitras Professor of Neuroscience, McGovern Institute for Brain Research, MIT.

  • fundamental shift in medicine away from treating symptoms of disease and toward treating disease at its genetic roots.
  • Gene therapy with clinical feasibility, improved delivery methods and the development of robust molecular technologies for gene editing in human cells, affordable genome sequencing has accelerated our ability to identify the genetic causes of disease.
  • 1,000 clinical trials testing gene therapies are ongoing, and the pace of clinical development is likely to accelerate.
  • refine molecular technologies for gene editing, to push our understanding of gene function in health and disease forward, and to engage with all members of society

Elizabeth Jaffee, PhD

Dana and Albert “Cubby” Broccoli Professor of Oncology, Johns Hopkins School of Medicine; deputy director, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins.

  • a single blood test could inform individuals of the diseases they are at risk of (diabetes, cancer, heart disease, etc.) and that safe interventions will be available.
  • developing cancer vaccines. Vaccines targeting the causative agents of cervical and hepatocellular cancers have already proven to be effective. With these technologies and the wealth of data that will become available as precision medicine becomes more routine, new discoveries identifying the earliest genetic and inflammatory changes occurring within a cell as it transitions into a pre-cancer can be expected. With these discoveries, the opportunities to develop vaccine approaches preventing cancers development will grow.

Jeremy Farrar, OBE FRCP FRS FMedSci

Director, Wellcome Trust.

  • shape how the culture of research will develop over the next 25 years, a culture that cares more about what is achieved than how it is achieved.
  • building a creative, inclusive and open research culture will unleash greater discoveries with greater impact.

John Nkengasong, PhD

Director, Africa Centres for Disease Control and Prevention.

  • To meet its health challenges by 2050, the continent will have to be innovative in order to leapfrog toward solutions in public health.
  • Precision medicine will need to take center stage in a new public health order— whereby a more precise and targeted approach to screening, diagnosis, treatment and, potentially, cure is based on each patient’s unique genetic and biologic make-up.

Eric Topol, MD

Executive vice-president, Scripps Research Institute; founder and director, Scripps Research Translational Institute.

  • In 2045, a planetary health infrastructure based on deep, longitudinal, multimodal human data, ideally collected from and accessible to as many as possible of the 9+ billion people projected to then inhabit the Earth.
  • enhanced capabilities to perform functions that are not feasible now.
  • AI machines’ ability to ingest and process biomedical text at scale—such as the corpus of the up-to-date medical literature—will be used routinely by physicians and patients.
  • the concept of a learning health system will be redefined by AI.

Linda Partridge, PhD

Professor, Max Planck Institute for Biology of Ageing.

  • Geroprotective drugs, which target the underlying molecular mechanisms of ageing, are coming over the scientific and clinical horizons, and may help to prevent the most intractable age-related disease, dementia.

Trevor Mundel, MD

President of Global Health, Bill & Melinda Gates Foundation.

  • finding new ways to share clinical data that are as open as possible and as closed as necessary.
  • moving beyond drug donations toward a new era of corporate social responsibility that encourages biotechnology and pharmaceutical companies to offer their best minds and their most promising platforms.
  • working with governments and multilateral organizations much earlier in the product life cycle to finance the introduction of new interventions and to ensure the sustainable development of the health systems that will deliver them.
  • deliver on the promise of global health equity.

Josep Tabernero, MD, PhD

Vall d’Hebron Institute of Oncology (VHIO); president, European Society for Medical Oncology (2018–2019).

  • genomic-driven analysis will continue to broaden the impact of personalized medicine in healthcare globally.
  • Precision medicine will continue to deliver its new paradigm in cancer care and reach more patients.
  • Immunotherapy will deliver on its promise to dismantle cancer’s armory across tumor types.
  • AI will help guide the development of individually matched
  • genetic patient screenings
  • the promise of liquid biopsy policing of disease?

Pardis Sabeti, PhD

Professor, Harvard University & Harvard T.H. Chan School of Public Health and Broad Institute of MIT and Harvard; investigator, Howard Hughes Medical Institute.

  • the development and integration of tools into an early-warning system embedded into healthcare systems around the world could revolutionize infectious disease detection and response.
  • But this will only happen with a commitment from the global community.

Els Toreele, PhD

Executive director, Médecins Sans Frontières Access Campaign

  • we need a paradigm shift such that medicines are no longer lucrative market commodities but are global public health goods—available to all those who need them.
  • This will require members of the scientific community to go beyond their role as researchers and actively engage in R&D policy reform mandating health research in the public interest and ensuring that the results of their work benefit many more people.
  • The global research community can lead the way toward public-interest driven health innovation, by undertaking collaborative open science and piloting not-for-profit R&D strategies that positively impact people’s lives globally.

Read Full Post »

Evolution of the Human Cell Genome Biology Field of Gene Expression, Gene Regulation, Gene Regulatory Networks and Application of Machine Learning Algorithms in Large-Scale Biological Data Analysis

Curator & Reporter: Aviva Lev-Ari, PhD, RN



The Scientific Frontier is presented in Deciphering eukaryotic gene-regulatory logic with 100 million random promoters

Boer, C.G., Vaishnav, E.D., Sadeh, R. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promotersNat Biotechnol (2019) doi:10.1038/s41587-019-0315-8


How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation.

The Evolution of the Human Cell Genome Biology Field of Gene Expression, Gene Regulation, Gene Regulatory Networks and Application of Machine Learning Algorithms in Large-Scale Biological Data Analysis is presented in the following Table


50 Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034 e1026 (2019).
5 Muerdter, F. et al. Resolving systematic errors in widely used enhancer activity assays in human cells. Nat. Methods 15, 141–149 (2018).
6 Wang, X. et al. High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human. Nat. Commun. 9, 5380 (2018).
15 Yona, A. H., Alm, E. J. & Gore, J. Random sequences rapidly evolve into de novo promoters. Nat. Commun. 9, 1530 (2018).
4 van Arensbergen, J. et al. Genome-wide mapping of autonomous promoter activity in human cells. Nat. Biotechnol. 35, 145–153 (2017).
14 Cuperus, J. T. et al. Deep learning of the regulatory grammar of yeast 5’ untranslated regions from 500,000 random sequences. Genome Res. 27, 2015–2024 (2017).
31 Levo, M. et al. Systematic investigation of transcription factor activity in the context of chromatin using massively parallel binding and expression assays. Mol. Cell 65, 604–617 e606 (2017).
49 Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
54 de Boer, C. High-efficiency S. cerevisiae lithium acetate transformation. protocols.io https://doi.org/10.17504/protocols.io.j4tcqwn (2017).
59 Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous systems. arXiv 1603.04467 (2016).
20 Shalem, O. et al. Systematic dissection of the sequence determinants of gene 3’ end mediated expression control. PLoS Genet. 11, e1005147 (2015).
55 Deng, C., Daley, T. & Smith, A. D. Applications of species accumulation curves in large-scale biological data analysis. Quant. Biol. 3, 135–144 (2015).
9 Hughes, T. R. & de Boer, C. G. Mapping yeast transcriptional networks. Genetics 195, 9–36 (2013).
10 Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).
19 Kosuri, S. et al. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl Acad. Sci. USA 110, 14024–14029 (2013).
7 Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).
18 de Boer, C. G. & Hughes, T. R. YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities. Nucleic Acids Res. 40, D169–D179 (2012).
56 Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
61 Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).
11 Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat. Biotechnol. 29, 659–664 (2011).
26 Zhang, Z. et al. A packing mechanism for nucleosome organization reconstituted across a eukaryotic genome. Science 332, 977–980 (2011).
30 Ganapathi, M. et al. Extensive role of the general regulatory factors, Abf1 and Rap1, in determining genome-wide chromatin structure in budding yeast. Nucleic Acids Res. 39, 2032–2044 (2011).
52 Erb, I. & van Nimwegen, E. Transcription factor binding site positioning in yeast: proximal promoter motifs characterize TATA-less promoters. PloS One 6, e24279 (2011).
3 Kinney, J. B., Murugan, A., Callan, C. G. Jr. & Cox, E. C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl Acad. Sci. USA107, 9158–9163 (2010).
8 Gertz, J., Siggia, E. D. & Cohen, B. A. Analysis of combinatorial cis-regulation in synthetic and genomic promoters. Nature 457, 215–218 (2009).
16 Wunderlich, Z. & Mirny, L. A. Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 25, 434–440 (2009).
27 Hesselberth, J. R. et al. Global mapping of protein–DNA interactions in vivo by digital genomic footprinting. Nat. Methods 6, 283–289 (2009).
29 Hartley, P. D. & Madhani, H. D. Mechanisms that specify promoter nucleosome location and identity. Cell 137, 445–458 (2009).
51 Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).
58 Segal, E. & Widom, J. From DNA sequence to transcriptional behaviour: a quantitative approach. Nat. Rev. Genet. 10, 443–456 (2009).
2 Yuan, Y., Guo, L., Shen, L. & Liu, J. S. Predicting gene expression from sequence: a reexamination. PLoS Comput. Biol. 3, e243 (2007).
46 Hibbs, M. A. et al. Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics 23, 2692–2699 (2007).
25 Liu, X., Lee, C. K., Granek, J. A., Clarke, N. D. & Lieb, J. D. Whole-genome comparison of Leu3 binding in vitro and in vivo reveals the importance of nucleosome occupancy in target site selection. Genome Res. 16, 1517–1528 (2006).
34 Roberts, G. G. & Hudson, A. P. Transcriptome profiling of Saccharomyces cerevisiae during a transition from fermentative to glycerol-based respiratory growth reveals extensive metabolic and structural remodeling. Mol. Genet. Genomics 276, 170–186 (2006).
48 Tanay, A. Extensive low-affinity transcriptional interactions in the yeast genome. Gen. Res. 16, 962–972 (2006).
53 Tong, A. H. & Boone, C. Synthetic genetic array analysis in Saccharomyces cerevisiae. Methods Mol. Biol. 313, 171–192 (2006).
57 Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
62 Chua, G. et al. Identifying transcription factor functions and targets by phenotypic activation. Proc. Natl Acad. Sci. USA 103, 12045–12050 (2006).
17 Arnosti, D. N. & Kulkarni, M. M. Transcriptional enhancers: intelligent enhanceosomes or flexible billboards? J. Cell. Biochem. 94, 890–898 (2005).
21 Granek, J. A. & Clarke, N. D. Explicit equilibrium modeling of transcription-factor binding and gene regulation. Genome Biol. 6, R87 (2005).
1 Beer, M. A. & Tavazoie, S. Predicting gene expression from sequence. Cell 117, 185–198 (2004).
28 Bernstein, B. E., Liu, C. L., Humphrey, E. L., Perlstein, E. O. & Schreiber, S. L. Global nucleosome occupancy in yeast. Genome Biol. 5, R62 (2004).
44 Kim, T. S., Kim, H. Y., Yoon, J. H. & Kang, H. S. Recruitment of the Swi/Snf complex by Ste12-Tec1 promotes Flo8-Mss11-mediated activation of STA1 expression. Mol. Cell. Biol. 24, 9542–9556 (2004).
45 Harbison, C. T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004).
60 Kent, N. A., Eibert, S. M. & Mellor, J. Cbf1p is required for chromatin remodeling at promoter-proximal CACGTG motifs in yeast. J. Biol. Chem. 279, 27116–27123 (2004).
22 Kulkarni, M. M. & Arnosti, D. N. Information display by transcriptional enhancers. Development 130, 6569–6575 (2003).
24 Conlon, E. M., Liu, X. S., Lieb, J. D. & Liu, J. S. Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Natl Acad. Sci. USA 100, 3339–3344 (2003).
43 Neely, K. E., Hassan, A. H., Brown, C. E., Howe, L. & Workman, J. L. Transcription activator interactions with multiple SWI/SNF subunits. Mol. Cell. Biol. 22, 1615–1625 (2002).
23 Bussemaker, H. J., Li, H. & Siggia, E. D. Regulatory element detection using correlation with expression. Nat. Genet. 27, 167–171 (2001).
37 Haurie, V. et al. The transcriptional activator Cat8p provides a major contribution to the reprogramming of carbon metabolism during the diauxic shift in Saccharomyces cerevisiae. J. Biol. Chem. 276, 76–85 (2001).
39 Grauslund, M. & Ronnow, B. Carbon source-dependent transcriptional regulation of the mitochondrial glycerol-3-phosphate dehydrogenase gene, GUT2, from Saccharomyces cerevisiae. Can. J. Microbiol. 46, 1096–1100 (2000).
42 Cullen, P. J. & Sprague, G. F. Jr. Glucose depletion causes haploid invasive growth in yeast. Proc. Natl Acad. Sci. USA 97, 13619–13624 (2000).
38 Sato, T. et al. TheE-box DNA binding protein Sgc1p suppresses the gcr2 mutation, which is involved in transcriptional activation of glycolytic genes in Saccharomyces cerevisiae. FEBS Lett. 463, 307–311 (1999).
40 Madhani, H. D. & Fink, G. R. Combinatorial control required for the specificity of yeast MAPK signaling. Science 275, 1314–1317 (1997).
41 Gavrias, V., Andrianopoulos, A., Gimeno, C. J. & Timberlake, W. E. Saccharomyces cerevisiae TEC1 is required for pseudohyphal growth. Mol. Microbiol. 19, 1255–1263 (1996).
36 Hedges, D., Proft, M. & Entian, K. D. CAT8, a new zinc cluster-encoding gene necessary for derepression of gluconeogenic enzymes in the yeast Saccharomyces cerevisiae. Mol. Cell. Biol. 15, 1915–1922 (1995).
47 Bednar, J. et al. Determination of DNA persistence length by cryo-electron microscopy. Separation of the static and dynamic contributions to the apparent persistence length of DNA. J. Mol. Biol. 254, 579–594 (1995).
32 Axelrod, J. D., Reagan, M. S. & Majors, J. GAL4 disrupts a repressing nucleosome during activation of GAL1 transcription in vivo. Genes Dev. 7, 857–869 (1993).
33 Morse, R. H. Nucleosome disruption by transcription factor binding in yeast. Science 262, 1563–1566 (1993).
12 Oliphant, A. R., Brandl, C. J. & Struhl, K. Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides: analysis of yeast GCN4 protein. Mol. Cell. Biol. 9, 2944–2949 (1989).
35 Forsburg, S. L. & Guarente, L. Identification and characterization of HAP4: a third component of the CCAAT-bound HAP2/HAP3 heteromer. Genes Dev. 3, 1166–1178 (1989).
13 Horwitz, M. S. & Loeb, L. A. Promoters selected from random DNA sequences. Proc. Natl Acad. Sci. USA 83, 7405–7409 (1986).


To access each reference as a live link, go to the number in the first column in the Table and look it up in the List of References in the Link, below


Author information

C.G.D. and A.R. drafted the manuscript, with all authors contributing. C.G.D. analyzed the data. C.G.D., E.D.V., E.L.A. and R.S. performed the experiments. A.R. and N.F. supervised the research.

Correspondence to Carl G. de Boer or Aviv Regev.

Ethics declarations

Competing interests

A.R. is an SAB member of Thermo Fisher Scientific, Neogene Therapeutics, Asimov, and Syros Pharmaceuticals, an equity holder of Immunitas, and a founder of and equity holder in Celsius Therapeutics. All other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Cite this article

Boer, C.G., Vaishnav, E.D., Sadeh, R. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat Biotechnol (2019) doi:10.1038/s41587-019-0315-8

Download citation

Read Full Post »

Data Science & Analytics: What do Data Scientists Do in 2020 and a Pioneer Practitioner’s Portfolio of Algorithm-based Decision Support Systems for Operations Management in Several Industrial Verticals

Curator: Aviva Lev-Ari, PhD, RN

Based on  Jesse Anderson’s work on data teams Kathleen Walch in Why Data Scientists Aren’t Data Engineers makes several keen distinctions between the two skill sets.

I can attest that she is absolutely correct. See below, a Pioneer Practitioner’s Portfolio of Algorithm-based Decision Support Systems for Operations Management in Several Industrial Verticals


These key distinctions are:

Data Scientists vs Data Engineers

In the mid-2000s, we saw the emergence of the Data Scientist position. As cited in the O’Reilly article: “This increase in the demand for data scientists has been driven by the success of the major Internet companies. Google, Facebook, LinkedIn, and Amazon have all made their marks by using data creatively: not just warehousing data, but turning it into something of value.” Not surprisingly, any organization that has data of value is looking at data science and data scientists to increasingly extract more value from that information.

Originating from roots in statistical modeling and data analysis, data scientists have backgrounds in advanced math and statistics, advanced analytics, and increasingly machine learning / AI.  The focus of data scientists is, unsurprisingly, data science — that is to say, how to extract useful information from a sea of data, and how to translate business and scientific informational needs into the language of information and math. Data scientists need to be masters of statistics, probability, mathematics, and algorithms that help to glean useful insights from huge piles of information. These data scientists usually have learned programming out of necessity more than anything else in order to run programs and run advanced analysis on data.  As a result, the code that data scientists have usually been tasked to write, is of a minimal nature – only as necessary to accomplish a data science task (R is a common language for them to use) and work best when they are provided clean data to run advanced analytics on. A data scientist is a scientist who creates hypothesis, runs tests and analysis of the data, and then translates their results for someone else in the organization to easily view and understand.

On the other hand, data scientists can’t perform their jobs without access to large volumes of clean data. Extracting, cleaning, and moving data is not really the role of a data scientist, but rather that of a data engineer. Data Engineers have programming and technology expertise, and have previously been involved with data integration, middleware, analytics, business data portal, and extract-transform-load (ETL) operations. The data engineer’s center of gravity and skills are focused around big data and distributed systems, and experience with programming languages such as Java, Python, Scala, and scripting tools and techniques.  Data engineers are challenged with the task of taking data from a wide range of systems in structured and unstructured formats, and data which is usually not “clean”, with missing fields, mismatched data types, and other data-related issues. These data engineers need to use their programming, integration, architecture, and systems skills to clean all the data and put it into a format and system that data scientists can then use to analyze, build their data models, and provide value to the organization. In this way, the role of a data engineer is an engineer who designs, builds and arranges data.

Can there be a combined Data Scientist-Engineer role?

While it might seem that the roles of a data scientist and data engineer are distinct, data scientists and data engineers share many traits and skill sets. These overlapping skills include the necessity to work with and manipulate big data sets, programming skills to apply operations to the data, data analytics skills, and general fluency with systems operations.

Rather than engineering and programming-centric tools, data scientists need data science-centric tools. Right now there’s a growing collection of these tools, often emerging from data or predictive analytics environments that suit the needs of data scientists. However, it’s possible that even more business-centric tools might be appropriate, especially as the data scientists become more embedded with the line of business. For example, decades ago if you wanted to operate on large volumes of data in a spreadsheet-like format, this involved programming, but tools like Excel introduced things like pivot tables and now business managers are able to perform all sorts of analyses. It’s only a matter of time before tools like Excel embed data science capabilities, or business-centric data mining and analysis tools into their products.

As the talent gap for data scientists continues to widen, there is no doubt that we will see new tools created out of necessity to allow non-technical (read: business) people to run, test, and analyze data. Strategic business managers will begin to learn data science, without needing or wanting programming or data integration experience.  Traditional data scientists will still be needed to run very complex analysis of data. For the most part however, basic analysis will move more to the business unit due to increasingly easy-to-use tools. This means we have still yet to see which tool or technology will be the dominant one for ML and data science in the enterprise.



My SOURCES for the evolution of the field of Data Science are the following:

 Jesse Anderson’s work on data teams

Learn How to Create and Manage Big Data Teams

This Free, 73 Page E-Book is the Complete Guide to Successful Big Data projects

I’m really tired of seeing Big Data projects fail. They fail for both technical and managerial reasons. They all fail for similar reasons and that’s just sad because we can fix or prevent them. Gartner’s research shows that 85% of Big Data projects don’t even make it into production.

“Only 15 percent of businesses reported deploying their big data project to production, effectively unchanged from last year (14 percent).”

October 4, 2016 Gartner Press Release



December, 1, 2019, 9:48 am

Why Data Scientists Aren’t Data Engineers

Kathleen Walch

Managing Partner & Principal Analyst at AI Focused Research and Advisory firm Cognilytica



Translating Between Computer Science and Statistics

Posted on December 1, 2019

Gil Press



Jan 8, 2019, 06:18am

The AI Chronicles: Combining Statistical Analysis And Computing From Hollerith To Zuckerberg

Gil Press Contributor

Enterprise & Cloud



Jan 2, 2015, 10:48am

A Very Short History Of The Internet And The Web

Gil Press Contributor

Enterprise & Cloud



May 28, 2013, 09:09am

A Very Short History Of Data Science

Gil Press Contributor

Enterprise & Cloud



May 9, 2013, 09:45am

A Very Short History Of Big Data

Gil Press Contributor

Enterprise & Cloud



Apr 8, 2013, 09:16am

A Very Short History of Information Technology (IT)

Gil Press Contributor

Enterprise & Cloud



A Pioneer Practitioner’s Portfolio of Algorithm-based Decision Support Systems for Operations Management in Several Industrial Verticals: Analytics Designer, Aviva Lev-Ari, PhD, RN

On this landscape about IT, The Internet, Analytics, Statistics, Big Data, Data Science and Artificial Intelligence, I am to tell stories on my own pioneering work in data science, Algorithm-based decision support systems design for different organizations in several sectors of the US economy:

  • Startups:
  1. TimeØ Group
  2. Concept Five Technologies, Inc.
  3. MDSS, Inc.
  4. LPBI Group
  • Top Tier Management Consulting: SRI International, Monitor Group;
  • OEM: Amdahl Corporation;
  • Top 6th System Integrator: Perot System Corporation;
  • FFRDC: MITRE Corporation.
  • Publishing industry: was Director of Research at McGraw-Hill/CTB.
  • Northeastern University, Researcher on Cardiovascular Pharmaco-therapy at Bouve College of Health Sciences (Independent research guided by Professor of Pharmacology)

Type of institutions:

  • For-Profit corporations: Amdahl Corp, PSC, McGraw-Hill
  • For-Profit Top Tier Consulting: Monitor Company, Now Deloitte
  • Not-for-Profit Top Tier Consulting: SRI International
  • eScientific Publishing: LPBI Group: Developers of Curation methodology for e-Articles [N = 3,700], electronic Table of Contents for e-Books in Medicine [N = 16, https://lnkd.in/ekWGNqA] and e-Proceedings of Biotech Conferences [N = 70].


Autobiographical Annotations: Tribute to My Professors


Pioneering implementations of analytics to business decision making: contributions to domain knowledge conceptualization, research design, methodology development, data modeling and statistical data analysis: Aviva Lev-Ari, UCB, PhD’83; HUJI MA’76


Recollections of Years at UC, Berkeley, Part 1 and Part 2

  • Recollections: Part 1 – My days at Berkeley, 9/1978 – 12/1983 – About my doctoral advisor, Allan Pred, other professors and other peers


  • Recollections: Part 2 – “While Rolling” is preceded by “While Enrolling” Autobiographical Alumna Recollections of Berkeley – Aviva Lev-Ari, PhD’83



The Digital Age Gave Rise to New Definitions – New Benchmarks were born on the World Wide Web for the Intangible Asset of Firm’s Reputation: Pay a Premium for buying e-Reputation

For @AVIVA1950, Founder, LPBI Group @pharma_BI: Twitter Analytics [Engagement Rate, Link Clicks, Retweets, Likes, Replies] & Tweet Highlights [Tweets, Impressions, Profile Visits, Mentions, New Followers] https://analytics.twitter.com/user/AVIVA1950/tweets

Thriving at the Survival Calls during Careers in the Digital Age – An AGE like no Other, also known as, DIGITAL

Reflections on a Four-phase Career: Aviva Lev-Ari, PhD, RN, March 2018

Was prepared for publication in American Friends of the Hebrew University (AFHU), May 2018 Newsletter, Hebrew University’s HUJI Alumni Spotlight Section.

Aviva Lev-Ari’s profile was up on 5/3/2018 on AFHU website under the Alumni Spotlight at https://www.afhu.org/

On 5/11/2018, Excerpts were Published in AFHU e-news.




Read Full Post »

scPopCorn: A New Computational Method for Subpopulation Detection and their Comparative Analysis Across Single-Cell Experiments

Reporter and Curator: Dr. Sudipta Saha, Ph.D.


Present day technological advances have facilitated unprecedented opportunities for studying biological systems at single-cell level resolution. For example, single-cell RNA sequencing (scRNA-seq) enables the measurement of transcriptomic information of thousands of individual cells in one experiment. Analyses of such data provide information that was not accessible using bulk sequencing, which can only assess average properties of cell populations. Single-cell measurements, however, can capture the heterogeneity of a population of cells. In particular, single-cell studies allow for the identification of novel cell types, states, and dynamics.


One of the most prominent uses of the scRNA-seq technology is the identification of subpopulations of cells present in a sample and comparing such subpopulations across samples. Such information is crucial for understanding the heterogeneity of cells in a sample and for comparative analysis of samples from different conditions, tissues, and species. A frequently used approach is to cluster every dataset separately, inspect marker genes for each cluster, and compare these clusters in an attempt to determine which cell types were shared between samples. This approach, however, relies on the existence of predefined or clearly identifiable marker genes and their consistent measurement across subpopulations.


Although the aligned data can then be clustered to reveal subpopulations and their correspondence, solving the subpopulation-mapping problem by performing global alignment first and clustering second overlooks the original information about subpopulations existing in each experiment. In contrast, an approach addressing this problem directly might represent a more suitable solution. So, keeping this in mind the researchers developed a computational method, single-cell subpopulations comparison (scPopCorn), that allows for comparative analysis of two or more single-cell populations.


The performance of scPopCorn was tested in three distinct settings. First, its potential was demonstrated in identifying and aligning subpopulations from single-cell data from human and mouse pancreatic single-cell data. Next, scPopCorn was applied to the task of aligning biological replicates of mouse kidney single-cell data. scPopCorn achieved the best performance over the previously published tools. Finally, it was applied to compare populations of cells from cancer and healthy brain tissues, revealing the relation of neoplastic cells to neural cells and astrocytes. Consequently, as a result of this integrative approach, scPopCorn provides a powerful tool for comparative analysis of single-cell populations.


This scPopCorn is basically a computational method for the identification of subpopulations of cells present within individual single-cell experiments and mapping of these subpopulations across these experiments. Different from other approaches, scPopCorn performs the tasks of population identification and mapping simultaneously by optimizing a function that combines both objectives. When applied to complex biological data, scPopCorn outperforms previous methods. However, it should be kept in mind that scPopCorn assumes the input single-cell data to consist of separable subpopulations and it is not designed to perform a comparative analysis of single cell trajectories datasets that do not fulfill this constraint.


Several innovations developed in this work contributed to the performance of scPopCorn. First, unifying the above-mentioned tasks into a single problem statement allowed for integrating the signal from different experiments while identifying subpopulations within each experiment. Such an incorporation aids the reduction of biological and experimental noise. The researchers believe that the ideas introduced in scPopCorn not only enabled the design of a highly accurate identification of subpopulations and mapping approach, but can also provide a stepping stone for other tools to interrogate the relationships between single cell experiments.















Read Full Post »

Seven Alternative Designs to Quantum Computing Platform – The Race by IBM, Google, Microsoft, and Others


Reporter: Aviva Lev-Ari, PhD, RN


Business Bets on a Quantum Leap

Quantum computing could help companies address problems as huge as supply chains and climate change. Here’s how IBM, Google, Microsoft, and others are racing to bring the tech from theory to practice.
May 21, 2019

quantum computer at IonQ, an Alphabet-backed startup

A version of this article appears in the June 2019 issue of Fortune with the headline “The Race for Quantum Domination.”


One day, your health may depend on a quantum leap.

  • Pharmaceutical giant Biogen teamed up with consultancy Accenture and startup 1QBit on a quantum computing experiment in 2017 aimed at molecular modeling, one of the more complex disciplines in medicine. The goal: finding candidate drugs to treat neurodegenerative diseases.
  • Microsoft is collaborating with Case Western Reserve University to improve the accuracy of MRI machines, which help detect cancer, using so-called quantum-inspired algorithms.


7 ways to win the quantum race

There are multiple ways that quantum computing could work.

Here’s a guide to which companies are backing which tech.

Superconducting uses an electrical current, flowing through special semiconductor chips cooled to near absolute zero, to produce computational “qubits.” Google, IBM, and Intel are pursuing this approach, which has so far been the front-runner.

Ion trap relies on charged atoms that are manipulated by lasers in a vacuum, which helps to reduce noisy interference that can contribute to errors. Industrial giant Honeywell is betting on this technique. So is IonQ, a startup with backing from Alphabet.

Neutral Atom Similar to the ion-trap method, except it uses, you guessed it, neutral atoms. Physicist Mikhail Lukin’s lab at Harvard is a pioneer.

Annealing designed to find the lowest-energy (and therefore speediest) solutions to math problems. Canadian firm D-Wave has sold multimillion-dollar machines based on the idea to Google and NASA. They’re fast, but skeptics question whether they qualify as “quantum.”

Silicon spin uses single electrons trapped in transistors. Intel is hedging its bets between the more mature superconducting qubits and this younger, equally semiconductor-friendly method.

Topological uses exotic, highly stable quasi-particles called “anyons.” Microsoft deems this unproven moonshot as the best candidate in the long run, though the company has yet to produce a single one.

Photonics uses light particles sent through special silicon chips. The particles interact with one another very little (good), but can scatter and disappear (bad). Three-year-old stealth startup Psi Quantum is tinkering away on this idea.




Other related articles published in this Open Access Online Scientific Journal include the following:


  • R&D for Artificial Intelligence Tools & Applications: Google’s Research Efforts in 2018

Reporter: Aviva Lev-Ari, PhD, RN



  • LIVE Day Two – World Medical Innovation Forum ARTIFICIAL INTELLIGENCE, Boston, MA USA, Monday, April 9, 2019




  • Research and Development (R&D) Expenditure by Country represent time, capital, and effort being put into researching and designing the products of the future – Data from the UNESCO Institute for Statistics adjusted for purchasing-power parity (PPP).

Reporter: Aviva Lev-Ari, PhD, RN



  • Resources on Artificial Intelligence in Health Care and in Medicine: Articles of Note at PharmaceuticalIntelligence.com @AVIVA1950 @pharma_BI



  • IBM’s Watson Health division – How will the Future look like?I

Reporter: Aviva Lev-Ari, PhD, RN


Read Full Post »