Healthcare analytics, AI solutions for biological big data, providing an AI platform for the biotech, life sciences, medical and pharmaceutical industries, as well as for related technological approaches, i.e., curation and text analysis with machine learning and other activities related to AI applications to these industries.
AI enabled Drug Discovery and Development: The Challenges and the Promise
Reporter:Aviva Lev-Ari, PhD, RN
Early Development
Caroline Kovac (the first IBM GM of Life Sciences) is the one who started in silico development of drugs in 2000 using a big db of substances and computer power. She transformed an idea into $2b business. Most of the money was from big pharma. She was asking what is are the new drugs they are planning to develop and provided the four most probable combinations of substances, based on in Silicon work.
Carol Kovac
General Manager, Healthcare and Life Sciences, IBM
from speaker at conference on 2005
Carol Kovac is General Manager of IBM Healthcare and Life Sciences responsible for the strategic direction of IBM′s global healthcare and life sciences business. Kovac leads her team in developing the latest information technology solutions and services, establishing partnerships and overseeing IBM investment within the healthcare, pharmaceutical and life sciences markets. Starting with only two employees as an emerging business unit in the year 2000, Kovac has successfully grown the life sciences business unit into a multi-billion dollar business and one of IBM′s most successful ventures to date with more than 1500 employees worldwide. Kovac′s prior positions include general manager of IBM Life Sciences, vice president of Technical Strategy and Division Operations, and vice president of Services and Solutions. In the latter role, she was instrumental in launching the Computational Biology Center at IBM Research. Kovac sits on the Board of Directors of Research!America and Africa Harvest. She was inducted into the Women in Technology International Hall of Fame in 2002, and in 2004, Fortune magazine named her one of the 50 most powerful women in business. Kovac earned her Ph.D. in chemistry at the University of Southern California.
The use of artificial intelligence in drug discovery, when coupled with new genetic insights and the increase of patient medical data of the last decade, has the potential to bring novel medicines to patients more efficiently and more predictably.
Jack Fuchs, MBA ’91, an adjunct lecturer who teaches “Principled Entrepreneurial Decisions” at Stanford School of Engineering, moderated and explored how clearly articulated principles can guide the direction of technological advancements like AI-enabled drug discovery.
Kim Branson, Global head of AI and machine learning at GSK.
Russ Altman, the Kenneth Fong Professor of Bioengineering, of genetics, of medicine (general medical discipline), of biomedical data science and, by courtesy, of computer science.
Synthetic Biology Software applied to development of Galectins Inhibitors at LPBI Group
Using Structural Computation Models to Predict Productive PROTAC Ternary Complexes
Ternary complex formation is necessary but not sufficient for target protein degradation. In this research, Bai et al. have addressed questions to better understand the rate-limiting steps between ternary complex formation and target protein degradation. They have developed a structure-based computer model approach to predict the efficiency and sites of target protein ubiquitination by CRNB-binding PROTACs. Such models will allow a more complete understanding of PROTAC-directed degradation and allow crafting of increasingly effective and specific PROTACs for therapeutic applications.
Another major feature of this research is that it a result of collaboration between research groups at Amgen, Inc. and Promega Corporation. In the past commercial research laboratories have shied away from collaboration, but the last several years have found researchers more open to collaborative work. This increased collaboration allows scientists to bring their different expertise to a problem or question and speed up discovery. According to Dr. Kristin Riching, Senior Research Scientist at Promega Corporation, “Targeted protein degraders have broken many of the rules that have guided traditional drug development, but it is exciting to see how the collective learnings we gain from their study can aid the advancement of this new class of molecules to the clinic as effective therapeutics.”
Medical Startups – Artificial Intelligence (AI) Startups in Healthcare
Reporters: Stephen J. Williams, PhD and Aviva Lev-Ari, PhD, RN and Shraga Rottem, MD, DSc,
The motivation for this post is two fold:
First, we are presenting an application of AI, NLP, DL to our own medical text in the Genomics space. Here we present the first section of Part 1 in the following book. Part 1 has six subsections that yielded 12 plots. The entire Book is represented by 38 x 2 = 76 plots.
Second, we bring to the attention of the e-Reader the list of 276 Medical Startups – Artificial Intelligence (AI) Startups in Healthcare as a hot universe of R&D activity in Human Health.
Third, to highlight one academic center with an AI focus
Dear friends of the ETH AI Center,
We would like to provide you with some exciting updates from the ETH AI Center and its growing community.
As the Covid-19 restrictions in Switzerland have recently been lifted, we would like to hear from you what kind of events you would like to see in 2022! Participate in the survey to suggest event formats and topics that you would enjoy being a part of. We are already excited to learn what we can achieve together this year.
We already have many interesting events coming up, we look forward to seeing you at our main and community events!
LPBI Group is applying AI for Medical Text Analysis with Machine Learning and Natural Language Processing: Statistical and Deep Learning
Our Book
Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS & BioInformatics, Simulations and the Genome Ontology
Medical Text Analysis of this Books shows the following results obtained by Madison Davis by applying Wolfram NLP for Biological Languages on our own Text. See below an Example:
@MIT Artificial intelligence system rapidly predicts how two proteins will attach: The model called Equidock, focuses on rigid body docking — which occurs when two proteins attach by rotating or translating in 3D space, but their shapes don’t squeeze or bend
Reporter: Aviva Lev-Ari, PhD, RN
This paper introduces a novel SE(3) equivariant graph matching network, along with a keypoint discovery and alignment approach, for the problem of protein-protein docking, with a novel loss based on optimal transport. The overall consensus is that this is an impactful solution to an important problem, whereby competitive results are achieved without the need for templates, refinement, and are achieved with substantially faster run times.
Keywords:protein complexes, protein structure, rigid body docking, SE(3) equivariance, graph neural networks
Abstract: Protein complex formation is a central problem in biology, being involved in most of the cell’s processes, and essential for applications such as drug design or protein engineering. We tackle rigid body protein-protein docking, i.e., computationally predicting the 3D structure of a protein-protein complex from the individual unbound structures, assuming no three-dimensional flexibility during binding. We design a novel pairwise-independent SE(3)-equivariant graph matching network to predict the rotation and translation to place one of the proteins at the right location and the right orientation relative to the second protein. We mathematically guarantee that the predicted complex is always identical regardless of the initial placements of the two structures, avoiding expensive data augmentation. Our model approximates the binding pocket and predicts the docking pose using keypoint matching and alignment through optimal transport and a differentiable Kabsch algorithm. Empirically, we achieve significant running time improvements over existing protein docking software and predict qualitatively plausible protein complex structures despite not using heavy sampling, structure refinement, or templates.
One-sentence Summary: We perform rigid protein docking using a novel independent SE(3)-equivariant message passing mechanism that guarantees the same resulting protein complex independent of the initial placement of the two 3D structures.
MIT researchers created a machine-learning model that can directly predict the complex that will form when two proteins bind together. Their technique is between 80 and 500 times faster than state-of-the-art software methods, and often predicts protein structures that are closer to actual structures that have been observed experimentally.
This technique could help scientists better understand some biological processes that involve protein interactions, like DNA replication and repair; it could also speed up the process of developing new medicines.
“Deep learning is very good at capturing interactions between different proteins that are otherwise difficult for chemists or biologists to write experimentally. Some of these interactions are very complicated, and people haven’t found good ways to express them. This deep-learning model can learn these types of interactions from data,” says Octavian-Eugen Ganea, a postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and co-lead author of the paper.
Ganea’s co-lead author is Xinyuan Huang, a graduate student at ETH Zurich. MIT co-authors include Regina Barzilay, the School of Engineering Distinguished Professor for AI and Health in CSAIL, and Tommi Jaakkola, the Thomas Siebel Professor of Electrical Engineering in CSAIL and a member of the Institute for Data, Systems, and Society. The research will be presented at the International Conference on Learning Representations.
Significance of the Scientific Development by the @MIT Team
EquiDock wide applicability:
Our method can be integrated end-to-end to boost the quality of other models (see above discussion on runtime importance). Examples are predicting functions of protein complexes [3] or their binding affinity [5], de novo generation of proteins binding to specific targets (e.g., antibodies [6]), modeling back-bone and side-chain flexibility [4], or devising methods for non-binary multimers. See the updated discussion in the “Conclusion” section of our paper.
Advantages over previous methods:
Our method does not rely on templates or heavy candidate sampling [7], aiming at the ambitious goal of predicting the complex pose directly. This should be interpreted in terms of generalization (to unseen structures) and scalability capabilities of docking models, as well as their applicability to various other tasks (discussed above).
Our method obtains a competitive quality without explicitly using previous geometric (e.g., 3D Zernike descriptors [8]) or chemical (e.g., hydrophilic information) features [3]. Future EquiDock extensions would find creative ways to leverage these different signals and, thus, obtain more improvements.
Novelty of theory:
Our work is the first to formalize the notion of pairwise independent SE(3)-equivariance. Previous work (e.g., [9,10]) has incorporated only single object Euclidean-equivariances into deep learning models. For tasks such as docking and binding of biological objects, it is crucial that models understand the concept of multi-independent Euclidean equivariances.
All propositions in Section 3 are our novel theoretical contributions.
We have rewritten the Contribution and Related Work sections to clarify this aspect.
Footnote [a]: We have fixed an important bug in the cross-attention code. We have done a more extensive hyperparameter search and understood that layer normalization is crucial in layers used in Eqs. 5 and 9, but not on the h embeddings as it was originally shown in Eq. 10. We have seen benefits from training our models with a longer patience in the early stopping criteria (30 epochs for DIPS and 150 epochs for DB5). Increasing the learning rate to 2e-4 is important to speed-up training. Using an intersection loss weight of 10 leads to improved results compared to the default of 1.
Bibliography:
[1] Protein-ligand blind docking using QuickVina-W with inter-process spatio-temporal integration, Hassan et al., 2017
[2] GNINA 1.0: molecular docking with deep learning, McNutt et al., 2021
[3] Protein-protein and domain-domain interactions, Kangueane and Nilofer, 2018
[4] Side-chain Packing Using SE(3)-Transformer, Jindal et al., 2022
[5] Contacts-based prediction of binding affinity in protein–protein complexes, Vangone et al., 2015
[6] Iterative refinement graph neural network for antibody sequence-structure co-design, Jin et al., 2021
[7] Hierarchical, rotation-equivariant neural networks to select structural models of protein complexes, Eismann et al, 2020
[8] Protein-protein docking using region-based 3D Zernike descriptors, Venkatraman et al., 2009
[9] SE(3)-transformers: 3D roto-translation equivariant attention networks, Fuchs et al, 2020
[10] E(n) equivariant graph neural networks, Satorras et al., 2021
[11] Fast end-to-end learning on protein surfaces, Sverrisson et al., 2020
Big pharma companies are snapping up collaborations with firms using AI to speed up drug discovery, with one of the latest being Sanofi’s pact with Exscientia.
Tech giants are placing big bets on digital health analysis firms, such as Oracle’s €25.42B ($28.3B) takeover of Cerner in the US.
There’s also a steady flow of financing going to startups taking new directions with AI and bioinformatics, with the latest example being a €20M Series A round by SeqOne Genomics in France.
“IBM Watson uses a philosophy that is diametrically opposed to SeqOne’s,” said Jean-Marc Holder, CSO of SeqOne. “[IBM Watson seems] to rely on analysis of large amounts of relatively unstructured data and bet on the volume of data delivering the right result. By opposition, SeqOne strongly believes that data must be curated and structured in order to deliver good results in genomics.”
Francisco Partners is picking up a range of databases and analytics tools – including
Health Insights,
MarketScan,
Clinical Development,
Social Programme Management,
Micromedex and
other imaging and radiology tools, for an undisclosed sum estimated to be in the region of $1 billion.
IBM said the sell-off is tagged as “a clear next step” as it focuses on its platform-based hybrid cloud and artificial intelligence strategy, but it’s no secret that Watson Health has failed to live up to its early promise.
The sale also marks a retreat from healthcare for the tech giant, which is remarkable given that it once said it viewed health as second only to financial services market as a market opportunity.
IBM said it “remains committed to Watson, our broader AI business, and to the clients and partners we support in healthcare IT.”
The company reportedly invested billions of dollars in Watson, but according to a Wall Street Journal report last year, the health business – which provided cloud-based access to the supercomputer and a range of analytics services – has struggled to build market share and reach profitability.
An investigation by Stat meanwhile suggested that Watson Health’s early push into cancer for example was affected by a premature launch, interoperability challenges and over-reliance on human input to generate results.
For its part, IBM has said that the Watson for Oncology product has been improving year-on-year as the AI crunches more and more data.
That is backed up by a meta analysis of its performance published last year in Nature found that the treatment recommendations delivered by the tool were largely in line with human doctors for several cancer types.
However, the study also found that there was less consistency in more advanced cancers, and the authors noted the system “still needs further improvement.”
Watson Health offers a range of other services of course, including
tools for genomic analysis and
running clinical trials that have found favour with a number of pharma companies.
Francisco said in a statement that it offers “a market leading team [that] provides its customers with mission critical products and outstanding service.”
The deal is expected to close in the second quarter, with the current management of Watson Health retaining “similar roles” in the new standalone company, according to the investment company.
IBM’s step back from health comes as tech rivals are still piling into the sector.
@pharma_BI is asking: What will be the future of WATSON Health?
@AVIVA1950 says on 1/26/2022:
Aviva believes plausible scenarios will be that Francisco Partners will:
A. Invest in Watson Health – Like New Mountains Capital (NMC) did with Cytel
B. Acquire several other complementary businesses – Like New Mountains Capital (NMC) did with Cytel
C. Hold and grow – Like New Mountains Capital (NMC) is doing with Cytel since 2018.
D. Sell it in 7 years to @Illumina or @Nvidia or Google’s Parent @AlphaBet
1/21/2022
IBM said Friday it will sell the core data assets of its Watson Health division to a San Francisco-based private equity firm, marking the staggering collapse of its ambitious artificial intelligence effort that failed to live up to its promises to transform everything from drug discovery to cancer care.
IBM has reached an agreement to sell its Watson Health data and analytics business to the private-equity firm Francisco Partners. … He said the deal will give Francisco Partners data and analytics assets that will benefit from “the enhanced investment and expertise of a healthcare industry focused portfolio.”5 days ago
5 days ago — IBM has been trying to find buyers for the Watson Health business for more than a year. And it was seeking a sale price of about $1 billion, The …Missing: Statement | Must include: Statement
5 days ago — IBM Watson Health – Certain Assets Sold: Executive Perspectives. In a prepared statement about the deal, Tom Rosamilia, senior VP, IBM Software, …
Feb 18, 2021 — International Business Machines Corp. is exploring a potential sale of its IBM Watson Health business, according to people familiar with the …
3 days ago — Nuance played a part in building watson in supplying the speech recognition component of Watson. Through the years, Nuance has done some serious …
From: Heidi Rheim et al. GA4GH: International policies and standards for data sharing across genomic research and healthcare. (2021): Cell Genomics, Volume 1 Issue 2.
Siloing genomic data in institutions/jurisdictions limits learning and knowledge
GA4GH policy frameworks enable responsible genomic data sharing
GA4GH technical standards ensure interoperability, broad access, and global benefits
Data sharing across research and healthcare will extend the potential of genomics
Summary
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.
In order for genomic and personalized medicine to come to fruition it is imperative that data siloes around the world are broken down, allowing the international collaboration for the collection, storage, transferring, accessing and analying of molecular and health-related data.
We had talked on this site in numerous articles about the problems data siloes produce. By data siloes we are meaning that collection and storage of not only DATA but intellectual thought are being held behind physical, electronic, and intellectual walls and inacessible to other scientisits not belonging either to a particular institituion or even a collaborative network.
Standardization and harmonization of data is key to this effort to sharing electronic records. The EU has taken bold action in this matter. The following section is about the General Data Protection Regulation of the EU and can be found at the following link:
The data protection package adopted in May 2016 aims at making Europe fit for the digital age. More than 90% of Europeans say they want the same data protection rights across the EU and regardless of where their data is processed.
The General Data Protection Regulation (GDPR)
Regulation (EU) 2016/679 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data. This text includes the corrigendum published in the OJEU of 23 May 2018.
The regulation is an essential step to strengthen individuals’ fundamental rights in the digital age and facilitate business by clarifying rules for companies and public bodies in the digital single market. A single law will also do away with the current fragmentation in different national systems and unnecessary administrative burdens.
Directive (EU) 2016/680 on the protection of natural persons regarding processing of personal data connected with criminal offences or the execution of criminal penalties, and on the free movement of such data.
The directive protects citizens’ fundamental right to data protection whenever personal data is used by criminal law enforcement authorities for law enforcement purposes. It will in particular ensure that the personal data of victims, witnesses, and suspects of crime are duly protected and will facilitate cross-border cooperation in the fight against crime and terrorism.
The directive entered into force on 5 May 2016 and EU countries had to transpose it into their national law by 6 May 2018.
The following paper by the organiztion The Global Alliance for Genomics and Health discusses these types of collaborative efforts to break down data silos in personalized medicine. This organization has over 2000 subscribers in over 90 countries encompassing over 60 organizations.
Enabling responsible genomic data sharing for the benefit of human health
The Global Alliance for Genomics and Health (GA4GH) is a policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework.
he Global Alliance for Genomics and Health (GA4GH) is an international, nonprofit alliance formed in 2013 to accelerate the potential of research and medicine to advance human health. Bringing together 600+ leading organizations working in healthcare, research, patient advocacy, life science, and information technology, the GA4GH community is working together to create frameworks and standards to enable the responsible, voluntary, and secure sharing of genomic and health-related data. All of our work builds upon the Framework for Responsible Sharing of Genomic and Health-Related Data.
GA4GH Connect is a five-year strategic plan that aims to drive uptake of standards and frameworks for genomic data sharing within the research and healthcare communities in order to enable responsible sharing of clinical-grade genomic data by 2022. GA4GH Connect links our Work Streams with Driver Projects—real-world genomic data initiatives that help guide our development efforts and pilot our tools.
The Global Alliance for Genomics and Health (GA4GH) is a worldwide alliance of genomics researchers, data scientists, healthcare practitioners, and other stakeholders. We are collaborating to establish policy frameworks and technical standards for responsible, international sharing of genomic and other molecular data as well as related health data. Founded in 2013,3 the GA4GH community now consists of more than 1,000 individuals across more than 90 countries working together to enable broad sharing that transcends the boundaries of any single institution or country (see https://www.ga4gh.org).In this perspective, we present the strategic goals of GA4GH and detail current strategies and operational approaches to enable responsible sharing of clinical and genomic data, through both harmonized data aggregation and federated approaches, to advance genomic medicine and research. We describe technical and policy development activities of the eight GA4GH Work Streams and implementation activities across 24 real-world genomic data initiatives (“Driver Projects”). We review how GA4GH is addressing the major areas in which genomics is currently deployed including rare disease, common disease, cancer, and infectious disease. Finally, we describe differences between genomic sequence data that are generated for research versus healthcare purposes, and define strategies for meeting the unique challenges of responsibly enabling access to data acquired in the clinical setting.
GA4GH organization
GA4GH has partnered with 24 real-world genomic data initiatives (Driver Projects) to ensure its standards are fit for purpose and driven by real-world needs. Driver Projects make a commitment to help guide GA4GH development efforts and pilot GA4GH standards (see Table 2). Each Driver Project is expected to dedicate at least two full-time equivalents to GA4GH standards development, which takes place in the context of GA4GH Work Streams (see Figure 1). Work Streams are the key production teams of GA4GH, tackling challenges in eight distinct areas across the data life cycle (see Box 1). Work Streams consist of experts from their respective sub-disciplines and include membership from Driver Projects as well as hundreds of other organizations across the international genomics and health community.
Figure 1Matrix structure of the Global Alliance for Genomics and HealthShow full caption
Box 1GA4GH Work Stream focus areasThe GA4GH Work Streams are the key production teams of the organization. Each tackles a specific area in the data life cycle, as described below (URLs listed in the web resources).
(1)Data use & researcher identities: Develops ontologies and data models to streamline global access to datasets generated in any country9,10
(2)Genomic knowledge standards: Develops specifications and data models for exchanging genomic variant observations and knowledge18
(3)Cloud: Develops federated analysis approaches to support the statistical rigor needed to learn from large datasets
(4)Data privacy & security: Develops guidelines and recommendations to ensure identifiable genomic and phenotypic data remain appropriately secure without sacrificing their analytic potential
(5)Regulatory & ethics: Develops policies and recommendations for ensuring individual-level data are interoperable with existing norms and follow core ethical principles
(6)Discovery: Develops data models and APIs to make data findable, accessible, interoperable, and reusable (FAIR)
(7)Clinical & phenotypic data capture & exchange: Develops data models to ensure genomic data is most impactful through rich metadata collected in a standardized way
(8)Large-scale genomics: Develops APIs and file formats to ensure harmonized technological platforms can support large-scale computing
For more articles on Open Access, Science 2.0, and Data Networks for Genomics on this Open Access Scientific Journal see:
The Vibrant Philly Biotech Scene: Proteovant Therapeutics Using Artificial Intelligence and Machine Learning to Develop PROTACs
Reporter:Stephen J. Williams, Ph.D.
It has been a while since I have added to this series but there have been a plethora of exciting biotech startups in the Philadelphia area, and many new startups combining technology, biotech, and machine learning. One such exciting biotech is Proteovant Therapeutics, which is combining the new PROTAC (Proteolysis-Targeting Chimera) technology with their in house ability to utilize machine learning and artificial intelligence to design these types of compounds to multiple intracellular targets.
PROTACs (which actually is under a trademark name of Arvinus Operations, but is also refered to as Protein Degraders. These PROTACs take advantage of the cell protein homeostatic mechanism of ubiquitin-mediated protein degradation, which is a very specific targeted process which regulates protein levels of various transcription factors, protooncogenes, and receptors. In essence this regulated proteolyic process is needed for normal cellular function, and alterations in this process may lead to oncogenesis, or a proteotoxic crisis leading to mitophagy, autophagy and cellular death. The key to this technology is using chemical linkers to associate an E3 ligase with a protein target of interest. E3 ligases are the rate limiting step in marking the proteins bound for degradation by the proteosome with ubiquitin chains.
A review of this process as well as PROTACs can be found elsewhere in articles (and future articles) on this Open Access Journal.
Protevant have made two important collaborations:
Oncopia Therapeutics: came out of University of Michigan Innovation Hub and lab of Shaomeng Wang, who developed a library of BET and MDM2 based protein degraders. In 2020 was aquired by Riovant Sciences.
Riovant Sciences: uses computer aided design of protein degraders
Proteovant Company Description:
Proteovant is a newly launched development-stage biotech company focusing on discovery and development of disease-modifying therapies by harnessing natural protein homeostasis processes. We have recently acquired numerous assets at discovery and development stages from Oncopia, a protein degradation company. Our lead program is on track to enter IND in 2021. Proteovant is building a strong drug discovery engine by combining deep drugging expertise with innovative platforms including Roivant’s AI capabilities to accelerate discovery and development of protein degraders to address unmet needs across all therapeutic areas. The company has recently secured $200M funding from SK Holdings in addition to investment from Roivant Sciences. Our current therapeutic focus includes but is not limited to oncology, immunology and neurology. We remain agnostic to therapeutic area and will expand therapeutic focus based on opportunity. Proteovant is expanding its discovery and development teams and has multiple positions in biology, chemistry, biochemistry, DMPK, bioinformatics and CMC at many levels. Our R&D organization is located close to major pharmaceutical companies in Eastern Pennsylvania with a second site close to biotech companies in Boston area.
The ubiquitin proteasome system (UPS) is responsible for maintaining protein homeostasis. Targeted protein degradation by the UPS is a cellular process that involves marking proteins and guiding them to the proteasome for destruction. We leverage this physiological cellular machinery to target and destroy disease-causing proteins.
Unlike traditional small molecule inhibitors, our approach is not limited by the classic “active site” requirements. For example, we can target transcription factors and scaffold proteins that lack a catalytic pocket. These classes of proteins, historically, have been very difficult to drug. Further, we selectively degrade target proteins, rather than isozymes or paralogous proteins with high homology. Because of the catalytic nature of the interactions, it is possible to achieve efficacy at lower doses with prolonged duration while decreasing dose-limiting toxicities.
Biological targets once deemed “undruggable” are now within reach.
Roivant develops transformative medicines faster by building technologies and developing talent in creative ways, leveraging the Roivant platform to launch “Vants” – nimble and focused biopharmaceutical and health technology companies. These Vants include Proteovant but also Dermovant, ImmunoVant,as well as others.
Roivant’s drug discovery capabilities include the leading computational physics-based platform for in silico drug design and optimization as well as machine learning-based models for protein degradation.
The integration of our computational and experimental engines enables the rapid design of molecules with high precision and fidelity to address challenging targets for diseases with high unmet need.
Our current modalities include small molecules, heterobifunctionals and molecular glues.
Roivant Unveils Targeted Protein Degradation Platform
– First therapeutic candidate on track to enter clinical studies in 2021
– Computationally-designed degraders for six targets currently in preclinical development
– Acquisition of Oncopia Therapeutics and research collaboration with lab of Dr. Shaomeng Wang at the University of Michigan to add diverse pipeline of current and future compounds
– Clinical-stage degraders will provide foundation for multiple new Vants in distinct disease areas
– Platform supported by $200 million strategic investment from SK Holdings
Other articles in this Vibrant Philly Biotech Scene on this Online Open Access Journal include:
The Map of human proteins drawn by artificial intelligence and PROTAC (proteolysis targeting chimeras) Technology for Drug Discovery
Curators: Dr. Stephen J. Williams and Aviva Lev-Ari, PhD, RN
UPDATED on 11/5/2021
Introducing Isomorphic Labs
I believe we are on the cusp of an incredible new era of biological and medical research. Last year DeepMind’s breakthrough AI system AlphaFold2 was recognised as a solution to the 50-year-old grand challenge of protein folding, capable of predicting the 3D structure of a protein directly from its amino acid sequence to atomic-level accuracy. This has been a watershed moment for computational and AI methods for biology. Building on this advance, today, I’m thrilled to announce the creation of a new Alphabet company – Isomorphic Labs – a commercial venture with the mission to reimagine the entire drug discovery process from the ground up with an AI-first approach and, ultimately, to model and understand some of the fundamental mechanisms of life.
For over a decade DeepMind has been in the vanguard of advancing the state-of-the-art in AI, often using games as a proving ground for developing general purpose learning systems, like AlphaGo, our program that beat the world champion at the complex game of Go. We are at an exciting moment in history now where these techniques and methods are becoming powerful and sophisticated enough to be applied to real-world problems including scientific discovery itself. One of the most important applications of AI that I can think of is in the field of biological and medical research, and it is an area I have been passionate about addressing for many years. Now the time is right to push this forward at pace, and with the dedicated focus and resources that Isomorphic Labs will bring.
An AI-first approach to drug discovery and biology The pandemic has brought to the fore the vital work that brilliant scientists and clinicians do every day to understand and combat disease. We believe that the foundational use of cutting edge computational and AI methods can help scientists take their work to the next level, and massively accelerate the drug discovery process. AI methods will increasingly be used not just for analysing data, but to also build powerful predictive and generative models of complex biological phenomena. AlphaFold2 is an important first proof point of this, but there is so much more to come. At its most fundamental level, I think biology can be thought of as an information processing system, albeit an extraordinarily complex and dynamic one. Taking this perspective implies there may be a common underlying structure between biology and information science – an isomorphic mapping between the two – hence the name of the company. Biology is likely far too complex and messy to ever be encapsulated as a simple set of neat mathematical equations. But just as mathematics turned out to be the right description language for physics, biology may turn out to be the perfect type of regime for the application of AI. What’s next for Isomorphic Labs This is just the beginning of what we hope will become a radical new approach to drug discovery, and I’m incredibly excited to get this ambitious new commercial venture off the ground and to partner with pharmaceutical and biomedical companies. I will serve as CEO for Isomorphic’s initial phase, while remaining as DeepMind CEO, partially to help facilitate collaboration between the two companies where relevant, and to set out the strategy, vision and culture of the new company. This will of course include the building of a world-class multidisciplinary team, with deep expertise in areas such as AI, biology, medicinal chemistry, biophysics, and engineering, brought together in a highly collaborative and innovative environment. (We are hiring!) As pioneers in the emerging field of ‘digital biology’, we look forward to helping usher in an amazingly productive new age of biomedical breakthroughs. Isomorphic’s mission could not be a more important one: to use AI to accelerate drug discovery, and ultimately, find cures for some of humanity’s most devastating diseases.
AI research lab DeepMind has created the most comprehensive map of human proteins to date using artificial intelligence. The company, a subsidiary of Google-parent Alphabet, is releasing the data for free, with some scientists comparing the potential impact of the work to that of the Human Genome Project, an international effort to map every human gene.
Proteins are long, complex molecules that perform numerous tasks in the body, from building tissue to fighting disease. Their purpose is dictated by their structure, which folds like origami into complex and irregular shapes. Understanding how a protein folds helps explain its function, which in turn helps scientists with a range of tasks — from pursuing fundamental research on how the body works, to designing new medicines and treatments. “the culmination of the entire 10-year-plus lifetime of DeepMind” Previously, determining the structure of a protein relied on expensive and time-consuming experiments. But last year DeepMind showed it can produce accurate predictions of a protein’s structure using AI software called AlphaFold. Now, the company is releasing hundreds of thousands of predictions made by the program to the public. “I see this as the culmination of the entire 10-year-plus lifetime of DeepMind,” company CEO and co-founder Demis Hassabis told The Verge. “From the beginning, this is what we set out to do: to make breakthroughs in AI, test that on games like Go and Atari, [and] apply that to real-world problems, to see if we can accelerate scientific breakthroughs and use those to benefit humanity.”
Two examples of protein structures predicted by AlphaFold (in blue) compared with experimental results (in green). Image: DeepMind
There are currently around 180,000 protein structures available in the public domain, each produced by experimental methods and accessible through the Protein Data Bank. DeepMind is releasing predictions for the structure of some 350,000 proteins across 20 different organisms, including animals like mice and fruit flies, and bacteria like E. coli. (There is some overlap between DeepMind’s data and pre-existing protein structures, but exactly how much is difficult to quantify because of the nature of the models.) Most significantly, the release includes predictions for 98 percent of all human proteins, around 20,000 different structures, which are collectively known as the human proteome. It isn’t the first public dataset of human proteins, but it is the most comprehensive and accurate.
If they want, scientists can download the entire human proteome for themselves, says AlphaFold’s technical lead John Jumper. “There is a HumanProteome.zip effectively, I think it’s about 50 gigabytes in size,” Jumper tells The Verge. “You can put it on a flash drive if you want, though it wouldn’t do you much good without a computer for analysis!” “anyone can use it for anything” After launching this first tranche of data, DeepMind plans to keep adding to the store of proteins, which will be maintained by Europe’s flagship life sciences lab, the European Molecular Biology Laboratory (EMBL). By the end of the year, DeepMind hopes to release predictions for 100 million protein structures, a dataset that will be “transformative for our understanding of how life works,” according to Edith Heard, director general of the EMBL. The data will be free in perpetuity for both scientific and commercial researchers, says Hassabis. “Anyone can use it for anything,” the DeepMind CEO noted at a press briefing. “They just need to credit the people involved in the citation.” The benefits of protein folding
Understanding a protein’s structure is useful for scientists across a range of fields. The information can help design new medicines, synthesize novel enzymes that break down waste materials, and create crops that are resistant to viruses or extreme weather. Already, DeepMind’s protein predictions are being used for medical research, including studying the workings of SARS-CoV-2, the virus that causes COVID-19. “it will definitely have a huge impact for the scientific community” New data will speed these efforts, but scientists note it will still take a lot of time to turn this information into real-world results. “I don’t think it’s going to be something that changes the way patients are treated within the year, but it will definitely have a huge impact for the scientific community,” Marcelo C. Sousa, a professor at the University of Colorado’s biochemistry department, told The Verge. Scientists will have to get used to having such information at their fingertips, says DeepMind senior research scientist Kathryn Tunyasuvunakool. “As a biologist, I can confirm we have no playbook for looking at even 20,000 structures, so this [amount of data] is hugely unexpected,” Tunyasuvunakool told The Verge. “To be analyzing hundreds of thousands of structures — it’s crazy.”
Notably, though, DeepMind’s software produces predictions of protein structures rather than experimentally determined models, which means that in some cases further work will be needed to verify the structure. DeepMind says it spent a lot of time building accuracy metrics into its AlphaFold software, which ranks how confident it is for each prediction.
Example protein structures predicted by AlphaFold. Image: DeepMind Predictions of protein structures are still hugely useful, though. Determining a protein’s structure through experimental methods is expensive, time-consuming, and relies on a lot of trial and error. That means even a low-confidence prediction can save scientists years of work by pointing them in the right direction for research. Helen Walden, a professor of structural biology at the University of Glasgow, tells The Verge that DeepMind’s data will “significantly ease” research bottlenecks, but that “the laborious, resource-draining work of doing the biochemistry and biological evaluation of, for example, drug functions” will remain. Sousa, who has previously used data from AlphaFold in his work, says for scientists the impact will be felt immediately. “In our collaboration we had with DeepMind, we had a dataset with a protein sample we’d had for 10 years, and we’d never got to the point of developing a model that fit,” he says. “DeepMind agreed to provide us with a structure, and they were able to solve the problem in 15 minutes after we’d been sitting on it for 10 years.”
Why protein folding is so difficult
Proteins are constructed from chains of amino acids, which come in 20 different varieties in the human body. As any individual protein can be comprised of hundreds of individual amino acids, each of which can fold and twist in different directions, it means a molecule’s final structure has an incredibly large number of possible configurations. One estimate is that the typical protein can be folded in 10^300 ways — that’s a 1 followed by 300 zeroes.
Protein folding has been a “grand challenge” of biology for decades
Because proteins are too small to examine with microscopes, scientists have had to indirectly determine their structure using expensive and complicated methods like nuclear magnetic resonance and X-ray crystallography. The idea of determining the structure of a protein simply by reading a list of its constituent amino acids has been long theorized but difficult to achieve, leading many to describe it as a “grand challenge” of biology. In recent years, though, computational methods — particularly those using artificial intelligence — have suggested such analysis is possible. With these techniques, AI systems are trained on datasets of known protein structures and use this information to create their own predictions.
DeepMind’s AlphaFold software has significantly increased the accuracy of computational protein-folding, as shown by its performance in the CASP competition. Image: DeepMind Many groups have been working on this problem for years, but DeepMind’s deep bench of AI talent and access to computing resources allowed it to accelerate progress dramatically. Last year, the company competed in an international protein-folding competition known as CASP and blew away the competition. Its results were so accurate that computational biologist John Moult, one of CASP’s co-founders, said that “in some sense the problem [of protein folding] is solved.”
DeepMind’s AlphaFold program has been upgraded since last year’s CASP competition and is now 16 times faster. “We can fold an average protein in a matter of minutes, most cases seconds,” says Hassabis.
@@@@@@@
The company also released the underlying code for AlphaFold last week as open-source, allowing others to build on its work in the future.
@@@@@@@
Liam McGuffin, a professor at Reading University who developed some of the UK’s leading protein-folding software, praised the technical brilliance of AlphaFold, but also noted that the program’s success relied on decades of prior research and public data. “DeepMind has vast resources to keep this database up to date and they are better placed to do this than any single academic group,” McGuffin told The Verge. “I think academics would have got there in the end, but it would have been slower because we’re not as well resourced.”
Why does DeepMind care?
Many scientists The Verge spoke to noted the generosity of DeepMind in releasing this data for free. After all, the lab is owned by Google-parent Alphabet, which has been pouring huge amounts of resources into commercial healthcare projects. DeepMind itself loses a lot of money each year, and there have been numerousreportsof tensions between the company and its parent firm over issues like research autonomy and commercial viability.
Hassabis, though, tells The Verge that the company always planned to make this information freely available, and that doing so is a fulfillment of DeepMind’s founding ethos. He stresses that DeepMind’s work is used in lots of places at Google — “almost anything you use, there’s some of our technology that’s part of that under the hood” — but that the company’s primary goal has always been fundamental research. “There’s many ways value can be attained.”
“The agreement when we got acquired is that we are here primarily to advance the state of AGI and AI technologies and then use that to accelerate scientific breakthroughs,” says Hassabis. “[Alphabet] has plenty of divisions focused on making money,” he adds, noting that DeepMind’s focus on research “brings all sorts of benefits, in terms of prestige and goodwill for the scientific community. There’s many ways value can be attained.” Hassabis predicts that AlphaFold is a sign of things to come — a project that shows the huge potential of artificial intelligence to handle messy problems like human biology.
“I think we’re at a really exciting moment,” he says. “In the next decade, we, and others in the AI field, are hoping to produce amazing breakthroughs that will genuinely accelerate solutions to the really big problems we have here on Earth.”
Abstract PROTACs-induced targeted protein degradation has emerged as a novel therapeutic strategy in drug development and attracted the favor of academic institutions, large pharmaceutical enterprises (e.g., AstraZeneca, Bayer, Novartis, Amgen, Pfizer, GlaxoSmithKline, Merck, and Boehringer Ingelheim, etc.), and biotechnology companies. PROTACs opened a new chapter for novel drug development. However, any new technology will face many new problems and challenges. Perspectives on the potential opportunities and challenges of PROTACs will contribute to the research and development of new protein degradation drugs and degrader tools.
Although PROTAC technology has a bright future in drug development, it also has many challenges as follows: (1) Until now, there is only one example of PROTAC reported for an “undruggable” target; (18) more cases are needed to prove the advantages of PROTAC in “undruggable” targets in the future. (2) “Molecular glue”, existing in nature, represents the mechanism of stabilized protein–protein interactions through small molecule modulators of E3 ligases. For instance, auxin, the plant hormone, binds to the ligase SCF-TIR1 to drive recruitment of Aux/IAA proteins and subsequently triggers its degradation. In addition, some small molecules that induce targeted protein degradation through “molecular glue” mode of action have been reported. (21,22) Furthermore, it has been recently reported that some PROTACs may actually achieve target protein degradation via a mechanism that includes “molecular glue” or via “molecular glue” alone. (23) How to distinguish between these two mechanisms and how to combine them to work together is one of the challenges for future research. (3) Since PROTAC acts in a catalytic mode, traditional methods cannot accurately evaluate the pharmacokinetics (PK) and pharmacodynamics (PD) properties of PROTACs. Thus, more studies are urgently needed to establish PK and PD evaluation systems for PROTACs. (4) How to quickly and effectively screen for target protein ligands that can be used in PROTACs, especially those targeting protein–protein interactions, is another challenge. (5) How to understand the degradation activity, selectivity, and possible off-target effects (based on different targets, different cell lines, and different animal models) and how to rationally design PROTACs etc. are still unclear. (6) The human genome encodes more than 600 E3 ubiquitin ligases. However, there are only very few E3 ligases (VHL, CRBN, cIAPs, and MDM2) used in the design of PROTACs. How to expand E3 ubiquitin ligase scope is another challenge faced in this area.
PROTAC technology is rapidly developing, and with the joint efforts of the vast number of scientists in both academia and industry, these problems shall be solved in the near future.
PROTACs have opened a new chapter for the development of new drugs and novel chemical knockdown tools and brought unprecedented opportunities to the industry and academia, which are mainly reflected in the following aspects: (1) Overcoming drug resistance of cancer. In addition to traditional chemotherapy, kinase inhibitors have been developing rapidly in the past 20 years. (12) Although kinase inhibitors are very effective in cancer therapy, patients often develop drug resistance and disease recurrence, consequently. PROTACs showed greater advantages in drug resistant cancers through degrading the whole target protein. For example, ARCC-4 targeting androgen receptor could overcome enzalutamide-resistant prostate cancer (13) and L18I targeting BTK could overcome C481S mutation. (14) (2) Eliminating both the enzymatic and nonenzymatic functions of kinase. Traditional small molecule inhibitors usually inhibit the enzymatic activity of the target, while PROTACs affect not only the enzymatic activity of the protein but also nonenzymatic activity by degrading the entire protein. For example, FAK possesses the kinase dependent enzymatic functions and kinase independent scaffold functions, but regulating the kinase activity does not successfully inhibit all FAK function. In 2018, a highly effective and selective FAK PROTAC reported by Craig M. Crews’ group showed a far superior activity to clinical candidate drug in cell migration and invasion. (15) Therefore, PROTAC can expand the druggable space of the existing targets and regulate proteins that are difficult to control by traditional small molecule inhibitors. (3) Degrade the “undruggable” protein target. At present, only 20–25% of the known protein targets (include kinases, G protein-coupled receptors (GPCRs), nuclear hormone receptors, and iron channels) can be targeted by using conventional drug discovery technologies. (16,17) The proteins that lack catalytic activity and/or have catalytic independent functions are still regarded as “undruggable” targets. The involvement of Signal Transducer and Activator of Transcription 3 (STAT3) in the multiple signaling pathway makes it an attractive therapeutic target; however, the lack of an obviously druggable site on the surface of STAT3 limited the development of STAT3 inhibitors. Thus, there are still no effective drugs directly targeting STAT3 approved by the Food and Drug Administration (FDA). In November 2019, Shaomeng Wang’s group first reported a potent PROTAC targeting STAT3 with potent biological activities in vitro and in vivo. (18) This successful case confirms the key potential of PROTAC technology, especially in the field of “undruggable” targets, such as K-Ras, a tricky tumor target activated by multiple mutations as G12A, G12C, G12D, G12S, G12 V, G13C, and G13D in the clinic. (19) (4) Fast and reversible chemical knockdown strategy in vivo. Traditional genetic protein knockout technologies, zinc-finger nuclease (ZFN), transcription activator-like effector nuclease (TALEN), or CRISPR-Cas9, usually have a long cycle, irreversible mode of action, and high cost, which brings a lot of inconvenience for research, especially in nonhuman primates. In addition, these genetic animal models sometimes produce phenotypic misunderstanding due to potential gene compensation or gene mutation. More importantly, the traditional genetic method cannot be used to study the function of embryonic-lethal genes in vivo. Unlike DNA-based protein knockout technology, PROTACs knock down target proteins directly, rather than acting at the genome level, and are suitable for the functional study of embryonic-lethal proteins in adult organisms. In addition, PROTACs provide exquisite temporal control, allowing the knockdown of a target protein at specific time points and enabling the recovery of the target protein after withdrawal of drug treatment. As a new, rapid and reversible chemical knockdown method, PROTAC can be used as an effective supplement to the existing genetic tools. (20)
Peroxisome proliferator-activated receptor (PPAR-gamma) Receptors Activation: PPARγ transrepression for Angiogenesis in Cardiovascular Disease and PPARγ transactivation for Treatment of Diabetes
Patients with type 2 diabetes may soon receive artificial pancreas and a smartphone app assistance
Curator and Reporter: Dr. Premalata Pati, Ph.D., Postdoc
In a brief, randomized crossover investigation, adults with type 2 diabetes and end-stage renal disease who needed dialysis benefited from an artificial pancreas. Tests conducted by the University of Cambridge and Inselspital, University Hospital of Bern, Switzerland, reveal that now the device can help patients safely and effectively monitor their blood sugar levels and reduce the risk of low blood sugar levels.
Diabetes is the most prevalent cause of kidney failure, accounting for just under one-third (30%) of all cases. As the number of people living with type 2 diabetes rises, so does the number of people who require dialysis or a kidney transplant. Kidney failure raises the risk of hypoglycemia and hyperglycemia, or unusually low or high blood sugar levels, which can lead to problems ranging from dizziness to falls and even coma.
Diabetes management in adults with renal failure is difficult for both the patients and the healthcare practitioners. Many components of their therapy, including blood sugar level targets and medications, are poorly understood. Because most oral diabetes drugs are not indicated for these patients, insulin injections are the most often utilized diabetic therapy-yet establishing optimum insulin dose regimes is difficult.
Patients living with type 2 diabetes and kidney failure are a particularly vulnerable group and managing their condition-trying to prevent potentially dangerous highs or lows of blood sugar levels – can be a challenge. There’s a real unmet need for new approaches to help them manage their condition safely and effectively.
The artificial pancreas is a compact, portable medical device that uses digital technology to automate insulin delivery to perform the role of a healthy pancreas in managing blood glucose levels. The system is worn on the outside of the body and consists of three functional components:
a glucose sensor
a computer algorithm for calculating the insulin dose
an insulin pump
The artificial pancreas directed insulin delivery on a Dana Diabecare RS pump using a Dexcom G6 transmitter linked to the Cambridge adaptive model predictive control algorithm, automatically administering faster-acting insulin aspart (Fiasp). The CamDiab CamAPS HX closed-loop app on an unlocked Android phone was used to manage the closed loop system, with a goal glucose of 126 mg/dL. The program calculated an insulin infusion rate based on the data from the G6 sensor every 8 to 12 minutes, which was then wirelessly routed to the insulin pump, with data automatically uploaded to the Diasend/Glooko data management platform.
The Case Study
Between October 2019 and November 2020, the team recruited 26 dialysis patients. Thirteen patients were randomly assigned to get the artificial pancreas first, followed by 13 patients who received normal insulin therapy initially. The researchers compared how long patients spent as outpatients in the target blood sugar range (5.6 to 10.0mmol/L) throughout a 20-day period.
Patients who used the artificial pancreas spent 53 % in the target range on average, compared to 38% who utilized the control treatment. When compared to the control therapy, this translated to approximately 3.5 more hours per day spent in the target range.
The artificial pancreas resulted in reduced mean blood sugar levels (10.1 vs. 11.6 mmol/L). The artificial pancreas cut the amount of time patients spent with potentially dangerously low blood sugar levels, known as ‘hypos.’
The artificial pancreas’ efficacy improved significantly over the research period as the algorithm evolved, and the time spent in the target blood sugar range climbed from 36% on day one to over 60% by the twentieth day. This conclusion emphasizes the need of employing an adaptive algorithm that can adapt to an individual’s fluctuating insulin requirements over time.
When asked if they would recommend the artificial pancreas to others, everyone who responded indicated they would. Nine out of ten (92%) said they spent less time controlling their diabetes with the artificial pancreas than they did during the control period, and a comparable amount (87%) said they were less concerned about their blood sugar levels when using it.
Other advantages of the artificial pancreas mentioned by study participants included fewer finger-prick blood sugar tests, less time spent managing their diabetes, resulting in more personal time and independence, and increased peace of mind and reassurance. One disadvantage was the pain of wearing the insulin pump and carrying the smartphone.
Not only did the artificial pancreas increase the amount of time patients spent within the target range for the blood sugar levels, but it also gave the users peace of mind. They were able to spend less time having to focus on managing their condition and worrying about the blood sugar levels, and more time getting on with their lives.
The team is currently testing the artificial pancreas in outpatient settings in persons with type 2 diabetes who do not require dialysis, as well as in difficult medical scenarios such as perioperative care.
“The artificial pancreas has the potential to become a fundamental part of integrated personalized care for people with complicated medical needs,” said Dr Lia Bally, who co-led the study in Bern.
The authors stated that the study’s shortcomings included a small sample size due to “Brexit-related study funding concerns and the COVID-19 epidemic.”
Boughton concluded:
We would like other clinicians to be aware that automated insulin delivery systems may be a safe and effective treatment option for people with type 2 diabetes and kidney failure in the future.
Science Policy Forum: Should we trust healthcare explanations from AI predictive systems?
Some in industry voice their concerns
Curator: Stephen J. Williams, PhD
Post on AI healthcare and explainable AI
In a Policy Forum article in Science “Beware explanations from AI in health care”, Boris Babic, Sara Gerke, Theodoros Evgeniou, and Glenn Cohen discuss the caveats on relying on explainable versus interpretable artificial intelligence (AI) and Machine Learning (ML) algorithms to make complex health decisions. The FDA has already approved some AI/ML algorithms for analysis of medical images for diagnostic purposes. These have been discussed in prior posts on this site, as well as issues arising from multi-center trials. The authors of this perspective article argue that choice of type of algorithm (explainable versus interpretable) algorithms may have far reaching consequences in health care.
Summary
Artificial intelligence and machine learning (AI/ML) algorithms are increasingly developed in health care for diagnosis and treatment of a variety of medical conditions (1). However, despite the technical prowess of such systems, their adoption has been challenging, and whether and how much they will actually improve health care remains to be seen. A central reason for this is that the effectiveness of AI/ML-based medical devices depends largely on the behavioral characteristics of its users, who, for example, are often vulnerable to well-documented biases or algorithmic aversion (2). Many stakeholders increasingly identify the so-called black-box nature of predictive algorithms as the core source of users’ skepticism, lack of trust, and slow uptake (3, 4). As a result, lawmakers have been moving in the direction of requiring the availability of explanations for black-box algorithmic decisions (5). Indeed, a near-consensus is emerging in favor of explainable AI/ML among academics, governments, and civil society groups. Many are drawn to this approach to harness the accuracy benefits of noninterpretable AI/ML such as deep learning or neural nets while also supporting transparency, trust, and adoption. We argue that this consensus, at least as applied to health care, both overstates the benefits and undercounts the drawbacks of requiring black-box algorithms to be explainable.
Types of AI/ML Algorithms: Explainable and Interpretable algorithms
Interpretable AI: A typical AI/ML task requires constructing algorithms from vector inputs and generating an output related to an outcome (like diagnosing a cardiac event from an image). Generally the algorithm has to be trained on past data with known parameters. When an algorithm is called interpretable, this means that the algorithm uses a transparent or “white box” function which is easily understandable. Such example might be a linear function to determine relationships where parameters are simple and not complex. Although they may not be as accurate as the more complex explainable AI/ML algorithms, they are open, transparent, and easily understood by the operators.
Explainable AI/ML: This type of algorithm depends upon multiple complex parameters and takes a first round of predictions from a “black box” model then uses a second algorithm from an interpretable function to better approximate outputs of the first model. The first algorithm is trained not with original data but based on predictions resembling multiple iterations of computing. Therefore this method is more accurate or deemed more reliable in prediction however is very complex and is not easily understandable. Many medical devices that use an AI/ML algorithm use this type. An example is deep learning and neural networks.
The purpose of both these methodologies is to deal with problems of opacity, or that AI predictions based from a black box undermines trust in the AI.
For a deeper understanding of these two types of algorithms see here:
How interpretability is different from explainability
Why a model might need to be interpretable and/or explainable
Who is working to solve the black box problem—and how
What is interpretability?
Does Chipotle make your stomach hurt? Does loud noise accelerate hearing loss? Are women less aggressive than men? If a machine learning model can create a definition around these relationships, it is interpretable.
All models must start with a hypothesis. Human curiosity propels a being to intuit that one thing relates to another. “Hmm…multiple black people shot by policemen…seemingly out of proportion to other races…something might be systemic?” Explore.
People create internal models to interpret their surroundings. In the field of machine learning, these models can be tested and verified as either accurate or inaccurate representations of the world.
Interpretability means that the cause and effect can be determined.
What is explainability?
ML models are often called black-box models because they allow a pre-set number of empty parameters, or nodes, to be assigned values by the machine learning algorithm. Specifically, the back-propagation step is responsible for updating the weights based on its error function.
To predict when a person might die—the fun gamble one might play when calculating a life insurance premium, and the strange bet a person makes against their own life when purchasing a life insurance package—a model will take in its inputs, and output a percent chance the given person has at living to age 80.
Below is an image of a neural network. The inputs are the yellow; the outputs are the orange. Like a rubric to an overall grade, explainability shows how significant each of the parameters, all the blue nodes, contribute to the final decision.
In this neural network, the hidden layers (the two columns of blue dots) would be the black box.
For example, we have these data inputs:
Age
BMI score
Number of years spent smoking
Career category
If this model had high explainability, we’d be able to say, for instance:
The career category is about 40% important
The number of years spent smoking weighs in at 35% important
The age is 15% important
The BMI score is 10% important
Explainability: important, not always necessary
Explainability becomes significant in the field of machine learning because, often, it is not apparent. Explainability is often unnecessary. A machine learning engineer can build a model without ever having considered the model’s explainability. It is an extra step in the building process—like wearing a seat belt while driving a car. It is unnecessary for the car to perform, but offers insurance when things crash.
The benefit a deep neural net offers to engineers is it creates a black box of parameters, like fake additional data points, that allow a model to base its decisions against. These fake data points go unknown to the engineer. The black box, or hidden layers, allow a model to make associations among the given data points to predict better results. For example, if we are deciding how long someone might have to live, and we use career data as an input, it is possible the model sorts the careers into high- and low-risk career options all on its own.
Perhaps we inspect a node and see it relates oil rig workers, underwater welders, and boat cooks to each other. It is possible the neural net makes connections between the lifespan of these individuals and puts a placeholder in the deep net to associate these. If we were to examine the individual nodes in the black box, we could note this clustering interprets water careers to be a high-risk job.
In the previous chart, each one of the lines connecting from the yellow dot to the blue dot can represent a signal, weighing the importance of that node in determining the overall score of the output.
If that signal is high, that node is significant to the model’s overall performance.
If that signal is low, the node is insignificant.
With this understanding, we can define explainability as:
Knowledge of what one node represents and how important it is to the model’s performance.
So how does choice of these two different algorithms make a difference with respect to health care and medical decision making?
The authors argue:
“Regulators like the FDA should focus on those aspects of the AI/ML system that directly bear on its safety and effectiveness – in particular, how does it perform in the hands of its intended users?”
A suggestion for
Enhanced more involved clinical trials
Provide individuals added flexibility when interacting with a model, for example inputting their own test data
More interaction between user and model generators
Determining in which situations call for interpretable AI versus explainable (for instance predicting which patients will require dialysis after kidney damage)
Other articles on AI/ML in medicine and healthcare on this Open Access Journal include
Al is on the way to lead critical ED decisions on CT
Curator and Reporter: Dr. Premalata Pati, Ph.D., Postdoc
Artificial intelligence (AI) has infiltrated many organizational processes, raising concerns that robotic systems will eventually replace many humans in decision-making. The advent of AI as a tool for improving health care provides new prospects to improve patient and clinical team’s performance, reduce costs, and impact public health. Examples include, but are not limited to, automation; information synthesis for patients, “fRamily” (friends and family unpaid caregivers), and health care professionals; and suggestions and visualization of information for collaborative decision making.
In the emergency department (ED), patients with Crohn’s disease (CD) are routinely subjected to Abdomino-Pelvic Computed Tomography (APCT). It is necessary to diagnose clinically actionable findings (CAF) since they may require immediate intervention, which is typically surgical. Repeated APCTs, on the other hand, results in higher ionizing radiation exposure. The majority of APCT performance guidance is clinical and empiric. Emergency surgeons struggle to identify Crohn’s disease patients who actually require a CT scan to determine the source of acute abdominal distress.
Aid seems to be on the way. Researchers employed machine learning to accurately distinguish these sufferers from Crohn’s patients who appear with the same complaint but may safely avoid the recurrent exposure to contrast materials and ionizing radiation that CT would otherwise wreak on them.
Retrospectively, Jacob Ollech and his fellow researcher have analyzed 101 emergency treatments of patients with Crohn’s who underwent abdominopelvic CT.
They were looking for examples where a scan revealed clinically actionable results. These were classified as intestinal blockage, perforation, intra-abdominal abscess, or complex fistula by the researchers.
On CT, 44 (43.5 %) of the 101 cases reviewed had such findings.
Ollech and colleagues utilized a machine-learning technique to design a decision-support tool that required only four basic clinical factors to test an AI approach for making the call.
The approach was successful in categorizing patients into low- and high-risk groupings. The researchers were able to risk-stratify patients based on the likelihood of clinically actionable findings on abdominopelvic CT as a result of their success.
Ollech and co-authors admit that their limited sample size, retrospective strategy, and lack of external validation are shortcomings.
Moreover, several patients fell into an intermediate risk category, implying that a standard workup would have been required to guide CT decision-making in a real-world situation anyhow.
Consequently, they generate the following conclusion:
We believe this study shows that a machine learning-based tool is a sound approach for better-selecting patients with Crohn’s disease admitted to the ED with acute gastrointestinal complaints about abdominopelvic CT: reducing the number of CTs performed while ensuring that patients with high risk for clinically actionable findings undergo abdominopelvic CT appropriately.
Main Source:
Konikoff, Tom, Idan Goren, Marianna Yalon, Shlomit Tamir, Irit Avni-Biron, Henit Yanai, Iris Dotan, and Jacob E. Ollech. “Machine learning for selecting patients with Crohn’s disease for abdominopelvic computed tomography in the emergency department.” Digestive and Liver Disease (2021). https://www.sciencedirect.com/science/article/abs/pii/S1590865821003340
Other Related Articles published in this Open Access Online Scientific Journal include the following: