Funding, Deals & Partnerships: BIOLOGICS & MEDICAL DEVICES; BioMed e-Series; Medicine and Life Sciences Scientific Journal – http://PharmaceuticalIntelligence.com
Use of Systems Biology for Design of inhibitor of Galectins as Cancer Therapeutic – Strategy and Software
Curator:Stephen J. Williams, Ph.D.
Below is a slide representation of the overall mission 4 to produce a PROTAC to inhibit Galectins 1, 3, and 9.
Using A Priori Knowledge of Galectin Receptor Interaction to Create a BioModel of Galectin 3 Binding
Now after collecting literature from PubMed on “galectin-3” AND “binding” to determine literature containing kinetic data we generate a WordCloud on the articles.
This following file contains the articles needed for BioModels generation.
From the WordCloud we can see that these corpus of articles describe galectin binding to the CRD (carbohydrate recognition domain). Interestingly there are many articles which describe van Der Waals interactions as well as electrostatic interactions. Certain carbohydrate modifictions like Lac NAc and Gal 1,4 may be important. Many articles describe the bonding as well as surface interactions. Many studies have been performed with galectin inhibitors like TDGs (thio-digalactosides) like TAZ TDG (3-deoxy-3-(4-[m-fluorophenyl]-1H-1,2,3-triazol-1-yl)-thio-digalactoside). This led to an interesting article
.
Dual thio-digalactoside-binding modes of human galectins as the structural basis for the design of potent and selective inhibitors
Human galectins are promising targets for cancer immunotherapeutic and fibrotic disease-related drugs. We report herein the binding interactions of three thio-digalactosides (TDGs) including TDG itself, TD139 (3,3′-deoxy-3,3′-bis-(4-[m-fluorophenyl]-1H-1,2,3-triazol-1-yl)-thio-digalactoside, recently approved for the treatment of idiopathic pulmonary fibrosis), and TAZTDG (3-deoxy-3-(4-[m-fluorophenyl]-1H-1,2,3-triazol-1-yl)-thio-digalactoside) with human galectins-1, -3 and -7 as assessed by X-ray crystallography, isothermal titration calorimetry and NMR spectroscopy. Five binding subsites (A-E) make up the carbohydrate-recognition domains of these galectins. We identified novel interactions between an arginine within subsite E of the galectins and an arene group in the ligands. In addition to the interactions contributed by the galactosyl sugar residues bound at subsites C and D, the fluorophenyl group of TAZTDG preferentially bound to subsite B in galectin-3, whereas the same group favored binding at subsite E in galectins-1 and -7. The characterised dual binding modes demonstrate how binding potency, reported as decreased Kd values of the TDG inhibitors from μM to nM, is improved and also offer insights to development of selective inhibitors for individual galectins.
Figures
Figure 1. Chemical structures of L3, TDG…
Figure 2. Structural comparison of the carbohydrate…
Explanation on “Results of Medical Text Analysis with Natural Language Processing (NLP) presented in LPBI Group’s NEW GENRE Edition: NLP” on Genomics content, standalone volume in Series B and NLP on Cancer content as Part B New Genre Volume 1 in Series C
NEW GENRE Edition, Editor-in-Chief: Aviva Lev-Ari, PhD, RN
Series B: Frontiers in Genomics Research NEW GENRE Audio English-Spanish
PART A: The eTOCs in Spanish in Audio format AND the eTOCs in Bi-lingual format: Spanish and English in Text format
PART C: The Editorials of the original e-Books in English in Audio format
However,
PART B: The graphical results of Machine Learning (ML), Deep Learning (DL) and Natural Language Processing (NLP) algorithms AND the Domain Knowledge Expert (DKE) interpretation of the results in Text format – PART B IS ISSUED AS A STANDALONE VOLUME, named
PART A.1: The eTOCs in Spanish in Audio format AND
PART A.2: The eTOCs in Bi-lingual format: Spanish and English in Text format
PART B:
The graphical results of Medical Text Analysis with Machine Learning (ML), Deep Learning (DL) and Natural Language Processing (NLP) algorithms AND the Domain Knowledge Expert (DKE) interpretation of the results in Text format
Infertility is a major reproductive health issue that affects about 12% of women of reproductive age in the United States. Aneuploidy in eggs accounts for a significant proportion of early miscarriage and in vitro fertilization failure. Recent studies have shown that genetic variants in several genes affect chromosome segregation fidelity and predispose women to a higher incidence of egg aneuploidy. However, the exact genetic causes of aneuploid egg production remain unclear, making it difficult to diagnose infertility based on individual genetic variants in mother’s genome. Although, age is a predictive factor for aneuploidy, it is not a highly accurate gauge because aneuploidy rates within individuals of the same age can vary dramatically.
Researchers described a technique combining genomic sequencing with machine-learning methods to predict the possibility a woman will undergo a miscarriage because of egg aneuploidy—a term describing a human egg with an abnormal number of chromosomes. The scientists were able to examine genetic samples of patients using a technique called “whole exome sequencing,” which allowed researchers to home in on the protein coding sections of the vast human genome. Then they created software using machine learning, an aspect of artificial intelligence in which programs can learn and make predictions without following specific instructions. To do so, the researchers developed algorithms and statistical models that analyzed and drew inferences from patterns in the genetic data.
As a result, the scientists were able to create a specific risk score based on a woman’s genome. The scientists also identified three genes—MCM5, FGGY and DDX60L—that when mutated and are highly associated with a risk of producing eggs with aneuploidy. So, the report demonstrated that sequencing data can be mined to predict patients’ aneuploidy risk thus improving clinical diagnosis. The candidate genes and pathways that were identified in the present study are promising targets for future aneuploidy studies. Identifying genetic variations with more predictive power will serve women and their treating clinicians with better information.
In this article, I will list 9 free Harvard courses that you can take to learn data science from scratch. Feel free to skip any of these courses if you already possess knowledge of that subject.
Step 1: Programming
The first step you should take when learning data science is to learn to code. You can choose to do this with your choice of programming language?—?ideally Python or R.
If you’d like to learn R, Harvard offers an introductory R course created specifically for data science learners, called Data Science: R Basics.
This program will take you through R concepts like variables, data types, vector arithmetic, and indexing. You will also learn to wrangle data with libraries like dplyr and create plots to visualize data.
If you prefer Python, you can choose to take CS50’s Introduction to Programming with Python offered for free by Harvard. In this course, you will learn concepts like functions, arguments, variables, data types, conditional statements, loops, objects, methods, and more.
Both programs above are self-paced. However, the Python course is more detailed than the R program, and requires a longer time commitment to complete. Also, the rest of the courses in this roadmap are taught in R, so it might be worth learning R to be able to follow along easily.
Step 2: Data Visualization
Visualization is one of the most powerful techniques with which you can translate your findings in data to another person.
With Harvard’s Data Visualization program, you will learn to build visualizations using the ggplot2 library in R, along with the principles of communicating data-driven insights.
Step 3: Probability
In this course, you will learn essential probability concepts that are fundamental to conducting statistical tests on data. The topics taught include random variables, independence, Monte Carlo simulations, expected values, standard errors, and the Central Limit Theorem.
The concepts above will be introduced with the help of a case study, which means that you will be able to apply everything you learned to an actual real-world dataset.
Step 4: Statistics
After learning probability, you can take this course to learn the fundamentals of statistical inference and modelling.
This program will teach you to define population estimates and margin of errors, introduce you to Bayesian statistics, and provide you with the fundamentals of predictive modeling.
Step 5: Productivity Tools (Optional)
I’ve included this project management course as optional since it isn’t directly related to learning data science. Rather, you will be taught to use Unix/Linux for file management, Github, version control, and creating reports in R.
The ability to do the above will save you a lot of time and help you better manage end-to-end data science projects.
Step 6: Data Pre-Processing
The next course in this list is called Data Wrangling, and will teach you to prepare data and convert it into a format that is easily digestible by machine learning models.
You will learn to import data into R, tidy data, process string data, parse HTML, work with date-time objects, and mine text.
As a data scientist, you often need to extract data that is publicly available on the Internet in the form of a PDF document, HTML webpage, or a Tweet. You will not always be presented with clean, formatted data in a CSV file or Excel sheet.
By the end of this course, you will learn to wrangle and clean data to come up with critical insights from it.
Step 7: Linear Regression
Linear regression is a machine learning technique that is used to model a linear relationship between two or more variables. It can also be used to identify and adjust the effect of confounding variables.
This course will teach you the theory behind linear regression models, how to examine the relationship between two variables, and how confounding variables can be detected and removed before building a machine learning algorithm.
Step 8: Machine Learning
Finally, the course you’ve probably been waiting for! Harvard’s machine learning program will teach you the basics of machine learning, techniques to mitigate overfitting, supervised and unsupervised modelling approaches, and recommendation systems.
Step 9: Capstone Project
After completing all the above courses, you can take Harvard’s data science capstone project, where your skills in data visualization, probability, statistics, data wrangling, data organization, regression, and machine learning will be assessed.
With this final project, you will get the opportunity to put together all the knowledge learnt from the above courses and gain the ability to complete a hands-on data science project from scratch.
Note: All the courses above are available on an online learning platform from edX and can be audited for free. If you want a course certificate, however, you will have to pay for one.
Data powers AI. Good data can mean the difference between an impactful solution or one that never gets off the ground. Re-assess the foundational AI questions to ensure your data is working for, not against, you.
Innovation to Reality
The challenges of implementing AI are many. Avoid the common pitfalls with real-world case studies from leaders who have successfully turned their AI solutions into reality.
Harness What’s Possible at the Edge
With its potential for near instantaneous decision making, pioneers are moving AI to the edge. We examine the pros and cons of moving AI decisions to the edge, with the experts getting it right.
Generative AI Solutions
The use of generative AI to boost human creativity is breaking boundaries in creative areas previously untouched by AI. We explore the intersection of data and algorithms enabling collaborative AI processes to design and create.
Data powers AI. Good data can mean the difference between an impactful solution or one that never gets off the ground. Re-assess the foundational AI questions to ensure your data is working for, not against, you.
Data is the most under-valued and de-glamorized aspect of AI. Learn why shifting the focus from model/algorithm development to quality of the data is the next and most efficient, way to improve the decision-making abilities of AI.
Data labeling is key to determining the success or failure of AI applications. Learn how to implement a data-first approach that can transform AI inference, resulting in better models that make better decisions.
Question the status quo. Build stakeholder trust. These are foundational elements of thought leadership in AI. Explore how organizations can use their data and algorithms in ethical and responsible ways while building bigger and more effective systems.
Haniyeh Mahmoudian
Global AI Ethicist, DataRobot
Mainstage Break (10:35 a.m. – 11:05 a.m.)
Networking and refreshments for our live audience and a selection of curated content for those tuning in virtually.
With its next-generation machine learning models fueling precision medicine, French biotech company, Owkin, captured the attention of the pharma industry. Learn how they did it and get tips to navigate the complex task of scaling your innovation.
Networking and refreshments for our live audience.
Innovation to Reality (11:05 a.m. – 12:30 p.m.)
The challenges of implementing AI are many. Avoid the common pitfalls with real-world case studies from leaders who have successfully turned their AI solutions into reality.
Deploying AI in real-world environments benefits from human input before and during implementation. Get an inside look at how organizations can ensure reliable results with the key questions and competing needs that should be considered when implementing AI solutions.
AI is evolving from the research lab into practical real world applications. Learn what issues should be top of mind for businesses, consumers, and researchers as we take a deep dive into AI solutions that increase modern productivity and accelerate intelligence transformation.
Getting AI to work 80% of the time is relatively straightforward, but trustworthy AI requires deployments that work 100% of the time. Unpack some of the biggest challenges that come up when eliminating the 20% gap.
Bali Raghavan
Head of Engineering, Forward
Lunch and Networking Break (12:30 p.m. – 1:30 p.m.)
Lunch served at the MIT Media Lab and a selection of curated content for those tuning in virtually.
Harness What’s Possible at the Edge (1:30 p.m. – 3:15 p.m.)
With its potential for near instantaneous decision making, pioneers are moving AI to the edge. We examine the pros and cons of moving AI decisions to the edge, with the experts getting it right.
To create sustainable business impact, AI capabilities need to be tailored and optimized to an industry or organization’s specific requirements and infrastructure model. Hear how customers’ challenges across industries can be addressed in any compute environment from the cloud to the edge with end-to-end hardware and software optimization.
Kavitha Prasad
VP & GM, Datacenter, AI and Cloud Execution and Strategy, Intel Corporation
Decision making has moved from the edge to the cloud before settling into a hybrid setup for many AI systems. Through the examination of key use-cases, take a deep dive into understanding the benefits and detractors of operating a machine-learning system at the point of inference.
Enable your organization to transform customer experiences through AI at the edge. Learn about the required technologies, including teachable and self-learning AI, that are needed for a successful shift to the edge, and hear how deploying these technologies at scale can unlock richer, more responsive experiences.
Reimagine AI solutions as a unified system, instead of individual components. Through the lens of autonomous vehicles, discover the pros and cons of using an all-inclusive AI-first approach that includes AI decision-making at the edge and see how this thinking can be applied across industry.
Raquel Urtasun
Founder & CEO, Waabi
Mainstage Break (3:15 p.m. – 3:45 p.m.)
Networking and refreshments for our live audience and a selection of curated content for those tuning in virtually.
Advances in machine learning are enabling artists and creative technologists to think about and use AI in new ways. Discuss the concept of creative AI and look at project examples from London’s art scene that illustrate the various ways creative AI is bridging the gap between the traditional art world and the latest technological innovations.
Luba Elliott
Curator, Producer, and Researcher, Creative AI
Generative AI Solutions (3:45 p.m. – 5:10 p.m.)
The use of generative AI to boost human creativity is breaking boundaries in creative areas previously untouched by AI. We explore the intersection of data and algorithms enabling collaborative AI processes to design and create.
Change the design problem with AI. The creative nature of generative AI enhances design capabilities, finding efficiencies and opportunities that humans alone might not conceive. Explore business applications including project planning, construction, and physical design.
Deep learning is data hungry technology. Manually labelled training data has become cost prohibitive and time-consuming. Get a glimpse at how interactive large-scale synthetic data generation can accelerate the AI revolution, unlocking the potential of data-driven artificial intelligence.
Danny Lange
SVP of Artificial Intelligence, Unity Technologies
Push beyond the typical uses of AI. Explore the nexus of art, technology, and human creativity through the unique innovation of kinetic data sculptures that use machines to give physical context and shape to data to rethink how we engage with the physical world.
Refik Anadol
CEO, RAS Lab; Lecturer, UCLA
Last Call with the Editors (5:10 p.m. – 5:20 p.m.)
Before we wrap day 1, join our last call with all of our editors to get their analysis on the day’s topics, themes, and guests.
Networking Reception (5:20 p.m. – 6:20 p.m.)
WEDNESDAY, MARCH 30
Evolving the Algorithms
What’s Next for Deep Learning
Deep learning algorithms have powered most major AI advances of the last decade. We bring you into the top innovation labs to see how they are advancing their deep learning models to find out just how much more we can get out of these algorithms.
AI in Day-To-Day Business
Many organizations are already using AI internally in their day-to-day operations, in areas like cybersecurity, customer service, finance, and manufacturing. We examine the tools that organizations are using when putting AI to work.
Making AI Work for All
As AI increasingly underpins our lives, businesses, and society, we must ensure that AI must work for everyone – not just those represented in datasets, and not just 80% of the time. Examine the challenges and solutions needed to ensure AI works fairly, for all.
Envisioning the Next AI
Some business problems can’t be solved with current deep learning methods. We look at what’s around the corner at the new approaches and most revolutionary ideas propelling us toward the next stage in AI evolution.
Day 2: Evolving the Algorithms (9:00 a.m. – 5:25 p.m.)
What’s Next for Deep Learning (9:10 a.m. – 10:25 a.m.)
Deep learning algorithms have powered most major AI advances of the last decade. We bring you into the top innovation labs to see how they are advancing their deep learning models to find out just how much more we can get out of these algorithms.
Transformer-based language models are revolutionizing the way neural networks process natural language. This deep dive looks at how organizations can put their data to work using transformer models. We consider the problems that business may face as these massive models mature, including training needs, managing parallel processing at scale, and countering offensive data.
Critical thinking may be one step closer for AI by combining large-scale transformers with smart sampling and filtering. Get an early look at how AlphaCode’s entry into competitive programming may lead to a human-like capacity for AI to write original code that solves unforeseen problems.
As advanced AI systems gain greater capabilities in our search for artificial general intelligence, it’s critical to teach them how to understand human intentions. Look at the latest advancements in AI systems and how to ensure they can be truthful, helpful, and safe.
Mira Murati
SVP, Research, Product, & Partnerships, OpenAI
Mainstage Break (10:25 a.m. – 10:55 a.m.)
Networking and refreshments for our live audience and a selection of curated content for those tuning in virtually.
Good data is the bedrock of a self-service data consumption model, which in turn unlocks insights, analytics, personalization at scale through AI. Yet many organizations face immense challenges setting up a robust data foundation. Dive into a pragmatic perspective on abstracting the complexity and untangling the conflicts in data management for better AI.
Naveen Kamat
Executive Director, Data and AI Services, Kyndryl
AI in Day-To-Day Business (10:55 a.m. – 12:20 p.m.)
Many organizations are already using AI internally in their day-to-day operations, in areas like cybersecurity, customer service, finance, and manufacturing. We examine the tools that organizations are using when putting AI to work.
Effectively operationalized AI/ML can unlock untapped potential in your organization. From enhancing internal processes to managing the customer experience, get the pragmatic advice and takeaways leaders need to better understand their internal data to achieve impactful results.
Use AI to maximize reliability of supply chains. Learn the dos and don’ts to managing key processes within your supply chain, including workforce management, streamlining and simplification, and reaping the full value of your supply chain solutions.
Darcy MacClaren
Senior Vice President, Digital Supply Chain, SAP North America
Machine and reinforcement learning enable Spotify to deliver the right content to the right listener at the right time, allowing for personalized listening experiences that facilitate discovery at a global scale. Through user interactions, algorithms suggest new content and creators that keep customers both happy and engaged with the platform. Dive into the details of making better user recommendations.
Tony Jebara
VP of Engineering and Head of Machine Learning, Spotify
Lunch and Networking Break (12:20 p.m. – 1:15 p.m.)
Lunch served at the MIT Media Lab and a selection of curated content for those tuning in virtually.
Making AI Work for All (1:15 p.m. – 2:35 p.m.)
As AI increasingly underpins our lives, businesses, and society, we must ensure that AI must work for everyone – not just those represented in datasets, and not just 80% of the time. Examine the challenges and solutions needed to ensure AI works fairly, for all.
Walk through the practical steps to map and understand the nuances, outliers, and special cases in datasets. Get tips to ensure ethical and trustworthy approaches to training AI systems that grow in scope and scale within a business.
Lauren Bennett
Group Software Engineering Lead, Spatial Analysis and Data Science, Esri
Get an inside look at the long- and short-term benefits of addressing inequities in AI opportunities, ranging from educating the tech youth of the future to a 10,000-foot view on what it will take to ensure that equity top is of mind within society and business alike.
Public policies can help to make AI more equitable and ethical for all. Examine how policies could impact corporations and what it means for building internal policies, regardless of what government adopts. Identify actionable ideas to best move policies forward for the widest benefit to all.
Nicol Turner Lee
Director, Center for Technology Innovation, Brookings Institution
Mainstage Break (2:35 p.m. – 3:05 p.m.)
Networking and refreshments for our live audience and a selection of curated content for those tuning in virtually.
From the U.S. to China, the global robo-taxi race is gaining traction with consumers and regulators alike. Go behind the scenes with AutoX – a Level 4 driving technology company – and hear how it overcame obstacles while launching the world’s second and China’s first public, fully driverless robo-taxi service.
Jianxiong Xiao
Founder and CEO, AutoX
Envisioning the Next AI (3:05 p.m. – 4:50 p.m.)
Some business problems can’t be solved with current deep learning methods. We look at what’s around the corner at the new approaches and most revolutionary ideas propelling us toward the next stage in AI evolution.
The use of AI in finance is gaining traction as organizations realize the advantages of using algorithms to streamline and improve the accuracy of financial tasks. Step through use cases that examine how AI can be used to minimize financial risk, maximize financial returns, optimize venture capital funding by connecting entrepreneurs to the right investors; and more.
Sameena Shah
Managing Director, J.P. Morgan AI Research, JP Morgan Chase
In a study of simulated robotic evolution, it was observed that more complex environments and evolutionary changes to the robot’s physical form accelerated the growth of robot intelligence. Examine this cutting-edge research and decipher what this early discovery means for the next generation of AI and robotics.
Agrim Gupta
PhD Student, Stanford Vision and Learning Lab, Stanford University
Understanding human thinking and reasoning processes could lead to more general, flexible and human-like artificial intelligence. Take a close look at the research building AI inspired by human common-sense that could create a new generation of tools for complex decision-making.
Zenna Tavares
Research Scientist, Columbia University; Co-Founder, Basis
Look under the hood at this innovative approach to AI learning with multi-agent and human-AI interactions. Discover how bots work together and learn together through personal interactions. Recognize the future implications for AI, plus the benefits and obstacles that may come from this new process.
David Ferrucci was the principal investigator for the team that led IBM Watson to its landmark Jeopardy success, awakening the world to the possibilities of AI. We pull back the curtain on AI for a wide-ranging discussion on explicable models, and the next generation of human and machine collaboration creating AI thought partners with limitless applications.
AI enabled Drug Discovery and Development: The Challenges and the Promise
Reporter:Aviva Lev-Ari, PhD, RN
Early Development
Caroline Kovac (the first IBM GM of Life Sciences) is the one who started in silico development of drugs in 2000 using a big db of substances and computer power. She transformed an idea into $2b business. Most of the money was from big pharma. She was asking what is are the new drugs they are planning to develop and provided the four most probable combinations of substances, based on in Silicon work.
Carol Kovac
General Manager, Healthcare and Life Sciences, IBM
from speaker at conference on 2005
Carol Kovac is General Manager of IBM Healthcare and Life Sciences responsible for the strategic direction of IBM′s global healthcare and life sciences business. Kovac leads her team in developing the latest information technology solutions and services, establishing partnerships and overseeing IBM investment within the healthcare, pharmaceutical and life sciences markets. Starting with only two employees as an emerging business unit in the year 2000, Kovac has successfully grown the life sciences business unit into a multi-billion dollar business and one of IBM′s most successful ventures to date with more than 1500 employees worldwide. Kovac′s prior positions include general manager of IBM Life Sciences, vice president of Technical Strategy and Division Operations, and vice president of Services and Solutions. In the latter role, she was instrumental in launching the Computational Biology Center at IBM Research. Kovac sits on the Board of Directors of Research!America and Africa Harvest. She was inducted into the Women in Technology International Hall of Fame in 2002, and in 2004, Fortune magazine named her one of the 50 most powerful women in business. Kovac earned her Ph.D. in chemistry at the University of Southern California.
The use of artificial intelligence in drug discovery, when coupled with new genetic insights and the increase of patient medical data of the last decade, has the potential to bring novel medicines to patients more efficiently and more predictably.
Jack Fuchs, MBA ’91, an adjunct lecturer who teaches “Principled Entrepreneurial Decisions” at Stanford School of Engineering, moderated and explored how clearly articulated principles can guide the direction of technological advancements like AI-enabled drug discovery.
Kim Branson, Global head of AI and machine learning at GSK.
Russ Altman, the Kenneth Fong Professor of Bioengineering, of genetics, of medicine (general medical discipline), of biomedical data science and, by courtesy, of computer science.
Synthetic Biology Software applied to development of Galectins Inhibitors at LPBI Group
Using Structural Computation Models to Predict Productive PROTAC Ternary Complexes
Ternary complex formation is necessary but not sufficient for target protein degradation. In this research, Bai et al. have addressed questions to better understand the rate-limiting steps between ternary complex formation and target protein degradation. They have developed a structure-based computer model approach to predict the efficiency and sites of target protein ubiquitination by CRNB-binding PROTACs. Such models will allow a more complete understanding of PROTAC-directed degradation and allow crafting of increasingly effective and specific PROTACs for therapeutic applications.
Another major feature of this research is that it a result of collaboration between research groups at Amgen, Inc. and Promega Corporation. In the past commercial research laboratories have shied away from collaboration, but the last several years have found researchers more open to collaborative work. This increased collaboration allows scientists to bring their different expertise to a problem or question and speed up discovery. According to Dr. Kristin Riching, Senior Research Scientist at Promega Corporation, “Targeted protein degraders have broken many of the rules that have guided traditional drug development, but it is exciting to see how the collective learnings we gain from their study can aid the advancement of this new class of molecules to the clinic as effective therapeutics.”
Medical Startups – Artificial Intelligence (AI) Startups in Healthcare
Reporters: Stephen J. Williams, PhD and Aviva Lev-Ari, PhD, RN and Shraga Rottem, MD, DSc,
The motivation for this post is two fold:
First, we are presenting an application of AI, NLP, DL to our own medical text in the Genomics space. Here we present the first section of Part 1 in the following book. Part 1 has six subsections that yielded 12 plots. The entire Book is represented by 38 x 2 = 76 plots.
Second, we bring to the attention of the e-Reader the list of 276 Medical Startups – Artificial Intelligence (AI) Startups in Healthcare as a hot universe of R&D activity in Human Health.
Third, to highlight one academic center with an AI focus
Dear friends of the ETH AI Center,
We would like to provide you with some exciting updates from the ETH AI Center and its growing community.
As the Covid-19 restrictions in Switzerland have recently been lifted, we would like to hear from you what kind of events you would like to see in 2022! Participate in the survey to suggest event formats and topics that you would enjoy being a part of. We are already excited to learn what we can achieve together this year.
We already have many interesting events coming up, we look forward to seeing you at our main and community events!
LPBI Group is applying AI for Medical Text Analysis with Machine Learning and Natural Language Processing: Statistical and Deep Learning
Our Book
Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS & BioInformatics, Simulations and the Genome Ontology
Medical Text Analysis of this Books shows the following results obtained by Madison Davis by applying Wolfram NLP for Biological Languages on our own Text. See below an Example:
@MIT Artificial intelligence system rapidly predicts how two proteins will attach: The model called Equidock, focuses on rigid body docking — which occurs when two proteins attach by rotating or translating in 3D space, but their shapes don’t squeeze or bend
Reporter: Aviva Lev-Ari, PhD, RN
This paper introduces a novel SE(3) equivariant graph matching network, along with a keypoint discovery and alignment approach, for the problem of protein-protein docking, with a novel loss based on optimal transport. The overall consensus is that this is an impactful solution to an important problem, whereby competitive results are achieved without the need for templates, refinement, and are achieved with substantially faster run times.
Keywords:protein complexes, protein structure, rigid body docking, SE(3) equivariance, graph neural networks
Abstract: Protein complex formation is a central problem in biology, being involved in most of the cell’s processes, and essential for applications such as drug design or protein engineering. We tackle rigid body protein-protein docking, i.e., computationally predicting the 3D structure of a protein-protein complex from the individual unbound structures, assuming no three-dimensional flexibility during binding. We design a novel pairwise-independent SE(3)-equivariant graph matching network to predict the rotation and translation to place one of the proteins at the right location and the right orientation relative to the second protein. We mathematically guarantee that the predicted complex is always identical regardless of the initial placements of the two structures, avoiding expensive data augmentation. Our model approximates the binding pocket and predicts the docking pose using keypoint matching and alignment through optimal transport and a differentiable Kabsch algorithm. Empirically, we achieve significant running time improvements over existing protein docking software and predict qualitatively plausible protein complex structures despite not using heavy sampling, structure refinement, or templates.
One-sentence Summary: We perform rigid protein docking using a novel independent SE(3)-equivariant message passing mechanism that guarantees the same resulting protein complex independent of the initial placement of the two 3D structures.
MIT researchers created a machine-learning model that can directly predict the complex that will form when two proteins bind together. Their technique is between 80 and 500 times faster than state-of-the-art software methods, and often predicts protein structures that are closer to actual structures that have been observed experimentally.
This technique could help scientists better understand some biological processes that involve protein interactions, like DNA replication and repair; it could also speed up the process of developing new medicines.
“Deep learning is very good at capturing interactions between different proteins that are otherwise difficult for chemists or biologists to write experimentally. Some of these interactions are very complicated, and people haven’t found good ways to express them. This deep-learning model can learn these types of interactions from data,” says Octavian-Eugen Ganea, a postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and co-lead author of the paper.
Ganea’s co-lead author is Xinyuan Huang, a graduate student at ETH Zurich. MIT co-authors include Regina Barzilay, the School of Engineering Distinguished Professor for AI and Health in CSAIL, and Tommi Jaakkola, the Thomas Siebel Professor of Electrical Engineering in CSAIL and a member of the Institute for Data, Systems, and Society. The research will be presented at the International Conference on Learning Representations.
Significance of the Scientific Development by the @MIT Team
EquiDock wide applicability:
Our method can be integrated end-to-end to boost the quality of other models (see above discussion on runtime importance). Examples are predicting functions of protein complexes [3] or their binding affinity [5], de novo generation of proteins binding to specific targets (e.g., antibodies [6]), modeling back-bone and side-chain flexibility [4], or devising methods for non-binary multimers. See the updated discussion in the “Conclusion” section of our paper.
Advantages over previous methods:
Our method does not rely on templates or heavy candidate sampling [7], aiming at the ambitious goal of predicting the complex pose directly. This should be interpreted in terms of generalization (to unseen structures) and scalability capabilities of docking models, as well as their applicability to various other tasks (discussed above).
Our method obtains a competitive quality without explicitly using previous geometric (e.g., 3D Zernike descriptors [8]) or chemical (e.g., hydrophilic information) features [3]. Future EquiDock extensions would find creative ways to leverage these different signals and, thus, obtain more improvements.
Novelty of theory:
Our work is the first to formalize the notion of pairwise independent SE(3)-equivariance. Previous work (e.g., [9,10]) has incorporated only single object Euclidean-equivariances into deep learning models. For tasks such as docking and binding of biological objects, it is crucial that models understand the concept of multi-independent Euclidean equivariances.
All propositions in Section 3 are our novel theoretical contributions.
We have rewritten the Contribution and Related Work sections to clarify this aspect.
Footnote [a]: We have fixed an important bug in the cross-attention code. We have done a more extensive hyperparameter search and understood that layer normalization is crucial in layers used in Eqs. 5 and 9, but not on the h embeddings as it was originally shown in Eq. 10. We have seen benefits from training our models with a longer patience in the early stopping criteria (30 epochs for DIPS and 150 epochs for DB5). Increasing the learning rate to 2e-4 is important to speed-up training. Using an intersection loss weight of 10 leads to improved results compared to the default of 1.
Bibliography:
[1] Protein-ligand blind docking using QuickVina-W with inter-process spatio-temporal integration, Hassan et al., 2017
[2] GNINA 1.0: molecular docking with deep learning, McNutt et al., 2021
[3] Protein-protein and domain-domain interactions, Kangueane and Nilofer, 2018
[4] Side-chain Packing Using SE(3)-Transformer, Jindal et al., 2022
[5] Contacts-based prediction of binding affinity in protein–protein complexes, Vangone et al., 2015
[6] Iterative refinement graph neural network for antibody sequence-structure co-design, Jin et al., 2021
[7] Hierarchical, rotation-equivariant neural networks to select structural models of protein complexes, Eismann et al, 2020
[8] Protein-protein docking using region-based 3D Zernike descriptors, Venkatraman et al., 2009
[9] SE(3)-transformers: 3D roto-translation equivariant attention networks, Fuchs et al, 2020
[10] E(n) equivariant graph neural networks, Satorras et al., 2021
[11] Fast end-to-end learning on protein surfaces, Sverrisson et al., 2020
From: Heidi Rheim et al. GA4GH: International policies and standards for data sharing across genomic research and healthcare. (2021): Cell Genomics, Volume 1 Issue 2.
Siloing genomic data in institutions/jurisdictions limits learning and knowledge
GA4GH policy frameworks enable responsible genomic data sharing
GA4GH technical standards ensure interoperability, broad access, and global benefits
Data sharing across research and healthcare will extend the potential of genomics
Summary
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.
In order for genomic and personalized medicine to come to fruition it is imperative that data siloes around the world are broken down, allowing the international collaboration for the collection, storage, transferring, accessing and analying of molecular and health-related data.
We had talked on this site in numerous articles about the problems data siloes produce. By data siloes we are meaning that collection and storage of not only DATA but intellectual thought are being held behind physical, electronic, and intellectual walls and inacessible to other scientisits not belonging either to a particular institituion or even a collaborative network.
Standardization and harmonization of data is key to this effort to sharing electronic records. The EU has taken bold action in this matter. The following section is about the General Data Protection Regulation of the EU and can be found at the following link:
The data protection package adopted in May 2016 aims at making Europe fit for the digital age. More than 90% of Europeans say they want the same data protection rights across the EU and regardless of where their data is processed.
The General Data Protection Regulation (GDPR)
Regulation (EU) 2016/679 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data. This text includes the corrigendum published in the OJEU of 23 May 2018.
The regulation is an essential step to strengthen individuals’ fundamental rights in the digital age and facilitate business by clarifying rules for companies and public bodies in the digital single market. A single law will also do away with the current fragmentation in different national systems and unnecessary administrative burdens.
Directive (EU) 2016/680 on the protection of natural persons regarding processing of personal data connected with criminal offences or the execution of criminal penalties, and on the free movement of such data.
The directive protects citizens’ fundamental right to data protection whenever personal data is used by criminal law enforcement authorities for law enforcement purposes. It will in particular ensure that the personal data of victims, witnesses, and suspects of crime are duly protected and will facilitate cross-border cooperation in the fight against crime and terrorism.
The directive entered into force on 5 May 2016 and EU countries had to transpose it into their national law by 6 May 2018.
The following paper by the organiztion The Global Alliance for Genomics and Health discusses these types of collaborative efforts to break down data silos in personalized medicine. This organization has over 2000 subscribers in over 90 countries encompassing over 60 organizations.
Enabling responsible genomic data sharing for the benefit of human health
The Global Alliance for Genomics and Health (GA4GH) is a policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework.
he Global Alliance for Genomics and Health (GA4GH) is an international, nonprofit alliance formed in 2013 to accelerate the potential of research and medicine to advance human health. Bringing together 600+ leading organizations working in healthcare, research, patient advocacy, life science, and information technology, the GA4GH community is working together to create frameworks and standards to enable the responsible, voluntary, and secure sharing of genomic and health-related data. All of our work builds upon the Framework for Responsible Sharing of Genomic and Health-Related Data.
GA4GH Connect is a five-year strategic plan that aims to drive uptake of standards and frameworks for genomic data sharing within the research and healthcare communities in order to enable responsible sharing of clinical-grade genomic data by 2022. GA4GH Connect links our Work Streams with Driver Projects—real-world genomic data initiatives that help guide our development efforts and pilot our tools.
The Global Alliance for Genomics and Health (GA4GH) is a worldwide alliance of genomics researchers, data scientists, healthcare practitioners, and other stakeholders. We are collaborating to establish policy frameworks and technical standards for responsible, international sharing of genomic and other molecular data as well as related health data. Founded in 2013,3 the GA4GH community now consists of more than 1,000 individuals across more than 90 countries working together to enable broad sharing that transcends the boundaries of any single institution or country (see https://www.ga4gh.org).In this perspective, we present the strategic goals of GA4GH and detail current strategies and operational approaches to enable responsible sharing of clinical and genomic data, through both harmonized data aggregation and federated approaches, to advance genomic medicine and research. We describe technical and policy development activities of the eight GA4GH Work Streams and implementation activities across 24 real-world genomic data initiatives (“Driver Projects”). We review how GA4GH is addressing the major areas in which genomics is currently deployed including rare disease, common disease, cancer, and infectious disease. Finally, we describe differences between genomic sequence data that are generated for research versus healthcare purposes, and define strategies for meeting the unique challenges of responsibly enabling access to data acquired in the clinical setting.
GA4GH organization
GA4GH has partnered with 24 real-world genomic data initiatives (Driver Projects) to ensure its standards are fit for purpose and driven by real-world needs. Driver Projects make a commitment to help guide GA4GH development efforts and pilot GA4GH standards (see Table 2). Each Driver Project is expected to dedicate at least two full-time equivalents to GA4GH standards development, which takes place in the context of GA4GH Work Streams (see Figure 1). Work Streams are the key production teams of GA4GH, tackling challenges in eight distinct areas across the data life cycle (see Box 1). Work Streams consist of experts from their respective sub-disciplines and include membership from Driver Projects as well as hundreds of other organizations across the international genomics and health community.
Figure 1Matrix structure of the Global Alliance for Genomics and HealthShow full caption
Box 1GA4GH Work Stream focus areasThe GA4GH Work Streams are the key production teams of the organization. Each tackles a specific area in the data life cycle, as described below (URLs listed in the web resources).
(1)Data use & researcher identities: Develops ontologies and data models to streamline global access to datasets generated in any country9,10
(2)Genomic knowledge standards: Develops specifications and data models for exchanging genomic variant observations and knowledge18
(3)Cloud: Develops federated analysis approaches to support the statistical rigor needed to learn from large datasets
(4)Data privacy & security: Develops guidelines and recommendations to ensure identifiable genomic and phenotypic data remain appropriately secure without sacrificing their analytic potential
(5)Regulatory & ethics: Develops policies and recommendations for ensuring individual-level data are interoperable with existing norms and follow core ethical principles
(6)Discovery: Develops data models and APIs to make data findable, accessible, interoperable, and reusable (FAIR)
(7)Clinical & phenotypic data capture & exchange: Develops data models to ensure genomic data is most impactful through rich metadata collected in a standardized way
(8)Large-scale genomics: Develops APIs and file formats to ensure harmonized technological platforms can support large-scale computing
For more articles on Open Access, Science 2.0, and Data Networks for Genomics on this Open Access Scientific Journal see:
The Vibrant Philly Biotech Scene: Proteovant Therapeutics Using Artificial Intelligence and Machine Learning to Develop PROTACs
Reporter:Stephen J. Williams, Ph.D.
It has been a while since I have added to this series but there have been a plethora of exciting biotech startups in the Philadelphia area, and many new startups combining technology, biotech, and machine learning. One such exciting biotech is Proteovant Therapeutics, which is combining the new PROTAC (Proteolysis-Targeting Chimera) technology with their in house ability to utilize machine learning and artificial intelligence to design these types of compounds to multiple intracellular targets.
PROTACs (which actually is under a trademark name of Arvinus Operations, but is also refered to as Protein Degraders. These PROTACs take advantage of the cell protein homeostatic mechanism of ubiquitin-mediated protein degradation, which is a very specific targeted process which regulates protein levels of various transcription factors, protooncogenes, and receptors. In essence this regulated proteolyic process is needed for normal cellular function, and alterations in this process may lead to oncogenesis, or a proteotoxic crisis leading to mitophagy, autophagy and cellular death. The key to this technology is using chemical linkers to associate an E3 ligase with a protein target of interest. E3 ligases are the rate limiting step in marking the proteins bound for degradation by the proteosome with ubiquitin chains.
A review of this process as well as PROTACs can be found elsewhere in articles (and future articles) on this Open Access Journal.
Protevant have made two important collaborations:
Oncopia Therapeutics: came out of University of Michigan Innovation Hub and lab of Shaomeng Wang, who developed a library of BET and MDM2 based protein degraders. In 2020 was aquired by Riovant Sciences.
Riovant Sciences: uses computer aided design of protein degraders
Proteovant Company Description:
Proteovant is a newly launched development-stage biotech company focusing on discovery and development of disease-modifying therapies by harnessing natural protein homeostasis processes. We have recently acquired numerous assets at discovery and development stages from Oncopia, a protein degradation company. Our lead program is on track to enter IND in 2021. Proteovant is building a strong drug discovery engine by combining deep drugging expertise with innovative platforms including Roivant’s AI capabilities to accelerate discovery and development of protein degraders to address unmet needs across all therapeutic areas. The company has recently secured $200M funding from SK Holdings in addition to investment from Roivant Sciences. Our current therapeutic focus includes but is not limited to oncology, immunology and neurology. We remain agnostic to therapeutic area and will expand therapeutic focus based on opportunity. Proteovant is expanding its discovery and development teams and has multiple positions in biology, chemistry, biochemistry, DMPK, bioinformatics and CMC at many levels. Our R&D organization is located close to major pharmaceutical companies in Eastern Pennsylvania with a second site close to biotech companies in Boston area.
The ubiquitin proteasome system (UPS) is responsible for maintaining protein homeostasis. Targeted protein degradation by the UPS is a cellular process that involves marking proteins and guiding them to the proteasome for destruction. We leverage this physiological cellular machinery to target and destroy disease-causing proteins.
Unlike traditional small molecule inhibitors, our approach is not limited by the classic “active site” requirements. For example, we can target transcription factors and scaffold proteins that lack a catalytic pocket. These classes of proteins, historically, have been very difficult to drug. Further, we selectively degrade target proteins, rather than isozymes or paralogous proteins with high homology. Because of the catalytic nature of the interactions, it is possible to achieve efficacy at lower doses with prolonged duration while decreasing dose-limiting toxicities.
Biological targets once deemed “undruggable” are now within reach.
Roivant develops transformative medicines faster by building technologies and developing talent in creative ways, leveraging the Roivant platform to launch “Vants” – nimble and focused biopharmaceutical and health technology companies. These Vants include Proteovant but also Dermovant, ImmunoVant,as well as others.
Roivant’s drug discovery capabilities include the leading computational physics-based platform for in silico drug design and optimization as well as machine learning-based models for protein degradation.
The integration of our computational and experimental engines enables the rapid design of molecules with high precision and fidelity to address challenging targets for diseases with high unmet need.
Our current modalities include small molecules, heterobifunctionals and molecular glues.
Roivant Unveils Targeted Protein Degradation Platform
– First therapeutic candidate on track to enter clinical studies in 2021
– Computationally-designed degraders for six targets currently in preclinical development
– Acquisition of Oncopia Therapeutics and research collaboration with lab of Dr. Shaomeng Wang at the University of Michigan to add diverse pipeline of current and future compounds
– Clinical-stage degraders will provide foundation for multiple new Vants in distinct disease areas
– Platform supported by $200 million strategic investment from SK Holdings
Other articles in this Vibrant Philly Biotech Scene on this Online Open Access Journal include: