Funding, Deals & Partnerships: BIOLOGICS & MEDICAL DEVICES; BioMed e-Series; Medicine and Life Sciences Scientific Journal – http://PharmaceuticalIntelligence.com
In this article, I will list 9 free Harvard courses that you can take to learn data science from scratch. Feel free to skip any of these courses if you already possess knowledge of that subject.
Step 1: Programming
The first step you should take when learning data science is to learn to code. You can choose to do this with your choice of programming language?—?ideally Python or R.
If you’d like to learn R, Harvard offers an introductory R course created specifically for data science learners, called Data Science: R Basics.
This program will take you through R concepts like variables, data types, vector arithmetic, and indexing. You will also learn to wrangle data with libraries like dplyr and create plots to visualize data.
If you prefer Python, you can choose to take CS50’s Introduction to Programming with Python offered for free by Harvard. In this course, you will learn concepts like functions, arguments, variables, data types, conditional statements, loops, objects, methods, and more.
Both programs above are self-paced. However, the Python course is more detailed than the R program, and requires a longer time commitment to complete. Also, the rest of the courses in this roadmap are taught in R, so it might be worth learning R to be able to follow along easily.
Step 2: Data Visualization
Visualization is one of the most powerful techniques with which you can translate your findings in data to another person.
With Harvard’s Data Visualization program, you will learn to build visualizations using the ggplot2 library in R, along with the principles of communicating data-driven insights.
Step 3: Probability
In this course, you will learn essential probability concepts that are fundamental to conducting statistical tests on data. The topics taught include random variables, independence, Monte Carlo simulations, expected values, standard errors, and the Central Limit Theorem.
The concepts above will be introduced with the help of a case study, which means that you will be able to apply everything you learned to an actual real-world dataset.
Step 4: Statistics
After learning probability, you can take this course to learn the fundamentals of statistical inference and modelling.
This program will teach you to define population estimates and margin of errors, introduce you to Bayesian statistics, and provide you with the fundamentals of predictive modeling.
Step 5: Productivity Tools (Optional)
I’ve included this project management course as optional since it isn’t directly related to learning data science. Rather, you will be taught to use Unix/Linux for file management, Github, version control, and creating reports in R.
The ability to do the above will save you a lot of time and help you better manage end-to-end data science projects.
Step 6: Data Pre-Processing
The next course in this list is called Data Wrangling, and will teach you to prepare data and convert it into a format that is easily digestible by machine learning models.
You will learn to import data into R, tidy data, process string data, parse HTML, work with date-time objects, and mine text.
As a data scientist, you often need to extract data that is publicly available on the Internet in the form of a PDF document, HTML webpage, or a Tweet. You will not always be presented with clean, formatted data in a CSV file or Excel sheet.
By the end of this course, you will learn to wrangle and clean data to come up with critical insights from it.
Step 7: Linear Regression
Linear regression is a machine learning technique that is used to model a linear relationship between two or more variables. It can also be used to identify and adjust the effect of confounding variables.
This course will teach you the theory behind linear regression models, how to examine the relationship between two variables, and how confounding variables can be detected and removed before building a machine learning algorithm.
Step 8: Machine Learning
Finally, the course you’ve probably been waiting for! Harvard’s machine learning program will teach you the basics of machine learning, techniques to mitigate overfitting, supervised and unsupervised modelling approaches, and recommendation systems.
Step 9: Capstone Project
After completing all the above courses, you can take Harvard’s data science capstone project, where your skills in data visualization, probability, statistics, data wrangling, data organization, regression, and machine learning will be assessed.
With this final project, you will get the opportunity to put together all the knowledge learnt from the above courses and gain the ability to complete a hands-on data science project from scratch.
Note: All the courses above are available on an online learning platform from edX and can be audited for free. If you want a course certificate, however, you will have to pay for one.
Data powers AI. Good data can mean the difference between an impactful solution or one that never gets off the ground. Re-assess the foundational AI questions to ensure your data is working for, not against, you.
Innovation to Reality
The challenges of implementing AI are many. Avoid the common pitfalls with real-world case studies from leaders who have successfully turned their AI solutions into reality.
Harness What’s Possible at the Edge
With its potential for near instantaneous decision making, pioneers are moving AI to the edge. We examine the pros and cons of moving AI decisions to the edge, with the experts getting it right.
Generative AI Solutions
The use of generative AI to boost human creativity is breaking boundaries in creative areas previously untouched by AI. We explore the intersection of data and algorithms enabling collaborative AI processes to design and create.
Data powers AI. Good data can mean the difference between an impactful solution or one that never gets off the ground. Re-assess the foundational AI questions to ensure your data is working for, not against, you.
Data is the most under-valued and de-glamorized aspect of AI. Learn why shifting the focus from model/algorithm development to quality of the data is the next and most efficient, way to improve the decision-making abilities of AI.
Data labeling is key to determining the success or failure of AI applications. Learn how to implement a data-first approach that can transform AI inference, resulting in better models that make better decisions.
Question the status quo. Build stakeholder trust. These are foundational elements of thought leadership in AI. Explore how organizations can use their data and algorithms in ethical and responsible ways while building bigger and more effective systems.
Haniyeh Mahmoudian
Global AI Ethicist, DataRobot
Mainstage Break (10:35 a.m. – 11:05 a.m.)
Networking and refreshments for our live audience and a selection of curated content for those tuning in virtually.
With its next-generation machine learning models fueling precision medicine, French biotech company, Owkin, captured the attention of the pharma industry. Learn how they did it and get tips to navigate the complex task of scaling your innovation.
Networking and refreshments for our live audience.
Innovation to Reality (11:05 a.m. – 12:30 p.m.)
The challenges of implementing AI are many. Avoid the common pitfalls with real-world case studies from leaders who have successfully turned their AI solutions into reality.
Deploying AI in real-world environments benefits from human input before and during implementation. Get an inside look at how organizations can ensure reliable results with the key questions and competing needs that should be considered when implementing AI solutions.
AI is evolving from the research lab into practical real world applications. Learn what issues should be top of mind for businesses, consumers, and researchers as we take a deep dive into AI solutions that increase modern productivity and accelerate intelligence transformation.
Getting AI to work 80% of the time is relatively straightforward, but trustworthy AI requires deployments that work 100% of the time. Unpack some of the biggest challenges that come up when eliminating the 20% gap.
Bali Raghavan
Head of Engineering, Forward
Lunch and Networking Break (12:30 p.m. – 1:30 p.m.)
Lunch served at the MIT Media Lab and a selection of curated content for those tuning in virtually.
Harness What’s Possible at the Edge (1:30 p.m. – 3:15 p.m.)
With its potential for near instantaneous decision making, pioneers are moving AI to the edge. We examine the pros and cons of moving AI decisions to the edge, with the experts getting it right.
To create sustainable business impact, AI capabilities need to be tailored and optimized to an industry or organization’s specific requirements and infrastructure model. Hear how customers’ challenges across industries can be addressed in any compute environment from the cloud to the edge with end-to-end hardware and software optimization.
Kavitha Prasad
VP & GM, Datacenter, AI and Cloud Execution and Strategy, Intel Corporation
Decision making has moved from the edge to the cloud before settling into a hybrid setup for many AI systems. Through the examination of key use-cases, take a deep dive into understanding the benefits and detractors of operating a machine-learning system at the point of inference.
Enable your organization to transform customer experiences through AI at the edge. Learn about the required technologies, including teachable and self-learning AI, that are needed for a successful shift to the edge, and hear how deploying these technologies at scale can unlock richer, more responsive experiences.
Reimagine AI solutions as a unified system, instead of individual components. Through the lens of autonomous vehicles, discover the pros and cons of using an all-inclusive AI-first approach that includes AI decision-making at the edge and see how this thinking can be applied across industry.
Raquel Urtasun
Founder & CEO, Waabi
Mainstage Break (3:15 p.m. – 3:45 p.m.)
Networking and refreshments for our live audience and a selection of curated content for those tuning in virtually.
Advances in machine learning are enabling artists and creative technologists to think about and use AI in new ways. Discuss the concept of creative AI and look at project examples from London’s art scene that illustrate the various ways creative AI is bridging the gap between the traditional art world and the latest technological innovations.
Luba Elliott
Curator, Producer, and Researcher, Creative AI
Generative AI Solutions (3:45 p.m. – 5:10 p.m.)
The use of generative AI to boost human creativity is breaking boundaries in creative areas previously untouched by AI. We explore the intersection of data and algorithms enabling collaborative AI processes to design and create.
Change the design problem with AI. The creative nature of generative AI enhances design capabilities, finding efficiencies and opportunities that humans alone might not conceive. Explore business applications including project planning, construction, and physical design.
Deep learning is data hungry technology. Manually labelled training data has become cost prohibitive and time-consuming. Get a glimpse at how interactive large-scale synthetic data generation can accelerate the AI revolution, unlocking the potential of data-driven artificial intelligence.
Danny Lange
SVP of Artificial Intelligence, Unity Technologies
Push beyond the typical uses of AI. Explore the nexus of art, technology, and human creativity through the unique innovation of kinetic data sculptures that use machines to give physical context and shape to data to rethink how we engage with the physical world.
Refik Anadol
CEO, RAS Lab; Lecturer, UCLA
Last Call with the Editors (5:10 p.m. – 5:20 p.m.)
Before we wrap day 1, join our last call with all of our editors to get their analysis on the day’s topics, themes, and guests.
Networking Reception (5:20 p.m. – 6:20 p.m.)
WEDNESDAY, MARCH 30
Evolving the Algorithms
What’s Next for Deep Learning
Deep learning algorithms have powered most major AI advances of the last decade. We bring you into the top innovation labs to see how they are advancing their deep learning models to find out just how much more we can get out of these algorithms.
AI in Day-To-Day Business
Many organizations are already using AI internally in their day-to-day operations, in areas like cybersecurity, customer service, finance, and manufacturing. We examine the tools that organizations are using when putting AI to work.
Making AI Work for All
As AI increasingly underpins our lives, businesses, and society, we must ensure that AI must work for everyone – not just those represented in datasets, and not just 80% of the time. Examine the challenges and solutions needed to ensure AI works fairly, for all.
Envisioning the Next AI
Some business problems can’t be solved with current deep learning methods. We look at what’s around the corner at the new approaches and most revolutionary ideas propelling us toward the next stage in AI evolution.
Day 2: Evolving the Algorithms (9:00 a.m. – 5:25 p.m.)
What’s Next for Deep Learning (9:10 a.m. – 10:25 a.m.)
Deep learning algorithms have powered most major AI advances of the last decade. We bring you into the top innovation labs to see how they are advancing their deep learning models to find out just how much more we can get out of these algorithms.
Transformer-based language models are revolutionizing the way neural networks process natural language. This deep dive looks at how organizations can put their data to work using transformer models. We consider the problems that business may face as these massive models mature, including training needs, managing parallel processing at scale, and countering offensive data.
Critical thinking may be one step closer for AI by combining large-scale transformers with smart sampling and filtering. Get an early look at how AlphaCode’s entry into competitive programming may lead to a human-like capacity for AI to write original code that solves unforeseen problems.
As advanced AI systems gain greater capabilities in our search for artificial general intelligence, it’s critical to teach them how to understand human intentions. Look at the latest advancements in AI systems and how to ensure they can be truthful, helpful, and safe.
Mira Murati
SVP, Research, Product, & Partnerships, OpenAI
Mainstage Break (10:25 a.m. – 10:55 a.m.)
Networking and refreshments for our live audience and a selection of curated content for those tuning in virtually.
Good data is the bedrock of a self-service data consumption model, which in turn unlocks insights, analytics, personalization at scale through AI. Yet many organizations face immense challenges setting up a robust data foundation. Dive into a pragmatic perspective on abstracting the complexity and untangling the conflicts in data management for better AI.
Naveen Kamat
Executive Director, Data and AI Services, Kyndryl
AI in Day-To-Day Business (10:55 a.m. – 12:20 p.m.)
Many organizations are already using AI internally in their day-to-day operations, in areas like cybersecurity, customer service, finance, and manufacturing. We examine the tools that organizations are using when putting AI to work.
Effectively operationalized AI/ML can unlock untapped potential in your organization. From enhancing internal processes to managing the customer experience, get the pragmatic advice and takeaways leaders need to better understand their internal data to achieve impactful results.
Use AI to maximize reliability of supply chains. Learn the dos and don’ts to managing key processes within your supply chain, including workforce management, streamlining and simplification, and reaping the full value of your supply chain solutions.
Darcy MacClaren
Senior Vice President, Digital Supply Chain, SAP North America
Machine and reinforcement learning enable Spotify to deliver the right content to the right listener at the right time, allowing for personalized listening experiences that facilitate discovery at a global scale. Through user interactions, algorithms suggest new content and creators that keep customers both happy and engaged with the platform. Dive into the details of making better user recommendations.
Tony Jebara
VP of Engineering and Head of Machine Learning, Spotify
Lunch and Networking Break (12:20 p.m. – 1:15 p.m.)
Lunch served at the MIT Media Lab and a selection of curated content for those tuning in virtually.
Making AI Work for All (1:15 p.m. – 2:35 p.m.)
As AI increasingly underpins our lives, businesses, and society, we must ensure that AI must work for everyone – not just those represented in datasets, and not just 80% of the time. Examine the challenges and solutions needed to ensure AI works fairly, for all.
Walk through the practical steps to map and understand the nuances, outliers, and special cases in datasets. Get tips to ensure ethical and trustworthy approaches to training AI systems that grow in scope and scale within a business.
Lauren Bennett
Group Software Engineering Lead, Spatial Analysis and Data Science, Esri
Get an inside look at the long- and short-term benefits of addressing inequities in AI opportunities, ranging from educating the tech youth of the future to a 10,000-foot view on what it will take to ensure that equity top is of mind within society and business alike.
Public policies can help to make AI more equitable and ethical for all. Examine how policies could impact corporations and what it means for building internal policies, regardless of what government adopts. Identify actionable ideas to best move policies forward for the widest benefit to all.
Nicol Turner Lee
Director, Center for Technology Innovation, Brookings Institution
Mainstage Break (2:35 p.m. – 3:05 p.m.)
Networking and refreshments for our live audience and a selection of curated content for those tuning in virtually.
From the U.S. to China, the global robo-taxi race is gaining traction with consumers and regulators alike. Go behind the scenes with AutoX – a Level 4 driving technology company – and hear how it overcame obstacles while launching the world’s second and China’s first public, fully driverless robo-taxi service.
Jianxiong Xiao
Founder and CEO, AutoX
Envisioning the Next AI (3:05 p.m. – 4:50 p.m.)
Some business problems can’t be solved with current deep learning methods. We look at what’s around the corner at the new approaches and most revolutionary ideas propelling us toward the next stage in AI evolution.
The use of AI in finance is gaining traction as organizations realize the advantages of using algorithms to streamline and improve the accuracy of financial tasks. Step through use cases that examine how AI can be used to minimize financial risk, maximize financial returns, optimize venture capital funding by connecting entrepreneurs to the right investors; and more.
Sameena Shah
Managing Director, J.P. Morgan AI Research, JP Morgan Chase
In a study of simulated robotic evolution, it was observed that more complex environments and evolutionary changes to the robot’s physical form accelerated the growth of robot intelligence. Examine this cutting-edge research and decipher what this early discovery means for the next generation of AI and robotics.
Agrim Gupta
PhD Student, Stanford Vision and Learning Lab, Stanford University
Understanding human thinking and reasoning processes could lead to more general, flexible and human-like artificial intelligence. Take a close look at the research building AI inspired by human common-sense that could create a new generation of tools for complex decision-making.
Zenna Tavares
Research Scientist, Columbia University; Co-Founder, Basis
Look under the hood at this innovative approach to AI learning with multi-agent and human-AI interactions. Discover how bots work together and learn together through personal interactions. Recognize the future implications for AI, plus the benefits and obstacles that may come from this new process.
David Ferrucci was the principal investigator for the team that led IBM Watson to its landmark Jeopardy success, awakening the world to the possibilities of AI. We pull back the curtain on AI for a wide-ranging discussion on explicable models, and the next generation of human and machine collaboration creating AI thought partners with limitless applications.
AI enabled Drug Discovery and Development: The Challenges and the Promise
Reporter:Aviva Lev-Ari, PhD, RN
Early Development
Caroline Kovac (the first IBM GM of Life Sciences) is the one who started in silico development of drugs in 2000 using a big db of substances and computer power. She transformed an idea into $2b business. Most of the money was from big pharma. She was asking what is are the new drugs they are planning to develop and provided the four most probable combinations of substances, based on in Silicon work.
Carol Kovac
General Manager, Healthcare and Life Sciences, IBM
from speaker at conference on 2005
Carol Kovac is General Manager of IBM Healthcare and Life Sciences responsible for the strategic direction of IBM′s global healthcare and life sciences business. Kovac leads her team in developing the latest information technology solutions and services, establishing partnerships and overseeing IBM investment within the healthcare, pharmaceutical and life sciences markets. Starting with only two employees as an emerging business unit in the year 2000, Kovac has successfully grown the life sciences business unit into a multi-billion dollar business and one of IBM′s most successful ventures to date with more than 1500 employees worldwide. Kovac′s prior positions include general manager of IBM Life Sciences, vice president of Technical Strategy and Division Operations, and vice president of Services and Solutions. In the latter role, she was instrumental in launching the Computational Biology Center at IBM Research. Kovac sits on the Board of Directors of Research!America and Africa Harvest. She was inducted into the Women in Technology International Hall of Fame in 2002, and in 2004, Fortune magazine named her one of the 50 most powerful women in business. Kovac earned her Ph.D. in chemistry at the University of Southern California.
The use of artificial intelligence in drug discovery, when coupled with new genetic insights and the increase of patient medical data of the last decade, has the potential to bring novel medicines to patients more efficiently and more predictably.
Jack Fuchs, MBA ’91, an adjunct lecturer who teaches “Principled Entrepreneurial Decisions” at Stanford School of Engineering, moderated and explored how clearly articulated principles can guide the direction of technological advancements like AI-enabled drug discovery.
Kim Branson, Global head of AI and machine learning at GSK.
Russ Altman, the Kenneth Fong Professor of Bioengineering, of genetics, of medicine (general medical discipline), of biomedical data science and, by courtesy, of computer science.
Synthetic Biology Software applied to development of Galectins Inhibitors at LPBI Group
Using Structural Computation Models to Predict Productive PROTAC Ternary Complexes
Ternary complex formation is necessary but not sufficient for target protein degradation. In this research, Bai et al. have addressed questions to better understand the rate-limiting steps between ternary complex formation and target protein degradation. They have developed a structure-based computer model approach to predict the efficiency and sites of target protein ubiquitination by CRNB-binding PROTACs. Such models will allow a more complete understanding of PROTAC-directed degradation and allow crafting of increasingly effective and specific PROTACs for therapeutic applications.
Another major feature of this research is that it a result of collaboration between research groups at Amgen, Inc. and Promega Corporation. In the past commercial research laboratories have shied away from collaboration, but the last several years have found researchers more open to collaborative work. This increased collaboration allows scientists to bring their different expertise to a problem or question and speed up discovery. According to Dr. Kristin Riching, Senior Research Scientist at Promega Corporation, “Targeted protein degraders have broken many of the rules that have guided traditional drug development, but it is exciting to see how the collective learnings we gain from their study can aid the advancement of this new class of molecules to the clinic as effective therapeutics.”
Medical Startups – Artificial Intelligence (AI) Startups in Healthcare
Reporters: Stephen J. Williams, PhD and Aviva Lev-Ari, PhD, RN and Shraga Rottem, MD, DSc,
The motivation for this post is two fold:
First, we are presenting an application of AI, NLP, DL to our own medical text in the Genomics space. Here we present the first section of Part 1 in the following book. Part 1 has six subsections that yielded 12 plots. The entire Book is represented by 38 x 2 = 76 plots.
Second, we bring to the attention of the e-Reader the list of 276 Medical Startups – Artificial Intelligence (AI) Startups in Healthcare as a hot universe of R&D activity in Human Health.
Third, to highlight one academic center with an AI focus
Dear friends of the ETH AI Center,
We would like to provide you with some exciting updates from the ETH AI Center and its growing community.
As the Covid-19 restrictions in Switzerland have recently been lifted, we would like to hear from you what kind of events you would like to see in 2022! Participate in the survey to suggest event formats and topics that you would enjoy being a part of. We are already excited to learn what we can achieve together this year.
We already have many interesting events coming up, we look forward to seeing you at our main and community events!
LPBI Group is applying AI for Medical Text Analysis with Machine Learning and Natural Language Processing: Statistical and Deep Learning
Our Book
Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS & BioInformatics, Simulations and the Genome Ontology
Medical Text Analysis of this Books shows the following results obtained by Madison Davis by applying Wolfram NLP for Biological Languages on our own Text. See below an Example:
@MIT Artificial intelligence system rapidly predicts how two proteins will attach: The model called Equidock, focuses on rigid body docking — which occurs when two proteins attach by rotating or translating in 3D space, but their shapes don’t squeeze or bend
Reporter: Aviva Lev-Ari, PhD, RN
This paper introduces a novel SE(3) equivariant graph matching network, along with a keypoint discovery and alignment approach, for the problem of protein-protein docking, with a novel loss based on optimal transport. The overall consensus is that this is an impactful solution to an important problem, whereby competitive results are achieved without the need for templates, refinement, and are achieved with substantially faster run times.
Keywords:protein complexes, protein structure, rigid body docking, SE(3) equivariance, graph neural networks
Abstract: Protein complex formation is a central problem in biology, being involved in most of the cell’s processes, and essential for applications such as drug design or protein engineering. We tackle rigid body protein-protein docking, i.e., computationally predicting the 3D structure of a protein-protein complex from the individual unbound structures, assuming no three-dimensional flexibility during binding. We design a novel pairwise-independent SE(3)-equivariant graph matching network to predict the rotation and translation to place one of the proteins at the right location and the right orientation relative to the second protein. We mathematically guarantee that the predicted complex is always identical regardless of the initial placements of the two structures, avoiding expensive data augmentation. Our model approximates the binding pocket and predicts the docking pose using keypoint matching and alignment through optimal transport and a differentiable Kabsch algorithm. Empirically, we achieve significant running time improvements over existing protein docking software and predict qualitatively plausible protein complex structures despite not using heavy sampling, structure refinement, or templates.
One-sentence Summary: We perform rigid protein docking using a novel independent SE(3)-equivariant message passing mechanism that guarantees the same resulting protein complex independent of the initial placement of the two 3D structures.
MIT researchers created a machine-learning model that can directly predict the complex that will form when two proteins bind together. Their technique is between 80 and 500 times faster than state-of-the-art software methods, and often predicts protein structures that are closer to actual structures that have been observed experimentally.
This technique could help scientists better understand some biological processes that involve protein interactions, like DNA replication and repair; it could also speed up the process of developing new medicines.
“Deep learning is very good at capturing interactions between different proteins that are otherwise difficult for chemists or biologists to write experimentally. Some of these interactions are very complicated, and people haven’t found good ways to express them. This deep-learning model can learn these types of interactions from data,” says Octavian-Eugen Ganea, a postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and co-lead author of the paper.
Ganea’s co-lead author is Xinyuan Huang, a graduate student at ETH Zurich. MIT co-authors include Regina Barzilay, the School of Engineering Distinguished Professor for AI and Health in CSAIL, and Tommi Jaakkola, the Thomas Siebel Professor of Electrical Engineering in CSAIL and a member of the Institute for Data, Systems, and Society. The research will be presented at the International Conference on Learning Representations.
Significance of the Scientific Development by the @MIT Team
EquiDock wide applicability:
Our method can be integrated end-to-end to boost the quality of other models (see above discussion on runtime importance). Examples are predicting functions of protein complexes [3] or their binding affinity [5], de novo generation of proteins binding to specific targets (e.g., antibodies [6]), modeling back-bone and side-chain flexibility [4], or devising methods for non-binary multimers. See the updated discussion in the “Conclusion” section of our paper.
Advantages over previous methods:
Our method does not rely on templates or heavy candidate sampling [7], aiming at the ambitious goal of predicting the complex pose directly. This should be interpreted in terms of generalization (to unseen structures) and scalability capabilities of docking models, as well as their applicability to various other tasks (discussed above).
Our method obtains a competitive quality without explicitly using previous geometric (e.g., 3D Zernike descriptors [8]) or chemical (e.g., hydrophilic information) features [3]. Future EquiDock extensions would find creative ways to leverage these different signals and, thus, obtain more improvements.
Novelty of theory:
Our work is the first to formalize the notion of pairwise independent SE(3)-equivariance. Previous work (e.g., [9,10]) has incorporated only single object Euclidean-equivariances into deep learning models. For tasks such as docking and binding of biological objects, it is crucial that models understand the concept of multi-independent Euclidean equivariances.
All propositions in Section 3 are our novel theoretical contributions.
We have rewritten the Contribution and Related Work sections to clarify this aspect.
Footnote [a]: We have fixed an important bug in the cross-attention code. We have done a more extensive hyperparameter search and understood that layer normalization is crucial in layers used in Eqs. 5 and 9, but not on the h embeddings as it was originally shown in Eq. 10. We have seen benefits from training our models with a longer patience in the early stopping criteria (30 epochs for DIPS and 150 epochs for DB5). Increasing the learning rate to 2e-4 is important to speed-up training. Using an intersection loss weight of 10 leads to improved results compared to the default of 1.
Bibliography:
[1] Protein-ligand blind docking using QuickVina-W with inter-process spatio-temporal integration, Hassan et al., 2017
[2] GNINA 1.0: molecular docking with deep learning, McNutt et al., 2021
[3] Protein-protein and domain-domain interactions, Kangueane and Nilofer, 2018
[4] Side-chain Packing Using SE(3)-Transformer, Jindal et al., 2022
[5] Contacts-based prediction of binding affinity in protein–protein complexes, Vangone et al., 2015
[6] Iterative refinement graph neural network for antibody sequence-structure co-design, Jin et al., 2021
[7] Hierarchical, rotation-equivariant neural networks to select structural models of protein complexes, Eismann et al, 2020
[8] Protein-protein docking using region-based 3D Zernike descriptors, Venkatraman et al., 2009
[9] SE(3)-transformers: 3D roto-translation equivariant attention networks, Fuchs et al, 2020
[10] E(n) equivariant graph neural networks, Satorras et al., 2021
[11] Fast end-to-end learning on protein surfaces, Sverrisson et al., 2020
From: Heidi Rheim et al. GA4GH: International policies and standards for data sharing across genomic research and healthcare. (2021): Cell Genomics, Volume 1 Issue 2.
Siloing genomic data in institutions/jurisdictions limits learning and knowledge
GA4GH policy frameworks enable responsible genomic data sharing
GA4GH technical standards ensure interoperability, broad access, and global benefits
Data sharing across research and healthcare will extend the potential of genomics
Summary
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.
In order for genomic and personalized medicine to come to fruition it is imperative that data siloes around the world are broken down, allowing the international collaboration for the collection, storage, transferring, accessing and analying of molecular and health-related data.
We had talked on this site in numerous articles about the problems data siloes produce. By data siloes we are meaning that collection and storage of not only DATA but intellectual thought are being held behind physical, electronic, and intellectual walls and inacessible to other scientisits not belonging either to a particular institituion or even a collaborative network.
Standardization and harmonization of data is key to this effort to sharing electronic records. The EU has taken bold action in this matter. The following section is about the General Data Protection Regulation of the EU and can be found at the following link:
The data protection package adopted in May 2016 aims at making Europe fit for the digital age. More than 90% of Europeans say they want the same data protection rights across the EU and regardless of where their data is processed.
The General Data Protection Regulation (GDPR)
Regulation (EU) 2016/679 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data. This text includes the corrigendum published in the OJEU of 23 May 2018.
The regulation is an essential step to strengthen individuals’ fundamental rights in the digital age and facilitate business by clarifying rules for companies and public bodies in the digital single market. A single law will also do away with the current fragmentation in different national systems and unnecessary administrative burdens.
Directive (EU) 2016/680 on the protection of natural persons regarding processing of personal data connected with criminal offences or the execution of criminal penalties, and on the free movement of such data.
The directive protects citizens’ fundamental right to data protection whenever personal data is used by criminal law enforcement authorities for law enforcement purposes. It will in particular ensure that the personal data of victims, witnesses, and suspects of crime are duly protected and will facilitate cross-border cooperation in the fight against crime and terrorism.
The directive entered into force on 5 May 2016 and EU countries had to transpose it into their national law by 6 May 2018.
The following paper by the organiztion The Global Alliance for Genomics and Health discusses these types of collaborative efforts to break down data silos in personalized medicine. This organization has over 2000 subscribers in over 90 countries encompassing over 60 organizations.
Enabling responsible genomic data sharing for the benefit of human health
The Global Alliance for Genomics and Health (GA4GH) is a policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework.
he Global Alliance for Genomics and Health (GA4GH) is an international, nonprofit alliance formed in 2013 to accelerate the potential of research and medicine to advance human health. Bringing together 600+ leading organizations working in healthcare, research, patient advocacy, life science, and information technology, the GA4GH community is working together to create frameworks and standards to enable the responsible, voluntary, and secure sharing of genomic and health-related data. All of our work builds upon the Framework for Responsible Sharing of Genomic and Health-Related Data.
GA4GH Connect is a five-year strategic plan that aims to drive uptake of standards and frameworks for genomic data sharing within the research and healthcare communities in order to enable responsible sharing of clinical-grade genomic data by 2022. GA4GH Connect links our Work Streams with Driver Projects—real-world genomic data initiatives that help guide our development efforts and pilot our tools.
The Global Alliance for Genomics and Health (GA4GH) is a worldwide alliance of genomics researchers, data scientists, healthcare practitioners, and other stakeholders. We are collaborating to establish policy frameworks and technical standards for responsible, international sharing of genomic and other molecular data as well as related health data. Founded in 2013,3 the GA4GH community now consists of more than 1,000 individuals across more than 90 countries working together to enable broad sharing that transcends the boundaries of any single institution or country (see https://www.ga4gh.org).In this perspective, we present the strategic goals of GA4GH and detail current strategies and operational approaches to enable responsible sharing of clinical and genomic data, through both harmonized data aggregation and federated approaches, to advance genomic medicine and research. We describe technical and policy development activities of the eight GA4GH Work Streams and implementation activities across 24 real-world genomic data initiatives (“Driver Projects”). We review how GA4GH is addressing the major areas in which genomics is currently deployed including rare disease, common disease, cancer, and infectious disease. Finally, we describe differences between genomic sequence data that are generated for research versus healthcare purposes, and define strategies for meeting the unique challenges of responsibly enabling access to data acquired in the clinical setting.
GA4GH organization
GA4GH has partnered with 24 real-world genomic data initiatives (Driver Projects) to ensure its standards are fit for purpose and driven by real-world needs. Driver Projects make a commitment to help guide GA4GH development efforts and pilot GA4GH standards (see Table 2). Each Driver Project is expected to dedicate at least two full-time equivalents to GA4GH standards development, which takes place in the context of GA4GH Work Streams (see Figure 1). Work Streams are the key production teams of GA4GH, tackling challenges in eight distinct areas across the data life cycle (see Box 1). Work Streams consist of experts from their respective sub-disciplines and include membership from Driver Projects as well as hundreds of other organizations across the international genomics and health community.
Figure 1Matrix structure of the Global Alliance for Genomics and HealthShow full caption
Box 1GA4GH Work Stream focus areasThe GA4GH Work Streams are the key production teams of the organization. Each tackles a specific area in the data life cycle, as described below (URLs listed in the web resources).
(1)Data use & researcher identities: Develops ontologies and data models to streamline global access to datasets generated in any country9,10
(2)Genomic knowledge standards: Develops specifications and data models for exchanging genomic variant observations and knowledge18
(3)Cloud: Develops federated analysis approaches to support the statistical rigor needed to learn from large datasets
(4)Data privacy & security: Develops guidelines and recommendations to ensure identifiable genomic and phenotypic data remain appropriately secure without sacrificing their analytic potential
(5)Regulatory & ethics: Develops policies and recommendations for ensuring individual-level data are interoperable with existing norms and follow core ethical principles
(6)Discovery: Develops data models and APIs to make data findable, accessible, interoperable, and reusable (FAIR)
(7)Clinical & phenotypic data capture & exchange: Develops data models to ensure genomic data is most impactful through rich metadata collected in a standardized way
(8)Large-scale genomics: Develops APIs and file formats to ensure harmonized technological platforms can support large-scale computing
For more articles on Open Access, Science 2.0, and Data Networks for Genomics on this Open Access Scientific Journal see:
The Vibrant Philly Biotech Scene: Proteovant Therapeutics Using Artificial Intelligence and Machine Learning to Develop PROTACs
Reporter:Stephen J. Williams, Ph.D.
It has been a while since I have added to this series but there have been a plethora of exciting biotech startups in the Philadelphia area, and many new startups combining technology, biotech, and machine learning. One such exciting biotech is Proteovant Therapeutics, which is combining the new PROTAC (Proteolysis-Targeting Chimera) technology with their in house ability to utilize machine learning and artificial intelligence to design these types of compounds to multiple intracellular targets.
PROTACs (which actually is under a trademark name of Arvinus Operations, but is also refered to as Protein Degraders. These PROTACs take advantage of the cell protein homeostatic mechanism of ubiquitin-mediated protein degradation, which is a very specific targeted process which regulates protein levels of various transcription factors, protooncogenes, and receptors. In essence this regulated proteolyic process is needed for normal cellular function, and alterations in this process may lead to oncogenesis, or a proteotoxic crisis leading to mitophagy, autophagy and cellular death. The key to this technology is using chemical linkers to associate an E3 ligase with a protein target of interest. E3 ligases are the rate limiting step in marking the proteins bound for degradation by the proteosome with ubiquitin chains.
A review of this process as well as PROTACs can be found elsewhere in articles (and future articles) on this Open Access Journal.
Protevant have made two important collaborations:
Oncopia Therapeutics: came out of University of Michigan Innovation Hub and lab of Shaomeng Wang, who developed a library of BET and MDM2 based protein degraders. In 2020 was aquired by Riovant Sciences.
Riovant Sciences: uses computer aided design of protein degraders
Proteovant Company Description:
Proteovant is a newly launched development-stage biotech company focusing on discovery and development of disease-modifying therapies by harnessing natural protein homeostasis processes. We have recently acquired numerous assets at discovery and development stages from Oncopia, a protein degradation company. Our lead program is on track to enter IND in 2021. Proteovant is building a strong drug discovery engine by combining deep drugging expertise with innovative platforms including Roivant’s AI capabilities to accelerate discovery and development of protein degraders to address unmet needs across all therapeutic areas. The company has recently secured $200M funding from SK Holdings in addition to investment from Roivant Sciences. Our current therapeutic focus includes but is not limited to oncology, immunology and neurology. We remain agnostic to therapeutic area and will expand therapeutic focus based on opportunity. Proteovant is expanding its discovery and development teams and has multiple positions in biology, chemistry, biochemistry, DMPK, bioinformatics and CMC at many levels. Our R&D organization is located close to major pharmaceutical companies in Eastern Pennsylvania with a second site close to biotech companies in Boston area.
The ubiquitin proteasome system (UPS) is responsible for maintaining protein homeostasis. Targeted protein degradation by the UPS is a cellular process that involves marking proteins and guiding them to the proteasome for destruction. We leverage this physiological cellular machinery to target and destroy disease-causing proteins.
Unlike traditional small molecule inhibitors, our approach is not limited by the classic “active site” requirements. For example, we can target transcription factors and scaffold proteins that lack a catalytic pocket. These classes of proteins, historically, have been very difficult to drug. Further, we selectively degrade target proteins, rather than isozymes or paralogous proteins with high homology. Because of the catalytic nature of the interactions, it is possible to achieve efficacy at lower doses with prolonged duration while decreasing dose-limiting toxicities.
Biological targets once deemed “undruggable” are now within reach.
Roivant develops transformative medicines faster by building technologies and developing talent in creative ways, leveraging the Roivant platform to launch “Vants” – nimble and focused biopharmaceutical and health technology companies. These Vants include Proteovant but also Dermovant, ImmunoVant,as well as others.
Roivant’s drug discovery capabilities include the leading computational physics-based platform for in silico drug design and optimization as well as machine learning-based models for protein degradation.
The integration of our computational and experimental engines enables the rapid design of molecules with high precision and fidelity to address challenging targets for diseases with high unmet need.
Our current modalities include small molecules, heterobifunctionals and molecular glues.
Roivant Unveils Targeted Protein Degradation Platform
– First therapeutic candidate on track to enter clinical studies in 2021
– Computationally-designed degraders for six targets currently in preclinical development
– Acquisition of Oncopia Therapeutics and research collaboration with lab of Dr. Shaomeng Wang at the University of Michigan to add diverse pipeline of current and future compounds
– Clinical-stage degraders will provide foundation for multiple new Vants in distinct disease areas
– Platform supported by $200 million strategic investment from SK Holdings
Other articles in this Vibrant Philly Biotech Scene on this Online Open Access Journal include:
A laboratory for the use of AI for drug development has been launched in collaboration with Pfizer, Teva, AstraZeneca, Mark and Amazon
Reporter: Aviva Lev-Ari, PhD, RN
AION Labs unites pharma, technology and funds companies including IBF to invest in startups to integrate developments in cloud computing and artificial intelligence to improve drug development capabilities. An alliance of four leading pharmaceutical companies – AION Labs , the first innovation lab of its kind in the world and a pioneer in the process of adopting cloud technologies, artificial intelligence and computer science to solve the R&D challenges of the pharma industry, today announces its launch. AstraZeneca , Mark , Pfizer and Teva – and two leading companies in the field of high-tech and biotech investments, respectively – AWS ( Amazon Web Services Inc ) and the Israeli investment fund IBF ( Israel Biotech Fund ) – which joined together to establish groundbreaking ventures Through artificial intelligence and computer science to change the way new therapies are discovered and developed. “We are excited to launch the new innovation lab in favor of discoveries of drugs and medical devices using groundbreaking computational tools,” said Matti Gil, CEO of AION Labs. We are prepared and ready to make a difference in the process of therapeutic discoveries and their development. With a strong pool of talent from Israel and the world, cloud technology and artificial intelligence at the heart of our activities and a significant commitment by the State of Israel, we are ready to contribute to the health and well-being of the human race and promote industry in Israel. I thank the partners for the trust, and it is an honor for me to lead such a significant initiative. ” In addition, AION Labs has announced a strategic partnership with X BioMed , an independent biomedical research institute operating in Heidelberg, Germany. BioMed X has a proven track record in advancing research innovations in the field of biomedicine at the interface between academic research and the pharmaceutical industry. BioMed X’s innovation model, based on global mass sourcing and incubators to cultivate the most brilliant talent and ideas, will serve as the R & D engine to drive AION Labs’ enterprise model.
Science Policy Forum: Should we trust healthcare explanations from AI predictive systems?
Some in industry voice their concerns
Curator: Stephen J. Williams, PhD
Post on AI healthcare and explainable AI
In a Policy Forum article in Science “Beware explanations from AI in health care”, Boris Babic, Sara Gerke, Theodoros Evgeniou, and Glenn Cohen discuss the caveats on relying on explainable versus interpretable artificial intelligence (AI) and Machine Learning (ML) algorithms to make complex health decisions. The FDA has already approved some AI/ML algorithms for analysis of medical images for diagnostic purposes. These have been discussed in prior posts on this site, as well as issues arising from multi-center trials. The authors of this perspective article argue that choice of type of algorithm (explainable versus interpretable) algorithms may have far reaching consequences in health care.
Summary
Artificial intelligence and machine learning (AI/ML) algorithms are increasingly developed in health care for diagnosis and treatment of a variety of medical conditions (1). However, despite the technical prowess of such systems, their adoption has been challenging, and whether and how much they will actually improve health care remains to be seen. A central reason for this is that the effectiveness of AI/ML-based medical devices depends largely on the behavioral characteristics of its users, who, for example, are often vulnerable to well-documented biases or algorithmic aversion (2). Many stakeholders increasingly identify the so-called black-box nature of predictive algorithms as the core source of users’ skepticism, lack of trust, and slow uptake (3, 4). As a result, lawmakers have been moving in the direction of requiring the availability of explanations for black-box algorithmic decisions (5). Indeed, a near-consensus is emerging in favor of explainable AI/ML among academics, governments, and civil society groups. Many are drawn to this approach to harness the accuracy benefits of noninterpretable AI/ML such as deep learning or neural nets while also supporting transparency, trust, and adoption. We argue that this consensus, at least as applied to health care, both overstates the benefits and undercounts the drawbacks of requiring black-box algorithms to be explainable.
Types of AI/ML Algorithms: Explainable and Interpretable algorithms
Interpretable AI: A typical AI/ML task requires constructing algorithms from vector inputs and generating an output related to an outcome (like diagnosing a cardiac event from an image). Generally the algorithm has to be trained on past data with known parameters. When an algorithm is called interpretable, this means that the algorithm uses a transparent or “white box” function which is easily understandable. Such example might be a linear function to determine relationships where parameters are simple and not complex. Although they may not be as accurate as the more complex explainable AI/ML algorithms, they are open, transparent, and easily understood by the operators.
Explainable AI/ML: This type of algorithm depends upon multiple complex parameters and takes a first round of predictions from a “black box” model then uses a second algorithm from an interpretable function to better approximate outputs of the first model. The first algorithm is trained not with original data but based on predictions resembling multiple iterations of computing. Therefore this method is more accurate or deemed more reliable in prediction however is very complex and is not easily understandable. Many medical devices that use an AI/ML algorithm use this type. An example is deep learning and neural networks.
The purpose of both these methodologies is to deal with problems of opacity, or that AI predictions based from a black box undermines trust in the AI.
For a deeper understanding of these two types of algorithms see here:
How interpretability is different from explainability
Why a model might need to be interpretable and/or explainable
Who is working to solve the black box problem—and how
What is interpretability?
Does Chipotle make your stomach hurt? Does loud noise accelerate hearing loss? Are women less aggressive than men? If a machine learning model can create a definition around these relationships, it is interpretable.
All models must start with a hypothesis. Human curiosity propels a being to intuit that one thing relates to another. “Hmm…multiple black people shot by policemen…seemingly out of proportion to other races…something might be systemic?” Explore.
People create internal models to interpret their surroundings. In the field of machine learning, these models can be tested and verified as either accurate or inaccurate representations of the world.
Interpretability means that the cause and effect can be determined.
What is explainability?
ML models are often called black-box models because they allow a pre-set number of empty parameters, or nodes, to be assigned values by the machine learning algorithm. Specifically, the back-propagation step is responsible for updating the weights based on its error function.
To predict when a person might die—the fun gamble one might play when calculating a life insurance premium, and the strange bet a person makes against their own life when purchasing a life insurance package—a model will take in its inputs, and output a percent chance the given person has at living to age 80.
Below is an image of a neural network. The inputs are the yellow; the outputs are the orange. Like a rubric to an overall grade, explainability shows how significant each of the parameters, all the blue nodes, contribute to the final decision.
In this neural network, the hidden layers (the two columns of blue dots) would be the black box.
For example, we have these data inputs:
Age
BMI score
Number of years spent smoking
Career category
If this model had high explainability, we’d be able to say, for instance:
The career category is about 40% important
The number of years spent smoking weighs in at 35% important
The age is 15% important
The BMI score is 10% important
Explainability: important, not always necessary
Explainability becomes significant in the field of machine learning because, often, it is not apparent. Explainability is often unnecessary. A machine learning engineer can build a model without ever having considered the model’s explainability. It is an extra step in the building process—like wearing a seat belt while driving a car. It is unnecessary for the car to perform, but offers insurance when things crash.
The benefit a deep neural net offers to engineers is it creates a black box of parameters, like fake additional data points, that allow a model to base its decisions against. These fake data points go unknown to the engineer. The black box, or hidden layers, allow a model to make associations among the given data points to predict better results. For example, if we are deciding how long someone might have to live, and we use career data as an input, it is possible the model sorts the careers into high- and low-risk career options all on its own.
Perhaps we inspect a node and see it relates oil rig workers, underwater welders, and boat cooks to each other. It is possible the neural net makes connections between the lifespan of these individuals and puts a placeholder in the deep net to associate these. If we were to examine the individual nodes in the black box, we could note this clustering interprets water careers to be a high-risk job.
In the previous chart, each one of the lines connecting from the yellow dot to the blue dot can represent a signal, weighing the importance of that node in determining the overall score of the output.
If that signal is high, that node is significant to the model’s overall performance.
If that signal is low, the node is insignificant.
With this understanding, we can define explainability as:
Knowledge of what one node represents and how important it is to the model’s performance.
So how does choice of these two different algorithms make a difference with respect to health care and medical decision making?
The authors argue:
“Regulators like the FDA should focus on those aspects of the AI/ML system that directly bear on its safety and effectiveness – in particular, how does it perform in the hands of its intended users?”
A suggestion for
Enhanced more involved clinical trials
Provide individuals added flexibility when interacting with a model, for example inputting their own test data
More interaction between user and model generators
Determining in which situations call for interpretable AI versus explainable (for instance predicting which patients will require dialysis after kidney damage)
Other articles on AI/ML in medicine and healthcare on this Open Access Journal include
Al is on the way to lead critical ED decisions on CT
Curator and Reporter: Dr. Premalata Pati, Ph.D., Postdoc
Artificial intelligence (AI) has infiltrated many organizational processes, raising concerns that robotic systems will eventually replace many humans in decision-making. The advent of AI as a tool for improving health care provides new prospects to improve patient and clinical team’s performance, reduce costs, and impact public health. Examples include, but are not limited to, automation; information synthesis for patients, “fRamily” (friends and family unpaid caregivers), and health care professionals; and suggestions and visualization of information for collaborative decision making.
In the emergency department (ED), patients with Crohn’s disease (CD) are routinely subjected to Abdomino-Pelvic Computed Tomography (APCT). It is necessary to diagnose clinically actionable findings (CAF) since they may require immediate intervention, which is typically surgical. Repeated APCTs, on the other hand, results in higher ionizing radiation exposure. The majority of APCT performance guidance is clinical and empiric. Emergency surgeons struggle to identify Crohn’s disease patients who actually require a CT scan to determine the source of acute abdominal distress.
Aid seems to be on the way. Researchers employed machine learning to accurately distinguish these sufferers from Crohn’s patients who appear with the same complaint but may safely avoid the recurrent exposure to contrast materials and ionizing radiation that CT would otherwise wreak on them.
Retrospectively, Jacob Ollech and his fellow researcher have analyzed 101 emergency treatments of patients with Crohn’s who underwent abdominopelvic CT.
They were looking for examples where a scan revealed clinically actionable results. These were classified as intestinal blockage, perforation, intra-abdominal abscess, or complex fistula by the researchers.
On CT, 44 (43.5 %) of the 101 cases reviewed had such findings.
Ollech and colleagues utilized a machine-learning technique to design a decision-support tool that required only four basic clinical factors to test an AI approach for making the call.
The approach was successful in categorizing patients into low- and high-risk groupings. The researchers were able to risk-stratify patients based on the likelihood of clinically actionable findings on abdominopelvic CT as a result of their success.
Ollech and co-authors admit that their limited sample size, retrospective strategy, and lack of external validation are shortcomings.
Moreover, several patients fell into an intermediate risk category, implying that a standard workup would have been required to guide CT decision-making in a real-world situation anyhow.
Consequently, they generate the following conclusion:
We believe this study shows that a machine learning-based tool is a sound approach for better-selecting patients with Crohn’s disease admitted to the ED with acute gastrointestinal complaints about abdominopelvic CT: reducing the number of CTs performed while ensuring that patients with high risk for clinically actionable findings undergo abdominopelvic CT appropriately.
Main Source:
Konikoff, Tom, Idan Goren, Marianna Yalon, Shlomit Tamir, Irit Avni-Biron, Henit Yanai, Iris Dotan, and Jacob E. Ollech. “Machine learning for selecting patients with Crohn’s disease for abdominopelvic computed tomography in the emergency department.” Digestive and Liver Disease (2021). https://www.sciencedirect.com/science/article/abs/pii/S1590865821003340
Other Related Articles published in this Open Access Online Scientific Journal include the following: