@MIT Artificial intelligence system rapidly predicts how two proteins will attach: The model called Equidock, focuses on rigid body docking — which occurs when two proteins attach by rotating or translating in 3D space, but their shapes don’t squeeze or bend

Reporter: Aviva Lev-Ari, PhD, RN

This paper introduces a novel SE(3) equivariant graph matching network, along with a keypoint discovery and alignment approach, for the problem of protein-protein docking, with a novel loss based on optimal transport. The overall consensus is that this is an impactful solution to an important problem, whereby competitive results are achieved without the need for templates, refinement, and are achieved with substantially faster run times.
28 Sept 2021 (modified: 18 Nov 2021)ICLR 2022 SpotlightReaders:  Everyone Show BibtexShow Revisions
Keywords:protein complexes, protein structure, rigid body docking, SE(3) equivariance, graph neural networks
AbstractProtein complex formation is a central problem in biology, being involved in most of the cell’s processes, and essential for applications such as drug design or protein engineering. We tackle rigid body protein-protein docking, i.e., computationally predicting the 3D structure of a protein-protein complex from the individual unbound structures, assuming no three-dimensional flexibility during binding. We design a novel pairwise-independent SE(3)-equivariant graph matching network to predict the rotation and translation to place one of the proteins at the right location and the right orientation relative to the second protein. We mathematically guarantee that the predicted complex is always identical regardless of the initial placements of the two structures, avoiding expensive data augmentation. Our model approximates the binding pocket and predicts the docking pose using keypoint matching and alignment through optimal transport and a differentiable Kabsch algorithm. Empirically, we achieve significant running time improvements over existing protein docking software and predict qualitatively plausible protein complex structures despite not using heavy sampling, structure refinement, or templates.
One-sentence SummaryWe perform rigid protein docking using a novel independent SE(3)-equivariant message passing mechanism that guarantees the same resulting protein complex independent of the initial placement of the two 3D structures.

MIT researchers created a machine-learning model that can directly predict the complex that will form when two proteins bind together. Their technique is between 80 and 500 times faster than state-of-the-art software methods, and often predicts protein structures that are closer to actual structures that have been observed experimentally.

This technique could help scientists better understand some biological processes that involve protein interactions, like DNA replication and repair; it could also speed up the process of developing new medicines.

Deep learning is very good at capturing interactions between different proteins that are otherwise difficult for chemists or biologists to write experimentally. Some of these interactions are very complicated, and people haven’t found good ways to express them. This deep-learning model can learn these types of interactions from data,” says Octavian-Eugen Ganea, a postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and co-lead author of the paper.

Ganea’s co-lead author is Xinyuan Huang, a graduate student at ETH Zurich. MIT co-authors include Regina Barzilay, the School of Engineering Distinguished Professor for AI and Health in CSAIL, and Tommi Jaakkola, the Thomas Siebel Professor of Electrical Engineering in CSAIL and a member of the Institute for Data, Systems, and Society. The research will be presented at the International Conference on Learning Representations.

Significance of the Scientific Development by the @MIT Team

EquiDock wide applicability:

  • Our method can be integrated end-to-end to boost the quality of other models (see above discussion on runtime importance). Examples are predicting functions of protein complexes [3] or their binding affinity [5], de novo generation of proteins binding to specific targets (e.g., antibodies [6]), modeling back-bone and side-chain flexibility [4], or devising methods for non-binary multimers. See the updated discussion in the “Conclusion” section of our paper.


Advantages over previous methods:

  • Our method does not rely on templates or heavy candidate sampling [7], aiming at the ambitious goal of predicting the complex pose directly. This should be interpreted in terms of generalization (to unseen structures) and scalability capabilities of docking models, as well as their applicability to various other tasks (discussed above).


  • Our method obtains a competitive quality without explicitly using previous geometric (e.g., 3D Zernike descriptors [8]) or chemical (e.g., hydrophilic information) features [3]. Future EquiDock extensions would find creative ways to leverage these different signals and, thus, obtain more improvements.


Novelty of theory:

  • Our work is the first to formalize the notion of pairwise independent SE(3)-equivariance. Previous work (e.g., [9,10]) has incorporated only single object Euclidean-equivariances into deep learning models. For tasks such as docking and binding of biological objects, it is crucial that models understand the concept of multi-independent Euclidean equivariances.

  • All propositions in Section 3 are our novel theoretical contributions.

  • We have rewritten the Contribution and Related Work sections to clarify this aspect.


Footnote [a]: We have fixed an important bug in the cross-attention code. We have done a more extensive hyperparameter search and understood that layer normalization is crucial in layers used in Eqs. 5 and 9, but not on the h embeddings as it was originally shown in Eq. 10. We have seen benefits from training our models with a longer patience in the early stopping criteria (30 epochs for DIPS and 150 epochs for DB5). Increasing the learning rate to 2e-4 is important to speed-up training. Using an intersection loss weight of 10 leads to improved results compared to the default of 1.



Periodic table of protein complexes, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 1: Next Generation Sequencing (NGS)

Periodic table of protein complexes

Larry H. Bernstein, MD, FCAP, Curator



Periodic Table of Protein Complexes


New tool helps to visualise, understand and predict how proteins combine to drive biological processes

A new ‘periodic table’ of protein complexes has been developed that provides a unified way to classify and visualise protein complexes, providing a valuable tool for biotechnology and the engineering of novel complexes.

This study also provides insights into evolutionary distribution of different types of existing protein complexes.

The Periodic Table of Protein Complexes offers a new way of looking at the enormous variety of structures that proteins can build in nature, which ones might be discovered next, and predicting how entirely novel structures could be engineered. Created by an interdisciplinary team led by researchers at the Wellcome Genome Campus and the University of Cambridge, the Table provides a valuable tool for research into evolution and protein engineering.

Almost every biological process depends on proteins interacting and assembling into complexes in a specific way, and many diseases are associated with problems in complex assembly. The principles underpinning this organisation are not yet fully understood, but by defining the fundamental steps in the evolution of protein complexes, the new ‘periodic table’ presents a systematic, ordered view on protein assembly, providing a visual tool for understanding biological function.

“Evolution has given rise to a huge variety of protein complexes, and it can seem a bit chaotic. But if you break down the steps proteins take to become complexes, there are some basic rules that can explain almost all of the assemblies people have observed so far.”


Dr Joe Marsh, formerly of the Wellcome Genome Campus and now of the MRC Human Genetics Unit at the University of Edinburgh.

Different ballroom dances can be seen as an endless combination of a small number of basic steps. Similarly, the ‘dance’ of protein complex assembly can be seen as endless variations on dimerization (one doubles, and becomes two), cyclisation (one forms a ring of three or more) and subunit addition (two different proteins bind to each other). Because these happen in a fairly predictable way, it’s not as hard as you might think to predict how a novel protein would form.

“We’re bringing a lot of order into the messy world of protein complexes. Proteins can keep go through several iterations of these simple steps, adding more and more levels of complexity and resulting in a huge variety of structures. What we’ve made is a classification based on these underlying principles that helps people get a handle on the complexity.”

Dr Sebastian Ahnert of the Cavendish Laboratory at the University of Cambridge

The exceptions to the rule are interesting in their own right, as are the subject of on-going studies.

“By analysing the tens of thousands of protein complexes for which three-dimensional structures have already been experimentally determined, we could see repeating patterns in the assembly transitions that occur – and with new data from mass spectrometry we could start to see the bigger picture.”

Dr Joe Marsh

“The core work for this study is in theoretical physics and computational biology, but it couldn’t have been done without the mass spectrometry work by our colleagues at Oxford University. This is yet another excellent example of how extremely valuable interdisciplinary research can be.”

Dr Sarah Teichmann, Research Group Leader at the Wellcome Trust Sanger Institute and the European Bioinformatics Institute (EMBL-EBI)

