For generations, biologists have aimed to understand every single process of life. The key to moving closer to this goal is to further understand the compounds that govern almost all life processes, i.e., proteins. AlphaFold, a revolutionary new AI tool holds the potential to give us the power to evaluate protein structures like never before, further breaking barriers for advancements in research as well as medicine.
Proteins are nutrients essential to build muscle, but more credit is due to these complex, intricate macromolecules that represent the core machinery behind almost all biological processes. These molecules are extremely versatile in their functionality and this is achieved through the diverse range of possible structures they can form. The folding pattern that proteins undergo after starting off as just a sequence of amino acids is the key factor behind their structural diversity.
Overview of the biogenesis of a protein. Alpha helices and pleated sheets are so-called secondary structures that are frequently found in proteins.
Why predict protein structures?
There exists a wide variety of proteins in our world, such as transporter proteins (for transportation of molecules in the body), antibodies (for immunity), enzymes (for catalysis), etc. All such protein functions are largely determined by their structure, which is why there is a need to understand protein structures better.
To put this problem further into perspective, consider the viral variants for SARS-CoV-2, the virus which sparked the COVID-19 pandemic. Viral variants found in the UK and Brazil were deemed to have higher transmissibility than the ‘original’ virus and it was understood that the critical factor between these variants was the structure of the spike proteins found on the virus. Discovery of these spike protein structures was the key factor in vaccine design to combat this virus. This clearly illustrates how efficient decoding of protein structures can be a driving force for significant advancements in medicine.
Spike proteins of viral variants shown alongside spike protein found in the ‘original’ SARS-CoV-2 virus (left).
These ideas have led Biologists to ask a crucial question over decades of work, i.e., “How to accurately AND efficiently predict the 3D structure of a protein?”
Enter AlphaFold, the ground-breaking AI system that has broken barriers for advancements in protein structure prediction.
DeepMind, a general AI company, introduced AlphaFold through their paper Improved Protein structure prediction using Potentials from Deep Learning. AlphaFold highlights significant advancements in the field of protein structure prediction; but what makes AlphaFold so special?
AlphaFold allows for prediction of the 3D structure of a protein solely from its amino acid sequence.
Until now, several experimental techniques (such as X-ray crystallography, cryo-EM and NMR) and non-experimental techniques (heavily based on template-based modeling, i.e., prediction of protein structures by alignment to known/solved protein structures) have been used to successfully deduce the structure of many proteins but there are some caveats to all these approaches: they are either inefficient, expensive, or limited in their ability to decode structures of novel proteins.
AlphaFold has opened the door to efficiently modeling 3D structures of novel proteins in a non-experimental setting, an idea deemed as a “fairy tale” before its advent. This is because there exists an astoundingly large number of possible structural configurations for a given protein (about 10300 for a typical protein!) which made it too challenging a problem to solve without AI. AlphaFold promises to grant scientists the long-sought power to model novel protein structures with great accuracy. Such power to predict the structures of these complex macromolecules simply from their amino acid sequence with increased speed and efficiency is the dream of many biologists.
Furthermore, AlphaFold demonstrated its effectiveness by its commendable achievements in the CASP competition. The CASP competition is a biennial assessment where participating teams are asked to predict structures of previously unseen proteins. In the CASP13 competition (2018), AlphaFold emerged as the best performer!
Performance of AlphaFold (Purple) and AlphaFold 2 (Blue) in the CASP competition. GDT (global distance test, ranging from 0 to 100) is the primary metric employed by CASP to quantify accuracy of protein structures.
AlphaFold’s achievements were further surpassed by its successor, AlphaFold II (further discussed below), in the CASP14 competition (2019).
Artificial Intelligence is growing at a tremendous rate and AlphaFold has come forth as one of the most impactful applications of AI. AI systems typically employ large computational architectures known as neural networks. As the name suggests, these systems exhibit “intelligence” and learn patterns and dynamics of a problem simply by being exposed to a vast amount of data relating to the problem.
This core idea of learning is precisely why AlphaFold is able to accurately predict complex structures of proteins despite the vast number of structural possibilities for any given sequence.
At this point, it’s only natural to ask how does AlphaFold achieve this?
Like most AI systems, AlphaFold learns patterns and relationships between different protein structures and sequences by being exposed to a large amount of known protein structures obtained from the Protein Data Bank (PDB). Upon completion of learning, AlphaFold can then be used to predict structures of novel proteins. Specifically, AlphaFold predicts the torsion angles and the distances between amino acid residues in a protein; it can be understood that the torsion angles and residual distances of a protein completely characterize its 3D structure. Therefore, accurate prediction of these angles and distances allows for accurate prediction of protein structure.
Since the release of the AlphaFold I paper in Early 2020, DeepMind has unveiled its sequel AlphaFold II. The improvements of AlphaFold II include some computational and architectural modifications to the original system’s neural network, which in turn leads to more accurate and efficient protein structure prediction. This newer, faster and better version of AlphaFold performed significantly better than its predecessor in the CASP14 competition (90% vs 60%, see figure above).
However, the problem is not 100% solved. Like any other model, there is always room for improvement to learn even more intricate features and to further address ideas such as how proteins fold.
That being said, AlphaFold truly promises to be one of the most significant scientific breakthroughs of our generation. The power to deduce structures of novel proteins with the speed and efficiency that AlphaFold brings to the table clears the pathway for revolutionary breakthroughs in science and medicine.
Learn more about AlphaFold:
- Original article describing AlphaFold I in Nature (2020)
- Original article describing AlphaFold II in Nature (2021)
- Blog post by DeepMind about AlphaFold I
- Blog post by DeepMind about AlphaFold II
- AlphaFold: The making of a scientific breakthrough (YouTube)
About the Author
This post was written by Sarvagya Agrawal. He is an undergraduate student at
University of Toronto looking forward to graduating with a degree in Data
Science/Machine Learning and Molecular Biology. He is heavily interested in Deep
Learning and aims to create efficient models to solve real world problems under the light of both industry and research. Always being fascinated by the field of AI and genomics, he plans to bring some sci-fi theories to life!