December 15, 2020
Mizzou Engineering students took on tech giants at a worldwide competition last month and came home in the top 10 for devising a way to accurately predict protein structures. And in subcategories, Mizzou teams ranked in the top 3.
What makes this especially notable is that this was a historic year for the 14th Critical Assessment of Structure Prediction (CASP14) conference as Google-owned DeepMind took first place and made headlines for unprecedented accuracy.
In short, the company demonstrated a way to predict how proteins will fold using artificial intelligence (AI).
“It’s one of the major scientific breakthroughs this year,” said Jianlin “Jack” Cheng, Thompson Professor of Electrical Engineering and Computer Science who’s been working on protein prediction for nearly 20 years. “It’s a great thing that happened in this field, and we have been part of this process to develop knowledge that will drastically revolutionize life sciences—agriculture, medicine, health care and the economy.”
Why it Matters
Proteins are the building blocks of life. They start out as strings of amino acids that fold into three-dimensional shapes. And those shapes determine how a protein will function. For instance, COVID-19 has been found to have a spike-shaped protein that injects the virus into human cells. On the flip side, knowing how proteins fold and function can lead to the design of new medicines and vaccines that can prepare us for future pandemics.
Predicting what structure a protein will fold into has vexed researchers for decades.
Cheng was among the first to use deep learning—currently the most powerful AI technology—in a protein-predicting algorithm, demonstrating its superiority in the 2012 CASP10 competition.
His approach has since evolved. Earlier this year, he was awarded a $1.37 million grant from the National Institutes of Health to further develop his method using a so-called attention mechanism with deep learning.
Google’s DeepMind appears to have also used the attention mechanism in its successful methodology, AlphaFold2.
“I was very happy to see Google’s AlphaFold2 has demonstrated that attention is one of the most important things to get this method to work,” Cheng said.
History of Success
Mizzou Engineering has a long history of success at CASP, a global competition held every other year since 1994. The goal of the event is to evaluate and demonstrate the progress of protein prediction models.
Although this year was more competitive than previous years, Mizzou Engineers still ranked well across the board.
Cheng’s team, MULTICOM, including PhD students Jian Liu, Tianqi Wu, Xiao Chen and Zhiye Guo, placed 7th in the main competition.
The MUFold team, consisting of PhD students Wenbo Wang and Junlin Wang, Shumaker Professor Dong Xu, and EECS Professor Yi Shang, landed at 14th out of 146 submissions in the main category.
In one subcategory, MULTICOM earned top honors for selecting the most accurate protein structural models. In another category, the team ranked 3rd.
MUFold also ranked No. 2 out of 72 submissions in one quality assessment subcategory and No. 5 in another subcategory. In three consecutive CASPs, the MUFold team has finished in top two places in some quality assessment subcategories.
In the past, the CASP competition consisted of academic groups of scientists. Then Google entered two years ago, and other industry giants followed suit. This year’s event included submissions from Microsoft and Facebook, too.
“I’m glad Google participated,” said Wenbo Wang. “It brings a lot of attention to what we are doing. After Google’s participation in the last CASP, we saw a huge increase in terms of participation in this CASP. The competition was very fierce compared to years before. We are glad our method still prevailed and feel like our hard work paid off.”
Both he and teammate Junlin Wang agreed the company’s participation has been validating.
“In this competition, the biggest contribution Google made was they gave us confidence,” Junlin Wang said. “Google showed us we can solve this problem, and that tells all researchers we can continue to work on this, and it will be successful.”
Work to be Done
Although the problem of predicting 3D structures appears to have been solved, there’s still plenty of work to be done.
Unlike Google, a private company, academic researchers release their findings in an open source platform. That means anyone can access those discoveries to either advance the research or develop commercial applications.
Wenbo Wang, for instance, has developed a website where anyone can see the accuracy of their own protein prediction models. The site, Protein Structural Information Conformity Analysis, is free and open to the public.
“We still need an open source version that others can access to do research and technology development,” Cheng said.
Plus, protein structure prediction is just one problem within a larger category of protein modeling, Xu said.
“There are many others. For instance, once you know the structure, you need to know how it moves over time,” he said. “And how two proteins dock and interact together. So in terms of related research, there’s still lots to do.”
“There will be even more to do in terms of potential applications and innovations, and that will mean more jobs for students who have applied AI and machine learning techniques to protein structure prediction,” Shang said. “The research field is becoming more competitive, but the job market will be better.”
And the discovery will likely open up an entire new industry, Cheng said.
“It’s a very exciting time to study AI and its applications in the biomedical domain,” he said. “Students are going to see huge amounts of potential opportunities in the bioinformatics industry and the AI industry. This is the biggest technology revolution since the IT revolution in the 1980s ”