MULTICOM, MUFOLD rank high in protein modeling competition
Almost every protein in nature is comprised of just 20 known amino acids, sequenced and shaped to fit whatever biological lock needs turning. But given a protein sequence of just 200 of these amino acids, the number of possible configurations is gargantuan — a figure 261 digits long.
Those are the astronomical odds MULTICOM and MUFOLD — two MU College of Engineering protein structure prediction teams led (respectively) by Jianlin Cheng and Dong Xu — tackle in modeling proteins, one of the most fundamental units in biology and one of the most challenging problems in bioinformatics.
“That’s one of the major outstanding problems in computational biology, is to construct a 3D structure from a protein sequence,” said Cheng. “The CASP competition is sort of a community-wide benchmark for protein structure predicting techniques.”
Cheng has led MULTICOM through three consecutive CASP competitions since 2008.
Both teams have received final word of top rankings at the Tenth Critical Assessment of protein Structure Prediction (CASP10), a biennial worldwide exercise in automated protein structure prediction.
Cheng, associate professor of computer science, heads MULTICOM, which received a first place ranking in residue contact prediction and automated protein refinement, as well as a second place ranking in single-model quality assessment, an overall second place ranking in template-based modeling, as well as an overall fourth place ranking in tertiary structure prediction.
MUFOLD finished at the top of their group in the hard template-based human modeling category and third in refinement prediction, finishing overall in the top 15 at CASP10.
Every two years, international teams employ ever-improving methods to analyze, predict and model proteins based on given sequences of amino acids. For three solid months, CASP distributes two sequences per day to research groups around the world. Research groups are required to submit their predictions to the CASP center within three days.
In total, more than 200 predictors from around the world generated models for about 120 proteins during the CASP10 experiment.
“We’ve developed some novel methods in this process,” said computer science Professor Dong Xu, who heads the MUFOLD team. “We have a graph-based modeling system called ‘multidimensional scaling’, and then use different assessments to rank models.”
MUFOLD finished at the top of their group, and received an overall top-five ranking in template-based modeling.
Official results were released at the CASP10 conference, held in Gaeta, Italy, in December.
“For some of us, this was the first time we had participated in the CASP competition,” said Badri Adhikari of MULTICOM. “We didn’t know much about the competition before, but now we see that the competition is actually a very good platform to improve our research.”
MULTICOM has developed machine-learning techniques that allow a server to process protein data and generate structure predictions. The team then divides tasks all the way from structure prediction to quality assessment and model refinement.
“A number of researchers are doing this competition, so we can learn from other groups also, see how good our method is and our own improvement,” said Rhenzi Cao, model quality assessment specialist for MULTICOM.
Cao said this year was particularly challenging because teams were only allotted 20 models to evaluate their predictions, versus about 300 in previous competitions.
“The most key thing is novelty,” said Zheng Wang, a post-doctoral student who’s participated in three consecutive CASP competitions. “I think this lab can generate novel ideas every time, can be crazy, can keep adding new stuff, keep making improvements on our previous experiments.”
Both teams are already preparing for the next CASP competition in Spring 2014, as well as considering other areas they can apply their structure prediction techniques. MULTICOM is currently exploring the possibility of predicting entire genome structures
“That’s another very, very challenging problem in bioinformatics,” said CS graduate Debswapna Bhattacharya, a MULTICOM developer. “Some on our team are already exposed to the new exploration.”