MU protein structure prediction teams shine at international CASP9 Conference
Every protein’s unique three-dimensional structure dictates its function, yet only a fraction of the hundreds of millions of protein structures have been identified. It is one of the great-unsolved mysteries of life science, and researchers around the world are diligently working — and competing — to predict protein structure. Recent results from the National Institutes of Health (NIH)-sponsored Ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) rank two University of Missouri teams — MUFOLD and MULTICOM — among the very best in the world at the computationally intense bioinformatics endeavor.
“We are well-represented in this field,” said Dong Xu, a James C. Dowell Professor and chair of computer science. “Everyone knows we are a major player.”
Xu, who heads up one of three labs that constitute the MUFOLD team, received a five-year, $1 million grant in 2005 from NIH to develop protein structure prediction programs. The labs of computer science colleague, Professor Yi Shang, and Ioan Kosztin, an MU physics assistant professor, round out the MUFOLD team. Team members working in the labs include research scientist Jingfen Zhang, and graduate students Qingguo Wang, Kittinun Vantasin, Zhiquan He and Jiong Zhang.
The results of CASP9 show MUFOLD ranked number one in template-based human prediction, took the top three spots in model quality assessment with two server predictions and one human prediction, and also was highly ranked in a number of other categories.
“Protein structure prediction is a basic research tool, useful in a number of research areas,” said Zhang. “It is the first step. I want to do it very well and then will have many different research options.”
Jianlin “Jack” Cheng, assistant professor of computer science and MU’s Informatics Institute (MUII), headed the MULTICOM team, which included graduate students Zheng Wang and Jesse Eickholt. The team, also funded by the NIH grant, ranked number two in automated template-free server prediction, was in the top five in automated template-based server prediction, came in first in protein contact map prediction and second in disordered protein prediction, among other top performances in global and local protein model quality assessment.
“This event is the most important event in the field. The standard is very high,” Cheng said. “It is the best way to demonstrate your capability of predictions.”
From May through August 2010, CASP organizers sent 128 protein sequences to the servers of research groups around the world with a three-day deadline to return predictions. Additional targets were sent for human prediction, and teams have up to three weeks to submit those predictions.
Shang said students on the MUFOLD team proposed the three successful quality assessment methods. “Prediction involves the development of many different algorithms. The successful ones are combined. There are thousands of lines of code in an algorithm,” he said. “We learn from what we did wrong and what others did right.
“Proteins carry out all out all of the biological functions. They are responsible for things like immune systems, muscle movement and the growth of cells,” said Cheng explaining the importance of the work. Both of his MULTICOM lab assistants said the research’s applications in the medical field are part of what drew them to it.
“Informatics is critical in collaborating fields, especially with doctors on, for example, cancer,” said Zheng Wang. “That is my motivation.”
Eickholt said that he didn’t know what informatics was until Cheng asked him if he would be interested in working on protein structure prediction in his lab. “Its use in medical research and drug design was appealing to me,” he said.
Cheng believes the informatics-life science partnership has the potential to create an entire new biotech industry. “Computing is penetrating all sorts of disciplines. In the future, everybody will have a personal genome, which may lead to the personalized medicine and healthcare,” Cheng said. “Opportunities for computational modeling of proteins and genes in a genome will be infinite. Bioinformatics is a computational tool for our lives.”
Computational research is experiencing a growing role in various branches of life science research and in many other disciplines, but there is fear among professionals that its importance is not being stressed enough in this country’s middle and high school curriculums.
“Computer science is everywhere. Algorithms analyze political candidates,” said Shang. “But it has been marginalized. People think it is the same thing as IT. A recent report from the Association for Computing Machinery (ACM) shows that few high schools are emphasizing it.”
That report, based an in-depth study by ACM and the Computer Science Teachers Association stated, “Approximately two-thirds of U.S. states have very few CS education standards for secondary school education, and most states treat high school CS courses as simply an elective and not part of a student’s core education.”
Statistics show that since a baseline assessment in 2005, 17 percent fewer secondary schools were offering introductory CS courses in 2009, and 35 percent fewer offered advanced placement CS courses.
Xu said that in addition to informatics, the expanding role of CS research in cyber security and image processing has boosted his department’s faculty research funding by 50 percent. It has increased number of quality graduate students that are attracted to and supported by the program, like those who helped make MUFOLD and MULTICOM so successful.
“I love solving hard scientific problems,” said doctoral student Qingguo Wang when asked about his plans for the future. “I will work in computational biology. There are many complex problems in this field. It’s very demanding to complete research. You work collaboratively with people from diverse backgrounds and use all of your skills and everything you know to solve the world’s problems. I like that. It’s good to be challenged.”