February 26, 2026
Tech company NVIDIA powered the development of an AI tool that will enable scientists to efficiently design bespoke proteins for pharmaceuticals and materials.

Mizzou is a leader in pioneering new ways to leverage artificial intelligence (AI) to enable advances in chemistry, energy, health care and materials.
Now an interdisciplinary team of researchers has developed technology that helps AI systems learn from and work with protein structures and could ultimately help reduce the time and cost required to develop new drugs.
The team’s method, called the Geometry‑Complete Vector‑Quantized Variational Autoencoder (GCP‑VQVAE), creates a language for protein geometry that preserves true shape and orientation and can decode that language back into detailed 3D structures with near-atomic accuracy.
Scientists have long dreamed of harnessing and directing the creative power of proteins, the molecular machines that build cells, carry signals and power chemical reactions. But working directly with 3D structure is difficult for many AI systems.
GCP‑VQVAE translates complex protein 3D structure into a form that generative models can learn from and use in protein engineering and drug discovery.
“Purpose-built proteins could create new medicines, clean up pollution or build sustainable materials,” said Mahdi Pourmirzaei, a computer science PhD student and research assistant at NextGen Precision Health, where he applies computer science to bioinformatics and protein design.
Overcoming challenges
What a protein does is determined by the shape, or fold, of its long strings of amino acids. To engineer proteins with specific functions, scientists first need to understand how different folds relate to different activities.
“In theory, you could start building proteins blindly, but the design space is astronomically large,” said Alex Morehead (PhD 2025), now a Hopper Postdoctoral Fellow at the Lawrence Berkeley National Laboratory. “Of the billions or trillions of possible amino acid sequences, only a tiny fraction will fold into a useful shape.
To avoid countless iterations and enormous time and cost, scientists in recent years developed a series of computational tools, including AlphaFold-class methods, that were able to predict a protein’s 3D structure, turn that structure into a string of symbols, and turn that string into a set of distinct, separate pieces.
But raw 3D coordinates are not a natural fit for many AI systems, because a protein’s coordinates can move or rotate even when the protein itself has not changed.
“Any system that tries to read a protein must recognize the molecule when it rotates or moves across the room,” Morehead said. “Even a shift of a few angstroms or a flipped chirality can mean the difference between a life-saving drug and a useless molecule.”
Fast and accurate
Now the team can translate protein structure into a form AI can learn from without losing critical 3D detail.
By translating complex protein 3D structure into a compact, physically faithful representation that can be decoded back into detailed structures, GCP‑VQVAE opens the door to multiple downstream applications across protein science—from accelerating early-stage discovery to making structure easier for AI systems to learn from and work with.
“GCP‑VQVAE respects the physics of 3D space,” Pourmirzaei said, emphasizing that the representation stays meaningful in real 3D settings rather than getting distorted by how a structure is positioned.
That matters in practical pipelines such as high‑throughput drug discovery, where thousands of candidate designs need to be generated, reconstructed, and screened quickly — and where GCP‑VQVAE’s fast reconstruction makes that scale more realistic.
Just as importantly, it helps make structural information “native” to the new generation of generative AI models, so they can learn from protein structure and propose new designs that can be decoded back into usable 3D candidates.
But protein design is still a numbers game.
“Researchers currently fail in about 90% of their attempts,” Pourmirzaei said. “GCP‑VQVAE, combined with the new wave of generative AI, helps us generate, reconstruct and triage many more candidate designs computationally, so we can dramatically reduce the time and cost compared to traditional lab‑only approaches.”
Modern AI projects require a combination of cloud computing expertise, distributed training, strong software engineering skills and domain knowledge. To translate ideas into working systems, universities like Mizzou will have to build these capabilities and foster collaboration.
Pourmirzaei said the project success relied on computing resources from the National AI Research Resource (NAIRR) Pilot with support from NVIDIA DGX Cloud and the NVIDIA AI Enterprise Software Platform.
“Without NVIDIA’s continuous support, mentorship and engineering guidance, it would have been impossible to bring a project like this to completion and reach such an important research goal,” he said.
The research team plans to iterate on the model and eventually enable chatbots capable of reading, understanding and analyzing protein structures or even designing new proteins in the future. They have released their research paper on the bioRxiv preprint server and open-source code and models on GitHub.
The principal investigator on the study was Pourmirzaei’s advisor, Dong Xu, former Curators’ Distinguished Professor at Mizzou Engineering. Other researchers on the project were Farzaneh Esmaili (University of Missouri), Jarett Ren (Carnegie Mellon University) and Mohammadreza Pourmirzaei (Politecnico di Milano).
“Making modern AI to understand and design proteins opens the door to new molecules, new therapies and entirely new possibilities,” Pourmirzaei said. “Precisely the kind of practical, impactful innovation Mizzou is known for.”
Discover more ways Mizzou Engineering is creating meaningful change.