Exploring how AI can strengthen research reproducibility

May 13, 2026

Mizzou Engineering students presented their studies at a recent symposium on AI and reproducibility in research led by faculty member Tanu Malik.

Tanu Malik and Jay Lofstead — *Tanu Malik, left, seen here with Jay Lofstead* *of Sandia National Laboratories*, *said the symposium at the the Sixth Chameleon User Meeting* *sought to gather community perspectives on where AI can meaningfully contribute to reproducibility.*

At Mizzou Engineering, we develop technology and establish practices to reinforce the bedrock of science and learning. One researcher advancing that work today is Tanu Malik, an associate professor in the Department of Electrical Engineering and Computer Science.

Malik builds tools to enable the efficient and reliable reproduction of studies, a cornerstone of academic integrity. If one scientist can replicate the findings of another using the same methods, it helps ensure that the original study was accurate.

Malik and two students recently returned from Boulder, Colorado, where they attended the Sixth Chameleon User Meeting, a two-day event centered on artificial intelligence (AI) and how it can support reproducible research practices.

Malik organized and led the symposium, which aligned with and expanded on her prior work.

Malik said she was motivated to explore how infrastructures like the Chameleon research cloud and testbed, which currently stores research artifacts, could benefit from AI-driven methods for curation, organization and usability

“AI is not universally applicable to all reproducibility challenges,” Malik said. “But there is an opportunity to identify low-hanging fruit where AI could deliver the greatest benefit.”

Emerging work from Mizzou students

The event also highlighted student success. Mizzou PhD students Deshan Wattegama and Bhanu Prakash Vangala each received a $1,500 travel scholarship to attend the event, where they presented their work.

Wattegama presented the study Lessons Learned from a Multi-Model Question Answering Pipeline on the Chameleon Testbed, which he co-authored with Nagarjuna Kandimalla, Katelyn Van Dyke and Malik. For this project, the students built a question‑answering system using multiple small AI models. The team showed that different models work best for different types of questions, and that smart routing can improve answers. Chameleon’s fast GPUs, full control and flexible storage made the experiments possible.

Vangala presented the study Evaluating Reproducibility and Dependency Gaps in LLM-Generated Code, which he co-authored with Ashish Gehani and Malik. Vangala explained that while AI tools can generate impressive, seemingly complete software projects, many fail to run on a clean machine. By testing 300 AI‑generated projects, the researchers showed that reproducibility issues are common and are often caused by code bugs and hidden dependency problems, not just missing libraries.

Thought leadership from the experts

Norwegian computer scientist and AI researcher Odd Erik Gundersen delivered the keynote, in which he identified three main barriers to reproducibility in AI research: rapid proliferation of models and experiments, lack of time for careful documentation, and incomplete reporting of code, data and dependencies.

Gundersen showed that sharing both code and data significantly improves reproducibility, though gaps remain. He proposed using AI itself to improve documentation, help researchers reproduce prior work and automatically quality-assure papers before publication, reframing reproducibility as essential research infrastructure rather than an optional add-on.

Bogdan Stoica, a postdoctoral research associate in the SysNet group at the University of Illinois Urbana-Champaign presented on the complexities of artifact evaluation — checking whether research code and data actually run.

Most effort in artifact evaluation goes into setup, fixing errors and resolving configuration problems, not evaluating results. Stoica proposed using AI agents to automate the repetitive, mechanical parts of artifact evaluation, such as environment setup and running experiments.

Jay Lofstead, principal member of technical staff in the scalable system software group at Sandia National Laboratories, spoke on reproducible machine learning (ML). Reproducible ML depends greatly on the scale of the system, he said.

For smaller models, reproducibility is achievable if researchers carefully control data splits, training order and randomness. Large‑scale machine learning is far harder, however. Multiple training paths, huge datasets, pruning decisions, checkpoints and distributed computing make it difficult to trace how a final model was produced.

Malik said the symposium raised important questions about reproducibility, infrastructure needs, and the growing role AI can play in addressing persistent research challenges. Invited speakers are currently working to synthesize their insights into a vision paper, which she expects to be available this summer.

“AI opens the door to a more dependable scientific future, where we can share, test and build upon research with confidence,” Malik said. “With the right infrastructure and community effort, reproducibility can become a natural part of how we do science.”

Discover more areas where Mizzou Engineering is pushing the envelope.