July 13, 2022
A Mizzou Engineer has developed a new software structure that allows users to more efficiently mine big data. Chi-Ren Shyu, Shumaker, professor of electrical engineering and computer science and Director of MU Institute for Data Science and Informatics, was recently awarded a U.S. patent for the system, which has potential applications for commerce, healthcare and any organization that leverages large databases for decision making.
The work builds on what’s known as association rule mining — a procedure used to determine patterns or connections in various types of databases. These rules can be fairly simple, for example, if a customer buys bread at the store, then they are more likely to purchase milk, as well. However, some connections are not as obvious. One of the most famous examples of this type of association stems from a 30-year-old observation that men purchasing diapers on a Friday evening are significantly more likely to buy beer, too.
Some companies, including Amazon, already have proprietary systems that make these connections. For instance, if you purchase a book from the online retailer, it will recommend similar books purchased by others who bought the same title.
Shyu’s software structure is unique in that it can predict how many potential combinations can be formed within the data. If a store has 150,000 items in stock, for example, the system will tell you that there are 22.5 billion potential pairs of different combinations of what could end up in a shopper’s cart. The system also predicts how much computing power you will need to search for the associations you want to make, which is important because mining big data requires computational resources beyond what most companies’ cyberinfrastructures are able to process.
Finally, the model dynamically updates when additional data is received. So, information coming from a store’s register would automatically be included in future searches.
Shyu believes the system could significantly impact healthcare and drug development.
“There could be trillions of combinations within large genetic networks, and that would take forever to run,” he said. “We need computer algorithms to help us deal with the number of items. This structure will help index patterns so we can find information from large data sets and quickly find those connections, then we can develop targeted treatments for those potential combinations.”
An extension of the patent has been developed to mine risk patterns for precision health applications and recently published in an ACM transaction paper.
Work with researchers who are experts in big data and informatics. Learn more about electrical engineering and computer science at Mizzou!