Home » Health » AI-Powered Codon Optimization Boosts Protein Production in Yeast for Drug Manufacturing

AI-Powered Codon Optimization Boosts Protein Production in Yeast for Drug Manufacturing

by Dr. Jennifer Chen

The development of new protein-based drugs, including vaccines and biopharmaceuticals, is often a costly and time-consuming process. Now, a new approach leveraging artificial intelligence is showing promise in streamlining this development, potentially reducing both the time and expense involved. Researchers at MIT have developed a large language model (LLM) capable of optimizing the genetic code used to manufacture proteins in yeast, a common workhorse of the biopharmaceutical industry.

Yeast, such as Komagataella phaffii and Saccharomyces cerevisiae (baker’s yeast), are widely used to produce complex protein drugs. The process involves introducing a gene from another organism into the yeast and modifying it to maximize production. A key step in this process is codon optimization – selecting the most efficient DNA sequence for the yeast to read and translate into protein.

“Today, those steps are all done by very laborious experimental tasks,” explains J. Christopher Love, the Raymond A. And Helen E. St. Laurent Professor of Chemical Engineering at MIT and faculty co-director of the MIT Initiative for New Manufacturing (MIT INM). “We have been looking at the question of where could we take some of the concepts that are emerging in machine learning and apply them to make different aspects of the process more reliable and simpler to predict.”

Understanding the Genetic Code and Codon Optimization

The genetic code is based on codons – three-letter DNA sequences that specify which amino acid should be added to a growing protein chain. Importantly, most amino acids can be encoded by multiple different codons. Different organisms exhibit preferences for certain codons over others, a phenomenon known as codon bias. Traditionally, researchers have optimized genes for yeast production by selecting codons that are frequently used by the host organism.

However, simply choosing the most common codons isn’t always the most effective strategy. Over-reliance on a single codon can deplete the cell’s supply of the corresponding transfer RNA (tRNA) molecules, which are essential for translating the genetic code into protein. This can ultimately limit protein production.

The MIT team’s new model takes a more sophisticated approach. Instead of simply counting codon frequencies, it analyzes the patterns of codon usage within genes, learning the “syntax” or “language” of the yeast’s genetic code. The model was trained on a publicly available dataset containing the amino acid and DNA sequences for approximately 5,000 proteins naturally produced by K. Phaffii.

“The model learns the syntax or the language of how these codons are used,” Love says. “It takes into account how codons are placed next to each other, and also the long-distance relationships between them.”

AI Outperforms Existing Methods

To test the model’s effectiveness, the researchers used it to optimize the codon sequences of six different proteins, including human growth hormone, human serum albumin, and trastuzumab, a monoclonal antibody used in cancer treatment. They then compared the performance of the AI-optimized sequences to those generated by four commercially available codon optimization tools.

The results were compelling. For five of the six proteins, the sequences generated by the MIT model yielded the highest levels of protein production in K. Phaffii cells. For the remaining protein, the model’s sequence was the second-best performer.

“We made sure to cover a variety of different philosophies of doing codon optimization and benchmarked them against our approach,” says Harini Narayanan, the lead author of the study. “We’ve experimentally compared these approaches and showed that our approach outperforms the others.”

Implications for Biopharmaceutical Production

K. Phaffii is a crucial organism in the biopharmaceutical industry, used to produce a wide range of products, including insulin, hepatitis B vaccines, and treatments for chronic migraines. Optimizing protein production in this yeast strain can have a significant impact on the cost and availability of these essential medicines.

The researchers have made the code for their model available to other researchers, allowing them to apply it to K. Phaffii or other organisms. They also tested the model on datasets from humans and cows, finding that species-specific models are necessary for optimal codon optimization.

Uncovering Biological Principles

Interestingly, the researchers discovered that the AI model appeared to learn some underlying biological principles without being explicitly taught them. For example, it avoided incorporating negative repeat elements – DNA sequences that can suppress gene expression – and it categorized amino acids based on their physical properties, such as hydrophobicity and hydrophilicity.

“Not only was it learning this language, but it was also contextualizing it through aspects of biophysical and biochemical features, which gives us additional confidence that This proves learning something that’s actually meaningful and not simply an optimization of the task that we gave it,” Love explains.

This research, published in the issue of the Proceedings of the National Academy of Sciences, represents a significant step forward in applying artificial intelligence to the complex challenges of biopharmaceutical manufacturing. By streamlining the codon optimization process, this new technology has the potential to reduce the cost of developing and producing life-saving protein drugs.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.