Scientists at Eli Lilly have been surprised by novel design of molecules that AI has produced as part of hypothetical drug discovery research.
A major precedent for AI-generated breakthroughs in biology was set in 2021 when Google’s DeepMind AI, known for its creative thinking in realms ranging from the strategy game Go to music, video, and cloud computing, came up with a novel protein called AlphaFold.
Within a few years, experts at Lilly and Nvidia say AI will not only think up new drugs, but ones that humans could not create.
Many of today’s most powerful generative AI systems employ the established technique called Reinforcement Learning from Human Feedback (RLHF), learning from human preferences. Unlike text, pictures, videos, voice, smell, and other data types that can be easily interpreted by most non-expert humans, interpreting AI-generated biological, chemical, and clinical data requires years of training and talent. For example, medicinal chemists who can accurately predict the multiple properties of small molecules just by looking at them are rare. In addition, even expert humans are often wrong and subject to human biases. And unlike in other fields where it is possible to reduce the biases in feedback by diversifying and balancing the user base, experts in drug discovery can not be hired cheaply. There are only a few experienced drug hunters that can traverse biology, chemistry, and medicine and even fewer humans who can cover multiple therapeutic areas. These expert humans are expensive, busy, and rarely work for companies developing solutions for AI-driven drug discovery. This poses a significant challenge for the training and validation of generative AI systems and is one of the key reasons for slowing the adoption and the impact of these systems on the pharmaceutical industry. Finally, many properties of the generated biological and chemical, or clinical data cannot be predicted even by expert humans with decades of experience and require experimental validation in biological systems.
In this study we used cryo-electron microscopy to determine the structures of the Flotillin protein complex, part of the Stomatin, Prohibitin, Flotillin, and HflK/C (SPFH) superfamily, from cell-derived vesicles without detergents. It forms a right-handed helical barrel consisting of 22 pairs of Flotillin1 and Flotillin2 subunits, with a diameter of 32 nm its wider end and 19 nm at its narrower end. Oligomerization is stabilized by the C-terminus, which forms two helical layers linked by a β-strand, and coiled-coil domains that enable strong charge-charge inter-subunit interactions. Flotillin interacts with membranes at both ends; through its SPFH1 domains at the wide end and the C-terminus at the narrow end, facilitated by hydrophobic interactions and lipidation. The inward tilting of the SPFH domain, likely triggered by phosphorylation, suggests its role in membrane curvature induction, which could be connected to its proposed role in clathrin-independent endocytosis. The structure suggests a shared architecture across the family of SPFH proteins and will promote further research into Flotillin’s roles in cell biology.
Gene editing has the potential to solve fundamental challenges in agriculture, biotechnology, and human health. CRISPR-based gene editors derived from microbes, while powerful, often show significant functional tradeoffs when ported into non-native environments, such as human cells. Artificial intelligence (AI) enabled design provides a powerful alternative with potential to bypass evolutionary constraints and generate editors with optimal properties. Here, using large language models (LLMs) trained on biological diversity at scale, we demonstrate the first successful precision editing of the human genome with a programmable gene editor designed with AI. To achieve this goal, we curated a dataset of over one million CRISPR operons through systematic mining of 26 terabases of assembled genomes and meta-genomes. We demonstrate the capacity of our models by generating 4.8x the number of protein clusters across CRISPR-Cas families found in nature and tailoring single-guide RNA sequences for Cas9-like effector proteins. Several of the generated gene editors show comparable or improved activity and specificity relative to SpCas9, the prototypical gene editing effector, while being 400 mutations away in sequence. Finally, we demonstrate an AI-generated gene editor, denoted as OpenCRISPR-1, exhibits compatibility with base editing. We release OpenCRISPR-1 publicly to facilitate broad, ethical usage across research and commercial applications.
Speaker discusses the potential of generative AI in revolutionizing the field of life sciences, particularly in the design and engineering of proteins. The key points are:
1. Traditional approaches have been limited in their ability to read and understand the code of life (DNA sequences) and engineer proteins with desired functions.
2. Generative AI models, trained on large datasets of DNA sequences and protein structures, can learn the patterns and generate new DNA sequences encoding proteins with specific functions and properties.
3. The company has successfully used these models to generate antibodies that bind to targets with high affinity and desired properties, outperforming conventional methods.
4. The models have also demonstrated the ability to design enzymes with multiple mutations, peptides, and multi-protein scaffolds, expanding the scope of protein engineering.
5. The speaker suggests that generative AI could unlock vast unexplored spaces of potential protein sequences, likening it to the age of exploration, where only a tiny fraction of the possibilities have been explored by nature.
6. This approach holds the promise of developing novel and important medicines and biotechnologies by accessing regions of the “protein universe” that were previously inaccessible.
In summary, speaker highlights the transformative potential of generative AI in protein design and engineering, enabling the exploration and creation of new proteins with desired properties, which could lead to significant advancements in medicine and biotechnology.