Pathogens, Poetry, and Predicting Evolution
What if we were fluent in the language of biology? Brian Hie is on a mission to translate life’s code, predict viral evolution, and revolutionize medicine.
Story tags:
Imagine a genome as a vast, ancient text: each gene a sentence, each protein a carefully crafted paragraph. Evolution has revised this manuscript countless times—adding footnotes, deleting words, and creating entirely new chapters. Just when you think you’ve grasped a sentence, a few of the letters change.
This ever-changing puzzle is what researchers face as they race to develop vaccines and treatments for rapidly evolving viruses like COVID-19, Ebola, and HIV. It takes a scientist who can tackle problems from multiple angles to stay one step ahead of evolution. This is where Brian Hie steps in.
As an assistant professor of chemical engineering and the Dieter Schwarz Foundation Stanford Data Science Faculty Fellow, Hie is developing tools to outsmart deadly pathogens before they strike. With a background that spans both data science and the humanities, he approaches his work with an eye for patterns and connections across disciplines, developing tools that could help us predict viral mutations and accelerate the creation of life-saving medicines.
To succeed in this kind of work, a researcher must master not only algorithms but also biology and chemistry to apply the right tools to the right problems. This ability to bridge disciplines defines Hie’s approach—and is central to Stanford Data Science’s mission.
Stanford Data Science (SDS) brings together researchers from across the university to tackle critical challenges, from public health to sustainability, by developing and applying advanced data science methods. In a time when data science expertise is crucial across so many fields, SDS serves as a unique hub, where experts in areas like engineering, medicine, and the humanities collaborate to advance both fundamental data science research and its applications. By recruiting and supporting researchers like Hie who work at the intersection of fields, SDS empowers Stanford to push the boundaries of what’s possible and remain at the forefront of interdisciplinary innovation.
Chris Mentzel, executive director of Stanford Data Science, believes Hie’s work will reshape medicine and engineering for decades to come. “The lines between fields are blurring, and when you bring in people who are comfortable working across those boundaries, that’s where a lot of new science originates,” Mentzel says. “In 20 years, we might all be using drugs created from the discoveries Brian Hie is making today.”
”“In an alternate reality, I’m a Renaissance scholar.”Brian Hie
Combining mathematics and metaphors
Well before joining the Stanford faculty, the Farm shaped Hie’s intellectual trajectory—first as an undergraduate (class of 2016) and later as a Stanford Science Fellow. Both experiences encouraged expansive thinking, cultivating the intellectual flexibility that now characterizes his work.
Hie recalls an undergraduate semester spent navigating back-to-back courses on convex optimization and Baroque literature. One moment, he was lost in matrix multiplication; the next, immersed in Don Quixote’s winding adventures. For most, the leap between mathematical precision and Baroque complexity might seem jarring. “That context switch was what I enjoyed most,” he remembers.
Even then, Hie appreciated how Stanford encouraged him to explore the university’s broad intellectual landscape. Yet, that same freedom left his next step after graduation unclear. Majoring in computer science and minoring in English literature, he applied to doctoral programs in both fields, hoping the admissions committees would decide for him. No such luck—he was accepted to MIT for computer science and Harvard for English literature.
“In an alternate reality, I’m a Renaissance scholar,” Hie laughs.
Though he chose to pursue computer science at MIT, he continued to nurture his passion for literature, taking courses at Harvard on English Renaissance poetry.
Engineering proteins to outsmart pandemics
Hie wanted to use his skills to develop better treatments for disease, but he faced one significant challenge: To have the greatest impact, he would need to pair his data science expertise with a knowledge of biochemistry that he did not yet possess. So, after completing his PhD, he returned to Stanford as a Stanford Science Fellow—a three-year award for early-career scientists pursuing innovative research. Thanks to the fellowship’s flexible funding, he was able to address his knowledge gap by becoming a practicing biochemist in the lab of Peter Kim, the Virginia and D.K. Ludwig Professor of Biochemistry at the Stanford School of Medicine.
It’s a rare scientist who can start from scratch in another field—immediately after completing a PhD. “I was learning from undergraduates at first,” Hie recalls. “But I didn’t mind that—it was a different kind of training, and I think it’s the kind of opportunity that’s unique to the Stanford ecosystem.”
In an experimental science like biochemistry, researchers must conduct hands-on lab work, design and run experiments, and collect data firsthand—a skill set far removed from computer modeling and data analysis. “Brian came to my research group with tremendous expertise in computer science but also determined to become an experimentalist,” says Kim. “After a couple of years he became an outstanding, bona fide experimentalist.”
Xiaojing Gao, assistant professor of chemical engineering at Stanford, highlights how uncommon it is for a data scientist to conduct biochemistry experiments and gather data. Hie, Gao says, learned to do this because he wasn’t content with being “the software engineer that swoops into an experimental field just to analyze data.” Instead, he learned how to read molecular data through the eyes of a biochemist.
”In 20 years, we might all be using drugs created from the discoveries Brian Hie is making today.Chris Mentzel
Working with Kim, Hie set out to improve how antibodies bind to viruses. Using machine learning algorithms originally developed to predict human language, Hie trained models on large datasets of viral sequences. Like human languages, viral sequences encode complex meanings and follow intricate rules. When a virus mutates to dodge the immune system, Hie describes it as changing its meaning—so the immune system no longer recognizes it—while still obeying the grammar of biological rules that enable it to function as a virus.
It may seem surprising that the same kind of model helping you write emails can predict pathogen evolution, but Hie explains that genetic code is no more difficult for a model to learn than human language.
However, unlike human languages, where we have fluency, we are still in the early stages of learning genetic code. “Right now, our tools are limited, and we’re trying to ‘write’ biology with only a few letters of the alphabet,” says Hie. “We’re like toddlers learning to speak, but we want to gain full fluency in this language.”
Hie and Kim faced the common challenge of having limited data, which often hampers machine learning in molecular biology. To overcome this, they used a technique called “transfer learning.” First, they trained their model on a task with abundant data—predicting protein sequences that would bind to a known structure—then transferred that knowledge to the data-poor task of predicting mutations to improve antibody binding.
How does this approach differ from conventional methods of predicting viral evolution? Traditional methods rely on a slow, guess-and-check process of mutating specific amino acids (the building blocks of proteins) until a sequence improves protein function. But with an almost infinite number of possible amino acid sequences, this is like searching for a needle in a haystack the size of the universe—expensive and often impossible.
Hie and Kim tested their new model in the lab by attempting to improve a COVID-19 antibody that had been discontinued due to poor performance against the Omicron variant. Previous attempts to optimize this antibody with machine learning had only led to a modest two-fold improvement, convincing many that it wasn’t worth further effort. Hie’s model, however, didn’t just succeed—it led to a 25-fold improvement against Omicron.
Since then, other scientists have been eager to apply Hie’s model to their own research. Gao has used it to improve a protein with potential therapeutic applications for genetic disorders. While many computational protein models have high technical barriers that limit their accessibility, Hie’s is notably user-friendly. “All it takes is to run a few lines of code,” Gao says.
Modeling evolution on an epic scale
For Hie, who had formative experiences at Stanford as both an undergraduate and a Stanford Science Fellow, the decision to join the faculty was an easy one.
“Stanford has experts in everything from computer science to biology to Renaissance literature. And Stanford encourages collaboration—it’s part of the culture,” says Hie. “It’s intellectually rich and diverse but also deeply specialized.”
As one of SDS’s first joint hires with the School of Engineering, Hie now leads the Evolutionary Design Lab, where he continues developing tools to predict and respond to pathogen evolution. This joint appointment allows him to pursue interdisciplinary research without being confined to the traditional silos of data scientist or chemical engineer, enabling him to blend approaches from both fields to address complex biological challenges.
Most breakthroughs at the intersection of machine learning and biology have focused on individual molecules, such as proteins. But as Hie points out, the complexity of a single cell is orders of magnitude greater than that of any one molecule. With this in mind, he developed Evo, a genomic-scale foundation model that goes beyond proteins to include DNA, RNA, and systems where all three interact.
“Most biological functions involve multiple molecules working together, so we’re trying to model that complexity,” explains Hie.
“The lines between fields are blurring,and when you bring in people who are comfortable working across those boundaries, that’s where a lot of new science originates.”Chris Mentzel
While there is more work ahead, Evo holds the potential to revolutionize our understanding of diseases, enable personalized medicine tailored to individual genomes, directly target cancer cells, and accelerate the discovery of new drugs.
“If we could design complex molecular systems, we could reprogram cells to restore them to a healthy state rather than treating single proteins in isolation, which is a bit like playing Whac-A-Mole,” Hie says. “We could engineer biological systems to capture carbon more efficiently. There are also applications in biomanufacturing, like producing drugs, artificial blood, or breast milk.”
Hie, for his part, is characteristically droll about the breadth of his work. “A lot of things are made of molecules,” he notes.
To help scientists create stronger, more effective medicines in response to evolving pathogens, Hie has made his models and their code freely available—and he’s already begun to see the reproducible benefits of his approach.
“I’ve given talks at pharmaceutical companies, and sometimes a researcher will come up to me and say, ‘We used your approach to improve our antibody, and it worked,’ which is really gratifying,” he says.
In the long term, Hie’s goal is simple: “The potential to help people is what drives me.” But on a day-to-day basis, he’s fueled by something else: “It’s really fun,” he says, explaining that his ideal day involves switching between machine learning challenges, engineering problems, wet lab experiments, and statistical analyses. And if there’s time for one more challenge, he’ll read a bit of Renaissance poetry.