Put on your AI Reading Genes

Can AI predict viral mutations before they occur?

February 7, 2021
2 minute read

Viruses mutate all the time, and the Sars-CoV2 virus is no exception. MIT Computational Biologist, Bonnie Berger, has developed an NLP model that can help identify potentially dangerous viral mutations by effectively reading the genetic code of a virus.

What is NLP?

Natural language processing, or NLP, is a subset of artificial intelligence that allows computers to interpret human language. One way these models can work is by relating words based on how similar their meanings are. For example, the words happy and cheerful would be more closely related than happy and angry.

From Sentences to Sequences: How does NLP work on genetic code?

Short answer: it’s all based on analogy.

If you think of the entire genetic sequence of a virus as a really long sentence, the individual components of these sequences can be thought of as words. When a virus mutates, the “words” of its genetic sequence change. Whether or not these mutations will affect the overall behavior of the virus in a profound way (e.g. make the virus more infectious or resistant to treatment) depends on what changes occurred. 

Going back to our previous example, replacing the word happy with cheerful in the sentence I am happy does not change the meaning too drastically, but replacing the word happy with angry makes the sentence take on an entirely different meaning.

The same idea holds true for genetic mutations in viruses: the behavior of the virus will only change if its genetic code mutates in a consequential way. When Berger’s NLP model is given the genetic sequence of a new strain, it is able to read this sequence and find the mutations within it that may affect viral behavior.

Looking Ahead

Although the model is not perfect (it missed a potential vaccine-resistant mutation in a South African COVID variant), this technology is quite promising and has the potential to be applied outside of the COVID pandemic, such as finding drug-resistant cancer mutations.