Immune Evasion Virus Catches as a Language Model-Sciencetimes

One reason it is difficult to develop an effective vaccine against some viruses, such as the flu and acquired immunodeficiency syndrome (HIV), is that these viruses mutate very quickly.

Viruses that have mutated can avoid antibodies produced by certain vaccines through a process called’viral escape’.

A virus suspected of having a variant of the COVID-19 vaccine, which has been vaccinated a while ago, appeared again in various places around the world. This has led some to question the effectiveness of the vaccine. In fact, there are many strains of influenza virus, and new vaccines are created and inoculated each year in anticipation of new strains.

Recently, the Massachusetts Institute of Technology (MIT) team in the United States devised a method of computationally modeling’virus escape’ based on a language analysis model, and it is noteworthy whether it will be possible to prepare a new turning point in the manufacture of viruses and cancer vaccines.

This model can predict which parts of the viral surface protein have a higher probability of mutation that allows viral escape, so in theory, effective vaccines can be developed targeting regions with a lower probability of mutation.

It is expected that the Massachusetts Institute of Technology (MIT) team in the United States will devise a method for computationally modeling’virus escape’ to help develop a vaccine that is effective against flu and HIV, including COVID-19. © MIT News

“Virus escape is a big problem”

“Virus escape is a big problem,” said Professor Bonnie Berger, Chair Professor of Mathematics and Computer Science and Artificial Intelligence Lab at MIT. “Virus escape is a big problem. It is because there is no universal flu vaccine that works well with various flu viruses due to’viral escape’ due to the mutation of the HIV envelope surface protein, and there is no HIV vaccine.”

In this situation, Professor Berger’s team identified possible vaccine targets that could fight flu and mutations in SARS-CoV-2 that cause HIV and COVID-19, and published in the scientific journal’Science’ on the 15th.

After the paper was approved for publication, the research team applied and analyzed the model to new SARS-CoV-2 strains that recently appeared in the UK and South Africa. This analysis paper has not yet completed expert screening, but marked the viral genetic sequence that needs further investigation as to whether SARS-CoV-2 has the potential to evade the currently vaccinated vaccine.

In this study, Professor Burger and Assistant Professor Brian Bryson of the Department of Biological Engineering participated as senior authors of the thesis, and Brian Hie, PhD Fellow of the Department of Computer Science, participated as the first author of the paper.

The flu, HIV, and Corona-19 virus, which were the subjects of this study, cause many deaths every year. Picture captures video. © AI & Health at MIT

language pattern prediction Model analysis As a model uses

Different types of viruses acquire genetic mutations at different rates, and HIV and flu viruses are known to be among the fastest mutating classes.

In order to analyze these different types of virus escape, the research team selected one type of computing model among language models in the field of natural language processing (NLP), and modeled by setting two criteria. One criterion is that’if mutations are to promote viral escape, the shape of the virus surface protein must be changed so that the antibody can no longer bind’ and’the protein function must function properly even if the shape is changed’.

The model was originally designed to analyze patterns in language, especially the frequency with which certain words appear together.

For example, this model is “Sally ate eggs for… Can predict words that can be used to complete sentences such as “”. The word you choose should be grammatically correct and have the correct meaning, and the NLP model says that in this example,’breakfast’ or’lunch’ can be predicted as when eating an egg.

In this case,’grammar’ is similar to’rules’ that determine whether a protein encoded by a specific sequence is functional, and semantically, it corresponds to whether a protein can take a new form that can evade antibodies. do.

The team has shown insight that this kind of model could also be applied to biological information such as genetic sequences.

The research team revealed that the core of this study is to investigate grammatical mutations whose meanings change significantly in’constrained semantic change search’. Picture captures video. © AI & Health at MIT

easily get Number there is heredity order Information only need

“If the virus tries to escape the human immune system, he doesn’t want to kill itself or cause mutations that cannot be replicated,” Hie said. “To maintain survival suitability, but disguise it so that the human immune system cannot detect it. I would like it.”

To model this process, the team trained an NLP model to analyze patterns found in genetic sequences. And through this, it was possible to predict new genetic sequences that have new functions but still follow the biological rules of protein structure.

One important advantage of this modeling is that it only requires genetic sequence information, which is much easier to obtain than protein structure. It also has the advantage of being able to train with a relatively small amount of information. The research team used 60,000 HIV sequences, 45,000 flu sequences, and 4,000 coronavirus sequences in this study.

“Language models are very powerful because they can learn this complex distribution structure and gain insight into their function from genetic sequence variations,” Hie said. “We have a large amount of viral sequence data for each amino acid position. “There is, and the model learns the properties of the co-occurrence and co-variation of amino acids across the training data.”

The research team adopted the grammar and semantics of the language analysis model at the same time in the development of modeling to block’virus escape’. Picture captures video. © AI & Health at MIT

virus escapeof block

Once the model was trained, the researchers used the model to predict the coronavirus spike protein, HIV envelope protein, and influenza hemagglutinin (HA) protein sequence, which are likely to produce escape mutations.

In the case of flu, the model found that the sequence with the lowest chance of mutation and viral escape was in the HA protein stem. This is consistent with recent research that antibodies targeting the HA stem can provide nearly universal protection against all flu strains, the researchers said.

Coronavirus model analysis showed that some of the spike proteins called the S2 subunit were the least likely to produce escape mutations.

A paper published in the January 15th edition of the scientific journal’Science’. © AAAS / Science

cancer In the vaccine Target discrimination Research medium

Currently, the question remains as to how quickly the SARS-CoV-2 virus will mutate. Therefore, it is unclear how long the effectiveness of the currently distributed COVID-19 vaccine will last. Early evidence suggests that the SARS-CoV-2 virus is unlikely to mutate as quickly as the flu virus or HIV.

Nevertheless, the researchers believe that new mutations that have recently appeared in Singapore, South Africa and Malaysia should be identified and the potential for immune escape of these viruses should be investigated. In a study on HIV, the researchers found a sequence with a low probability of escape, along with many possible escape mutations in the V1-V2 hypervariable region of the envelope protein, consistent with previous findings.

They are currently working with other researchers to use their models to identify possible targets in cancer vaccines that destroy tumors by stimulating the body’s immune system.

The team also said that their model could be used to design small molecule drugs that are less likely to develop resistance in diseases like tuberculosis.

Professor Bryson said, “There are many opportunities to use models, and the future is bright as we can easily generate the necessary sequence data.”

(13)

Source