Home » Behind the AlphaFold2 explosion, why humans have to die of protein-scientific exploration

Behind the AlphaFold2 explosion, why humans have to die of protein-scientific exploration

by admin

Recently, there was an explosive news in the field of life sciences. DeepMind’s Al phaFold2 model predicted all 98.5% of human proteins, and made a data set free and open source for use by people in the scientific research circle. The open data set includes not only the human proteome, but also the proteome data of 20 organisms commonly used in scientific research, such as Escherichia coli, Drosophila, and mice, with a total of more than 350,000 protein structures.

The goal of the Al phaFold2 model is to provide predicted structures for all proteins with known sequences. Deepmind plans to increase the predicted number to 130 million by the end of the year, and this number has reached half of the total number of known human proteins.

Because of this news, the scientific research circle has been fried, and things that usually take months and years to complete can only be done in a few days. Everyone admires this epoch-making moment. DeepMind co-founder and CEO Demis Hassabis said: “I think this is the culmination of DeepMind’s entire 10-year life cycle.” For researchers, with abundant protein structure data, the next step of research will be smooth.

Insiders watch the doorway, laymen watch the excitement. Regarding the collective orgasm of academic circles, laymen have a big question mark in their hearts. What is the use of predicting so many protein structures? Why do humans keep struggling with protein?

The significance of studying protein

Before answering this question, I have to mention the most important central principle in the field of life sciences: genetic information is transcribed from DNA→RNA→protein among biological macromolecules in cells.

How to understand it? In the process of biological reproduction from generation to generation, organisms will pass the genetic material DNA molecules they carry to their offspring through replication, and in the process of each generation of organisms from life to death, this set of DNA molecules uses itself as a blueprint to guide production A large number of protein molecules perform all the functions that support the survival and activities of organisms.

One end of the central law is DNA, and the other end is protein. DNA can be regarded as the design drawings produced in the factory, and proteins are parts with various functions. Some of the proteins produced go to participate in various biochemical reactions in the body. For example, various enzymes are involved in the digestion of food, some actively transport nutrients in the blood (hemoglobin), some act as messengers to transmit signals between cells, and some act as guards and participate in the immune battle of organisms. Different proteins designed, assembled and delivered can allow life activities such as genetics, development, reproduction, and metabolism to function normally.

The systematic and in-depth study of proteins allows us to interpret the composition and operation of life bodies from a deeper level, and then fully reveal the mechanism of life operation and development, and stimulate the development of biological sciences, drug research and development, and synthetic biology.

See also  "Black Adam" hits Dwayne Johnson's personal best box office score - People - cnBeta.COM

We had a brief understanding of protein in middle school. Protein is an important basic substance of biological composition, composed of various amino acids. The difference in their arrangement and position makes it extremely diverse and complex in structure. The spatial structure and function of each protein are quite different. Also because of the spatial structure of the protein, different folding methods make the activity and biological properties of the protein uncertain, and this complex characteristic is destined to be difficult to study the path of the protein.

The twists and turns of protein research

Protein was discovered by French chemists as early as the 18th century, but due to technical limitations, it was not until the beginning of the 20th century that scientists could study proteins in depth based on some techniques. Because of the complexity and variety of protein structures, the process of research and understanding is extremely time-consuming and laborious.

For the early biochemists, the difficulty in studying protein was that it was difficult to obtain a large amount of purified protein for research. Therefore, the early research work was on the road to various purified proteins. Later, in 1950, a biological company purified ribonuclease a from bovine pancreas and provided it to scientists for free, and a large number of experiments by scientists gradually opened.

In 1949, British biochemist Sanger used 8 years to test the sequence of 51 amino acids of insulin (protein) and verified that protein is a linear polymer formed by amino acids. Sanger was awarded the 1958 Nobel Prize in Chemistry for this research. People used Sanger’s method to quickly sequence many other proteins. Sanger’s research paved the way for the first artificial synthesis of insulin in 1965.

Humans first learned about the molecular structure of proteins in 1959. The British scientist Max Perutz used X-ray diffraction to estimate the position of electrons based on the angle at which the rays were scattered and analyzed the three-dimensional structure of myoglobin molecules. Since then, X-rays Diffraction has become the most powerful tool for analyzing high-resolution protein structures. In addition to X-ray diffraction, later research tools commonly used by scientists are nuclear magnetic resonance and cryo-electron microscopy.

Although there are equipment-assisted research, the limitations of the actual test technology are too costly to implement. According to traditional experimental procedures, from gene sequence to corresponding protein structure determination, gene expression, protein extraction and purification, crystallization, and crystallization are required. X-ray diffraction analysis and other steps. Due to the diversity of protein structures and properties, most of these steps have no fixed rules to follow.

Historically, it took several decades for scientists to obtain a clear three-dimensional structure of a protein. The determination of the three-dimensional structure of a protein has become a very difficult study in the field of biology. So far, without the assistance of AI technology, only 170,000 three-dimensional structures have been seen, which is as big as a drop in the amount of protein compared to the total amount of protein.

See also  four flights diverted to Trapani Birgi

For the structure of a protein, even if we can see and measure its shape, there are 10^300 ways in the three-dimensional space about its folding direction. Why choose to fold to the current state? This process and the selection The path cannot be resolved. Because the method and content of the research are extremely difficult, there is really only one way to study the structure and characterization of proteins. For more than half a century, as long as there are new discoveries in the research of protein structure, the Nobel Prize will be honored. So far, only the protein field has won more than 20 Nobel Prizes.

There are also a group of scientists who have jumped out of the technical thinking dilemma of naked-eye observation, and circumvented the laborious and costly experimental steps of traditional techniques, and directly calculated and predicted their three-dimensional structure from the amino acid sequence of proteins.

R&D on the shoulders of AI giants

The prerequisite for realizing the prediction of protein structure from amino acids is the development of computer technology. In 1998, Professor David Baker of the University of Washington developed a computer program called “Rosetta” (Rosetta Stone) to predict protein structure. However, because of the limited computing power, it cannot be exhausted violently. Therefore, in the early prediction, it is mainly used to deal with proteins with a small number of amino acids and a relatively regular arrangement. I can only sigh for the complex protein.

In order to obtain an objective assessment of the level of protein structure prediction technology, a group of scientists led by John Moult of the University of Maryland founded CASP (Critical Assessment of Structure Prediction) in 1994, where predictors can evaluate their methods in a double-blind framework , To promote research, monitor progress, and establish the latest level of protein structure prediction.

Thanks to the development of convolutional neural networks, Deepmind’s research shines in the 14th CASP competition. The team uses a neural network based on the attention mechanism and relies on end-to-end optimization of the overall construction structure, with a large number of built-in sequences, With multiple comparison information such as structure and metagenomics, its predicted median GDT-TS reached 92.4 points, far higher than the second place. What level is this? It is reported that the GDT-TS score is around 70 points, indicating that its results have accurate global and local topology models. Over 80 points, the modeling of structural details becomes more and more accurate, and over 95 points, the model is as accurate as a model based on experimental data.

As an auxiliary method to predict protein structure, artificial intelligence technology uses violent learning to shorten the time that scientists would have needed several decades to predict to a few days, and for simple protein molecules, the structure prediction has been very accurate. And such an outcome will make scientists turn around and devote themselves to the study of in-depth understanding of the mechanism of the protein itself.

See also  Tencent's 1.7 billion acquisition of Japan's Kadokawa shares will strengthen cooperation between the two parties in the field of animation and IP-Tencent Tencent

Throughout the history of science, every time a scientist makes a major progress in his field, it is inseparable from the support of the technology at that time. Whether in the difficult era of protein purification or the era of cold electron microscopy technology for observing proteins, the tools used by scientists depend on the highest level of technology at the time. In the AI ​​era, we have witnessed the historical moment of protein structure prediction due to the tremendous improvement in computing power and algorithm models.

The AlphaFold2 database is now open source and is constantly adding new protein structure predictions. This has also become a treasure database for scientists to conduct protein research. But calculating the structure is only the preliminary stage in the field of biological sciences, pointing out the direction and subsequent progress will require experimentation and brainstorming. For other protein structures that have not been trained on the existing protein structure data sets, the research is still a mystery, which also leaves a lot of research space for scientists.

However, high-precision models such as AlphaFold2 in general have greatly promoted the research and development of scientists, and expanded the scope of functional analysis of proteins and downstream applications. Scientists have been able to carry out pioneering research in various fields, such as some The research on cancer and viral infections, the development of antibiotics and targeted drugs, and the research and development of new efficient enzymes contribute to health and environmental protection.

Standing on the shoulders of the technological giants of neural networks and deep learning, the development of life sciences has made a qualitative leap, and AI’s protein predictions no longer rely on human prior knowledge to make structural predictions, compared to a few years ago The sensational AlphaGo and AlphaFold also let deep learning and neural networks show off their muscles. Scientific innovation is inseparable from the powerful assistance of technical tools, and protein, a molecule that can affect the process of life, has opened the door for us to study it. These massive amounts of protein structure information are released by technology, and the interpretation and analysis behind it may contain Password for vital information. The next revolutionary research results in the field of life sciences will explode what we cannot imagine. To experience the joy of opening a blind box in life science research is also a surprise that I have never thought of, and look forward to the next future.

.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Privacy & Cookies Policy