Structure biology will continue to use artificial intelligence

  • Nature issued an editorial on July 27, 2021, reflecting on recent topical reporting on AlphaFold.

     

    Our understanding of protein folding will be altered as a result of machine learning. It is important that all data be made available to the public.

     

    "I never imagined we'd make it this far in my lifetime." So said a structural biology expert in response to AlphaFold's results.

     

    Artificial intelligence (AI) was used in this work to predict the structures of over 20,000 human proteins, as well as the structures of almost all known proteins generated by 20 model organisms, including E. coli, Drosophila, yeast, soybean, and Asian rice. That's roughly 365,000 structures projected.

     

    Researchers from DeepMind, a London-based artificial intelligence company owned by Google parent company Alphabet, and the European Bioinformatics Institute at the European Molecular Biology Laboratory (EBI-EMBL) near Cambridge, UK, released the data online on July 22 (https://alphafold.ebi.ac.uk).

     

    AlphaFold is a machine learning tool developed by the DeepMind team, which was trained using DNA sequences (including their evolutionary history) and the known structures of thousands of proteins in the EBI-EMBL protein database that is open to the public. DeepMind also published the source code for AlphaFold and comprehensive instructions on how it was developed a week ago, while academics from the University of Washington in Seattle disclosed another protein structure prediction algorithm (called RoseTTAFold, inspired by AlphaFold).

     

    If the data and techniques were not openly and freely available, the publication of this catalog of anticipated structures would not be as exciting. AlphaFold has been used by structural biologists and other researchers to create more accurate models of proteins that are difficult or impossible to examine using existing experimental methods.

     

    Speed up structure forecasting

     

    Since the discovery of the structure of DNA in 1953, one of the unresolved big problems in biology has been predicting the 3D structure into which proteins fold. Prior to artificial intelligence, structure prediction based on the sequence was a time-consuming, labor-intensive procedure with little guarantee of accuracy. AI techniques can anticipate protein structures precisely in minutes to hours. In comparison, determining the structure of one or two proteins used to take months, if not years. This offers up new possibilities for applications, such as the creation of enzymes that break down contaminants like microplastics in the environment.

     

    This achievement was made possible by improvements in basic research and technology, as well as the sharing of open data. Parallel methods to understanding the science of protein folding have been investigated by structural biologists since the 1960s. Understanding the underlying physical forces has been one method of piecing together the structure of proteins. Another method takes advantage of the evolutionary history of an organism to anticipate its form by comparing it to closely similar proteins. Imaging methods, such as X-ray crystallography and cryo-electron microscopy, have become more significant.

     

    Some have likened the current breakthroughs to the first draft of the human genome sequence 20 years ago in terms of importance. It is possible to make comparisons. Both the Human Genome Project and DeepMind's library of anticipated human protein structures offer a tool that will definitely speed up discoveries in their respective disciplines.

     

    The draft human genome is the result of a competition. Solving the protein folding problem also benefits from a competition: an annual event called the Critical Assessment of Structure Prediction, or CASP, which is crucial to achieving results.

     

    The journal Nature interviewed nearly a dozen researchers in the field. The consensus is that it is too early to predict exactly what impact the use of AI in the life sciences will have, unless any impact will be transformative.

     

    To accurately anticipate how AI will affect biology, we need adequate training data, which we now lack. AI gives a view into research organization and management models in addition to research and data (which universities should be studying).