December 1, 2020

I am obsessed with DeepMind. I have been following their work for years now. It is DeepMind that makes me think the more grandiose claims about AI – it will reach human intelligence in my lifetime; it will transform life in ever more dramatic ways – could be true.

DeepMind, for those of you who don’t know, is a British AI company. It was founded by the endearingly nerdy Demis Hassabis, child chess prodigy and co-creator, as a teenager, of Theme Park, the classic, genre-defining, millions-selling BullFrog game. In 2014, DeepMind was bought by Google for half a billion dollars, and became world-famous two years later, when its game-playing AI AlphaGo defeated the world Go champion, Lee Sedol, four games to one.

Now its latest program, AlphaFold, has made a huge breakthrough in one of the great outstanding challenges of biology, the protein-folding problem. It is a huge deal from a biological point of view; but it is, perhaps, an even bigger deal from the point of view of how the science of the future gets done. And it is also another reminder that although DeepMind’s professed goals are ambitious to the point of being fantastical, it would be a brave punter who bet against them.

First, “protein folding”. Proteins are long molecules that are made as long chains of simpler molecules called amino acids; there are 20 different amino acids, but they can form an infinite number of different sequences.

The human body essentially runs on proteins. To a first approximation, what your DNA does is tell your cells which proteins to make. DNA has four “letters” (adenine, cytosine, guanine, thymine: A, C, G, T). A group of three of those letters is called a “codon”, and tells your cells to make a single amino acid. So, for instance, the codon CTT codes for the amino acid leucine. If you string a lot of them together, you get a protein. The length of DNA that codes for a protein is called a “gene”. It’s only relatively recently that scientists have discovered that your DNA does things other than tell your body what proteins to make.

And proteins do everything. They build your body: large parts of your cells are made out of them. They communicate news around your body: many of your hormones are proteins. Enzymes, proteins again, control and accelerate the chemical reactions in your body. If you took proteins away, about a fifth of your body by mass would disappear, and the rest would immediately stop working.

So understanding proteins is important. But there’s a complicating factor: what a protein does is determined, not by which amino acids are where in its sequence, but by its 3D shape. Imagine a protein as being a bit like a piece of elastic with lots of magnets tied to it. While you hold it tight, the elastic stays straight: but when you let go of it, it coils up, into a complicated ball determined by which magnets attract which. Proteins do the same thing.

And the shape of that complicated ball is vital. For instance, enzymes catalyse chemical reactions because they have a kind of pit on their surface which fits the two molecules in the reaction. If the pit were a different shape, it wouldn’t work.

Working out a protein’s 3D structure, therefore, is important. If you want to make some new drug, there’s a good chance that it will involve a small molecule that fits into a pit on the surface of some protein; if a mutation causes some disease, there’s a good chance that it does so by making a protein form the wrong shape. (Sickle-cell anaemia and cystic fibrosis are both caused by protein misfolding, driven by a single mutation.)

Unfortunately, working out the structure of proteins is really hard. At the moment, scientists take a protein, dissolve it in water, and use that water to form crystals. Then they diffract light through that crystal, to work out the shape. (You may have heard of Rosalind Franklin’s work on X-ray crystallography helping to determine the shape of the DNA molecule; Dorothy Hodgkin, another British scientist, won the 1964 chemistry Nobel for her work on protein crystallography.)

But crystallography is a slow and complicated process. “It could take months or years of a PhD student’s career,” says Rahul Samant, a research group leader at the Babraham life sciences institute in Cambridge. “It’s a bit quicker these days, but we’re still talking about weeks or months to do a single protein, and then months more to analyse it.” There are hundreds of millions of proteins in the world, but the shape is known for only a few hundred thousand.

In 1972, the chemist Christian Anfinsen, in his acceptance speech for that year’s chemistry Nobel prize, suggested an alternative. It should be possible, he said, to predict the 3D shape of a protein just from the sequence of its amino acids – and, therefore, from its DNA code.

But that’s not easy. Imagine that stretchy length of elastic covered in magnets again. And now imagine, looking at the sequence of magnets, trying to work out in advance what shape the elastic would form when it bunched up. Working out that shape from the sequence is known as the “protein folding” problem, and it has proved to be extremely difficult.

So in 1994, the CASP (Critical Assessment of protein Structure Prediction) challenge was set up. Every two years, teams would compete to make the best prediction of various proteins’ shapes from their sequences alone. The shape of the proteins being assessed would be currently unknown but in the process of being researched – which meant that the teams couldn’t cheat, but that their work could be assessed against experimental results.

The CASP programs were assessed on a scale of 0 to 100. If a program scored 100, it would mean that it predicted where every single atom was, correct to within one angstrom – that is, about one atom’s width. If it scored 0, it meant that it was completely wrong. The CASP assessors said that the target was scoring 90 or above, on average, across all the proteins being assessed; 90 is arbitrary, but crystallography experiments can’t do that much better.

From 2006 to 2016, the best program at each competition managed somewhere around 30 or 40 on that scale. Then in 2018, AlphaFold scored almost 60. And, now, it has achieved 92, including over 87 on the very hardest proteins.

This might sound a bit blah. They scored X, now they scored a bit more than X. But it is a huge advance. “There are still a lot of questions to ask,” says Ewan Birney, deputy director-general of the European Molecular Biology Laboratory (EMBL). The scientific community will want to “kick it around a bit” – although, he says, the CASP assessment is incredibly rigorous, so it’s almost certainly accurate. And there are further layers to the problem – proteins that don’t form globular shapes; proteins that change shape. “But that shouldn’t diminish what they’ve done. This is a 50-year-old problem, and the AlphaFold team has made a real massive change, a phase change.”

The implications for biological science are obvious. If you can predict the shape of a protein in a few hours rather than a few months – and the AlphaGo program runs on relatively modest resources, by supercomputer standards, in “a couple of days” – you can uncover potential targets for drug discovery much more quickly. Samant points out that you would still need to check whether your predictions are accurate, but it’s much, much easier to use crystallography to see whether a protein is the shape you think it is than to work out the shape from scratch.

That’s the near term. In the longer term, you can understand how the body works in far greater detail. “It’s not that AlphaFold suddenly understands how a human works,” says Birney, “but the tide has gone up by a massive level.” Professor Dame Janet Thornton, a pioneer of protein research who has been working on the folding problem for nearly 50 years, told a briefing held by the Science Media Centre ahead of the announcements that possible future applications could be designing enzymes that consume plastic waste, or that suck carbon out of the atmosphere, or improve crop yields.

Similarly, it could help us understand diseases like Alzheimer’s, which seems to be something to do with protein misfolding, as is Parkinson’s. Protein misfolding appears to play a role in the development of some cancers. And DeepMind hopes it will have a role in future pandemic responses: AlphaFold was able to predict the shape of a protein, ORF3a, in SARS-Cov2, as well as other coronavirus proteins. Understanding the shape should help make the discovery of future drugs and vaccines quicker. It is a sudden window into areas of basic science which were simply not visible before.

What fascinates me, though, is the AI angle. Birney made an interesting point when we spoke: that when DeepMind started out, they set out to make a single program that could play lots of Atari games. “People said, ‘you’re just playing silly games.’” Then they made a program that could become superhuman at Go, a fantastically deep and complex game, but nonetheless a game. Then they used essentially the same program to become enormously superhuman at chess. Now, a similar architecture – as far as I can tell, at least – can solve real scientific questions.

Hassabis, in the SMC briefing, compared the AlphaGo and AlphaFold breakthroughs by saying that the two both relied on something like human insight. With chess, there are something like 35 possible moves per turn, so to look ahead two moves you need to examine 35 x 35 moves (1,225); to look ahead five moves, it’s 52 million. With a powerful computer, you can do this kind of “brute-force” computing for a few moves, although chess programs still need to be intelligent as well as powerful.

But with Go, there are something like 200 possible moves per turn, and brute force is much less useful. Human Go players rely much more on intuition than human chess players do – this board position feels strong, in some wordless and ill-defined way; this board position feels weak. AlphaGo developed some sort of equivalent to this insight; it worked out what board positions and moves felt strong, with some kind of high-level pattern recognition, from playing hundreds of millions of games against itself.

It seems to have done something similar with protein folding. Again, computationally, it’s impossible to calculate every possibility; it’s too complex. But humans have turned out to be quite good at using their intuition to determine how proteins fold: some people became extremely good at the online computer game FoldIt, in which players tried to work out the shape of a protein from its sequence. There seem to be deep patterns that humans can pick up on, and AlphaFold can pick up on rather better. It learned this intuition by training on 170,000 known proteins and their sequences, in the same way that AlphaGo learnt from millions of games.

Birney points out that AlphaGo started to play in ways that no human would play, but in which Go champions could then see the logic and beauty, so they could learn from it. Similarly, he says, the AlphaFold deep learning system “has come up with insights that were not obvious to humans”. And unlike Go, which is a closed, human-designed system, protein folding “is a game where the universe sets the rules”.

This is scientific creativity: it is spotting patterns in the universe, working out how things are connected. It is, I think it is fair to say, a computer that is doing science. Whether it’s the first time that’s happened is obviously a question of definitions, and I’d probably say that it isn’t: big data and AI have been used to come up with hypotheses for some years now. But it’s another strike against anyone who says that computers can’t have “true intelligence” or “creativity” or whatever.

And this is why I’m obsessed with DeepMind. In the briefing, Hassabis casually dropped in that DeepMind’s “ultimate vision has been to build general AI, and to use it to help us better understand the world around us by accelerating the pace of scientific discovery”. General AI, for those who don’t know the terminology, is an AI that can do any intellectual task that a human can do, as opposed to “narrow” AIs which can, for instance, play chess, but couldn’t then do your taxes. It’s the AI of sci-fi: AI that can hold a conversation and sort your calendar and plan a satellite launch and balance the defence budget.

Most AI researchers instinctively shy away from big talk like that; Hassabis and DeepMind explicitly went for it, from Day 1. Lots of people worry about general AI destroying the world; most AI researchers sort of pretend that that couldn’t happen. But I once spoke to someone at DeepMind who simply said “Sure, that could happen. But we’ll make sure it doesn’t. We’ll make general AI, and it will be awesome.”

They haven’t achieved general AI with AlphaGo. But given that their system, with what seem (to my inexpert understanding) to be relatively minor changes, can become massively superhuman at chess, Go, StarCraft II, Atari games and now at the protein-folding problem, it is becoming increasingly inaccurate to refer to it as “narrow”, as well.

It’ll be interesting, now, to see how DeepMind use this technology — presumably they’ll want to monetise it, and drug discoveries are (as we’ve seen recently) lucrative things. But it’ll be even more interesting to see whether this is just the start of an era of computers doing science. I’m obsessed with DeepMind because they might actually bring about the AI future they promise. The technological singularity and Theme Park: it’s quite the legacy to leave the world.