The transformative advances of the post-genomic decades are revealing nothing less than a new biology: an extraordinary and fresh picture of how life works.
What do we now know about how life works that might lead us to a more fruitful destination?
The temptation is to throw up one’s hands and conclude that, for humans at least, how life works surpasses all understanding.
The best metaphors for thinking about how life works come not from our technologies but from life itselfAnd there you have it: under cover of being neutral tools for communication, metaphors smuggle in ideological freight.
There’s virtue in that picture, but I think it points to a wider consideration: that the best narratives and metaphors for thinking about how life works come not from our technologies (machines, computers) but from life itself.
You could be forgiven for thinking that the turn of the millennium was a golden age for the life sciences. After the halcyon days of the 1950s and ’60s when the structure of DNA, the true nature of genes and the genetic code itself were discovered, the Human Genome Project, launched in 1990 and culminating with a preliminary announcement of the entire genome sequence in 2000, looked like – and was presented as – a comparably dramatic leap forward in our understanding of the basis of life itself. As Bill Clinton put it when the draft sequence was unveiled: ‘Today we are learning the language in which God created life.’ Portentous stuff.
The genome sequence reveals the order in which the chemical building blocks (of which there are four distinct types) that make up our DNA are arranged along the molecule’s double-helical strands. Our genomes each have around 3 billion of these ‘letters’; reading them all is a tremendous challenge, but the Human Genome Project (HGP) transformed genome sequencing within the space of a couple of decades from a very slow and expensive procedure into something you can get done by mail order for the price of a meal for two. Since that first sequence was unveiled in 2000, hundreds of thousands of human genomes have now been decoded, giving an indication of the person-to-person variation in sequence. This information has provided a vital resource for biomedicine, enabling us, for example, to identify which parts of the genome correlate with which diseases and traits. And all that investment in gene-sequencing technology was more than justified merely by its use for studying and tracking the SARS-CoV-2 virus during the COVID-19 pandemic.
Nonetheless, as with the Apollo Moon landings – with which the HGP has been routinely compared – the decades that followed the initial triumph have seemed something of an anticlimax. For all its practical value, sequencing in itself offers little advance in understanding how the genome – or life itself – works. As the veteran molecular biologist Sydney Brenner wrote in 2010, the comparison with the Apollo programme turns out to be ‘literally correct’:
because sending a man to the moon is easy; it’s getting him back that is difficult and expensive. Today the human genome sequence is, so to speak, stranded on a metaphorical moon and it is our task to bring it back to Earth and give it the life it deserves.
That task hasn’t turned out as expected. The copious genome databases haven’t yet produced the flood of new treatments and drugs that some had predicted from gene-based medicine, nor delivered on the promise of therapies tuned to our own individual genomes. Despite the COVID-19 vaccines, drug development as a whole has stagnated or even slowed over recent decades, becoming ever more costly. And most drugs are still found by old-fashioned trial and error, not by leveraging genetic data. The outcomes have been particularly disappointing for understanding and treating cancer, long thought to arise from changes (mutations) in the sequences in our DNA that are either inherited or accumulated through age and environmental wear and tear. Despite the genetic data glut, biology seems to have settled back into a long, slow slog.
But I think this story is wrong. Fixing life remains difficult – but, in terms of understanding it, the course of cell and molecular biology over the past several decades isn’t a tale of unfulfilled promise. On the contrary, we’re in one of the most exciting periods since James Watson and Francis Crick discovered DNA’s double helix in 1953. The transformative advances of the post-genomic decades are revealing nothing less than a new biology: an extraordinary and fresh picture of how life works. And ironically, those advances turn out to undermine the skewed view of life on which the HGP itself was predicated, in which the genome sequence of DNA was (in the words Watson put into Crick’s mouth) the ‘secret of life’.
If that’s so, why haven’t we heard more about it? Why hasn’t it been trumpeted and celebrated as loudly as the HGP was? Part of the reason is that science is inherently and necessarily conservative: slow and reluctant to change its narratives and metaphors, not least because we have all (scientists and public alike) got accustomed to the old ones. And we have yet to find compelling new stories to replace them. Talk of a genetic blueprint, of selfish genes, of instruction books and digital codes gave us a narrative we could grasp. Even though we now know this to be at best a partial and at worst a misleading picture, it’s likely to remain in place until there is something better on offer.
The need for a new narrative isn’t just about communicating science; it also impacts how science is done. In 2013, the cancer biologist Michael Yaffe bemoaned the paucity of clinical advances that have come from a search for cancer-linked genes. We sought those genes, he suggested, not because we knew they were the key to developing new treatments so much as because we had the techniques for looking: ‘Like data junkies, we continue to look to genome sequencing when the really clinically useful information may lie someplace else.’ But then, where? What do we now know about how life works that might lead us to a more fruitful destination?
The conventional narrative in biology – the one that gets taught at school – goes like this. Our DNA contains lots of genes, which are segments of that molecule for which the sequence encodes a corresponding building-block sequence of proteins, which are chains of amino acids. (The genetic code specifies the translation between DNA sequence and protein sequence.) The genes are read out by being first transcribed into molecules called RNA, with a very similar chemical makeup to DNA, and then these RNA molecules are translated into proteins. Most of those proteins are enzymes, which facilitate biochemical reactions. In this way, the proteins are the molecular workhorses that – in a complicated process still not fully understood – put together new cells and allow embryos to grow and develop into babies. Thus, the genome contains the information needed to make a human.
If a gene acquires a mutation in its sequence – a change to one or more of its chemical letters – it encodes a slightly altered protein. We all have such variations in our genome, and most don’t significantly alter the protein’s ability to do its job. But sometimes a mutation will result in a malfunctioning protein – and that can cause real problems, as it does, for example, with certain mutations to the gene called CFTR that is associated with cystic fibrosis. To understand health conditions with an inherited aspect due to genetic mutations, therefore, we should start by identifying the relevant gene(s).
It is RNA and not DNA that is ‘the computational engine of the cell’
This story is (for the most part) not wrong. It’s plenty good enough to give students a rough notion of how biology works. But its elisions, omissions and simplifications can create serious misconceptions about what genes are and do. Consider this, for instance: most of the regions of the human genome that have been linked to diseases aren’t parts of genes at all. They feature in so-called non-coding sequences.
Only around 1-2 per cent of the entire human genome actually consists of protein-coding genes. The remainder was long thought to be mostly junk: meaningless sequences accumulated over the course of evolution. But at least some of that non-coding genome is now known to be involved in regulating genes: altering, activating or suppressing their transcription into RNA and translation into proteins. Many disease-linked regions are in these regulatory sequences, where mutations don’t change the proteins themselves but, rather, the rate or chance of them being made at all. So, to understand how life really works at the genomic level, we need to understand gene regulation. And that, as we’ll see, is not just eye-wateringly complicated but not at all what we have learnt to expect from the conventional molecular biology of the past 50 years.
What’s more, it turns out that not all genes encode proteins. In fact – and this may be one of genetics’ best-kept secrets, having been discovered only during the past decade – most of them do not. When the HGP began, many experts estimated the number of human genes to be around 100,000. It was soon found that, in fact, we have just 20,000 or so (some estimates put the figure even lower), which is little more than half as many as the banana. Meanwhile, researchers began to find genes that never get translated into protein at all. They are only transcribed into RNA, which seemed to have some intrinsic function rather than merely acting as a messenger for making proteins.
At first these non-coding (nc) RNA genes (they are not literally non-coding, but simply not protein-coding – biology’s language often reveals its flawed preconceptions) seemed a mere curiosity. But their numbers have been growing sharply, and now slightly exceed the number of coding genes. Some predict that eventually ncRNA genes will turn out to far outnumber protein-coding genes. The ncRNAs themselves may vary hugely in length, from many hundreds of ‘letters’ to a mere 20 or so. It is not yet known what many of them do, but in general they are thought to play important roles in gene regulation. As the molecular biologists Kevin Morris and John Mattick wrote:
It appears that we may have fundamentally misunderstood the nature of the genetic programming in complex organisms because of the assumption that most genetic information is transacted by proteins. This … is turning out not to be the case in more complex organisms, whose genomes appear to be progressively dominated by regulatory RNAs.
As Mattick pithily puts it, it is RNA and not DNA that is ‘the computational engine of the cell’.
Given these discoveries, it seems astonishing that, at least in biology’s public-facing image, it can seem as though nothing much has changed in the narrative of genetics since the 1960s. It is rather as if cosmologists, having discovered that all known matter makes up just 5 per cent of the Universe, being outweighed by a factor of five or so by the mysterious stuff dubbed dark matter while the remainder is the even more mysterious dark energy, were to say: ‘Nothing to see here! It’s still the same story!’
Then there’s gene regulation itself. We have known since the Nobel Prize-winning work of the biologists François Jacob and Jacques Monod in the 1960s that genes are regulated. It was once thought that each gene has a switch that can be turned on or off by some other molecule, such as the proteins called transcription factors. That seems typically to be the case for single-celled organisms such as bacteria, in which a regulatory protein might recognise and stick to the DNA sequence just next to a gene, in its so-called regulatory regions. In this way, transcription is controlled with a neat, digital logic whereby one gene can (via its protein product) switch another.
But that’s not the norm for human gene regulation. For us, there is layer after layer of regulatory processes, and we have little notion yet of how it all adds up. The same transcription factor can act on several different genes and can have different effects on the same gene in different types of cell, so that the result depends on some higher-level contextual information. Genes are also regulated by how the physical material of the chromosomes called chromatin – a composite of DNA with attached proteins called histones – is packaged up, which is a poorly understood matter. It’s as though some parts of the genome get filed away where they can’t be read. The packaging of chromatin is influenced by chemical groups that get stuck onto the histone proteins, perhaps in response to chemical signals such as hormones. We don’t understand the language of these histone modifications – why they sometimes suppress genes and sometimes activate them, say. But we do know that they matter: mutations of genes that make histone-modifying enzymes, for example, have been implicated in some diseases.
Evolution has, to speak anthropomorphically, evidently ‘designed’ our molecules to work in this fuzzy way
What’s more, our genes tend to be regulated not by individual molecules but by whole gangs of them. Transcription factors act together with other molecules (especially that regulatory ncRNA) and with regulatory segments of DNA called enhancers, insulators and so on, in vast teams that gather into loose collectives that some call condensates, which emerge like blobs of vinegar in the oil of salad dressing. No one knows how all this works, but it looks weirdly messy and analogue – think not of the digital computer but of knobs and dials for controlling old electrical circuits – given that our health and perhaps our life depends on it working reliably and accurately.
The temptation is to throw up one’s hands and conclude that, for humans at least, how life works surpasses all understanding. Some biologists have implied as much, suggesting that we might never truly understand life mechanistically, but will just have to rely on data mining with black-box AI to make predictions about what will lead to what.
But I don’t think that is so. On the contrary, it’s not hard to see why, the more complex the organism, the fuzzier its molecular mechanisms have to be. A huge machine that works only if all its countless components interlock in precisely coordinated ways is far too fragile – especially if those parts are, like molecules, constantly moving about randomly in a warm, wet environment. By the same token, if life relied on the accurate readout of innumerable genomic instructions in exactly the right order, it would be far too vulnerable to errors. It’s for these reasons that we are not machines – not, that is, like any machine humans have ever built. It’s a far better and more robust solution to find principles that work over many hierarchical levels, with the operation at one level being not too sensitive to the fine details of the levels below. Gene regulation by rather loosely defined condensates rather than by specific molecular switches, say, means that it can still work without every molecule having to be present and correct.
Evolution has, to speak anthropomorphically, evidently ‘designed’ our molecules to work in this fuzzy way. In contrast to the lock-and-key principle by which protein enzymes were long thought to recognise and transform their target molecules, some of the most important proteins in our cells, including many transcription factors, have shapes that are only loosely defined, enabling them to stick to others without being too choosy about it. And those little regulatory RNAs are generally too small to carry enough information for their unions to be very selective; they too work collectively, arriving at a decision, as it were, by committee.
As a result, cells can behave identically in building tissues and organs even while differing substantially in the precise mix of molecules they contain. The looseness, the permissiveness, pertains all the way up the scale – from molecules to networks to cells to tissues and bodies – in a manner that I call causal spreading. That’s to say, the true causes of outcomes at the level of traits and of health don’t all come from the bottom up, from the genes, but emerge at all levels in the hierarchy of scales. That’s how life works. If we can identify the key locus of causation for a given trait, we have a better chance of making interventions that make a difference.
Why have these dramatic developments within molecular biology been so little discussed beyond a small circle of specialists? That might have something to do with the habits of the field. Having long interacted with scientists of all persuasions, I’ve noticed a contrast between how physicists and biologists receive and communicate new ideas. Physicists are often keen to proclaim, at the drop of a hat, that ‘This changes everything!’ Biologists, on the other hand, while no slouches at drumming up media coverage for their own work, seem rather averse to big shifts in the discourse. ‘Well, we sort of knew that years ago,’ they will mutter – or alternatively: ‘That’s probably just a rare exception.’
I encountered this tendency a decade ago when it first became evident, thanks to an international project called ENCODE, that much of the non-coding portion of the human genome – up to 80 per cent of it in some cells at some time or another – is transcribed into RNA despite having no known function. Why would a cell bother to make that effort, at some cost in energy and resources, if these DNA sequences were all just junk? The answer turns out to be complicated. Some of that DNA might indeed be just meaningless stuff that is transcribed merely because it’s easier for the cell to go on making RNA than to have lots of precise controls for where to stop. But a fair proportion of non-coding RNA evidently does have a biochemical function. It seemed to me, back then, that the message of the ENCODE work represented quite a change in the prevailing narrative around DNA, and in 2013 I wrote an article for Nature saying as much, citing it as an example of how much we still don’t understand about genomics.
There is a lot now invested in the outdated narrative of the Human Genome Project
Some biologists responded by saying, in effect: ‘No no no, nothing to see here – our existing understanding is just fine.’ (This was mild stuff compared with the furious reaction the ENCODE paper itself elicited from some biologists, who accused the team of evolutionary heresy on a par with intelligent design.) Others said that, even if biology was indeed more complicated that we’d thought, what was to be gained by telling the public that? In other words: don’t upset the status quo.
Imbued with such persistent but vague misgivings about the stories we were telling of how biology works, in 2019 I spent the summer as a visitor in the Department of Systems Biology at Harvard Medical School. It seemed to me that everyone to whom I expressed those concerns in that unusually progressive and wide-ranging department replied: ‘Oh no – it’s much worse than that!’ They opened my eyes to ever more flaws in the conventional narrative. It was there that I discovered to what a considerable extent some important biological molecules don’t necessarily choose their binding partners with exquisite and tight selectivity, but on the contrary are highly promiscuous and form only very transient and weak partnerships. There I learnt how cells of a given type don’t all make identical suites of biomolecules, and how we can quantify their variety. And on it went.
I departed from Harvard convinced that it’s time to seek new narratives in biology, and that is why I wrote my book How Life Works (2023): an attempt not so much to tell these new stories as to discover for myself what those might be.
There’s more than disciplinary habit to the peculiar mutedness from biology about the conceptual advances of the past decades. For one thing, there is a lot now invested – intellectually, reputationally and financially – in the outdated narrative of the Human Genome Project, with its insistence on the genome as the instruction book for making (and, indirectly, for assembling) our molecular parts. To explain why it hasn’t yet delivered the promised cures, it is perhaps less of a climbdown to say that it’s turned out to be rather more complicated, than that we were working with the wrong picture in the first place. So, the hype around genes and the HGP won’t be dissipated overnight.
Any embarrassment about that was avoidable, though. To the extent that the new biology entails a demotion in the significance of genes, which now seem more like heritable resources that cells use than they are Watson’s ‘secret of life’, this relegation has been necessary only because of the rather absurd burden of responsibility placed on genes in the first place. It should always have been clear that genes do not somehow put cells and organisms together, but that, rather, differences between gene variants account for some of the variability in the way organisms end up.
The idea that ‘genes make proteins, and proteins make us’ is easy to grasp. The real picture is far harder to capture
It has also become much harder in recent years for scientists to admit to gaps in knowledge and understanding, which will be exploited by everyone ranging from creationists to climate-change deniers to anti-vaxxers as evidence that we shouldn’t believe a word they say. This is particularly hard in the life sciences: there are higher stakes (from our solipsistic perspective) attached to a medicine injected into your body than to a revision of cosmological theory. Of course, total understanding of how a drug works is not necessary anyway, so long as it has been thoroughly tested for efficacy and toxicity; we still have rather little idea, for example, how general anaesthetics work, but that didn’t bother me in the slightest when I had one last year. Yet it has become all too easy to fill such knowledge gaps with scare stories.
But surely another reason for the near invisibility in the science media of the transformation in biology is that we now have a much harder story to tell. The idea that ‘genes make proteins, and proteins make us’ is easy to grasp. The real picture is far harder to capture in a sound bite. I suspect we hear so little about this new biology in part because many journalists (or their editors) take a look at the latest research on, say, gene regulation of chromatin remodelling or cell signalling and think: ‘I’m not going anywhere near that!’
Finally, I suspect the narrative inertia reflects a general tendency in science whereby scientists get even more wedded to their metaphors than to their theories. Many biologists seem to have forgotten where the old metaphor of the genetic blueprint came from in the first place. The Harvard historian and philosopher of science Evelyn Fox Keller pointed out that it was never a notion compelled by the experimental evidence, but was merely a stopgap solution for our lack of knowledge about how the information in the genome (the genotype) was related to the visible traits of the organism (the phenotype).
The role of metaphor and narrative, as opposed to new theories or experiments, is too little recognised in discussions of the historian of science Thomas Kuhn’s paradigm shifts, supposed (and contested) moments of dramatic change in science. All scientists know how to go about scrutinising a theory: you use it to formulate some testable hypothesis, and then do the experiment. If the theory fails the test, that’s just the scientific method at work. But metaphors aren’t the kind of thing you test at all: there are no critical tools designed to challenge them. They become regarded merely as expressions of how things are: an invisible component of the prevailing paradigm.
As such, they are hard to dislodge when their utility has passed – scientists will instead find ingenious ways to hold on to them. Thus, genes may still be ‘selfish’, and organisms may still be ‘machines’, brains ‘computers’, genomes ‘blueprints’, so long as we give those metaphorical words different interpretations to the everyday ones – thereby, of course, negating their value as metaphors. Keller wrote eloquently on this issue:
[T]his style or habit of chronic slippage from one set of meanings to the other has prevailed for over 50 years ; it has become so deeply ensconced as to have been effectively invisible to most readers of the biological literature. This feature I suggest qualifies it as a Foucauldian discourse – by which I mean a discourse that operates by historically specific rules of exclusion, a discourse that is constituted by what can be said and thought, by what remains unsaid and unthought, and by who can speak, when, and with what authority.
The best metaphors for thinking about how life works come not from our technologies but from life itself
And there you have it: under cover of being neutral tools for communication, metaphors smuggle in ideological freight. If a metaphor is a kind of mental map, the sociologists Dorothy Nelkin and M Susan Lindee point out in their book The DNA Mystique (1995), quoting the curator Lucy Fellowes, that ‘every map is someone’s way of getting you to look at the world his or her way.’ I don’t suppose anyone who either supports or rejects the idea of ‘selfish genes’ would be so disingenuous as to deny that the arguments are not just about evolutionary biology but also about the broader connotations of the metaphor. I have heard it said that biologists who cleave to the claim that organisms are ‘machines’ do so not so much because of the aptness of the analogy but because it signifies allegiance to a materialist view of matter – as though one could not reject the idea that we are ‘machines made by genes’ without capitulating to a non-physical, mystical view of life.
Yet one can’t reasonably expect researchers to give up their metaphors unless they have others to replace them. In a 2020 commentary on my Nature article ‘Celebrate the Unknowns’ (2013), Keller (who saw the piece as a sign that even stodgy old Nature was waking up to something afoot) wrote that:
[I]f, as I claim, recent work in genomics has finally disrupted the narratives of developmental genetics that have prevailed for over a century, geneticists will now need a new narrative to help guide them through the thickets that lie before them.
So how now should we be speaking about biology? Keller herself tentatively suggested that we might adopt the prescient suggestion of the Nobel laureate biologist Barbara McClintock in recognising that the genome is a responsive, reactive system, not some passive data bank: as McClintock called it, a ‘highly sensitive organ of the cell’.
There’s virtue in that picture, but I think it points to a wider consideration: that the best narratives and metaphors for thinking about how life works come not from our technologies (machines, computers) but from life itself. Some biologists now argue that we should think of all living systems, from single cells upwards, not as mechanical contraptions but as cognitive agents, capable of sifting and integrating information against the backdrop of their own internal states in order to achieve some self-determined goal. Our biomolecules appear to make decisions not in the manner of on/off switches but in loosely defined committees that obey a combinatorial logic, comparable to the way different combinations of just a few light-sensitive cells or olfactory receptor molecules can generate countless sensations of colour or smell. The ‘organic technology’ of language, where meaning arises through context and cannot be atomised into component parts, is a constantly useful analogy. Life must be its own metaphor.
And shouldn’t we have seen that all along? For what, after all, is extraordinary – and challenging to scientific description – about living matter is not its molecules but its aliveness, its agency. It seems odd to have to say this, but it’s time for a biology that is life-centric.