Let’s imagine planet Earth without viruses.
We wave a wand, and they all disappear. The rabies virus is suddenly gone. The polio virus is gone. The gruesomely lethal Ebola virus is gone. The measles virus, the mumps virus, and the various influenzas are gone. Vast reductions of human misery and death. HIV is gone, and so the AIDS catastrophe never happened. Nipah and Hendra and Machupo and Sin Nombre are gone—never mind their records of ugly mayhem. Dengue, gone. All the rotaviruses, gone, a great mercy to children in developing countries who die by the hundreds of thousands each year. Zika virus, gone. Yellow fever virus, gone. Herpes B, carried by some monkeys, often fatal when passed to humans, gone. Nobody suffers anymore from chicken pox, hepatitis, shingles, or even the common cold. Variola, the agent of smallpox? That virus was eradicated in the wild by 1977, but now it vanishes from the high-security freezers where the last spooky samples are stored. The SARS virus of 2003, the alarm that we now know signaled the modern pandemic era, gone. And of course the nefarious SARS-CoV-2 virus, cause of COVID-19 and so bewilderingly variable in its effects, so tricky, so dangerous, so very transmissible, is gone. Do you feel better?
Don’t.
This scenario is more equivocal than you think. The fact is, we live in a world of viruses—viruses that are unfathomably diverse, immeasurably abundant. The oceans alone may contain more viral particles than stars in the observable universe. Mammals may carry at least 320,000 different species of viruses. When you add the viruses infecting nonmammalian animals, plants, terrestrial bacteria, and every other possible host, the total comes to … lots. And beyond the big numbers are big consequences: Many of those viruses bring adaptive benefits, not harms, to life on Earth, including human life.
We couldn’t continue without them. We wouldn’t have arisen from the primordial muck without them. There are two lengths of DNA that originated from viruses and now reside in the genomes of humans and other primates, for instance, without which—an astonishing fact—pregnancy would be impossible. There’s viral DNA, nestled among the genes of terrestrial animals, that helps package and store memories—more astonishment—in tiny protein bubbles. Still other genes co-opted from viruses contribute to the growth of embryos, regulate immune systems, resist cancer—important effects only now beginning to be understood. Viruses, it turns out, have played crucial roles in triggering major evolutionary transitions. Eliminate all viruses, as in our thought experiment, and the immense biological diversity gracing our planet would collapse like a beautiful wooden house with every nail abruptly removed.
A virus is a parasite, yes, but sometimes that parasitism is more like symbiosis, mutual dependence that profits both visitor and host. Like fire, viruses are a phenomenon that’s neither in all cases good nor in all cases bad; they can deliver advantage or destruction. Everything depends: depends on the virus, on the situation, on your point of reference. They are the dark angels of evolution, terrific and terrible. That’s what makes them so interesting.
To appreciate the multifariousness of viruses, you need to start with the basics of what they are and what they are not. It’s easier to say what they are not. They are not living cells. A cell, of the sort assembled in great number to make up your body or mine or the body of an octopus or a primrose, contains elaborate machinery for building proteins, packaging energy, and performing other specialized functions—depending on whether that cell happens to be a muscle cell or a xylem cell or a neuron. A bacterium is also a cell, with similar attributes, though much simpler. A virus is none of this.
Saying just what a virus is has been complicated enough that definitions have changed over the past 120-some years. Martinus Beijerinck, a Dutch botanist who studied tobacco mosaic virus, speculated in 1898 that it was an infectious liquid. For a time a virus was defined mainly by its size—a thing much smaller than a bacterium but that, like bacteria, could cause disease. Still later, a virus was thought to be a submicroscopic agent, bearing only a very small genome, that replicated inside living cells—but that was just a first step toward a better understanding. (See how viruses look up close.)
“I shall defend a paradoxical viewpoint,” wrote the French microbiologist André Lwoff in “The Concept of Virus,” an influential essay published in 1957, “namely that viruses are viruses.” Not a very helpful definition but fair warning—another way of saying “unique unto themselves.” He was just clearing his throat before beginning a complex disquisition.
Lwoff knew that viruses are easier to describe than to define. Each viral particle consists of a stretch of genetic instructions (written either in DNA or that other information-bearing molecule, RNA) packaged inside a protein capsule (known as a capsid). The capsid, in some cases, is surrounded by a membranous envelope (like the caramel on a caramel apple), which protects it and helps it catch hold of a cell. A virus can copy itself only by entering a cell and commandeering the 3D-printing machinery that turns genetic information into proteins.
If the host cell is unlucky, many new viral particles are manufactured, they come busting out, and the cell is left as wreckage. That sort of damage—such as what SARS-CoV-2 causes in the epithelial cells of the human airway—is partly how a virus becomes a pathogen.
But if the host cell is lucky, maybe the virus simply settles into this cozy outpost—either going dormant or back-engineering its little genome into the host’s genome—and bides its time. This second possibility carries many implications for the mixing of genomes, for evolution, even for our sense of identity as humans, a topic to which I’ll return. One hint, for now: In a popular 1983 book the British biologist Peter Medawar and his wife, Jean, an editor, asserted, “No virus is known to do good: It has been well said that a virus is ‘a piece of bad news wrapped up in protein.’ ” They had it wrong. So did a lot of scientists at the time, and it remains a view still embraced, understandably, by anyone whose knowledge of viruses is limited to such bad news as the flu and COVID-19. But today some viruses are known to do good. What’s wrapped up in the protein is a genetic dispatch, and that might turn out to be good news or bad, depending.
Where did the first viruses come from? This requires us to squint back almost four billion years, to the time when life on Earth was just emerging from an inchoate cookery of long molecules, simpler organic compounds, and energy.
Let’s say some of the long molecules (probably RNA) started to replicate. Darwinian natural selection would have begun there, as those molecules—the first genomes—reproduced, mutated, and evolved. Groping for competitive edge, some may have found or created protection within membranes and walls, leading to the first cells. These cells gave rise to offspring by fission, splitting in two. They split in a broader sense too, diverging to become Bacteria and Archaea, two of the three domains of cellular life. The third, Eukarya, arose sometime later. It includes us and all other creatures (animals, plants, fungi, certain microbes) composed of cells with complex internal anatomy. Those are the three great limbs on the tree of life, as presently drawn.
But where do viruses fit? Are they a fourth major limb? Or are they a sort of mistletoe, a parasite wafted in from elsewhere? Most versions of the tree omit viruses entirely.
One school of thought asserts that viruses shouldn’t be included on the tree of life because they aren’t alive. That’s a lingering argument, hinging on how you define “alive.” More intriguing is to grant viruses inclusion within the big tent called Life, and then wonder about how they got in.
There are three leading hypotheses to explain the evolutionary origins of viruses, known to scientists as viruses-first, escape, and reduction. Viruses-first is the notion that viruses came into existence before cells, somehow assembling themselves directly from that primeval cookery. The escape hypothesis posits that genes or stretches of genomes leaked out of cells, became encased within protein capsids, and went rogue, finding a new niche as parasites. The reduction hypothesis suggests that viruses originated when some cells downsized under competitive pressure (it being easier to replicate if you’re small and simple), shedding genes until they were reduced to such minimalism that only by parasitizing cells could they survive.
There is also a fourth variant, known as the chimeric hypothesis, which takes inspiration from another category of genetic elements: transposons (sometimes called jumping genes). The geneticist Barbara McClintock deduced their existence in 1948, a discovery that earned her a Nobel Prize. These opportunistic elements achieve their Darwinian success simply by bouncing from one part of a genome to another, in rare cases from one cell to another, even one species to another, using cellular resources to get themselves copied, over and over. Self-copying protects them from accidental extinction. They accumulate outlandishly. They constitute, for instance, roughly half of the human genome. The earliest viruses, according to this idea, may have arisen from such elements by borrowing proteins from cells to wrap their nakedness inside protective capsids, a more complex strategy.
Each of these hypotheses has merits. But in 2003 new evidence tipped expert opinion toward reduction: the giant virus.
It was found within amoebas, which are single-celled eukaryotes. These amoebas had been collected in water taken from a cooling tower in Bradford, England. Inside some of them was this mysterious blob. It was big enough to be seen through a light microscope (viruses supposedly were too small for that, visible only by electron microscope), and it looked like a bacterium. Scientists tried to detect bacterial genes within it but found none.
Finally a team of researchers in Marseille, France, invited the thing to infect other amoebas, sequenced its genome, recognized what it was, and named it Mimivirus, because it mimicked bacteria, at least with regard to size. In diameter it was huge, bigger than the smallest bacteria. Its genome was also huge for a virus, almost 1.2 million letters long, compared to, say, 13,000 for an influenza virus, or even 194,000 for smallpox. (DNA, like RNA, is a long molecule built with four different molecular bases, which scientists abbreviate by their first letters.) It was an “impossible” virus: viral in nature but too big in scale, like a newly discovered Amazon butterfly with a four-foot wingspan.
Jean-Michel Claverie was a senior member of that Marseille team. The discovery of Mimivirus, Claverie told me, “caused a lot of trouble.” Why? Because sequencing the genome revealed four very unexpected genes—genes for coding enzymes presumed to be uniquely cellular and never before seen in a virus. Those enzymes, Claverie explained, are among the components that translate the genetic code to assemble amino acids into proteins.
“So the question was,” Claverie said, “what the hell has a virus the need” for those fancy enzymes, normally active in cells, “when he has the cell at his disposal, OK?”
What need indeed? The logical inference is that Mimivirus has them as holdovers because its lineage originated by genomic reduction from a cell.
Mimivirus was no fluke. Similar giant viruses were soon detected in the Sargasso Sea, and the early name became a genus, Mimivirus, containing several giants. Then the Marseille team discovered two more behemoths—again, both parasites of amoebas—one taken from shallow marine sediments off the coast of Chile, the other from a pond in Australia. Up to twice as big as a Mimivirus, even more anomalous, these were assigned to a separate genus, which Claverie and his colleagues named Pandoravirus, evoking Pandora’s box, as they explained in 2013, because of “the surprises expected from their further study.”
Claverie’s senior co-author on that paper was Chantal Abergel, a virologist and structural biologist (and also his wife). Of the Pandoraviruses, Abergel told me, with a weary laugh: “They were highly challenging. They are my babies.” She explained how difficult it had been to tell what they were, these creatures—so different from cells, so different from classical viruses, carrying many genes that resembled nothing ever before seen. “All of that makes them fascinating but also mysterious.” For a while she called them NLF: new life-form. But from observing that they didn’t replicate by fission, she and her colleagues realized they were viruses—the largest and most perplexing ones found so far.
These discoveries suggested to the Marseille group a bold variant of the reduction hypothesis. Maybe viruses did originate by reducing from ancient cells, but cells of a sort no longer present on Earth. This kind of “ancestral protocell” might have been different from—and in competition with—the universal common ancestor of all cells known today. Maybe these protocells lost that competition and were excluded from all the niches available for free-living things. They may have survived as parasites on other cells, downsized their genomes, and become what we call viruses. From that vanished cellular realm, maybe only viruses remain, like the giant stone heads on Easter Island.
Discovery of the giant viruses inspired other scientists, notably Patrick Forterre at the Pasteur Institute in Paris, to formulate novel ideas about what viruses are and what constructive roles they have played, and continue to play, in the evolution and functions of cellular life.
Previous definitions of “virus” were inadequate, Forterre proposed, because scientists were confusing viral particles—the capsid-enclosed bits of genome, properly known as virions—with the totality of a virus. That, he argued, was as wrong as confusing a seed with a plant, or a spore with a mushroom. The virion is just the dispersal mechanism, he argued. The real wholeness of the virus also includes its presence within a cell, once it has seized the cell’s machinery to replicate more virions, more seeds of itself. To see the two phases together is to see that the cell has effectively become part of the virus’s life history.
Forterre bolstered that notion by inventing a new name for the combined entity: the virocell. This idea also cut through the alive-or-not-alive conundrum. A virus is alive when it’s a virocell, according to Forterre, never mind that its virions are inanimate.
“The idea behind the virocell concept,” he told me by Skype from Paris, “was mainly to focus on this intracellular stage.” That’s the delicate stage when the infected cell, like a zombie, is obeying the viral mandate, reading the viral genome and replicating it, but not always without skips, staggers, and mistakes. During that process, Forterre said, “new genes can originate in a viral genome. And this is a major point for me.” Viruses bring innovation, but cells respond with their own defensive innovations, such as the cell wall or the nucleus, and so it’s an arms race toward greater complexity. Many scientists have assumed that viruses achieve their major evolutionary changes by the “virus pickpocket” paradigm, snatching DNA from this infected organism and that one, and then putting the stolen pieces to use within the viral genome. Forterre argues that the pilfering might more often go the other way, cells taking genes from viruses.
An even more sweeping view, held by Forterre and Claverie and some other scientists in the field, including Gustavo Caetano-Anollés at the University of Illinois at Urbana-Champaign, is that viruses are the preeminent font of genetic diversity. According to this thinking, viruses have enriched the evolutionary options of cellular creatures over the past several billion years by depositing new genetic material in their genomes. This bizarre process is one version of a phenomenon known as horizontal gene transfer—genes flowing sideways, across boundaries between different lineages. (Vertical gene transfer is the more familiar form of inheritance: from parents to offspring.) The flow of viral genes into cellular genomes has been “overwhelming,” Forterre and a co-author have argued, and may help explain some great evolutionary transitions, such as the origin of DNA, the origin of the cell nucleus in complex creatures, the origin of cell walls, and maybe even the divergence of those three great limbs on the tree of life.
In the olden days, the days before COVID-19, engrossing discussions with scientists sometimes happened in person, not by Skype. Three years ago, I flew from Montana to Paris because I wanted to talk with a man about a virus and a gene. The man was Thierry Heidmann, and the gene was syncytin-2. He and his group had discovered it by screening the human genome—all 3.1 billion letters of code—to find stretches of DNA that looked like the kind of gene a virus would use to produce its envelope. They found about 20.
“At least two proved to be very important,” Heidmann told me. They were important because they had the capacity to perform functions essential to human pregnancy. Those two were syncytin-1, which was first discovered by other scientists, and syncytin-2, which he and his group found. How these viral genes became part of the human genome, and to what purposes they have become adapted, are aspects of a remarkable story that begins with the concept of human endogenous retroviruses.
A retrovirus is a virus with an RNA genome that operates backward from the usual direction (hence retro). Instead of using DNA to make RNA, which then serves as a messenger sent to the 3D printer to make proteins, these viruses use their RNA to make DNA and then integrate it into the genome of the infected cell. HIV, for instance, is a retrovirus that infects human immune cells, inserting its genome into the cell genome, where it may lie dormant. At some point, the viral DNA gets activated, becoming a template for production of many more HIV virions, which kill the cell as they come exploding out.
Here’s the big twist: Some retroviruses infect reproductive cells—the cells that produce eggs or sperm—and in doing that, they insert their DNA into the heritable genome of the host. Those inserted stretches are “endogenous” (internalized) retroviruses, and when incorporated into human genomes, they are known as human endogenous retroviruses (HERVs). If you remember nothing else from this article, you might want to remember that 8 percent of the human genome consists of such viral DNA, patched into our lineage by retroviruses over the course of evolution. We are each one-twelfth HERV. The gene syncytin-2 is among the more consequential of those patches.
For four hours I sat in Heidmann’s office while he explained to me, with a laptop at his elbow for bringing up graphs and charts, the origin and the functions of this particular gene. The essence is almost simple. A gene that originally helped a virus fuse with host cells found its way into ancient animal genomes. It was then repurposed to generate a similar protein that helps fuse cells to create a special structure around what became the placenta, opening a new possibility in some animals: internal pregnancy. That innovation was vastly consequential in evolutionary history, making it possible for a female to carry her developing offspring from place to place, inside her body, rather than leaving them vulnerable in one place, as eggs in a nest.
The first gene of this sort from an endogenous retrovirus eventually was replaced by others that were similar but better suited for the role. Over time, the design of this new mode of reproduction improved and the placenta evolved. Among these acquired viral genes is syncytin-2, one of two syncytins in humans helping to fuse cells to form a placental layer next to the uterus. That unique structure, mediating between mother and fetus, allows nutrients and oxygen in, carries waste products and carbon dioxide out, and probably protects the fetus from being attacked by the mother’s immune system. It’s a near miracle of efficient design, in which evolution shaped a viral component into a human component.
Heidmann and I broke for lunch and then resumed for another two hours. Finally, my brain buzzing, my notebook full, I asked him: What does it all say about how evolution works? He laughed with delight, and I laughed too, from amazement and fatigue.
“Our genes are not only our genes,” he said. “Our genes are also retroviral genes.”
The contribution of that retrovirus, giving us syncytin-2, is only one instance of a grand pattern. Another is the gene ARC, expressed in response to neuronal activity in mammals and flies. It closely resembles a retroviral gene that codes for a protein capsid. Recent research by several teams, including one led by Jason Shepherd at the University of Utah, suggests that ARC plays a key role in storing information within neural networks. Another word for that: memory. ARC seems to do it by packaging information derived from experience (embodied as RNA) into little protein sacs that carry it from one neuron to another.
And at the Stanford University School of Medicine, Joanna Wysocka, along with a group of colleagues, has found evidence that viral fragments produced by another human endogenous retrovirus, known as HERV-K, are present within human embryos at the earliest stage and might play some positive role in protecting the embryo from viral infection, or in helping control fetal development, or both. Further, her group has focused on a particular transposon that seems to have entered the human genome as a sort of prologue section of HERV-K, then found ways of copying itself and bouncing to other parts of the genome, so that it’s now present in 697 scattered copies. Those copies seem to help turn on almost 300 human genes.
“To me what is really mind-boggling,” Wysocka said, “is that HERVs are about 8 percent of the human genome,” a portion of our being that is essentially “the graveyard of previous retroviral infections.” It’s even more boggling to contemplate how, as Wysocka put it, “our history of past retroviral infections is continuing to shape our evolution as a species.”
If 8 percent of your genome and mine is retroviral DNA, and half is transposons, then maybe the very notion of human individuality (let alone human supremacy) is not as solid as we like to believe.
The downside of such evolutionary agility, of course, is that viruses can sometimes switch hosts, tumbling from one kind of creature into another and succeeding as pathogens in the unfamiliar new host. That’s called spillover, and it’s how most new human infectious diseases arise—with viruses acquired from a nonhuman animal host.
In the original host—known to science as a reservoir host—a virus may have abided quietly, at low abundance and low impact, for thousands of years. It may have reached an evolutionary accommodation with the reservoir host, accepting security in exchange for causing no trouble. But in a new host, such as a human, the old deal doesn’t necessarily hold. The virus may explode in abundance, causing discomfort or misery in that first victim. If the virus not only replicates but also manages to spread, human to human, among a few dozen other individuals, that’s an outbreak. If it sweeps through a community or a country, that’s an epidemic. If it encircles the world, it’s a pandemic. So now we’re back to SARS-CoV-2.
Some types of viruses are more likely to cause pandemics than others. Near the top of the list of the most worrisome candidates are coronaviruses, because of the nature of their genomes, their capacity to change and evolve, and their history of causing serious human disease. That group includes SARS (severe acute respiratory syndrome) in 2002-03 and MERS (Middle East respiratory syndrome) in 2012-15. So when the phrase “novel coronavirus” began to be used to describe the new thing causing clusters of illness in Wuhan, China, those two words were enough to make disease scientists around the world shudder. (Take an intimate look at how the virus upended our lives.)
Coronaviruses belong to an infamous category of viruses, the single-stranded RNA viruses, that includes influenzas, Ebola viruses, rabies, measles, Nipah, hantaviruses, and retroviruses. They are infamous partly because a single-stranded RNA genome is subject to frequent mutation as the virus replicates, and such mutation supplies a richness of random genetic variation upon which natural selection can work.
Coronaviruses, though, evolve relatively slowly for RNA viruses. They carry fairly long genomes—the SARS-CoV-2 genome runs to about 30,000 letters—but their genomes change less quickly than some others because they have a proofreading enzyme to correct mutations. Yet they are also capable of a trick called recombination, in which two strains of coronavirus, infecting the same cell, swap sections of their genomes and give rise to a third, hybrid strain of coronavirus. That may be what happened to create the novel coronavirus, SARS-CoV-2.
The ancestral virus probably resided in a bat, possibly a horseshoe bat, belonging to a genus of small, insectivorous creatures with horseshoe-shaped noses, which commonly carry coronaviruses. If recombination did occur, adding some crucial new elements from a different coronavirus, this could have happened in a bat or possibly in another animal. (Pangolins have been suggested; other species could also be candidates.) Scientists are exploring these possibilities and others by sequencing and comparing genomes of the viruses found in various potential hosts. All we know for now is that SARS-CoV-2 as it exists today in humans is a subtle virus capable of further evolution.
So viruses give and viruses take away. Maybe the reason they are difficult to place on the tree of life is that life’s history, after all, isn’t quite shaped like a tree. The arboreal analogy is just our traditional way of illustrating evolution, made canonical by Charles Darwin. But Darwin, great as he was, knew nothing about horizontal gene transfer. In fact, he knew nothing about genes. He knew nothing about viruses. Everything is very complicated, we realize now. Even viruses, which seem so simple at first glance, are very complicated. And if seeing them in all their complexity gives us humans a clearer vision of the tangled connectedness of the natural world, if reflecting upon our own viral contents takes away some of our sublime detachment, then I leave it to you to say whether those are benefits or harms.
Editor's note: The opening photo of a fetus, the fetus projected behind model Melody Carballo, and the embryo projected behind Joanna Wysocka were taken by Swedish photographer Lennart Nilsson (1922-2017). His groundbreaking work documenting life before birth was first showcased in Life magazine in 1965 and remains unsurpassed.