The first genomes sequenced came from bacteria. These models of compact efficiency typically stuffed about 3,000 genes into a genome that was only a few Megabases (millions of DNA base pairs) long. The expectation was that, when we moved onto vertebrates and their genomes, which measure in the Gigabases, we’d find a corresponding increase in genes. It hasn’t happened that way. Most vertebrates seem to have gene counts under 30,000. Even a humble single-celled ciliate has about the same number of protein coding genes as a human. Why do humans, which are just a wee bit more complex than bacteria and paramecia, get by with so few proteins?
One possible answer came out of a computer study of protein-protein interactions released this week by PNAS. Most proteins don’t operate in a vacuum; instead, they exist as parts of large complexes of proteins (like the ribosome) or influence the activities of other proteins through transient interactions. In short, protein-protein interactions are essential to the function of the cell. The new paper looks at the energetics of these interactions, and concludes they might set a limit on protein complexity.
The authors developed their system by using a simplified model. Out of a population of 200 proteins, they assumed four specific interactions. They used an evolutionary algorithm to optimize the energy involved in those four interactions, and then compared that to the energy of random interactions with any of the remaining 196 proteins. The energy difference between the specific and non-specific interactions provided some measure of how efficiently these two proteins would interact in this hypothetical cell.
With a working model in hand, they then scaled this system up and applied it to an actual biological interaction network identified in yeast. They found that, once the protein number goes above a certain level, random interactions began to dominate—the specific interactions that are needed to run a cell simply don’t occur that often. In addition, the pressure of random interactions favors what’s called a scale-free interaction network. Most proteins have only a single binding partner, while a limited number engage in a large number of specific interactions.
Can this help explain why we’ve not spotted organisms with much more than 30,000 genes? The authors sure think so, writing, “We provide a physical explanation for the absence of an increase in protein diversity from simple multicellular organisms to humans.” Maybe so, but the authors themselves note that there’s at least one effective way around this: compartmentalization. Some proteins get shipped to specific areas of a cell, where they’ll only engage in interactions with a subset of the total protein number (some also get shipped outside the cell). The more compartments, the more proteins you can get away with.
Specialization of cells can also accomplish the same thing. Liver and nerve cells require lots of protein interactions to function, but most of these are specific to the two cell types. So, the protein complexes in nerve cells don’t have to be able to withstand random interactions with the full complement of proteins found in the liver. The more cell types you have, the more genes you should be able to tolerate. The authors don’t seem to consider this in detail.
Another complication that doesn’t appear in the paper is alternate splicing, a process that can generate a large number of variations on a single protein sequence, potentially changing both its specific and nonspecific interactions. Mammalian genomes have a lot of genes that use alternate splicing, which should, in theory, make matters worse without changing the gene number. Unfortunately, this complication is not addressed at all in the text.
Overall, it’s an intriguing idea, and a nicely developed model. What’s really needed now is to get the model to more accurately reflect what goes on in a eukaryotic cell, with its multitude of compartments and numerous populations of related proteins derived from alternate splicing. Because it’s difficult to accept that a model provides “a physical explanation” when it doesn’t really reflect physical reality. (Arstechnica -By John Timmer)