A database of proteins, dubbed the “unknome”, that ranks proteins according to how much we have learned about them has revealed that we still know next to nothing about thousands of human proteins. The team behind the database has also shown that at least some of these proteins are essential for survival.
To create the unknome, Sean Munro at the MRC Laboratory of Molecular Biology in Cambridge, UK, and his colleagues started with the 20,000 or so genes for proteins that have been identified in humans. They grouped together closely related human genes or proteins on the basis that they probably have similar functions, resulting in around 7500 protein clusters.
Next, they added closely related proteins found in commonly studied animals, such as mice or fruit flies, to these clusters, as these probably also have the same function. They then gave each protein cluster a score based on how many entries there were about its members in the main repository of information on the functions of genes, known as the Gene Ontology Resource.
A human protein that hasn’t been directly studied still scores highly if an equivalent protein has been well studied in another animal. Proteins also get higher scores for entries that are regarded as more reliable, such as having been published in a journal. The scoring is slightly arbitrary, says Munro, but this is inevitable when trying to work out what we don’t know.
The best-studied proteins have scores of well over 100. For instance, a protein called sonic hedgehog, which is involved in embryonic development, scores 168, while p53, which helps stop cells turning cancerous, scores 126. However, more than 2200 proteins have scores below 2, 1100 score below 1 and more than 800 score 0.
In theory, these low-scoring proteins might not have been studied because they don’t do anything important. To get an idea of whether the proteins matter, the team used a technique called RNA interference (RNAi) to reduce the levels of 260 proteins with scores below 1 in fruit flies. In 60 cases, the flies died, showing that these particular proteins have an essential function.
That was a big surprise to the team members, who study fruit flies, says Munro. “They just assume that every possible important gene has been found, which turns out, of course, not to be true.”
The number of unknown proteins is slowly going down, he says, but he hopes the findings will accelerate the pace of discovery. The problem at the moment is that both funding bodies and individual researchers are reluctant to risk studying unknown proteins in case they turn out not to do anything important.
“There may even be biological processes that we don’t know about,” says Munro. “No one is looking for the proteins involved in them because no one knows about them.” That may sound surprising, he says, but the gene-editing technique known as CRISPR is based on bacterial proteins whose function was uncovered only in 2012.