Thursday, 6 October 2011

More Than Just Junk (post 1 of 2)

Back in 2003, after 13 long years, the Human Genome Project was completed and, for the first time, we were able to read the full genetic code of a human being. This publication was a landmark and resulted in a rapid increase in our understanding of many aspects of human biology. However, when first released, the genome also raised questions, the biggest of which was simply: where are the genes? It had been estimated that there would be around 100,000 genes in the genome to account for our complexity and genome size; however the actual number falls well short of this. It is estimated that we have between 20,000 and 25,000 genes, close to the number of genes a mouse has. Now consider this: our genome has approximately 3.2x109 base pairs (these are the chemical units which make up DNA) in length and the longest of all the 20-25,000 genes is about 2.4x106 base pairs, meaning the biggest gene in our genome takes up a mere 0.075%.  So what is all the rest?

I’ve already alluded to the fact that only a small amount of the genome is actually the genes that make us what and who we are. Of the whole of our genome, a mere 2.5% is genes that are expressed somewhere in our bodies at some time in our life. In order for DNA to be expressed it must firstly be converted to a molecule called mRNA, which is in turn used to make protein the protein then functions in our bodies to make us what and who we are. The remaining 97.5% was once just considered to be ‘junk’ DNA that had no use. We now know there is a lot more to this remaining DNA than just ‘junk.’

Let us firstly consider the small part of our genome that is actually human genes or in some way related to human genes, which make up a mere 25% of our genome. Of this 25% only 10% is actually ‘coding’. This means (as I stated above) that only 2.5% of our entire genome actually gives molecules that have a function in our bodies. The remaining 90% (of the 25%) is composed of ‘non-coding’ DNA.

There are two main types of non-coding DNA, known as pseudogenes and gene fragments.
As the prefix pseudo- implies, the pseudogenes are essentially false genes because they cannot be made into proteins. There are 4 different types of pseudogene:

Processed pseudogenes
For DNA to have any effect it must first be converted to mRNA prior to being made into a protein. When DNA is made into mRNA, certain changes are made. One of the biggest changes is known as splicing. Consider buying a watch with a metal link strap and finding it is too big. You would have links taken out of the strap and then put the whole strap back together at a shorter length so that it fits. This is sort of what happens to the DNA. When the DNA is made into mRNA certain parts known as introns are removed and the molecule is then put back together to be functional as mRNA. A processed pseudogene is created when this functional mRNA (so no introns) is converted back into DNA, instead of to protein as it should be. This new bit of DNA is then re-inserted to the genome. Due to the lack of introns, this DNA cannot be made into new mRNA, so is no longer functional - a false gene.

Non-processed pseudogenes
As the name implies, this is DNA that is non-functional but not as a result of the process I outlined above. Non-processed pseudogenes are simply genes that were once functional but are no longer expressed. These can also be known as fossilised genes as they were once necessary, but due to lack of use they no longer function, a case of ‘use it or lose it’. An example of this is a gene with the catchy name of OR7D4. This gene (like many non-processed pseudogenes) is part of our sense of smell and in approximately 30% of people it is no longer functional, meaning these individuals are unable to smell a chemical known as androstadienone.

Disabled pseudogenes
These are genes which have been disabled by some external factor such as a protein which may have mutated and become irreversibly bound to the DNA, impeding its expression.

Expressed pseudogenes
These are genes which are expressed but have no known function. It is though that these may be early proteins which will evolve over time to have a function in future generations.

Alongside pseudogenes, the other main non-coding DNA are gene fragments. These fragments are small parts of a gene that are not expressed and simply sit there as extra bits of DNA. For instance the HLA gene in the genome has what is known as a ‘leader’ followed by α1, α2 and α3 exons. Exons are in essence the opposite of introns in that they are the bits of DNA left in the mRNA when introns are removed (they can be thought of as the parts of the watch strap left in, to use that analogy again). Sitting next to the HLA gene in our genome there are gene fragments; one is made of the leader, α1 and α2, and the other is made of just α2. These are not expressed and simply take up space in the genome.

So that covers the 25% of the genome that is either genes or in some way related to genes. In the next post I will consider the remaining 75% and the fact that over 50% of our genome isn’t even human…

No comments:

Post a Comment