Wednesday, 26 October 2011

Have a read of this

For people who read my most recent post please have a read of this... http://www.sciencedaily.com/releases/2011/10/111025122615.htm

The viral particles I spoke about in my last post would appear to make us human!

(as a side note - I recently broke my laptop so won't be posting for a couple of weeks until it's fixed)

Tuesday, 18 October 2011

More Than Just Junk (post 2 of 2)


In the first half of this blog I discussed with you that a mere 2.5% of our genome actually codes the genes that make us who and what we are. A further 22.5% of our genome is in some way related to these genes, either through pseudogenes or as small fragments of larger, functional, genes. Here we are going to look at the remaining 75% of our genome, the so-called ‘extragenic DNA’ and discuss the fact that about 50% of our genome isn’t even human.

The vast majority of the extragenic DNA in our genome can be classed as human transposable elements (HTE) or ‘jumping genes’. These are relatively small portions of DNA which sit in our genome then, on a whim, jump out of the genome and move to another place within it, like riding on a bus and moving seats halfway through. So the first question to tackle is how are these DNA elements able to jump around our genome? In order to understand this we must re-visit the process of making protein. From the DNA in our cells we make RNA; RNA is not connected to our genome and in order to serve its purpose of producing protein, it leaves the rest of the genome (found in the nucleus) and moves to the rest of the cell. RNA is therefore free to move around in our cells. Jumping genes take advantage of this by converting themselves into RNA and then reversing this process to make new DNA that will be inserted elsewhere, so to go back to the bus analogy used before it is in actual fact more like sitting in one seat and cloning yourself, then moving to another, as the initial DNA remains in the genome while new DNA is made and inserted elsewhere, via the RNA intermediate.

These jumping genes can be sub-divided into two main classes, depending on their size. These classes are known as long or short interspersed nuclear elements (SINEs or LINEs). The most abundant of the SINE family are known as Alu repeat elements while the most abundant of the LINE family is known as L1.
The Alu repeat family has over 1.1 million copies in our genome, accounting for about 10% of the mass of the genome (that’s four times the amount taken up by our genes!). The repeats are short, only about 300 bases (bases being the units that make up DNA) in length and are seen to repeat approximately every 4000 bases of our overall genome. This is the current estimate but as I mentioned, these genes can freely jump around our genome and add new copies and it is estimated that a new Alu element is added about every 200 new births. An interesting fact about these Alu repeats is that they are only found in primates, meaning that they must have first been formed at the point when primates split from other mammals in our evolutionary origins. This knowledge can then be used to infer when primates split from other animals on the evolutionary tree. While Alu repeats are classified as SINEs due to their short length, they can also be classified as non-viral HTEs because they are unable to produce the protein that is necessary to convert their RNA into DNA. In order to do this, they must borrow the protein from the viral HTEs, which takes us nicely onto L1.

L1 is the most abundant LINE in our genome and as I stated above, it can be classed as a viral HTE due to its ability to code the enzyme which converts RNA back into DNA, allowing the jumping genes to jump. This enzyme is known as reverse transcriptase (RT) (transcription is the conversion of DNA to RNA; this is being reversed and all enzymes have –ase at the end) and is not human, or even eukaryotic (those who follow my blog will know about eukaryotes, but for those who don’t – the previous blog explains it), it is a protein coded for, predominantly by viruses, in particular retroviruses.  Retroviruses code this protein as their genome is made of RNA and they must convert that RNA to DNA and insert it to their host’s genome in order to replicate. The best known of these retroviruses is undoubtedly HIV. Since L1 can produce RT, it must in some way have a viral origin, which, again, leads us nicely into our next discussion piece.

Retroviruses are a viral family which can convert their RNA genome into DNA and then insert that DNA into our genome. This happens to anyone infected with HIV or any other retrovirus. Usually the DNA will be inserted into cells known as ‘somatic cells,’ which are the cells that make-up you, from your skin, to your hair, to your intestines, to your little toe. However these are not the only cells in the body - we also have ‘germline cells’, which make up the next generation of you - sperm and egg cells. Consider this: if a retrovirus inserts its genome into a germline cell, which is then used to make a child, that retrovirus will be part of the child’s genome. This sounds strange and highly unlikely; it is, but that hasn’t stopped it from happening on numerous occasions. The process forms what are known as endogenous retroviruses (ERVs or HERVs – with the H being for human). Approximately 8% of our genome is made up of intact HERVs, providing a signature of our long co-existence with viruses. It is important to point out that these viruses would once have caused disease by infecting our ancestors, but the ones that kept a place in our genome lost the disease causing potential, otherwise they would not still be here today. The vast majority of the HERVs in our genome are ‘silent’, meaning they have no function, as they do not produce protein. However some are functional and, more amazingly, some are now vital to us!

Take for instance the placenta, an organ without which a fetus would be unable to survive as it allows nutrients and oxygen to get from the mother’s blood to her unborn baby. The placenta is made of cells, which are fused together into a structure known as a ‘syncytium’. In the year 2000 it was found that a HERV known as HERV-W was almost exclusively expressed in human placenta. Further study revealed that HERV-W produced a protein whose function was to fuse cells together in order to make the placenta; this protein is known as ‘syncytin.’ Later work found a second protein capable of this function (‘syncitin-2’) and this was also found to be produced by a HERV, this time by HERV-FDR.

Let’s just take a step back and think about that; what were once two viruses managed to infect the cells of an ancient animal (or possibly just a simple cell) and become part of the genome, from there they were able to be passed from generation to generation and eventually found a role in producing a protein that could fuse cells in order to produce a placenta, one of the key features for the development of a human child. Boggles the mind a little bit!

So there you have it, the mysteries of our genome. A so-called ‘human’ genome with only a tiny proportion of genes, that can actually be called human, along with the mere 2.5% of the genome which gives us genes we have gene fragments and pseudogenes derived from function genes making up just 25% of the genome. Approximately 50% of the remainder is accounted for by viral elements such as fully functional HERVs and fragments of these, along with the viral jumping genes such as L1. The rest is non-viral jumping genes such as the Alu family, which accounts for four times as much DNA as the human genes. Makes you think doesn’t it, can we really call ourselves human..?

Thursday, 6 October 2011

More Than Just Junk (post 1 of 2)


Back in 2003, after 13 long years, the Human Genome Project was completed and, for the first time, we were able to read the full genetic code of a human being. This publication was a landmark and resulted in a rapid increase in our understanding of many aspects of human biology. However, when first released, the genome also raised questions, the biggest of which was simply: where are the genes? It had been estimated that there would be around 100,000 genes in the genome to account for our complexity and genome size; however the actual number falls well short of this. It is estimated that we have between 20,000 and 25,000 genes, close to the number of genes a mouse has. Now consider this: our genome has approximately 3.2x109 base pairs (these are the chemical units which make up DNA) in length and the longest of all the 20-25,000 genes is about 2.4x106 base pairs, meaning the biggest gene in our genome takes up a mere 0.075%.  So what is all the rest?


I’ve already alluded to the fact that only a small amount of the genome is actually the genes that make us what and who we are. Of the whole of our genome, a mere 2.5% is genes that are expressed somewhere in our bodies at some time in our life. In order for DNA to be expressed it must firstly be converted to a molecule called mRNA, which is in turn used to make protein the protein then functions in our bodies to make us what and who we are. The remaining 97.5% was once just considered to be ‘junk’ DNA that had no use. We now know there is a lot more to this remaining DNA than just ‘junk.’

Let us firstly consider the small part of our genome that is actually human genes or in some way related to human genes, which make up a mere 25% of our genome. Of this 25% only 10% is actually ‘coding’. This means (as I stated above) that only 2.5% of our entire genome actually gives molecules that have a function in our bodies. The remaining 90% (of the 25%) is composed of ‘non-coding’ DNA.

There are two main types of non-coding DNA, known as pseudogenes and gene fragments.
As the prefix pseudo- implies, the pseudogenes are essentially false genes because they cannot be made into proteins. There are 4 different types of pseudogene:

Processed pseudogenes
For DNA to have any effect it must first be converted to mRNA prior to being made into a protein. When DNA is made into mRNA, certain changes are made. One of the biggest changes is known as splicing. Consider buying a watch with a metal link strap and finding it is too big. You would have links taken out of the strap and then put the whole strap back together at a shorter length so that it fits. This is sort of what happens to the DNA. When the DNA is made into mRNA certain parts known as introns are removed and the molecule is then put back together to be functional as mRNA. A processed pseudogene is created when this functional mRNA (so no introns) is converted back into DNA, instead of to protein as it should be. This new bit of DNA is then re-inserted to the genome. Due to the lack of introns, this DNA cannot be made into new mRNA, so is no longer functional - a false gene.

Non-processed pseudogenes
As the name implies, this is DNA that is non-functional but not as a result of the process I outlined above. Non-processed pseudogenes are simply genes that were once functional but are no longer expressed. These can also be known as fossilised genes as they were once necessary, but due to lack of use they no longer function, a case of ‘use it or lose it’. An example of this is a gene with the catchy name of OR7D4. This gene (like many non-processed pseudogenes) is part of our sense of smell and in approximately 30% of people it is no longer functional, meaning these individuals are unable to smell a chemical known as androstadienone.

Disabled pseudogenes
These are genes which have been disabled by some external factor such as a protein which may have mutated and become irreversibly bound to the DNA, impeding its expression.

Expressed pseudogenes
These are genes which are expressed but have no known function. It is though that these may be early proteins which will evolve over time to have a function in future generations.

Alongside pseudogenes, the other main non-coding DNA are gene fragments. These fragments are small parts of a gene that are not expressed and simply sit there as extra bits of DNA. For instance the HLA gene in the genome has what is known as a ‘leader’ followed by α1, α2 and α3 exons. Exons are in essence the opposite of introns in that they are the bits of DNA left in the mRNA when introns are removed (they can be thought of as the parts of the watch strap left in, to use that analogy again). Sitting next to the HLA gene in our genome there are gene fragments; one is made of the leader, α1 and α2, and the other is made of just α2. These are not expressed and simply take up space in the genome.

So that covers the 25% of the genome that is either genes or in some way related to genes. In the next post I will consider the remaining 75% and the fact that over 50% of our genome isn’t even human…