Tuesday, 18 October 2011
More Than Just Junk (post 2 of 2)
In the first half of this blog I discussed with you that a mere 2.5% of our genome actually codes the genes that make us who and what we are. A further 22.5% of our genome is in some way related to these genes, either through pseudogenes or as small fragments of larger, functional, genes. Here we are going to look at the remaining 75% of our genome, the so-called ‘extragenic DNA’ and discuss the fact that about 50% of our genome isn’t even human.
The vast majority of the extragenic DNA in our genome can be classed as human transposable elements (HTE) or ‘jumping genes’. These are relatively small portions of DNA which sit in our genome then, on a whim, jump out of the genome and move to another place within it, like riding on a bus and moving seats halfway through. So the first question to tackle is how are these DNA elements able to jump around our genome? In order to understand this we must re-visit the process of making protein. From the DNA in our cells we make RNA; RNA is not connected to our genome and in order to serve its purpose of producing protein, it leaves the rest of the genome (found in the nucleus) and moves to the rest of the cell. RNA is therefore free to move around in our cells. Jumping genes take advantage of this by converting themselves into RNA and then reversing this process to make new DNA that will be inserted elsewhere, so to go back to the bus analogy used before it is in actual fact more like sitting in one seat and cloning yourself, then moving to another, as the initial DNA remains in the genome while new DNA is made and inserted elsewhere, via the RNA intermediate.
These jumping genes can be sub-divided into two main classes, depending on their size. These classes are known as long or short interspersed nuclear elements (SINEs or LINEs). The most abundant of the SINE family are known as Alu repeat elements while the most abundant of the LINE family is known as L1.
The Alu repeat family has over 1.1 million copies in our genome, accounting for about 10% of the mass of the genome (that’s four times the amount taken up by our genes!). The repeats are short, only about 300 bases (bases being the units that make up DNA) in length and are seen to repeat approximately every 4000 bases of our overall genome. This is the current estimate but as I mentioned, these genes can freely jump around our genome and add new copies and it is estimated that a new Alu element is added about every 200 new births. An interesting fact about these Alu repeats is that they are only found in primates, meaning that they must have first been formed at the point when primates split from other mammals in our evolutionary origins. This knowledge can then be used to infer when primates split from other animals on the evolutionary tree. While Alu repeats are classified as SINEs due to their short length, they can also be classified as non-viral HTEs because they are unable to produce the protein that is necessary to convert their RNA into DNA. In order to do this, they must borrow the protein from the viral HTEs, which takes us nicely onto L1.
L1 is the most abundant LINE in our genome and as I stated above, it can be classed as a viral HTE due to its ability to code the enzyme which converts RNA back into DNA, allowing the jumping genes to jump. This enzyme is known as reverse transcriptase (RT) (transcription is the conversion of DNA to RNA; this is being reversed and all enzymes have –ase at the end) and is not human, or even eukaryotic (those who follow my blog will know about eukaryotes, but for those who don’t – the previous blog explains it), it is a protein coded for, predominantly by viruses, in particular retroviruses. Retroviruses code this protein as their genome is made of RNA and they must convert that RNA to DNA and insert it to their host’s genome in order to replicate. The best known of these retroviruses is undoubtedly HIV. Since L1 can produce RT, it must in some way have a viral origin, which, again, leads us nicely into our next discussion piece.
Retroviruses are a viral family which can convert their RNA genome into DNA and then insert that DNA into our genome. This happens to anyone infected with HIV or any other retrovirus. Usually the DNA will be inserted into cells known as ‘somatic cells,’ which are the cells that make-up you, from your skin, to your hair, to your intestines, to your little toe. However these are not the only cells in the body - we also have ‘germline cells’, which make up the next generation of you - sperm and egg cells. Consider this: if a retrovirus inserts its genome into a germline cell, which is then used to make a child, that retrovirus will be part of the child’s genome. This sounds strange and highly unlikely; it is, but that hasn’t stopped it from happening on numerous occasions. The process forms what are known as endogenous retroviruses (ERVs or HERVs – with the H being for human). Approximately 8% of our genome is made up of intact HERVs, providing a signature of our long co-existence with viruses. It is important to point out that these viruses would once have caused disease by infecting our ancestors, but the ones that kept a place in our genome lost the disease causing potential, otherwise they would not still be here today. The vast majority of the HERVs in our genome are ‘silent’, meaning they have no function, as they do not produce protein. However some are functional and, more amazingly, some are now vital to us!
Take for instance the placenta, an organ without which a fetus would be unable to survive as it allows nutrients and oxygen to get from the mother’s blood to her unborn baby. The placenta is made of cells, which are fused together into a structure known as a ‘syncytium’. In the year 2000 it was found that a HERV known as HERV-W was almost exclusively expressed in human placenta. Further study revealed that HERV-W produced a protein whose function was to fuse cells together in order to make the placenta; this protein is known as ‘syncytin.’ Later work found a second protein capable of this function (‘syncitin-2’) and this was also found to be produced by a HERV, this time by HERV-FDR.
Let’s just take a step back and think about that; what were once two viruses managed to infect the cells of an ancient animal (or possibly just a simple cell) and become part of the genome, from there they were able to be passed from generation to generation and eventually found a role in producing a protein that could fuse cells in order to produce a placenta, one of the key features for the development of a human child. Boggles the mind a little bit!
So there you have it, the mysteries of our genome. A so-called ‘human’ genome with only a tiny proportion of genes, that can actually be called human, along with the mere 2.5% of the genome which gives us genes we have gene fragments and pseudogenes derived from function genes making up just 25% of the genome. Approximately 50% of the remainder is accounted for by viral elements such as fully functional HERVs and fragments of these, along with the viral jumping genes such as L1. The rest is non-viral jumping genes such as the Alu family, which accounts for four times as much DNA as the human genes. Makes you think doesn’t it, can we really call ourselves human..?