In the first half of this blog I
discussed with you that a mere 2.5% of our genome actually codes the genes that
make us who and what we are. A further 22.5% of our genome is in some way
related to these genes, either through pseudogenes or as small fragments of
larger, functional, genes. Here we are going to look at the remaining 75% of
our genome, the so-called ‘extragenic DNA’ and discuss the fact that about 50%
of our genome isn’t even human.
The vast majority of the
extragenic DNA in our genome can be classed as human transposable elements
(HTE) or ‘jumping genes’. These are relatively small portions of DNA which sit
in our genome then, on a whim, jump out of the genome and move to another place
within it, like riding on a bus and moving seats halfway through. So the first
question to tackle is how are these DNA elements able to jump around our
genome? In order to understand this we must re-visit the process of making
protein. From the DNA in our cells we make RNA; RNA is not connected to our
genome and in order to serve its purpose of producing protein, it leaves the
rest of the genome (found in the nucleus) and moves to the rest of the cell.
RNA is therefore free to move around in our cells. Jumping genes take advantage
of this by converting themselves into RNA and then reversing this process to
make new DNA that will be inserted elsewhere, so to go back to the bus analogy
used before it is in actual fact more like sitting in one seat and cloning
yourself, then moving to another, as the initial DNA remains in the genome
while new DNA is made and inserted elsewhere, via the RNA intermediate.
These jumping genes can be
sub-divided into two main classes, depending on their size. These classes are
known as long or short interspersed nuclear elements (SINEs or LINEs). The most
abundant of the SINE family are known as Alu repeat elements while the most
abundant of the LINE family is known as L1.
The Alu repeat family has over
1.1 million copies in our genome, accounting for about
10% of the mass of the genome (that’s four times the amount taken up by our genes!). The
repeats are short, only about 300 bases (bases being the units that make up
DNA) in length and are seen to repeat approximately every 4000 bases of our
overall genome. This is the current estimate but as I mentioned, these genes
can freely jump around our genome and add new copies and it is estimated that a
new Alu element is added about every 200 new births. An interesting
fact about these Alu repeats is that they are only found in primates, meaning
that they must have first been formed at the point when primates split from
other mammals in our evolutionary origins. This knowledge can then be used to
infer when primates split from other animals on the evolutionary tree. While Alu
repeats are classified as SINEs due to their short length, they can also be
classified as non-viral HTEs because they are unable to produce the protein that
is necessary to convert their RNA into DNA. In order to do this, they must borrow
the protein from the viral HTEs, which takes us nicely onto L1.
L1 is the most abundant LINE in
our genome and as I stated above, it can be classed as a viral HTE due to its
ability to code the enzyme which converts RNA back into DNA, allowing the
jumping genes to jump. This enzyme is known as reverse transcriptase (RT)
(transcription is the conversion of DNA to RNA; this is being reversed and all
enzymes have –ase at the end) and is not human, or even eukaryotic (those who
follow my blog will know about eukaryotes, but for those who don’t – the
previous blog explains it), it is a protein coded for, predominantly by
viruses, in particular retroviruses. Retroviruses
code this protein as their genome is made of RNA and they must convert that RNA
to DNA and insert it to their host’s genome in order to replicate. The best
known of these retroviruses is undoubtedly HIV. Since L1 can produce RT, it
must in some way have a viral origin, which, again, leads us nicely into our
next discussion piece.
Retroviruses are a viral family
which can convert their RNA genome into DNA and then insert that DNA into our
genome. This happens to anyone infected with HIV or any other retrovirus.
Usually the DNA will be inserted into cells known as ‘somatic cells,’ which are
the cells that make-up you, from your skin, to your hair, to your intestines,
to your little toe. However these are not the only cells in the body - we also
have ‘germline cells’, which make up the next generation of you - sperm and egg
cells. Consider this: if a retrovirus inserts its genome into a germline cell,
which is then used to make a child, that retrovirus will be part of the child’s
genome. This sounds strange and highly unlikely; it is, but that hasn’t stopped
it from happening on numerous occasions. The process forms what are known as
endogenous retroviruses (ERVs or HERVs – with the H being for human).
Approximately 8% of our genome is made up of intact HERVs, providing a
signature of our long co-existence with viruses. It is important to point out
that these viruses would once have caused disease by infecting our ancestors,
but the ones that kept a place in our genome lost the disease causing potential,
otherwise they would not still be here today. The vast majority of the HERVs in
our genome are ‘silent’, meaning they have no function, as they do not produce
protein. However some are functional and, more amazingly, some are now vital to
us!
Take for instance the placenta,
an organ without which a fetus would be unable to survive as it allows
nutrients and oxygen to get from the mother’s blood to her unborn baby. The
placenta is made of cells, which are fused together into a structure known as a
‘syncytium’. In the year 2000 it was found that a HERV known as
HERV-W was almost exclusively expressed in human placenta. Further study revealed
that HERV-W produced a protein whose function was to fuse cells together in
order to make the placenta; this protein is known as ‘syncytin.’ Later work
found a second protein capable of this function (‘syncitin-2’) and this was
also found to be produced by a HERV, this time by
HERV-FDR.
Let’s just take a step back and
think about that; what were once two viruses managed to infect the cells of an
ancient animal (or possibly just a simple cell) and become part of the genome, from
there they were able to be passed from generation to generation and eventually
found a role in producing a protein that could fuse cells in order to produce a
placenta, one of the key features for the development of a human child. Boggles
the mind a little bit!
So there you have it, the
mysteries of our genome. A so-called ‘human’ genome with only a tiny proportion
of genes, that can actually be called human, along with the mere 2.5% of the
genome which gives us genes we have gene fragments and pseudogenes derived from
function genes making up just 25% of the genome. Approximately 50% of the
remainder is accounted for by viral elements such as fully functional HERVs and
fragments of these, along with the viral jumping genes such as L1. The rest is
non-viral jumping genes such as the Alu family, which accounts for four times
as much DNA as the human genes. Makes you think doesn’t it, can we really call
ourselves human..?