I said the other day that there's always something new to learn, & I love that my job gives me lots of opportunities to do this. Here's a case in point.
In my second-year paper on evolution, I talk a little bit about pseudogenes. I'm not actually a geneticist & so for this part of the paper I've tended to take what I'll admit is the easy way out & be guided by the textbook for the content. (This was always going to come back & bite me one day as textbooks are always behind the primary scientific literature in terms of the details they include.) Anyway, the other day I made a comment on pseudogenes on another blog & one of the other commenters recommended a new paper to me. So I read it, & now I'm sharing it - thanks, Heraclides :-)
I talk with my students about pseudogenes as examples of vestigial structures at the molecular level. Pseudogenes are common in the human geneome, but according to our textbook - & also the list of definitions that I've just this minute looked for on-line - they don't code for functional protein or RNA. For example, there are multiple gene loci coding for haemoglobin, but three are viewed as non-functional pseudogenes, disabled by a 'stop' codon partway through the sequence that prevents complete translation. And there are 'processed' pseudogenes - these are formed when an mRNA molecule is reverse transcribed to DNA and then reinserted into the genome at a new, different location. Because they are transcribed from mRNA, processed pseudogenes don't have their own promoters or introns, and so we can distinguish them from their parent genes. And because they lack promoters, they can't be expressed, right?
Well, according to the data reviewed and discussed by Zheng & Gerstein (2007), the answer's actually 'no, not always'.
They begin by giving two characteristics that are commonly associated with pseudogenes: sequence similarity to a functional gene, and... genetic defects that preclude the generation of functional products (that is, proteins or rRNA/tRNA). But then they go on to list reports of functional pseudogenes and pseudogenes (in mammals, plants, & yeast) that are transcribed - and this challenges that conventional definition. So Zheng & Gerstein suggest that the definition should be modified to pay more attention to the pseudogenes' sequences rather than whether or not they are functional, & they also explain how that seemingly paradoxical functioning can arise.
'Traditional' pseudogenes can have one or more things wrong with them: they may not have promoters, they may contain disabling mutations, or they may have lost the splice sites that enable intron removal during transcription. But it turns out that the 'traditional' variety probably make up only a small proportion of the roughly 20,000 pseudogenes found in the human genome (Zheng & Gerstein, 2007). Not to mention that it's apparently rather hard to determine non-functionality.
This was first recognised in 1999, with the discovery of a pseudogene that was transcribed in some nervous system cells in the snail Lymnaea. The transcript turned out to function as an anti-sense RNA, blocking translation (& hence expression) of a particular protein & thus acting in memory formation.Such pseudogene transcription has since been found in human and mouse studies, and it's been estimated that 5-20% of human pseudogenes may actually be transcribed. But is this important? Zheng & Gerstein (2008) comment that Studies of transcribed regions in [a number of genomes] have revealed that the transcriptome is more complex than was expected. Not only is most of the mammalian genome transcribed but also more than half of the transcribed regions are mapped outside known genes.... perhaps we should not be entirely surprised that pseudogenes, which seem to be as abundant as protein-coding genes in the human genome, contribute to the complex pool of the human transcriptome. So it turns out that there's considerable variation in pseudogenes, and Zheng & Gerstein suggest classifying them into 4 groups: exapted pseudogenes, which have gained a new biological function; 'piggy-back pseudogenes', which contain new functional sequences; dying pseudogenes, with much-reduced transcription; and dead pseudogenes (the 'traditional' type).
And the functionality? Some pseudogenes are very 'young' - if they are very similar to their 'parent' genes, in that they haven't gained many mutations, then they might still have some residual function when expressed. Processed pseudogenes may be expressed if they have been reinserted into the genome close to the promoter of some other, functional gene. Apparently it's possible that a fairly large number (in the hundreds) of new protein-coding primate genes might have evolved as a result of this process: pseudogenes can provide a source of evolutionary novelty. What a fascinating and complex tale this is!
All this re-emphasises something that I've said before - that science doesn't have all the answers; there's always something new to find out. Which is really exciting! But it also means that these days, a true science generalist is very rare. There's so much information in so many different areas that all the sciences (not just biology) tend to split & split again into various specialities. And people are experts in their speciality, but it's just really difficult to keep on top of (or even remotely up-to-date-with) what's going on in all those other fields. A never-ending challenge, in fact :-)
D. Zheng & M.B. Gerstein (2007) The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they? Trends in Genetics 23(5): 219-224