Transcription is the mechanism by which a template strand of DNA is utilized by specific RNA polymerases to generate one of the four distinct classifications of RNA. These four RNA classes are:
1. Messenger RNAs (mRNAs): This class of RNA is the genetic coding templates used by the translational machinery to determine the order of amino acids incorporated into an elongating polypeptide in the process of translation.
2. Transfer RNAs (tRNAs): This class of small RNA form covalent attachments to individual amino acids and recognize the encoded sequences of the mRNAs to allow correct insertion of amino acids into the elongating polypeptide chain.
3. Ribosomal RNAs (rRNAs): This class of RNA is assembled, together with numerous ribosomal proteins, to form the ribosomes. Ribosomes engage the mRNAs and form a catalytic domain into which the tRNAs enter with their attached amino acids. A unique function of the 28S rRNA of the large ribosoma subunit is catalytic. This rRNA catalyzes the formation of the peptide bond via the ribozyme (RNA-directed catalysis) activity.
4. Small RNAs: This class of RNA includes the small nuclear RNAs (snRNAs) involved in RNA splicing and the microRNAs (miRNAs) involved in the modulation of gene expression through the alteration of target mRNA activity.
All RNA polymerases are dependent upon a DNA template in order to synthesize RNA. The resultant RNA is, therefore, complimentary to the template strand of the DNA duplex and identical to the non-template strand. The non-template strand is called the coding strand because its sequences are identical to those of the mRNA. However, in RNA, U is substituted for T and the intronic DNA sequences are removed from the RNAs through the process of splicing.back to the top
In prokaryotic cells, all three RNA classes are synthesized by a single polymerase. In eukaryotic cells there are three distinct classes of RNA polymerase, RNA polymerase (pol) I, II and III. Each polymerase is responsible for the synthesis of a different class of RNA. The capacity of the various polymerases to synthesize different RNAs was shown with the toxin α-amanitin. At low concentrations of α-amanitin synthesis of mRNAs are affected but not rRNAs nor tRNAs. At high concentrations, both mRNAs and tRNAs are affected. These observations have allowed the identification of which polymerase synthesizes which class of RNAs.
RNA pol I (RNAP I; also identified as RNA polymerase 7) is responsible for rRNA synthesis (excluding the 5S rRNA). The functional enzyme is a large (590 kDa) multi-subunit complex composed of 14 subunits. Twelve of the RNAP I subunits are identical to or related to subunits of the RNAP II complex. The genes that encode the subunits of the RNAP I complex are identified as POLR1 genes, with five distinct genes (POLR1A-POLR1E) expressed in humans. There are four major rRNAs in eukaryotic cells designated by their sedimentation size. The 28S, 5S 5.8S RNAs are associated with the large ribosomal subunit and the 18S rRNA is associated with the small ribosomal subunit.
RNA pol II (RNAP II) in humans is a large 550kDa complex composed of 12 distinct subunits. Each of the 12 subunits of the RNAP II complex are identified as RBP1-RBP12 and the genes that encode these subunits are POLR2A-POLR2L. The RBP1 subunit is the actual RNA polymerizing activity of the complex. This subunit is encoded by the POLR2A gene. The function of RNAP II is to synthesize all of the mRNAs and some of the small nuclear RNAs (snRNAs) involved in RNA splicing, and several microRNAs.
RNA pol III (RNAP III) is also a multisubunit complex and is composed of at least 17 proteins. Ten of the RNAP III subunits are unique to this complex, two are common with subunits of RNAP I, and five are common to all three RNAP complexes. The genes encoding the RNAP III-specific proteins are identified as POLR3A-POLR3H. All of the RNAs transcribed by RNAP III are small stable untranslated RNAs. The products of RNAP III include all of the tRNAs, the 5S rRNA, several microRNAs, and the U6 small nuclear RNA (snRNA) of the splicing machinery.back to the top
Synthesis of RNA exhibits several features that are synonymous with DNA replication. RNA synthesis requires accurate and efficient initiation, elongation proceeds in the 5' → 3' direction (i.e. the polymerase moves along the template strand of DNA in the 3' → 5' direction), and RNA synthesis requires distinct and accurate termination. Transcription exhibits several features that are distinct from replication.
1. Transcription initiates, both in prokaryotes and eukaryotes, from many more sites than replication.
2. There are many more molecules of RNA polymerase per cell than DNA polymerase.
3. RNA polymerase proceeds at a rate much slower than DNA polymerase (approximately 50–100 bases/sec for RNA versus near 1000 bases/sec for DNA).
4. Finally the fidelity of RNA polymerization is much lower than DNA. This is allowable since the aberrant RNA molecules can simply be turned over and new correct molecules made.back to the top
Signals are present within the DNA template that act in cis to stimulate the initiation of transcription. These sequence elements are termed promoters. Promoter sequences promote the ability of RNA polymerases to recognize the nucleotide at which initiation begins. Additional sequence elements are present within genes that act in cis to enhance polymerase activity even further. These sequence elements are termed enhancers. Transcriptional promoter and enhancer elements are important sequences used in the control of gene expression. The major defining differences between promoters and enhancers are that cis-acting promoter elements must be in a specific orientation and a relatively fixed position in order to properly function, whereas, enhancer elements can function in either orientation relative to the transcriptional start site and they can be displaced large distances relative to their naturally occurring locations and yet will still funciton as cis-acting enhancer elements.
E. coli RNA polymerase is composed of five distinct polypeptide chains. Association of several of these generates the RNA polymerase holoenzyme. The sigma (σ) subunit is only transiently associated with the holoenzyme. This subunit is required for accurate initiation of transcription by providing polymerase with the proper cues that a start site has been encountered. In both prokaryotic and eukaryotic transcription the first incorporated ribonucleotide is a purine and it is incorporated as a triphosphate. In E. coli several additional nucleotides are added before the sigma subunit dissociates.
The process of eukaryotic mRNA transcriptional initiation is an extremely complex event. There are numerous protein factors controlling initiation, some of which are basal factors present in all cells and others are specific to cell type and/or the differentiation state of the cell. Two basal promoter elements that are found in essentially all eukaryotic mRNA genes are the TATA-box and the CAAT-box. Many constitutively expressed mRNA genes (house-keeping genes) also contain a GC-box promoter element (generally GGGCGG). These elements are so called because of the DNA sequences that constitute the promoter element. The TATA-box can be found approximately 25–100 bases upstream (written -25 to -100) of the start site for transcription and the CAAT-box is generally in the -70 to -150 position. The TATA-box sequences are found ONLY in the coding strand of the gene (i.e. the strand that has the sequences identical to the resulting mRNA) while the CAAT-box and GC-box sequences are most often found in the template strand but can also reside in the coding strand. Many of the basal transcription factors are identified by the fact that they control the activity of RNA pol II. Thus, the nomenclature of these proteins is TFII, for transcription factor of RNA pol II. TFIID is the factor that binds to the TATA-box and its binding is facilitated by TFIIA. Once TFIID and TFIIA are bound TFIIB binds and this recruits RNA pol II to the promoter. Next TFIIE and TFIIH bind.
TFIIH is in fact a complex of ten proteins and this complex is not only involved in transcription but also in certain steps of DNA damage repair. The role of TFIIH in DNA repair can be seen as critical since defects in its function are responsible for certain forms of xeroderma pigmentosum. The critical role of TFIIH in transcription initiation is due to the fact that two of the proteins of the complex function to phosphorylate serine residues in the C-terminal domain (CTD) of the large subunit of RNA pol II. These two proteins are cyclin-dependent kinase 7 (encoded by the CDK7 gene) and cyclin H (encoded by the CCNH gene). The overall activity of CDK7 is regulated by interaction with cyclin H. The CTD of the large subunit of RNA pol II contains a tandem repeat sequence that is composed of the consensus heptad of amino acids: Y1S2 P3T4 S5P6 S7 which can be repeated from 25 to 52 times. It is Ser5 and Ser7 that become phosphorylated during transcriptional initiation. These serines are different from the serine (Ser2) phosphorylated in the CTD by P-TEFb involved in the capping process as discussed below. After transcriptional initiation has commenced and RNA pol II moves down the DNA template, factors TFIIA and TFIID remain on the promoter to allow for additional rounds of initiation to take place.
Elongation involves the addition of the 5'–phosphate of ribonucleotides to the 3'–OH of the elongating RNA with the concomitant release of pyrophosphate. Nucleotide addition continues until specific termination signals are encountered. Following termination the core polymerase dissociates from the template. In prokaryotic transcription, the core and sigma subunit can then reassociate forming the holoenzyme again ready to initiate another round of transcription.
In E. coli transcriptional termination occurs by both factor-dependent and factor-independent means. Two structural features of all E. coli factor-independently terminating genes have been identified. One feature is the presence of two symmetrical GC-rich segments that are capable of forming a stem-loop structure in the RNA and the second is a downstream A rich sequence in the template. The formation of the stem-loop in the RNA destabilizes the association between polymerase and the DNA template. This is further destabilized by the weaker nature of the AU base pairs that are formed, between the template and the RNA, following the stem-loop. This leads to dissociation of polymerase and termination of transcription. Most genes in E. coli terminate by this method. Factor-dependent termination requires the recognition of termination sequences by the termination protein, rho (ρ). The rho factor recognizes and binds to sequences in the 3' portion of the RNA. This binding destabilizes the polymerase-template interaction leading to dissociation of the polymerase and termination of transcription.
Transcriptional termination of eukaryotic mRNA genes occurs when RNA pol II encounters the sequence, 3'-TTATTT-5', in the template DNA which directs the incorporation of the termination and polyadenylation [poly(A)] signal, 5'-AAUAAA-3' in the mRNA. The processes of mRNA 3'-end polyadenylation is described in detail below. Following incorporation of the AAUAAA element into the mRNA, the cleavage and polyadenylation specificity complex, which is associated with the RNA pol II complex, recruits other proteins to the site. The proteins that are recruited then cleave the mRNA freeing it from the transcription complex and transcription terminates. RNA pol II activity can be terminated by this process within 500–2,000 nucleotides of the AAUAAA element. Termination of RNA pol I transcription requires an RNA pol I specific termination factor that is a DNA-binding proteins. Termination of RNA pol III transcription occurs following the incorportation of a series of U residues in the transcript.back to the top
When transcription of bacterial rRNAs and tRNAs is completed they are immediately ready for use in translation. No additional processing takes place. Translation of bacterial mRNAs can begin even before transcription is completed due to the lack of the nuclear-cytoplasmic separation that exists in eukaryotes. The ability to initiate translation of prokaryotic RNAs while transcription is still in progress affords a unique opportunity for regulating the transcription of certain genes. An additional feature of bacterial mRNAs is that most are polycistronic. This means that multiple polypeptides can be synthesized from a single primary transcript. Polycistronic mRNAs are very rare in eukaryotic cells but have been identified. The mitochondrial genomes in mammals and the slime mold, Dictyostelium discoideum, encode polycistronic mRNAs that are processed into primarily mono-, di-, and tricistronic transcripts. In addition, several viruses encode polycistronic RNAs.
In contrast to bacterial transcripts, eukaryotic RNAs (all 3 classes) undergo significant processing, some of which occurs co-transcriptionally and some post-transcriptionally. All three classes of RNA are transcribed from genes that contain introns. The RNA sequences encoded by the intronic DNA must be removed from the primary transcript prior to the RNA being biologically active. The process of intron removal is called RNA splicing. Additional processing occurs to mRNAs that can alter the 5'- and 3'-ends of the transcripts.
The 5' end of nearly all eukaryotic mRNAs are capped with a unique 5' → 5' linkage to a 7-methylguanosine residue. Synthesis of the mRNA cap structure is catalyzed by the bifunctional enzyme encoded by the RNGTT gene (RNA guanylyltransferase and 5'-phosphatase). The RNGTT encoded enzyme possesses mRNA 5'-triphosphatase activity in the N-terminal portion of the enzyme and mRNA guanylyltransferase activity in the C-terminal part. The mRNA 5'-triphosphatase activity of the enzyme hydrolyzes the 5'-triphosphate group of the 5'-nucleotide of the mRNA to generate a diphosphate-mRNA. The guanyltranferase activity then adds GMP to the diphosphate-mRNA generating the 5' → 5' triphosphate linkage. The guanine residue of the cap is then methylated by a second enzyme encoded by the RNMT gene (RNA guanine-7 methyltransferase). The capped end of the mRNA is thus, protected from exonucleases and more importantly is recognized by specific proteins of the translational machinery.
The capping process occurs after the newly synthesizing mRNA is around 20–30 bases long, at which point RNA pol II pauses. While RNA pol II is paused on the template, the kinase complex, known as positive transcription elongation factor b (P-TEFb), phosphorylates RNA pol II on the serine-2 residue (Ser2) in the repeat unit of the C-terminal domain (CTD) of the large subunit of the enzyme. The P-TEFb complex is composed of cyclin-dependent kinase 9 (CDK9) and either cyclin T1, T2, or K. The complex is also called C-terminal domain kinase 1 (CTDK1). This pausing and regulatory phosphorylation event allows for the potential of attenuation in the rate of transcription.
Structure of the 5'-cap of eukaryotic mRNAs. The cap structure present on most eukaryotic mRNAs consists of a 7-methylguanosine (m7G) coupled to the 5'-terminal nucleotide of the mRNA in a unique 5' → 5' triphosphate linkage.
Almost all mammalian mRNAs are polyadenylated at the 3'-end. A specific sequence, AAUAAA, is the primary sequence recognized by one of several proteins and multiprotein complexes. In addition to the AAUAAA sequence element in the mRNA, an upstream UGUA sequence and a downstream GU-rich element act in cis to promote the recognition of the 3'-end of an mRNA by the cleavage and polyadenylation complexes. These protein complexes are responsible for recognizing the cis-acting signals in the mRNA and then catalyzing the mRNA cleavage and subsequent polyadenylation reactions. In mammals the 3'-end cleavage and polyadenylation reactions are regulated by the interactions of four multiprotein complexes identified as the cleavage and polyadenylation specificity factor (CPSF), cleavage stimulatory factor (CSTF), cleavage factor I (CFIm; also identified as CFIm where the "m" refers to mRNA), and cleavage factor II (CFIIm). In addition to these four complexes the actual polyadenylation reactions are catalyzed by poly(A) polymerases (PAP). Additional proteins required for mRNA polyadenylation are nuclear poly(A)-binding protein (encoded by the PABPN1 gene), symplekin, and the C-terminal domain (CTD) of the large subunit of RNA pol II.
The CPSF is composed of at least four distinct proteins that were originally identified and named based upon their molecular weights. These four proteins are called CPSF-30, CPSF-73, CPSF-100, and CPSF-160 where the number represents the protein size in kDa. The CPSF-160 protein is encoded by the CPSF1 gene. The CPSF-160 protein physically binds to the AAUAAA sequence in the mRNA. The CPSF-100 protein is encoded by the CPSF2 gene. The CPSF-73 proteins is encoded by the CPSF3 gene. The CPSF-73 protein is a hydrolase that cleaves the mRNA downstream of the AAUAAA sequence element. The CPSF-30 protein in encoded by the CPSF4 gene. An additional protein that is found associated with the CPSF, that links the CPSF with poly(A) polymerases [specifically poly(A) polymerase alpha], is encoded by the FIP1L1 gene (factor interacting with PAPOLA and CPSF1). The FIP1L1 protein binds to U-rich sequences that reside upstream (5') of the AAUAAA element and stimulates poly(A) polymerase activity.
The cleavage stimulatory factor (CSTF) is a complex composed of three distinct proteins. These proteins are identified as CSTF1 (50 kDa protein), CSTF2 (64 kDa protein), and CSTF3 (77 kDa proteins) and each is encoded by a gene of the same name. The recruitment of the CSTF complex to the 3'-end of an mRNA is stimulated by the CPSF complex.
Cleavage factor I (CFIm) contains a 68 kDa protein encoded by the CPSF6 gene (cleavage and polyadenylation specific factor) and a smaller 25 kDa subunit. The binding of CFIm to the mRNA is facilitated by the RNA recognition motif in the N-terminus of the 68 kDa CPSF6 encoded protein. The primary function of CFIm is to recognize and bind the UGUA element in the mRNA. In addition to binding the UGUA element, CFIm has been shown to be involved in the regulation of alternative splicing. Functional CFIIm is a complex consisting of an essential component (identified as CFIIAm) and a stimulatory component (identified as CFIIBm). The CFIIAm component of the complex is composed of two proteins. These two proteins are encoded by the CLP1 gene (cleavage and polyadenylation factor I subunit 1) and the PCF11 gene (protein 1 of cleavage factor I). The CLP1 encoded protein of the CFIIm complex interacts with the CFIm complex and also with the CPSF complex.
Humans express a family of three polyadenylate polymerases (PAP), identified as poly(A) polymerase alpha (PAPOLA gene), poly(A) polymerase beta (PAPOLB gene), and poly(A) polymerase gamma (PAPOLG gene). These poly(A) polymerases possess both mRNA endonuclease activity and polyadenylate polymerase activity. The endonuclease activity cleaves the primary mRNA approximately 11–30 bases 3' of the AAUAAA sequence element. A stretch of 20–250 adenosine residues is then added to the 3'-end by the non-template requiring polyadenylate polymerase activity of the enzymes.
Processes of mRNA polyadenylation. RNA polymerase terminates mRNA transcription up to 500 nucleotides after incorporation of the AAUAAA element. The combined activities CPSF, CSTF, CFIm, CFIIm, symplekin, PABPN1, poly(A) polymerase, and the CTD of RNA pol II result in accurate and efficient transcriptional termination, cleavage of the pre-mRNA 10–30 bases 3' of the AAUAAA element, and addiiton of the poly(A) tail to the mRNA.
In addition to intron removal in tRNAs, extra nucleotides at both the 5' and 3' ends are cleaved, the sequence 5'–CCA–3' is added to the 3' end of all tRNAs and several nucleotides undergo modification. There have been more than 60 different modified bases identified in tRNAs.
Both prokaryotic and eukaryotic rRNAs are synthesized as long precursors termed pre-ribosomal RNAs. In eukaryotes a 45S pre-ribosomal RNA serves as the precursor for the 18S, 28S and 5.8S rRNAs.back to the top
The removal of intronic RNA from precursor mRNA, tRNA, and rRNA molecules, in humans and other higher eukaryotes, requires a complex machinery termed the spliceosome which is composed of numerous small nuclear RNAs (snRNAs) and numerous proteins. The spliceosome catalyzes the reactions that result in intron removal and the joining together of the protein-coding exons. The spliceosome has been shown to be composed of as many as 300 distinct proteins and five RNAs. The five small nuclear RNAs (snRNAs) that constitute the spliceosome RNAs are identified as U1, U2, U4, U5, and U6. Each of these snRNAs is around 100–300 nucleotides in length and each are associated with several proteins forming individual small nuclear ribonucleoprotein (snRNP: pronounced "snurp") complexes. The composition of the U1 snRNP consists of the U1 snRNA and at least 10 proteins. The composition of the U2 snRNP consists of the U2 snRNA and at least 19 proteins. The composition of the U4/U6 snRNP consists of the U4 and U6 snRNAs and at least 12 proteins. The composition of the U5 snRNP consists of the U5 snRNA and at least 15 proteins. Several of the proteins present in the snRNP complexes are members of the DEAD-box helicase family of enzymes that are involved in numerous aspects of RNA metabolism. The original members of the DEAD-box helicase family were so-called because they all contained the four amino acid sequence: D-E-A-D (Asp-Glu-Ala-Asp). As a result of the isolation of variant family members, the family is more commonly referred to as the DExD/H-box protein family. Additional important protein components of the overall spliceosome are members of the SR protein family. These proteins get their name from the fact that they are enriched in Ser/Arg residues. At least 18 different SR protein encoding genes have been identified in the human genome. The activity of the SR proteins in the splicing process is controlled by their state of phosphorylation.
Introns in higher eukaryotic mRNAs can be of considerable length, in many cases spanning several thousands of bases and sometimes comprising up to 90% of the precursor mRNA. In addition, numerous precursor mRNAs undergo alternative exon splicing, a process controlled by many factors such as the cell type in which the mRNA gene is expressed. The size and the number of introns in many mRNAs, in addition to the potential for alternative splicing, present an array of complexities that govern the control of, and catalytic processes of intron removal and exon joining.
The vast majority of eukaryotic mRNAs contain a highly conserved set of dinucleotides at the boundaries of every intron. These highly conserved sequences are GU at the 5'-end of the intron and AG at the 3'-end (shown in Figure below). In addition to theses highly conserved cis-acting sequence elements there are several other important sequence elements in most introns that are necessary to control efficient and accurate splicing. Introns that contain the GU-AG consensus are spliced by the major U1, U2, U5, and U4/U6 snRNP containing spliceosomes. These introns are spliced by what is called the U2-type spliceosome. However, numerous introns have been characterized whose 5'-end and 3'-end consensus sequences are AT-AC instead of the more typical GU-AG. These second type of intron has been shown to be spliced by a spliceosome composed of a different set of snRNPs, specifically the U4/U6atac, U5, U11, and U12 snRNPs. The AT-AC introns are spliced by what is called the U12-type spliceosome. To date no precursor RNA has been identified that contains intronic RNA sequences that are spliced by both types of spliceosome. All spliced RNAs contain exclusively U2-type introns (the majority) or U12-type introns.
Consensus elements of U2-type introns. Introns that are spliced by the U2-type spliceosome contain the consensus sequences GU and AG at the 5'-end and 3'-end, respectively. These consensus sequences are found in 100% of U2-type introns. Additional cis-acting sequences in the intron include the branch point and poly(Y) tract. The designations for the nucleotides in the consensus elements are: N: any nucleotide; R: purine; Y: pyrimidine.
The first stage in U2-type intron splicing in mRNAs is recognition of the GU consensus element at the 5'-end of the intron by the U1 snRNP. The branch point sequence element is recognized by an additional factor called splicing factor 1, SF1 (also called the branch point binding protein, BBP). This is followed by recognition of the AG consensus element at the 3'-end of the intron and the poly(Y) tract by the U2 snRNP. Binding of the U2-snRNP results in displacement of the SF1. Once the U1 and U2 snRNP complexes are bound to the mRNA, the complex consisting of the U4/U6, and U5 snRNPs (called the tri-snRNP complex) binds to the mRNA. At this point the splicing complex is referred to as the pre-catalytic spliceosome complex. The next step involves release of the U1 and U4 snRNPs. The complex of mRNA, U2, U5, and U6 snRNP is now catalytically active and the intron is removed and the upstream and downstream exons are joined together.
The process of alternative splicing involves multiple interactions between splicing proteins and snRNPs that results in different patterns of exon joining from the same pre-mRNA in different cell types or under different stages of development and differentiation. Alternative splicing allows for the generation of protein isoforms that exhibit different biological properties, that differ in protein-protein interaction, that are localized to different subcellular locations, or that exhibit different catalytic activities and/or abilities. The process of alternative splicing has been identified to occur in the primary transcripts from at least 80% of all human protein coding genes.
The molecular decisions that control which exon(s) is removed and which exon(s) is included in a resultant mRNA involves both cis-acting RNA sequence elements and various protein regulators. The various cis-acting regulatory elements of an mRNA have been divided into four categories: exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), intronic splicing enhancers (ISEs) and intronic splicing silencers (ISSs). The ESEs are usually bound by members of the SR protein family which were described above. Proteins that are known to interact with the ISS and ESS sequences of the mRNA are members of the heterogeneous nuclear RNP (hnRNP) family. There are 14 known hnRNP encoding genes in the human genome. Several additional proteins are necessary for alternative splicing and these proteins (at least 18 characterized members) are expressed in a tissue-specific patterns. In addition to cis-acting sequence elements in the control of alternative splicing, secondary structure in the mRNA itself is known to regulate the alternative splicing process.
The overall process of alternative splicing requires that certain proteins are expressed that allow for splice site recognition and selection as well as expression of proteins that inhibit splice site recognition. In most cases of alternative splicing the regulation and specificity of which introns are removed and which exons are joined together is the result of a combinatorial interaction between both cis- and trans-acting activators and inhibitors.
There are several different classes of reactions involved in intron removal. The two most common are the group 1 and group 2 introns. Group 1 introns are found in mRNA, tRNA, and rRNA molecules found in the chloroplasts and mitochondria of lower eukaryotic organisms as well as being found in bacterial RNA molecules. Group 2 introns are found in mRNA, tRNA, and rRNA molecules found in the chloroplasts and mitochondria of fungi, plants, and protists. The characteristic feature of both group 1 and group 2 introns is that they are self-splicing. The removal of these types of introns is catalyzed by the RNA itself via the ribozyme activity inherent in the RNA.
Group 1 introns require an external guanosine nucleotide as a cofactor. The 3'–OH of the guanosine nucleotide acts as a nucleophile to attack the 5'–phosphate of the 5' nucleotide of the intron. The resultant 3'–OH at the 3' end of the 5' exon then attacks the 5' nucleotide of the 3' exon releasing the intron and covalently attaching the two exons together. The 3' end of the 5' exon is termed the splice donor site and the 5' end of the 3' exon is termed the splice acceptor site.
Self splicing intron mechanisms. RNA-mediated (ribozyme) self splicing comprising two main categories. Group 1 self splicing utilizes a free GTP residue to initiate the catalysis reactions. Group 2 self splicing introns utilize a adenine residue within the intron sequence itself to initiate the catalysis reactions. During group 2 splicing reactions a lariat structure is formed within the intronic RNA.
Group 2 introns are spliced similarly except that instead of an external nucleophile the 2'–OH of an adenine residue within the intron is the nucleophile. This residue attacks the 3' nucleotide of the 5' exon forming an internal loop called a lariat structure. The 3' end of the 5' exon then attacks the 5' end of the 3' exon as in group I splicing releasing the intron and covalently attaching the two exons together.back to the top
The presence of introns in eukaryotic genes would appear to be an extreme waste of cellular energy when considering the number of nucleotides incorporated into the primary transcript only to be removed later, as well as the energy utilized in the synthesis of the splicing machinery. However, the presence of introns can protect the genetic makeup of an organism from genetic damage by outside influences such as chemical or radiation. An additionally important function of introns is to allow alternative splicing to occur, thereby, increasing the genetic diversity of the genome without increasing the overall number of genes. By altering the pattern of exons that are spliced together, from a single primary transcript, different proteins can arise from the processed mRNA from a single gene. Alternative splicing can occur either at specific developmental stages or in different cell types. As indicated earlier, the process of alternative splicing has been identified to occur in the primary transcripts from at least 80% of all human protein coding genes. One of the first clinically relevant examples of alternative splicing in humans involved the calcitonin gene (CALCA). Depending upon the site of transcription, the calcitonin gene yields an RNA that synthesizes calcitonin (thyroid) or calcitonin-gene related peptide (CGRP, brain). Even more complex is the alternative splicing that occurs in the α-tropomyosin transcript. At least eight different alternatively spliced α-tropomyosin mRNAs have been identified.
Abnormalities in the splicing process can lead to various disease states. Diseases that have been identified as being due to alteration in, or the result of, alternative splicing are numerous. The causes of the alterations in the alternative splicing process are also numerous. There are diseases that are the result of mutations in splicing regulatory sequences in exons (e.g. the spinal muscular atrophies, SMA) resulting in inappropriate exon skipping. Alterations in alternative splicing can also lead to changes in protein isoform ratios that ultimately results in manifestation of disease (e.g. the diseases of the brain that result from abnormal accumulation of the tau protein). Mutations in sequences within introns can lead to the activation of cryptic splice sites resulting in abnormally spliced exons. Numerous diseases are the result of mutations in either the 5'- or the 3'-splice sites such as various β-thalassemias. Diseases are also caused by mutations in genes the encode proteins of the spliceosomal machinery. Numerous human cancers are caused by mutations that alter splice site selection, particularly in tumor suppressor genes, or by mutations in genes encoding protein factors of the splicing machinery. Patients suffering from a number of different connective tissue diseases exhibit humoral auto-antibodies that recognize small nuclear RNA-protein complexes (snRNPs). Patients suffering from systemic lupus erythematosis (SLE) have auto-antibodies (anti-nuclear antibodies) that recognize the U1 RNA of the spliceosome.back to the top
As recently as 15 years ago it was believed that the only non-coding RNAs were the tRNAs and the rRNAs of the translational machinery. However, in a landmark study published in 1993 on the control of developmental timing in the roundworm Caenorhabditis elegans it was shown that the control of one gene was exerted by the small non-coding RNA product of another gene. This regulatory gene is identified as lin-4 (lin-4 controls the activity of the lin-14 gene product) and it codes for two RNAs, one is approximately 22 nucleotides (nt) and the other is approximately 61 nt. Examination of the sequences of the larger RNA revealed that it could form a stem-loop structure which then serves as the precursor for the shorter RNA. The shorter lin-4 RNA is considered the founding member of class of small non-coding regulatory RNAs called microRNAs or miRNAs that consist, in their functional state, of approximately 22 nt. It is predicted that at least 250 miRNA genes are present in the human genome. The vast majority of the miRNA genes in the human genome are the product of RNA polymerase II activity.
The processing and functioning of miRNAs is similar to that of the RNA silencing pathway identified in plants known as the post-transcriptional gene silencing (PTGS) pathway and the RNA inhibitory (RNAi) pathway in mammals. For more details go to the Control of Gene Expression page. The RNAi pathway involves the enzymatic processing of double-stranded RNA into small interfering RNAs (siRNAs) of approximately 22–25 nt that may have evolved as a means to degrade the RNA genomes of RNA viruses such as retroviruses. The pathway of processing both miRNAs and siRNAs in diagrammed in the Figure below. The stem-loop of the primary miRNA gene transcript (pri-miRNA) is first cleaved through the action of the RNase III-related activity called Drosha which takes place in the nucleus and generates the precursor miRNA (pre-miRNA). In the siRNA pathway the duplex RNAs are cleaved into 22–25 nt pieces through the action of the enzyme Dicer in the cytosol. Processed miRNA stem-loop structures are transported from the nucleus to the cytosol via the activity of exportin5. In the cytosol the processed miRNA stem-loop is targeted by Dicer which removes the loop portion. The nomenclature of the mature miRNA duplex is miRNA:miRNA*, where the miRNA* strand is the non-functional half of the duplex. Ultimately, fully processed miRNAs and siRNAs are engaged by the RNA-induced silencing complex (RISC) which separates the two RNA strands. The active strand of RNA derived either from the miRNA or siRNA pathway is anti-sense to a region of the target mRNA.
Model for processing miRNAs and siRNAs. miRNA genes are transcribed as larger precursor RNAs that are then processed via the action of the Drosha enzyme, within the nucleus, to a pre-miRNA. The pre-miRNA is then transported to the cytosol. Within the cytosol the pre-miRNA is further processed via the actions of the Dicer complex and an RNA helicase to the functional single-stranded functional miRNA. The miRNA is engaged by the RISC complex and associates with the appropriate target mRNA. Following mRNA-miRNA interaction the mRNA is degraded as well as being translationally inhibited. The net result is a reduction (knock down) in gene expression at the level of a given mRNA and protein.back to the top
Another biologically significant class of non-coding RNAs are termed the long non-coding RNAs, designated lncRNA. Like the majority of the small non-coding RNAs of the miRNA family, the vast majority of lncRNAs are transcribed by RNA polymerase II. The distinction for the term long non-coding RNA is that these RNA molecules are greater than 200 nucleotides. Most of the lncRNAs are post-trasncriptionally processed like mRNAs, the other major RNA polymerase II derived transcripts. Most of the lncRNAs are capped, polyadenylated, and spliced. However, unlike mRNAs which are only functional in the cytoplasm, the subcellular localization of lncRNAs are diverse including nuclear, cytoplasmic, and extracellular. In addition to diverse localization, the functions of this class of RNA are also highly diverse.
Most of the lncRNAs (representing the two major classes of lncRNA) are transcribed from either intergenic regions (i.e. between mRNA genes) or from the opposite strand of protein coding mRNA genes. The intergenic lncRNAs are referred to as large intergenic non-codong RNAs and given the designation, lincRNA. The lncRNAs that are transcribed across protein coding mRNA genes but in the opposite direction utilizing the opposite strand of DNA are referred to as natural antisense transcripts and given the designation, NAT. Of the two major lncRNA classes the lincRNA class is by far the largest with over 10,000 identifed genes. Although there are two major classifications for lncRNAs there are a number of other types of functional lncRNAs and the processes by which a fully functional lncRNA are derived are also quite diverse. The next most abundant classes of lncRNAs are those that originate within enhancer elements (called eRNAs) and those that originate from promoter elements (called PROMPTs). Dependent on the mechanism for processing the 3'-end of certain lncRNAs another class, derived from intergenic regions, as for the lincRNAs, contain a 3' triple helical domain. Another class of lncRNA molecules contain small nucleolar RNA (snoRNA) structures at the 5'- and 3'-ends (called sno-lncRNAs). Another class of lncRNA is derived from intronic sequences and when fully processed are circular RNAs whose ends are connected via a 2',5'-phosphodiester linkage (called ciRNAs) or via a 3',5'-phosphodiester linkage (called circRNAs).
Accumulating evidence has demonstrated that lncRNAs, like the miRNAs, have important roles in the regulation of gene expression at both the transcriptional and post-transcriptional levels in diverse cellular contexts and a variety of biological processes. The lncRNAs that remain in the nucleus have been shown to play roles in the integrity of the structure of the nucleus and in the regulation of expression of nearby genes. These lncRNA effects are referred to as cis-acting effects. Nuclear lncRNAs can also exert transcriptional effects via trans-acting effects through interactions with other proteins (e.g. transcription factors or RNA-binding proteins), RNAs (e.g. miRNAs), or DNA. The lncRNAs localized to the cytosol can also exert trans-acting effects on gene expression by interacting with proteins and RNAs. Based upon observation of lncRNA localization and function three primary classifications of these RNAs have been designated. One class are those lncRNAs that are absolutely nuclear and exert their effects in cis, another are those lncRNAs that are mainly nuclear localized and exert their effects in trans, and lastly those lncRNAs that primarily localized to the cytoplasm.
Nuclear lncRNAs that exert their effects in cis can carry out these effects in numerous ways. The lncRNAs can form DNA-RNA triple helical structures that anchor the lncRNA to the promoter regions of targeted genes. The nuclear lncRNAs can also recruit transcription factors or chromatin modifiers to the local regions of the chromosome where the lncRNA gene resides and, thereby, affect local transcriptional events. An important example of this cis-acting effect of lncRNA is the regulated expression of the gene encoding the transcription factor MYC. MYC is a critical transcription factor regulating the expression of hundreds of genes whose encoded proteins control cell growth and differentiation events. The lncRNA identified as CCAT1-L (colon cancer associated transcript 1) is transcribed from the upstream super enhancer region of the MYC gene. The accumulation of the CCAT1-L RNA with this enhancer results in the recruitment of the insulator protein CTCF (a chromatin organizer) resulting in enhanced transcription of the MYC gene. The CTCF protein is also involved in the pattern of imprinting at the IGF-2 locus. Imprinting effects are also exerted by lncRNAs as evidenced by the effects of the NAT encoded from the antisense strand of the AIR (acute insulin response) gene, referred to as the Airn lncRNA. The Airn RNA recruits the histone methyltransferase encoded by the EHMT2 gene (also known as KMT1C) to the locus of the IGF-2 receptor gene (IGF2R) to maintain the imprinted status of that locus which contains several other imprinted genes. Cytoplasmic lncRNAs exert their effects on gene expression as well. The mechanisms include interference with post-translational protein modifications, directly interfering with mRNA translation, activating mRNA decay processes, and acting as decoy targets for miRNAs.
Given that the evidence is clear that lncRNAs exert numerous important effects on the regulation of expression of numerous genes, it is not surprising that mutations in lncRNA genes, as well as dysregulation in lncRNA functions, have been correlated to numerous disease states in humans. Indeed the progression of diabetes, breast cancer, ovarian cancer, prostate cancer, hepatocellular cancer, colon cancer, lung cancer, and bladder cancer has been associated with abnormal lncRNA activity. Indeed, more than 200 human diseases have been shown to be associated with lncRNA activity. The H19 gene encodes a lncRNA whose expression is regulated by imprinting. The H19 gene is only expressed from the maternal allele. Overexpression of H19 is associated with the development of breast cancers. Several lncRNAs function as tumor-suppressor non-coding RNAs while other lncRNAs function as oncogenic non-coding RNAs. The MALAT1 (metastasis associated lung adenocarcinoma transcript 1) lncRNA is overexpressed in a number of different types of lung, cervical, hepatocellular, and colorectal cancers. The normal function of the MALAT1 lncRNA is the regulation of alternative splicing and, therefore, it is suspected that overexpression leads to aberrant splicing events resulting in loss of synthesis of important regulatory proteins.back to the top
RNA editing was a term first used to describe an unusual form of post-transcriptional processing involving the insertion of uridine (U) residues into a mitochondrial mRNA found in Trypanosoma brucei. This particular form of editing was then found to occur in many eukaryotic mRNAs. The process of RNA editing is now known to encompass a wide variety of mechanistically unrelated processes that change the nucleotide sequence of an RNA species relative to that directed by the encoding DNA. Currently RNA editing systems are divided into two general classes: substitution and insertion/deletion. In the first class, the coding sequences of a mature RNA and its gene are co-linear as they contain the same number of nucleotides but differ in nucleotide sequence where editing has occurred. In the second class, the nucleotide sequence of the mature RNA product is not co-linear with that of its DNA coding sequence since the final RNA product contains extra nucleotides relative to the encoding gene. All of the major types of cellular RNA (mRNA, rRNA, and tRNA) have been shown to be subject to editing in different organisms.
The term "RNA editing" is not used to refer to RNA modifications such as 5'-capping, splicing, and 3'-polyadenylation, nor to the formation of modified nucleosides in RNA (as is typical in tRNAs). However, it is important to keep sight of the fact that the distinctions between “RNA editing” and “RNA modification” can be less than obvious. To illustrate this fact, consider that there are instances of RNA editing involving deamination of A residues forming I (inosine) residues (see next section). If this editing occurs in the coding region of an mRNA, the edited site (I) is recognized as G during translation. However, it is also known that A residues in the wobble position of tRNA anticodons (the 5'-nucleotide) undergo deamination (by an evolutionarily related enzyme) to I, which similarly results in a change in the anticodon pairing properties. Thus, under these circumstances editing and modification can result in the same effects at the level of the resultant protein.
RNA editing systems have been identified that result in changes in A residues to I residues, referred to as A-to-I editing systems, or changes in C residues to U residues, referred to as C-to-U editing systems. The enzymes that catalyze the A-to-I edits are members of a family of adenosine deaminases that act on RNA (ADARs). This distinguishes these enzymes from the adenosine deaminase involved in the catabolism and salvage of purine nucleotides. The enzymes that catalyze C-to-U edits are called cytosine deaminases that act on RNA (CDARs). A sequence comparative analysis of ADAR and CDAR sequences demonstrated that they all belong to a superfamily of RNA-dependent deaminases that also includes tRNA-speciﬁc deaminases (ADATs). A common feature of ADARs, CDARs, and ADATs is the presence in the deaminase domain of conserved residues that are essential for catalysis. All three types of deaminases likely arose from an ancestral cytidine deaminase via the acquisition of RNA-binding domains.
The clinical significances of the editing of human RNAs is demonstrated by the observations that mutations in the ADAR1 gene are associated with rare autosomal skin pigmentation disorder (dyschromatosis symmetrica hereditaria, DSH) and with Aicardi-Goutières syndrome (AGS), an early-onset encephalopathy that often results in severe and permanent neurological damage. Defective RNA editing is also associated with a number of neurological diseases including suicidal depression, epilepsy, schizophrenia, and amyotrophic lateral sclerosis (Lou Gherig disease).
The process of A-to-I editing occurs on nuclear transcripts and is catalyzed by a family of enzymes referred to as ADARs. ADAR activity was initially characterized as a double-stranded RNA (dsRNA) unwinding activity and as such, these observations emphasize that ADARs are dsRNA-binding proteins and that their catalytic activity is directed toward duplex regions in RNA. Although the most biologically significant functions of ADARs is site-specific deamination in mRNA, it is known that RNA duplex regions in several types of non-coding RNAs, including microRNAs (miRNAs) and small interfering RNAs (siRNAs), as well as some viral RNAs are also substrates for ADARs.
Three mammalian ADAR genes give rise to four known isoforms: ADAR1p150, ADAR1p110, ADAR2 and ADAR3. Alternative promoter usage within the ADAR1 transcript generates the full length (ADAR1p150) isoform and an N-terminally truncated (ADAR1p110) isoform. Both ADAR1 isoforms contain three dsRNA-binding domains and the deaminase domain. The ADAR1 variants and ADAR2 are expressed in many tissues, whereas the ADAR3 protein is only expressed in the brain. Although ADAR3 is presumed to be catalytically inactive, it may compete with ADAR1 and -2 for RNA binding substrates, thereby, altering the overall profile of edited RNAs via that mechanism.
The vast majority of A residues that are targets for editing are localized near splice junctions in the pre-mRNA. The formation of a dsRNA-ADAR substrates in intronic sequences could, therefore, obscure splice sites from the splicing machinery resulting in alternative splicing events. In addition, the editing of select A residues could lead to the creation or elimination of splicing sites which also could result in alternative splicing events.
A-to-I editing occurs in more RNAs than does C-to-U editing. By far, most of the mammalian mRNAs found to undergo A-to-I editing are expressed in the nervous system. Physiologically significant examples are transcripts of the ionotropic glutamate receptor (GluR) family and the serotonin receptor family. In both cases the deamination of exonic A residues leads to single amino acid changes in the resulting proteins.
Editing of glutamate receptor mRNA occurs specifically in the mRNA encoding the GluA2 (GluR2) subunit of the 2-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptors. Editing of the GluA2 mRNA occurs at two non-synonymous sites termed the Q/R and R/G sites. These sites are so-called because the editing results in the change of a glutamine residue for an arginine residue in the first site and a change of arginine for glycine in the second. The Q/R site is encoded by exon 11 and resides within the second transmembrane domain (TMII) of the protein. The R/G site is located just one nucleotide from the boundary between exon 13 and the downstream intron. When this site is edited, splicing favors inclusion of exon 15 over that of exon 14. With respect to the Q/R site, editing has a profound effect on the calcium permeability of the resulting AMPA receptor. Calcium permeability of all AMPA receptor isoforms is controlled by the GluA2 subunit. In unedited GluA2 proteins the presence of the Q residue allows Ca2+ permeability whereas the edited amino acid (R) does not. Almost all of the GluA2 present in the human brain is edited. The importance of GluA2 mRNA editing can be demonstrated by the phenotype of ADAR2 knockout mice. These mice have significantly reduced editing of the Q/R site which causes them to be highly seizure-prone, and they die within 3 weeks of birth.
Editing of the serotonin receptor mRNA occurs specifically in the 5-HT2C subtype within the cells of the prefrontal cortex. This mRNA, encoded by the HTR2C gene, contains five sites that are A-to-I edited. These sites are referred to as A, B, C' (E), C, and D. The most commonly detected edited 5-HT2C mRNAs are edited at the AC'C, ABD, and ABCD combination sites. There is a strong correlation to severe psychiatric behaviors and 5-HT2C mRNA editing combinations. In victims of suicide, who had been diagnosed with a history of major depression, the level of C' editing is much higher and the level of D editing is significantly decreased when compared in unaffected individuals. Interestingly, when mice are treated with the antidepressant, fluoxetine, the pattern of C, C', and D editing in the 5-HT2C mRNA is the exact opposite to that observed in victims of suicide.
A-to-I editing also occurs in the non-coding region of the ADAR2 pre-mRNAs. The consequence of ADAR2 editing its own mRNA is the generation of an alternative splice acceptor site in intron 1, resulting in an alternative splicing event that creates a nonfunctional ADAR2 protein.
The A-to-I editing process also influences the biogenesis and target recognition of siRNAs involved in the RNAi pathway (see above). siRNA biogenesis requires processing of long dsRNA precursors into 21- to 23-nucleotide RNA duplexes which ultimately initiate transcriptional and post-transcriptional sequence-specific silencing. For details on the processing of siRNAs (and miRNAs) go to the Control of Gene Expression page. The RNA editing and RNAi pathways both involve dsRNAs, therefore, editing could potentially antagonizing the RNAi pathway. A-to-I edits could potentially alter the required dsRNA structures of siRNAs (and miRNAs) leading to reduced processing and thus, decreased functional siRNAs. In addition, editing of siRNAs and miRNAs could change their proper targeting to sequence-specific silencing sites in target mRNAs.
The first reported instance of C-to-U editing was within the mRNA encoding apolipoprotein B (apoB). Editing of the apoB mRNA changes a CAA codon to a UAA translational stop codon leading to premature termination of protein synthesis. When the apoB gene is transcribed within hepatocytes the mRNA is not edited and a full-length apoB protein is generated called apoB-100. This apolipoprotein (apoB-100) is found exclusively with the VLDL particles produced and secreted by the liver. Within intestinal enterocytes, the apoB mRNA is edited resulting in the generation of a smaller protein called apoB-48. This apolipoprotein (apoB-48) is found exclusively associated with chylomicrons, the lipoprotein particles produced by the intestines and released to the lymphatic system. C-to-U editing of the apoB mRNA requires a single-stranded RNA template with well defined characteristics in the immediate vicinity of the edited base, as well as protein cofactors that assemble into a functional complex referred to as a holoenzyme or editosome. This functional complex includes a minimal core composed of apolipoprotein B mRNA editing enzyme, catalytic polypeptide 1 (APOBEC-1; the catalytic deaminase) and a competence factor, APOBEC-1 complementation factor (A1CF). The function of A1CF is to act as an adaptor protein by binding both the APOBEC-1 enzyme and the mRNA substrate.
Another example of C-to-U mRNA editing involves site-specific deamination of a CGA to UGA codon in the neurofibromatosis type 1 (NF1) mRNA. The NF1 mRNA encodes a protein identified as neurofibromin 1. The editing of the NF1 mRNA introduces a translational stop codon at position 3916 that results in a truncation of the neurofibromin 1 protein in a critical domain involved in GTPase activation. Although no demonstration of a truncated NF1 protein has been shown, the editing of the NF1 mRNA has been demonstrated in peripheral nerve sheath tumors from patients with type 1 neurofibromatosis.
A third C-to-U edited mRNA encodes eukaryotic initiation factor 4, gamma 2, eIF-4G2 (also identified as p97, DAP5, and NAT1) which is a translational repressor that may be involved in repression of global translation. The editing of the eIF-4G2 mRNA was identified in studies that demonstrated the oncogenic potential of APOBEC-1 when it was overexpressed in experimental animals. In these studies it was found that the eIF-4G2 mRNA underwent C-to-U editing at multiple sites, creating of stop codons that in turn reduced the abundance of the eIF-4G2 protein. The eIF-4G2 protein has a crucial role in early embryogenesis since eIF-4G2-negative embryos die during gastrulation. Although the precise mechanism through which elevated APOBEC-1 activity leads to dysplasia and cancer is not yet defined, host adaptations have been shown to modulate the expression of APOBEC-1 in sporadic human colorectal cancers.
Editing of the apoB mRNA: When the apoB gene is expressed in the liver the resulting mRNA is not edited and is translated into the full-length apoB-100 protein present in VLDL. When the gene is transcribed in the intestines, editing of the mRNA converts a CAA codon to a translational stop codon (UAA) resulting in the translation of a truncated apoB-48 protein that is present in chylomicrons.
The APOBEC-1 deaminase is encoded by the APOBEC1 gene located on chromosome 12p13.1 and is composed of 6 exons that generate three alternatively spliced mRNAs that encode two distinct protein isoforms. The APOBEC1 gene is a member of a large cytidine deaminase gene family but is the only member of the family that encodes an mRNA-specific editing enzyme. All the other members of the family function primarily to edit cytidine residues in different types DNA molecules. The other members of the family include APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4, and activation-induced cytidine deaminase (AICDA). Although the APOBEC3A encoded protein functions principally to deaminate cytidines of single-stranded DNA and to inhibit viruses and retrotransposons, it is also known to deaminate cytidines in mRNAs in monocytes and macrophages in response to hypoxia. The enzynmes encoded by the APOBEC3D, APOBEC3F, APOBEC3G, and APOBEC3H genes function as anti-retroviral enzymes and have been shown to restrict HIV infection. Each of these four enzymes gets assembled into infectious virion particles where they deaminate cytidine residues in the viral cDNA resulting in reduced progression of reverse transcription. The resulting uracil residues induce G-to-A hypermutations in the HIV-1 genome since A base pairs with U during DNA replication.back to the top
Modified nucleotides, that serve distinctive functional purposes, have been known to exit in tRNA molecules from invertebrates and vertebrates for many years, where up to 25% of the nucleotides have been identified as being modified. Indeed, more than 100 distinct modifications have been characterized in tRNAs, as well as other non-coding RNAs and shown to promote the functions of these RNAs in the processes of translation and splicing. More recent data has demonstrated the presence of modified nucleotides in mRNAs. The modifications of mRNA nucleotides are widespread yet sparse in comparisong to their density found in tRNAs. The modifications found in mRNA molecules include N6-methyladenosine (m6A), 5-methylcytosine (m5C), and pseudouridine (Ψ; an isomer of cytidine). Most recently N1-methyladenosine (m1A) has been identifed as a modified nucleotide in mRNAs but little is currently understood regarding the incorporation and removal of this particular modification. More data have been gathered on the mechanisms of incorporation and removal of the m6A modification and the consequences of these changes than those related to the m5C and Ψ modifications. Given that there is clear evidence demonstrating that these mRNA modifications exert functional consequences on the mRNA, these dynamic mRNA marks have been collectively termed the epitranscriptome similar to DNA methylation and histone modifications being defined as the epigenome.
Modified nucleotides found in mRNAs Three modified nucleotides have been found to exist, either transiently or stably, in numerous mammalian mRNAs. The transient modification is N6-methyladenosine (m6A). Although less is known about the methods for incorporation and the stable versus transient incorporation, mammalian mRNAs have been shown to also contain N5-methylcytosine (m5C) and pseudouridine (Ψ).
With respect to mRNA m6A modification there are enzymes that incorporate the methyl group (referred to as writers), and enzymes that remove the methyl group (referred to as erasers) and proteins that recognize the m6A structure to effect functional consequences of the modification (referred to as readers). mRNA methylation readers have been found in both the nucleus and the cytoplasm. In mammalian mRNA, the m6A modification is primarily produced by the methyltransferases encoded by the METTL3and METTL14 (MTTL: methyltransferase like) genes. Addition of the methyl group, catalyzed by METTL3 and METTL14, also requires the regulatory protein encoded by the WTAP (Wilms tumor 1-associating protein) gene. An additional methyltransferase, encoded by the METTL4 gene, may also be involved in the generation of m6A residues in mRNAs. The significance of the METTL3 and METTL14 enzymes was demonstrated by knocking the genes out in mice and showing that up to 99% the m6A sites were lost. Incorporation of methyl groups into A residues in mRNAs has been shown to occur on A residues found within the consensus sequence RACH where R can be either G or A and where H can be A, C, or U.
As indicated, removal of the methylation in m6A residues is catalyzed by specific demethylases referred to as erasers. The fat mass and obesity-associated protein (encoded by the FTO gene and also known as the ALKBH9 gene), was the first mammalian RNA demethylase shown to catalyze m6A demethylation. Another demethylase, ALKBH5 (alkB homolog 5), is a conserved eraser of m6A methylation. The ALKBH5 gene is highly expressed in the testes and has been shown to be required for spermatogenesis and fertility in mice. The alkB gene is a bacterial gene responsible for DNA damage repair in response to alkylation damage. Mammalian homologs of the gene are, therefore, identified as ALKBH genes with nine identified members. The ALKBH genes encode enzymes that are members of the large family of Fe2+ and 2-oxoglutarate-dependent dioxygenases. Whereas the FTO and ALKBH5 enzymes have specificity for m6A residues in RNA, several of the ALKBH enzymes function in the demethylation of DNA. For instance ALKBH2 and ALKBH3 have been shown to demethylate N1-methyladenosine (m1A) and N3-methylcytosine (m3C) residues in DNA. Indeed, the ALKBH2 protein is the primary enzyme responsible for demethylation repair of alkylated DNA. The ALKBH4 gene encodes a lysine demethylase. The ALKBH8 gene encodes a tRNA methyltransferase. During the process of nucleotide demethylation, of both DNA and RNA, functional intermediates have been shown to exits. The intermediates generated by the ten eleven translocation (TET) family of DNA demethylases are discussed in the DNA Metabolism page. During FTO-mediated m6A demethylation the intermediates 6-hydroxymethyladenosine (6hmA) and 6-formyladenosine (6fA) are generated and these have both been shown to exist stably in mRNA molecules.
The proteins that recognize the various mRNA methylation marks are responsible for the actual decoding of this information. These methylated mRNA recognition proteins are referred to as the readers. Humans express at least five reader genes, all of which are members of the YTH domain family m6A-binding proteins. The YTH domain refers to the fact that this domain was found in proteins shown to have homology to the Drosophila melanogaster RNA splicing factor protein identified as YT521-B. The five human YTH domain genes are identified as YTHDC1, YTHDC2, YTHDF1, YTHDF2, and YTHDF3. The YTHDC1 and YTHDC2 proteins are localized to the nucleus, whereas the YTHDF1, YTHDF2, and YTHDF3 proteins are localized to the cytoplasm. Studies on the activity of these proteins showns that the cytoplasmic readers appear to be highly specific for binding to m6A residues in mRNA. The YTHDF1 protein promotes mRNA translation while the YTHDF2 protein promotes mRNA decay. RNA reader proteins dedicated to the recognition of the m5C or Ψ modified nucleotides have yet to be characterized.
The methylation of mRNA has the potential to affect most of the posttranscriptional steps in the processes of regulated gene expression. Indeed, evidence has demonstrated that nuclear mRNA methylation readers are involved in the control of mRNA stability and splicing, and micro-RNA (miRNA) processing, whereas cytoplasmic readers are known to be involved in the regulation of mRNA translation. The mechanisms of mRNA methylation-mediated effects are clearly the result of interactions with the m6A reader proteins, and may very well involve additional RNA binding proteins since, as indicated, readers for m5C and Ψ are yet to be characterized. Clearly defined functions for mRNA methylation in the control of splicing, stability, and translation have been identified. However, additional important consequences of these modifications are speculated. Synthesis of truncated proteins could result as a consequence of site-specific ribosome stalling at modified codons. One of the most significant consequences of mRNA methylation may be the possibility for altered gene function as a result of regulated rewiring of the genetic code. Although the data for this latter possibility are limited in humans, there are examples in bacteria showing that the insertion of m5C leads to recoding of proline as leucine. Given that both m5C and Ψ have been shown to affect tRNA and rRNA tertiary structure, these same modifications in mRNA may also affect mRNA structures resulting in altered accessibility of binding sites for regulatory factors.
Comparatively little is known about the mechanisms of m5C production in mRNA even though m5C is common in noncoding RNAs from all domains of life. Nonetheless, the significance of m5C in human cells is evidenced from the fact that over 8,000 such sites have been identified in human mRNAs. In human cells, the methyltransferases DNMT2 (a known DNA CpG dinucleotide methyltransferase) and NSUN2 have been shown to modify certain mRNAs. There are seven humans genes encoding proteins of the NSUN methyltransferase family. Like the potential for further chemical modification of Ψ sites in mRNAs, the m5C modification can also undergo further chemical modification. The enzymes known to carry out these m5C modifications are the same as those responsible for the step-wise removal of m5C in DNA. These enzymes are members of the ten eleven translocation (TET) gene family of demethylases. As for m5C modification bby TET enzymes in DNA, the m5C in mRNAs can be modified to 5-hydroxymethylcytidine (5hmC), 5-formylcytidine (5fC), and 5-carboxylcytidine (5caC).
The FTO gene (also identifed as the ALKBH9 gene) was the first gene to be shown to play a role in common obesity in humans. In this original association it was found that a single nucleotide polymorphism (SNP) in the first intron of the FTO gene was correlated to increased fat mass. In animal studies it was discovered that overexpression of the FTO gene resulted in increased fat mass and body weight, whereas knocking the gene out in mice resulted in reduced fat mass and body weight. The results of numerous studies on the FTO gene have demonstrated that its role in tuning the status of m6A methylation plays a direct role in the regulation of fat mass and obesity. The FTO demethylase has also been shown to play a role in the regulation of dopamine signaling in the brain. Maintenance of an appropriate m6A status in a specific subset of mRNAs encoding components of the neuronal dopamine signaling pathway has been shown to be important in the overall regulation of dopamine signaling. These observations have led to the suggestion that the function of FTO may be important in the onset and progression of Parkinson disease. Indeed, malfunction in FTO has been associated with reduced brain volume in healthy elderly individuals, in individuals with attention deficit disorders, and in addiction. Alterations in both the methylation writers, METTL3, METTL14, and WTAP, and the FTO demethylase eraser have been associated with numerous forms of cancer. These results clearly indicate that maintenance of the methylation status of the mRNA pool of numerous cell types is critical for normal cellular function.back to the top
Within the context of non-coding RNAs, such as tRNAs and rRNAs, the nucleotide pseudouridine (Ψ) is the most abundant non-standard nucleotide. The estimate the level of Ψ in non-coding RNAs is on the order of 7%–10% of all the uridine. Within mRNA the level of Ψ is significantly lower and is also much less abundant than m6A. The primary human enzyme that is repsonsible for the conversion of uridine to pseudouridine (Ψ) is encoded by the PUS1 (pseudouridylate synthase 1) gene although several PUS genes (at least 13) have been identified in the human genome with four (PUS1, PUS7, TRUB1, and DKC1) being identified as capable of synthesizing pseudouridine from uridine in mRNAs. Unlike the m6A modification, pseudouridylation of mRNA is believed to be irreversible. Additionally, the presence of the Ψ residue may mediate additional chemical modifications at those sites. For example, Ψ can be further modified by N1 methylation. In addition to the normal patterns of mRNA pseudouridylation that have been detected and identified, numerous additional sites of pseudouridylation are found in cells in response to stress related stimuli such as heat shock and increased production of reactive oxygen species (ROS).
The significance of pseudouridylation, whether it be in non-coding RNAs or in mRNAs, can be evidenced from the fact that mutations in several of the pseudouridine synthase genes have been identified in various disorders. Mutations in the PUS1 gene are associated with a form of mitochondrial myopathy identified as MLASA (myopathy, lactic acidosis and sideroblastic anemia). Mutations in the dyskerin pseudouridine synthase 1 (DKC1) gene result in the multisystem disorder known as X-linked dyskeratosis congenita (X-DC). The DKC1 encoded enzyme is responsible for pseudouridlyation of rRNA (as well as being a component of the telomerase complex), yet the loss of this modification leads to impaired translation of mRNAs that contain IRES elements. Internal ribosome entry sites (IRES) are used for the translation of mRNAs lacking a 5'-cap structure as well as for the translation of mRNAs under conditions where cap-dependent translation is impaired, such as in nutrient deprivation. Numerous anti-apoptotic protein coding mRNAs and tumor suppressor encoding mRNAs contain IRES element, thus a loss of their translation, due to mutations in the DKC1 gene, can lead to development of cancers.back to the top
Ribozymes represent a special class of RNA molecules that possess catalytic activity. Ribozyme are composed of well-defined tertiary structures that impart the RNAs with their unique biological activity as nucleic acid enzymes. Ribozymes have been identified in a wide range of genomes from viruses to mammals. To date, eight naturally occurring classes of ribozyme have been defined, all of which catalyze cleavage or ligation of the RNA backbone by trans-esterification or hydrolysis of phosphate groups. The catalytic properties of ribozymes are exclusively due to the capacity of these RNA molecules to assume particular structures. RNA molecules have the capacity to fold into several distinct structures which can enable a single RNA to perform more than one function. RNA-mediated catalysis was first demonstrated in the process of intron splicing (group I and II introns). Subsequently, numerous RNAs harboring catalytic activity have been described. Ribozymes have been shown to be involved in tRNA processing (RNaseP), phosphoryl transfer reactions catalyzing the cleavage or ligation of the RNA phosphodiester backbone, in protein synthesis (peptidyltransferase) and in the regulation of gene expression. Despite the similarity of the chemistry of the reactions catalyzed by ribosomes, each molecule possesses a completely unique sequence, tertiary structure, and a specific catalytic mechanism, which reflects the diversity of catalytic strategies of ribozymes. Peptidyltransferase activity of the ribosome represents a distinct ribozyme structure and activity.
The enzymatic activity of ribozymes depends on the capacity of the RNA to fold into specific structures that impart catalytic specificity. The possibility, for a single RNA molecule, to fold into more than one structure, implies that a single RNA polymer could have more than one function. This means the RNA molecules could perform more than one task resulting in a single sequence (the genotype) manifesting multiple phenotypes. That this is indeed the case has been demonstrated for short (25-34 nucleotides) RNA sequences which exhibit the ability to bind two different ligands such as GMP and L-arginine. In addition, another experiment, designed to select for a ribozyme that catalyzed the ligation of two RNA substrates, discovered that the RNA molecule could also undergo a separate self-cleavage reaction. These two distinct enzymatic reactions, ligation and cleavage, were imparted by two distinct sites of the RNA molecule. Multiple bifunctional ribozymes have been identified.
Group I introns are considerably larger and more structurally complex than any of the self-cleaving RNAs. This class of ribozme is found in precursor mRNA, tRNA, and rRNA transcripts from a variety of organisms. The catalytic reaction carried out by group I intron ribozymes occurs in two steps. The reactions result in the ligation of flanking 5' and 3' exons to yield the mature RNA. Several hundred examples of this class of ribozyme have been identified. All of them share a common secondary structure and most likely a similar reaction mechanism. The Tetrahymena thermophila rRNA intron was the first group I self-splicing intron discovered (see section above). The ribozyme derived from this intron is 421 nucleotides long and is composed of a conserved catalytic core of roughly 200 nucleotides. This ribozyme catalyzes the first step of intron self-splicing using an oligonucleotide to mimic the 5'-exon. The 3' oxygen of an exogenous guanosine serves as the nucleophile for this reaction (see Figure above).
The most recently discovered functional class of ribozymes include those that are involved in the regulation of protein synthesis. Two of these newly identified ribozymes are the mammalian cytoplasmic polyadenylation element-binding protein 3 (CPEB3) ribozyme and a variant hammerhead ribozyme embedded in mammalian mRNAs. Hammerhead ribozymes are so-called because of the secondary structure evident in the active ribozyme. The hammerhead, hepatitis delta virus (HDV), hairpin, Neurospora Varkud satellite (VS), and glmS ribozymes are a class of small RNAs (50–150 nucleotides) that catalyze site-specific self-cleavage and were originally characterized in viral, virusoid, bacterial, or satellite RNA genomes.
The glmS ribozyme is a ribozyme found in Gram-positive bacteria. It is considered a metabolite-responsive ribozyme since it was originally discovered by its ability to catalyze site-specific RNA cleavage in the presence of glucosamine-6-phosphate (GlcN6P). The glmS ribozyme was originally identified in the 5'-untranslated region of the GLMS gene which is involved in the synthesis of GlcN6P. The glmS ribozyme is also considered a riboswitch since it is involved in the regulation of gene expression in response to changing concentrations of a metabolite.
The CPEB3 ribozyme is a self-cleaving non-coding RNA located in the second intron of the CPEB3 gene, which belongs to a family of genes regulating the reactions of mRNA polyadenylation. A 72 nucleotide core of the CPEB3 ribozyme sequence is sufficient to carry out self-cleavage. The cleavage activity of the CPEB3 ribozyme is slow which, under normal conditions, allows normal splicing of the CPEB3 pre-mRNA to occur. A trans-acting factor is known to interact with the ribozyme cleavage site thereby, regulating the rate of ribozyme self-cleavage. When self-cleavage is increased, the level of truncated CPEB3 pre-mRNAs increases resulting in degradation of the cleaved RNA fragments. This process may serve as a switch to turn off the synthesis of the CPEB3 protein.back to the top