Transcription is the mechanism by which a template strand of DNA is utilized by specific RNA polymerases to generate one of the four distinct classifications of RNA. These four RNA classes are:
1. Messenger RNAs (mRNAs): This class of RNA is the genetic coding templates used by the translational machinery to determine the order of amino acids incorporated into an elongating polypeptide in the process of translation.
2. Transfer RNAs (tRNAs): This class of small RNA form covalent attachments to individual amino acids and recognize the encoded sequences of the mRNAs to allow correct insertion of amino acids into the elongating polypeptide chain.
3. Ribosomal RNAs (rRNAs): This class of RNA is assembled, together with numerous ribosomal proteins, to form the ribosomes. Ribosomes engage the mRNAs and form a catalytic domain into which the tRNAs enter with their attached amino acids. A unique function of the 28S rRNA of the large ribosoma subunit is catalytic. This rRNA catalyzes the formation of the peptide bond via the ribozyme (RNA-directed catalysis) activity.
4. Small RNAs: This class of RNA includes the small nuclear RNAs (snRNAs) involved in RNA splicing and the microRNAs (miRNAs) involved in the modulation of gene expression through the alteration of target mRNA activity.
All RNA polymerases are dependent upon a DNA template in order to synthesize RNA. The resultant RNA is, therefore, complimentary to the template strand of the DNA duplex and identical to the non-template strand. The non-template strand is called the coding strand because its sequences are identical to those of the mRNA. However, in RNA, U is substituted for T and the intronic DNA sequences are removed from the RNAs through the process of splicing.back to the top
In prokaryotic cells, all three RNA classes are synthesized by a single polymerase. In eukaryotic cells there are three distinct classes of RNA polymerase, RNA polymerase (pol) I, II and III. Each polymerase is responsible for the synthesis of a different class of RNA. The capacity of the various polymerases to synthesize different RNAs was shown with the toxin α-amanitin. At low concentrations of α-amanitin synthesis of mRNAs are affected but not rRNAs nor tRNAs. At high concentrations, both mRNAs and tRNAs are affected. These observations have allowed the identification of which polymerase synthesizes which class of RNAs.
RNA pol I (RNAP I; also identified as RNA polymerase 7) is responsible for rRNA synthesis (excluding the 5S rRNA). The functional enzyme is a large (590 kDa) multi-subunit complex composed of 14 subunits. Twelve of the RNAP I subunits are identical to or related to subunits of the RNAP II complex. The genes that encode the subunits of the RNAP I complex are identified as POLR1 genes, with five distinct genes (POLR1A-POLR1E) expressed in humans. There are four major rRNAs in eukaryotic cells designated by their sedimentation size. The 28S, 5S 5.8S RNAs are associated with the large ribosomal subunit and the 18S rRNA is associated with the small ribosomal subunit.
RNA pol II (RNAP II) in humans is a large 550kDa complex composed of 12 distinct subunits. Each of the 12 subunits of the RNAP II complex are identified as RBP1-RBP12 and the genes that encode these subunits are POLR2A-POLR2L. The RBP1 subunit is the actual RNA polymerizing activity of the complex. This subunit is encoded by the POLR2A gene. The function of RNAP II is to synthesize all of the mRNAs and some of the small nuclear RNAs (snRNAs) involved in RNA splicing, and several microRNAs.
RNA pol III (RNAP III) is also a multisubunit complex and is composed of at least 17 proteins. Ten of the RNAP III subunits are unique to this complex, two are common with subunits of RNAP I, and five are common to all three RNAP complexes. The genes encoding the RNAP III-specific proteins are identified as POLR3A-POLR3H. All of the RNAs transcribed by RNAP III are small stable untranslated RNAs. The products of RNAP III include all of the tRNAs, the 5S rRNA, several microRNAs, and the U6 small nuclear RNA (snRNA) of the splicing machinery.back to the top
Synthesis of RNA exhibits several features that are synonymous with DNA replication. RNA synthesis requires accurate and efficient initiation, elongation proceeds in the 5' → 3' direction (i.e. the polymerase moves along the template strand of DNA in the 3' → 5' direction), and RNA synthesis requires distinct and accurate termination. Transcription exhibits several features that are distinct from replication.
1. Transcription initiates, both in prokaryotes and eukaryotes, from many more sites than replication.
2. There are many more molecules of RNA polymerase per cell than DNA polymerase.
3. RNA polymerase proceeds at a rate much slower than DNA polymerase (approximately 50–100 bases/sec for RNA versus near 1000 bases/sec for DNA).
4. Finally the fidelity of RNA polymerization is much lower than DNA. This is allowable since the aberrant RNA molecules can simply be turned over and new correct molecules made.back to the top
Signals are present within the DNA template that act in cis to stimulate the initiation of transcription. These sequence elements are termed promoters. Promoter sequences promote the ability of RNA polymerases to recognize the nucleotide at which initiation begins. Additional sequence elements are present within genes that act in cis to enhance polymerase activity even further. These sequence elements are termed enhancers. Transcriptional promoter and enhancer elements are important sequences used in the control of gene expression. The major defining differences between promoters and enhancers are that cis-acting promoter elements must be in a specific orientation and a relatively fixed position in order to properly function, whereas, enhancer elements can function in either orientation relative to the transcriptional start site and they can be displaced large distances relative to their naturally occurring locations and yet will still funciton as cis-acting enhancer elements.
E. coli RNA polymerase is composed of five distinct polypeptide chains. Association of several of these generates the RNA polymerase holoenzyme. The sigma (σ) subunit is only transiently associated with the holoenzyme. This subunit is required for accurate initiation of transcription by providing polymerase with the proper cues that a start site has been encountered. In both prokaryotic and eukaryotic transcription the first incorporated ribonucleotide is a purine and it is incorporated as a triphosphate. In E. coli several additional nucleotides are added before the sigma subunit dissociates.
The process of eukaryotic mRNA transcriptional initiation is an extremely complex event. There are numerous protein factors controlling initiation, some of which are basal factors present in all cells and others are specific to cell type and/or the differentiation state of the cell. Two basal promoter elements that are found in essentially all eukaryotic mRNA genes are the TATA-box and the CAAT-box. Many constitutively expressed mRNA genes (house-keeping genes) also contain a GC-box promoter element (generally GGGCGG). These elements are so called because of the DNA sequences that constitute the promoter element. The TATA-box can be found approximately 25–100 bases upstream (written -25 to -100) of the start site for transcription and the CAAT-box is generally in the -70 to -150 position. The TATA-box sequences are found ONLY in the coding strand of the gene (i.e. the strand that has the sequences identical to the resulting mRNA) while the CAAT-box and GC-box sequences are most often found in the template strand but can also reside in the coding strand. Many of the basal transcription factors are identified by the fact that they control the activity of RNA pol II. Thus, the nomenclature of these proteins is TFII, for transcription factor of RNA pol II. TFIID is the factor that binds to the TATA-box and its binding is facilitated by TFIIA. Once TFIID and TFIIA are bound TFIIB binds and this recruits RNA pol II to the promoter. Next TFIIE and TFIIH bind.
TFIIH is in fact a complex of proteins and this complex is not only involved in transcription but also in certain steps of DNA repair. The role of TFIIH in DNA repair can be seen as critical since defects in its function are responsible for certain forms of xeroderma pigmentosum. The critical role of TFIIH in transcription initiation is due to the fact that one of the proteins of the complex is a kinase that phosphorylates serine residues in the C-terminal domain (CTD) of the large subunit of RNA pol II. This kinase subunit of TFIIH is called Kin28. The CTD contains a tandem repeat sequence that is composed of the consensus heptad of amino acids: Y1S2 P3T4 S5P6 S7 which can be repeated from 25 to 52 times. It is Ser5 and Ser7 that become phosphorylated during transcriptional initiation. These serines are different from the serine (Ser2) phosphorylated in the CTD by P-TEFb involved in the capping process as discussed below. After transcriptional initiation has commenced and RNA pol II moves down the DNA template, factors TFIIA and TFIID remain on the promoter to allow for additional rounds of initiation to take place.
Elongation involves the addition of the 5'–phosphate of ribonucleotides to the 3'–OH of the elongating RNA with the concomitant release of pyrophosphate. Nucleotide addition continues until specific termination signals are encountered. Following termination the core polymerase dissociates from the template. In prokaryotic transcription, the core and sigma subunit can then reassociate forming the holoenzyme again ready to initiate another round of transcription.
In E. coli transcriptional termination occurs by both factor-dependent and factor-independent means. Two structural features of all E. coli factor-independently terminating genes have been identified. One feature is the presence of two symmetrical GC-rich segments that are capable of forming a stem-loop structure in the RNA and the second is a downstream A rich sequence in the template. The formation of the stem-loop in the RNA destabilizes the association between polymerase and the DNA template. This is further destabilized by the weaker nature of the AU base pairs that are formed, between the template and the RNA, following the stem-loop. This leads to dissociation of polymerase and termination of transcription. Most genes in E. coli terminate by this method. Factor-dependent termination requires the recognition of termination sequences by the termination protein, rho (ρ). The rho factor recognizes and binds to sequences in the 3' portion of the RNA. This binding destabilizes the polymerase-template interaction leading to dissociation of the polymerase and termination of transcription.back to the top
When transcription of bacterial rRNAs and tRNAs is completed they are immediately ready for use in translation. No additional processing takes place. Translation of bacterial mRNAs can begin even before transcription is completed due to the lack of the nuclear-cytoplasmic separation that exists in eukaryotes. The ability to initiate translation of prokaryotic RNAs while transcription is still in progress affords a unique opportunity for regulating the transcription of certain genes. An additional feature of bacterial mRNAs is that most are polycistronic. This means that multiple polypeptides can be synthesized from a single primary transcript. Polycistronic mRNAs are very rare in eukaryotic cells but have been identified. The mitochondrial genomes in mammals and the slime mold, Dictyostelium discoideum, encode polycistronic mRNAs that are processed into primarily mono-, di-, and tricistronic transcripts. In addition, several viruses encode polycistronic RNAs.
In contrast to bacterial transcripts, eukaryotic RNAs (all 3 classes) undergo significant processing, some of which occurs co-transcriptionally and some post-transcriptionally. All three classes of RNA are transcribed from genes that contain introns. The RNA sequences encoded by the intronic DNA must be removed from the primary transcript prior to the RNA being biologically active. The process of intron removal is called RNA splicing. Additional processing occurs to mRNAs that can alter the 5'- and 3'-ends of the transcripts.
The 5' end of nearly all eukaryotic mRNAs are capped with a unique 5' → 5' linkage to a 7-methylguanosine residue. Synthesis of the mRNA cap structure is catalyzed by the bifunctional enzyme encoded by the RNGTT gene (RNA guanylyltransferase and 5'-phosphatase). The RNGTT encoded enzyme possesses mRNA 5'-triphosphatase activity in the N-terminal portion of the enzyme and mRNA guanylyltransferase activity in the C-terminal part. The mRNA 5'-triphosphatase activity of the enzyme hydrolyzes the 5'-triphosphate group of the 5'-nucleotide of the mRNA to generate a diphosphate-mRNA. The guanyltranferase activity then adds GMP to the diphosphate-mRNA generating the 5' → 5' triphosphate linkage. The guanine residue of the cap is then methylated by a second enzyme encoded by the RNMT gene (RNA guanine-7 methyltransferase). The capped end of the mRNA is thus, protected from exonucleases and more importantly is recognized by specific proteins of the translational machinery.
The capping process occurs after the newly synthesizing mRNA is around 20–30 bases long, at which point RNA pol II pauses. While RNA pol II is paused on the template, the kinase complex, known as positive transcription elongation factor b (P-TEFb), phosphorylates RNA pol II on the serine-2 residue (Ser2) in the repeat unit of the C-terminal domain (CTD) of the large subunit of the enzyme. The P-TEFb complex is composed of cyclin-dependent kinase 9 (CDK9) and either cyclin T1, T2, or K. The complex is also called C-terminal domain kinase 1 (CTDK1). This pausing and regulatory phosphorylation event allows for the potential of attenuation in the rate of transcription.
Structure of the 5'-cap of eukaryotic mRNAs. The cap structure present on most eukaryotic mRNAs consists of a 7-methylguanosine (m7G) coupled to the 5'-terminal nucleotide of the mRNA in a unique 5' → 5' triphosphate linkage.
Almost all mammalian mRNAs are polyadenylated at the 3'-end. A specific sequence, AAUAAA, is the primary sequence recognized by one of several proteins and multiprotein complexes. In addition to the AAUAAA sequence element in the mRNA, an upstream UGUA sequence and a downstream GU-rich element act in cis to promote the recognition of the 3'-end of an mRNA by the cleavage and polyadenylation complexes. These protein complexes are responsible for recognizing the cis-acting signals in the mRNA and then catalyzing the mRNA cleavage and subsequent polyadenylation reactions. In mammals the 3'-end cleavage and polyadenylation reactions are regulated by the interactions of four multiprotein complexes identified as the cleavage and polyadenylation specificity factor (CPSF), cleavage stimulatory factor (CSTF), cleavage factor I (CFIm; also identified as CFIm where the "m" refers to mRNA), and cleavage factor II (CFIIm). In addition to these four complexes the actual polyadenylation reactions are catalyzed by poly(A) polymerases (PAP). Additional proteins required for mRNA polyadenylation are nuclear poly(A)-binding protein (encoded by the PABPN1 gene), symplekin, and the C-terminal domain (CTD) of the large subunit of RNA pol II.
The CPSF is composed of at least four distinct proteins that were originally identified and named based upon their molecular weights. These four proteins are called CPSF-30, CPSF-73, CPSF-100, and CPSF-160 where the number represents the protein size in kDa. The CPSF-160 protein is encoded by the CPSF1 gene. The CPSF-160 protein physically binds to the AAUAAA sequence in the mRNA. The CPSF-100 protein is encoded by the CPSF2 gene. The CPSF-73 proteins is encoded by the CPSF3 gene. The CPSF-73 protein is a hydrolase that cleaves the mRNA downstream of the AAUAAA sequence element. The CPSF-30 protein in encoded by the CPSF4 gene. An additional protein that is found associated with the CPSF, that links the CPSF with poly(A) polymerases [specifically poly(A) polymerase alpha], is encoded by the FIP1L1 gene (factor interacting with PAPOLA and CPSF1). The FIP1L1 protein binds to U-rich sequences that reside upstream (5') of the AAUAAA element and stimulates poly(A) polymerase activity.
The cleavage stimulatory factor (CSTF) is a complex composed of three distinct proteins. These proteins are identified as CSTF1 (50 kDa protein), CSTF2 (64 kDa protein), and CSTF3 (77 kDa proteins) and each is encoded by a gene of the same name. The recruitment of the CSTF complex to the 3'-end of an mRNA is stimulated by the CPSF complex.
Cleavage factor I (CFIm) contains a 68 kDa protein encoded by the CPSF6 gene (cleavage and polyadenylation specific factor) and a smaller 25 kDa subunit. The binding of CFIm to the mRNA is facilitated by the RNA recognition motif in the N-terminus of the 68 kDa CPSF6 encoded protein. The primary function of CFIm is to recognize and bind the UGUA element in the mRNA. In addition to binding the UGUA element, CFIm has been shown to be involved in the regulation of alternative splicing. Functional CFIIm is a complex consisting of an essential component (identified as CFIIAm) and a stimulatory component (identified as CFIIBm). The CFIIAm component of the complex is composed of two proteins. These two proteins are encoded by the CLP1 gene (cleavage and polyadenylation factor I subunit 1) and the PCF11 gene (protein 1 of cleavage factor I). The CLP1 encoded protein of the CFIIm complex interacts with the CFIm complex and also with the CPSF complex.
Humans express a family of three polyadenylate polymerases (PAP), identified as poly(A) polymerase alpha (PAPOLA gene), poly(A) polymerase beta (PAPOLB gene), and poly(A) polymerase gamma (PAPOLG gene). These poly(A) polymerases possess both mRNA endonuclease activity and polyadenylate polymerase activity. The endonuclease activity cleaves the primary mRNA approximately 11–30 bases 3' of the AAUAAA sequence element. A stretch of 20–250 adenosine residues is then added to the 3'-end by the non-template requiring polyadenylate polymerase activity of the enzymes.
Processes of mRNA polyadenylation. RNA polymerase terminates mRNA transcription up to 500 nucleotides after incorporation of the AAUAAA element. The combined activities CPSF, CSTF, CFIm, CFIIm, symplekin, PABPN1, poly(A) polymerase, and the CTD of RNA pol II result in accurate and efficient transcriptional termination, cleavage of the pre-mRNA 10–30 bases 3' of the AAUAAA element, and addiiton of the poly(A) tail to the mRNA.
In addition to intron removal in tRNAs, extra nucleotides at both the 5' and 3' ends are cleaved, the sequence 5'–CCA–3' is added to the 3' end of all tRNAs and several nucleotides undergo modification. There have been more than 60 different modified bases identified in tRNAs.
Both prokaryotic and eukaryotic rRNAs are synthesized as long precursors termed pre-ribosomal RNAs. In eukaryotes a 45S pre-ribosomal RNA serves as the precursor for the 18S, 28S and 5.8S rRNAs.back to the top
The removal of intronic RNA from precursor mRNA, tRNA, and rRNA molecules, in humans and other higher eukaryotes, requires a complex machinery termed the spliceosome which is composed of numerous small nuclear RNAs (snRNAs) and numerous proteins. The spliceosome catalyzes the reactions that result in intron removal and the joining together of the protein-coding exons. The spliceosome has been shown to be composed of as many as 300 distinct proteins and five RNAs. The five small nuclear RNAs (snRNAs) that constitute the spliceosome RNAs are identified as U1, U2, U4, U5, and U6. Each of these snRNAs is around 100–300 nucleotides in length and each are associated with several proteins forming individual small nuclear ribonucleoprotein (snRNP: pronounced "snurp") complexes. The composition of the U1 snRNP consists of the U1 snRNA and at least 10 proteins. The composition of the U2 snRNP consists of the U2 snRNA and at least 19 proteins. The composition of the U4/U6 snRNP consists of the U4 and U6 snRNAs and at least 12 proteins. The composition of the U5 snRNP consists of the U5 snRNA and at least 15 proteins. Several of the proteins present in the snRNP complexes are members of the DEAD-box helicase family of enzymes that are involved in numerous aspects of RNA metabolism. The original members of the DEAD-box helicase family were so-called because they all contained the four amino acid sequence: D-E-A-D (Asp-Glu-Ala-Asp). As a result of the isolation of variant family members, the family is more commonly referred to as the DExD/H-box protein family. Additional important protein components of the overall spliceosome are members of the SR protein family. These proteins get their name from the fact that they are enriched in Ser/Arg residues. At least 18 different SR protein encoding genes have been identified in the human genome. The activity of the SR proteins in the splicing process is controlled by their state of phosphorylation.
Introns in higher eukaryotic mRNAs can be of considerable length, in many cases spanning several thousands of bases and sometimes comprising up to 90% of the precursor mRNA. In addition, numerous precursor mRNAs undergo alternative exon splicing, a process controlled by many factors such as the cell type in which the mRNA gene is expressed. The size and the number of introns in many mRNAs, in addition to the potential for alternative splicing, present an array of complexities that govern the control of, and catalytic processes of intron removal and exon joining.
The vast majority of eukaryotic mRNAs contain a highly conserved set of dinucleotides at the boundaries of every intron. These highly conserved sequences are GU at the 5'-end of the intron and AG at the 3'-end (shown in Figure below). In addition to theses highly conserved cis-acting sequence elements there are several other important sequence elements in most introns that are necessary to control efficient and accurate splicing. Introns that contain the GU-AG consensus are spliced by the major U1, U2, U5, and U4/U6 snRNP containing spliceosomes. These introns are spliced by what is called the U2-type spliceosome. However, numerous introns have been characterized whose 5'-end and 3'-end consensus sequences are AT-AC instead of the more typical GU-AG. These second type of intron has been shown to be spliced by a spliceosome composed of a different set of snRNPs, specifically the U4/U6atac, U5, U11, and U12 snRNPs. The AT-AC introns are spliced by what is called the U12-type spliceosome. To date no precursor RNA has been identified that contains intronic RNA sequences that are spliced by both types of spliceosome. All spliced RNAs contain exclusively U2-type introns (the majority) or U12-type introns.
Consensus elements of U2-type introns. Introns that are spliced by the U2-type spliceosome contain the consensus sequences GU and AG at the 5'-end and 3'-end, respectively. These consensus sequences are found in 100% of U2-type introns. Additional cis-acting sequences in the intron include the branch point and poly(Y) tract. The designations for the nucleotides in the consensus elements are: N: any nucleotide; R: purine; Y: pyrimidine.
The first stage in U2-type intron splicing in mRNAs is recognition of the GU consensus element at the 5'-end of the intron by the U1 snRNP. The branch point sequence element is recognized by an additional factor called splicing factor 1, SF1 (also called the branch point binding protein, BBP). This is followed by recognition of the AG consensus element at the 3'-end of the intron and the poly(Y) tract by the U2 snRNP. Binding of the U2-snRNP results in displacement of the SF1. Once the U1 and U2 snRNP complexes are bound to the mRNA, the complex consisting of the U4/U6, and U5 snRNPs (called the tri-snRNP complex) binds to the mRNA. At this point the splicing complex is referred to as the pre-catalytic spliceosome complex. The next step involves release of the U1 and U4 snRNPs. The complex of mRNA, U2, U5, and U6 snRNP is now catalytically active and the intron is removed and the upstream and downstream exons are joined together.
The process of alternative splicing involves multiple interactions between splicing proteins and snRNPs that results in different patterns of exon joining from the same pre-mRNA in different cell types or under different stages of development and differentiation. Alternative splicing allows for the generation of protein isoforms that exhibit different biological properties, that differ in protein-protein interaction, that are localized to different subcellular locations, or that exhibit different catalytic activities and/or abilities. The process of alternative splicing has been identified to occur in the primary transcripts from at least 80% of all human protein coding genes.
The molecular decisions that control which exon(s) is removed and which exon(s) is included in a resultant mRNA involves both cis-acting RNA sequence elements and various protein regulators. The various cis-acting regulatory elements of an mRNA have been divided into four categories: exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), intronic splicing enhancers (ISEs) and intronic splicing silencers (ISSs). The ESEs are usually bound by members of the SR protein family which were described above. Proteins that are known to interact with the ISS and ESS sequences of the mRNA are members of the heterogeneous nuclear RNP (hnRNP) family. There are 14 known hnRNP encoding genes in the human genome. Several additional proteins are necessary for alternative splicing and these proteins (at least 18 characterized members) are expressed in a tissue-specific patterns. In addition to cis-acting sequence elements in the control of alternative splicing, secondary structure in the mRNA itself is known to regulate the alternative splicing process.
The overall process of alternative splicing requires that certain proteins are expressed that allow for splice site recognition and selection as well as expression of proteins that inhibit splice site recognition. In most cases of alternative splicing the regulation and specificity of which introns are removed and which exons are joined together is the result of a combinatorial interaction between both cis- and trans-acting activators and inhibitors.
There are several different classes of reactions involved in intron removal. The two most common are the group 1 and group 2 introns. Group 1 introns are found in mRNA, tRNA, and rRNA molecules found in the chloroplasts and mitochondria of lower eukaryotic organisms as well as being found in bacterial RNA molecules. Group 2 introns are found in mRNA, tRNA, and rRNA molecules found in the chloroplasts and mitochondria of fungi, plants, and protists. The characteristic feature of both group 1 and group 2 introns is that they are self-splicing. The removal of these types of introns is catalyzed by the RNA itself via the ribozyme activity inherent in the RNA.
Group 1 introns require an external guanosine nucleotide as a cofactor. The 3'–OH of the guanosine nucleotide acts as a nucleophile to attack the 5'–phosphate of the 5' nucleotide of the intron. The resultant 3'–OH at the 3' end of the 5' exon then attacks the 5' nucleotide of the 3' exon releasing the intron and covalently attaching the two exons together. The 3' end of the 5' exon is termed the splice donor site and the 5' end of the 3' exon is termed the splice acceptor site.
Self splicing intron mechanisms. RNA-mediated (ribozyme) self splicing comprising two main categories. Group 1 self splicing utilizes a free GTP residue to initiate the catalysis reactions. Group 2 self splicing introns utilize a adenine residue within the intron sequence itself to initiate the catalysis reactions. During group 2 splicing reactions a lariat structure is formed within the intronic RNA.
Group 2 introns are spliced similarly except that instead of an external nucleophile the 2'–OH of an adenine residue within the intron is the nucleophile. This residue attacks the 3' nucleotide of the 5' exon forming an internal loop called a lariat structure. The 3' end of the 5' exon then attacks the 5' end of the 3' exon as in group I splicing releasing the intron and covalently attaching the two exons together.back to the top
RNA editing was a term first used to describe an unusual form of post-transcriptional processing involving the insertion of uridine (U) residues into a mitochondrial mRNA found in Trypanosoma brucei. This particular form of editing was then found to occur in many eukaryotic mRNAs. The process of RNA editing is now known to encompass a wide variety of mechanistically unrelated processes that change the nucleotide sequence of an RNA species relative to that directed by the encoding DNA. Currently RNA editing systems are divided into two general classes: substitution and insertion/deletion. In the first class, the coding sequences of a mature RNA and its gene are co-linear as they contain the same number of nucleotides but differ in nucleotide sequence where editing has occurred. In the second class, the nucleotide sequence of the mature RNA product is not co-linear with that of its DNA coding sequence since the final RNA product contains extra nucleotides relative to the encoding gene. All of the major types of cellular RNA (mRNA, rRNA, and tRNA) have been shown to be subject to editing in different organisms.
The term "RNA editing" is not used to refer to RNA modifications such as 5'-capping, splicing, and 3'-polyadenylation, nor to the formation of modified nucleosides in RNA (as is typical in tRNAs). However, it is important to keep sight of the fact that the distinctions between “RNA editing” and “RNA modification” can be less than obvious. To illustrate this fact, consider that there are instances of RNA editing involving deamination of A residues forming I (inosine) residues (see next section). If this editing occurs in the coding region of an mRNA, the edited site (I) is recognized as G during translation. However, it is also known that A residues in the wobble position of tRNA anticodons (the 5'-nucleotide) undergo deamination (by an evolutionarily related enzyme) to I, which similarly results in a change in the anticodon pairing properties. Thus, under these circumstances editing and modification can result in the same effects at the level of the resultant protein.
RNA editing systems have been identified that result in changes in A residues to I residues, referred to as A-to-I editing systems, or changes in C residues to U residues, referred to as C-to-U editing systems. The enzymes that catalyze the A-to-I edits are members of a family of adenosine deaminases that act on RNA (ADARs). This distinguishes these enzymes from the adenosine deaminase involved in the catabolism and salvage of purine nucleotides. The enzymes that catalyze C-to-U edits are called cytosine deaminases that act on RNA (CDARs). A sequence comparative analysis of ADAR and CDAR sequences demonstrated that they all belong to a superfamily of RNA-dependent deaminases that also includes tRNA-speciﬁc deaminases (ADATs). A common feature of ADARs, CDARs, and ADATs is the presence in the deaminase domain of conserved residues that are essential for catalysis. All three types of deaminases likely arose from an ancestral cytidine deaminase via the acquisition of RNA-binding domains.
The clinical significances of the editing of human RNAs is demonstrated by the observations that mutations in the ADAR1 gene are associated with rare autosomal skin pigmentation disorder (dyschromatosis symmetrica hereditaria, DSH) and with Aicardi-Goutières syndrome (AGS), an early-onset encephalopathy that often results in severe and permanent neurological damage. Defective RNA editing is also associated with a number of neurological diseases including suicidal depression, epilepsy, schizophrenia, and amyotrophic lateral sclerosis (Lou Gherig disease).
The process of A-to-I editing occurs on nuclear transcripts and is catalyzed by a family of enzymes referred to as ADARs. ADAR activity was initially characterized as a double-stranded RNA (dsRNA) unwinding activity and as such, these observations emphasize that ADARs are dsRNA-binding proteins and that their catalytic activity is directed toward duplex regions in RNA. Although the most biologically significant functions of ADARs is site-specific deamination in mRNA, it is known that RNA duplex regions in several types of non-coding RNAs, including microRNAs (miRNAs) and small interfering RNAs (siRNAs), as well as some viral RNAs are also substrates for ADARs.
Three mammalian ADAR genes give rise to four known isoforms: ADAR1p150, ADAR1p110, ADAR2 and ADAR3. Alternative promoter usage within the ADAR1 transcript generates the full length (ADAR1p150) isoform and an N-terminally truncated (ADAR1p110) isoform. Both ADAR1 isoforms contain three dsRNA-binding domains and the deaminase domain. The ADAR1 variants and ADAR2 are expressed in many tissues, whereas the ADAR3 protein is only expressed in the brain. Although ADAR3 is presumed to be catalytically inactive, it may compete with ADAR1 and -2 for RNA binding substrates, thereby, altering the overall profile of edited RNAs via that mechanism.
The vast majority of A residues that are targets for editing are localized near splice junctions in the pre-mRNA. The formation of a dsRNA-ADAR substrates in intronic sequences could, therefore, obscure splice sites from the splicing machinery resulting in alternative splicing events. In addition, the editing of select A residues could lead to the creation or elimination of splicing sites which also could result in alternative splicing events.
A-to-I editing occurs in more RNAs than does C-to-U editing. By far, most of the mammalian mRNAs found to undergo A-to-I editing are expressed in the nervous system. Physiologically significant examples are transcripts of the ionotropic glutamate receptor (GluR) family and the serotonin receptor family. In both cases the deamination of exonic A residues leads to single amino acid changes in the resulting proteins.
Editing of glutamate receptor mRNA occurs specifically in the mRNA encoding the GluA2 (GluR2) subunit of the 2-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptors. Editing of the GluA2 mRNA occurs at two non-synonymous sites termed the Q/R and R/G sites. These sites are so-called because the editing results in the change of a glutamine residue for an arginine residue in the first site and a change of arginine for glycine in the second. The Q/R site is encoded by exon 11 and resides within the second transmembrane domain (TMII) of the protein. The R/G site is located just one nucleotide from the boundary between exon 13 and the downstream intron. When this site is edited, splicing favors inclusion of exon 15 over that of exon 14. With respect to the Q/R site, editing has a profound effect on the calcium permeability of the resulting AMPA receptor. Calcium permeability of all AMPA receptor isoforms is controlled by the GluA2 subunit. In unedited GluA2 proteins the presence of the Q residue allows Ca2+ permeability whereas the edited amino acid (R) does not. Almost all of the GluA2 present in the human brain is edited. The importance of GluA2 mRNA editing can be demonstrated by the phenotype of ADAR2 knockout mice. These mice have significantly reduced editing of the Q/R site which causes them to be highly seizure-prone, and they die within 3 weeks of birth.
Editing of the serotonin receptor mRNA occurs specifically in the 5-HT2C subtype within the cells of the prefrontal cortex. This mRNA, encoded by the HTR2C gene, contains five sites that are A-to-I edited. These sites are referred to as A, B, C' (E), C, and D. The most commonly detected edited 5-HT2C mRNAs are edited at the AC'C, ABD, and ABCD combination sites. There is a strong correlation to severe psychiatric behaviors and 5-HT2C mRNA editing combinations. In victims of suicide, who had been diagnosed with a history of major depression, the level of C' editing is much higher and the level of D editing is significantly decreased when compared in unaffected individuals. Interestingly, when mice are treated with the antidepressant, fluoxetine, the pattern of C, C', and D editing in the 5-HT2C mRNA is the exact opposite to that observed in victims of suicide.
A-to-I editing also occurs in the non-coding region of the ADAR2 pre-mRNAs. The consequence of ADAR2 editing its own mRNA is the generation of an alternative splice acceptor site in intron 1, resulting in an alternative splicing event that creates a nonfunctional ADAR2 protein.
The A-to-I editing process also influences the biogenesis and target recognition of siRNAs involved in the RNAi pathway. siRNA biogenesis requires processing of long dsRNA precursors into 21- to 23-nucleotide RNA duplexes which ultimately initiate transcriptional and post-transcriptional sequence-specific silencing. For details on the processing of siRNAs (and miRNAs) go to the Control of Gene Expression page. The RNA editing and RNAi pathways both involve dsRNAs, therefore, editing could potentially antagonizing the RNAi pathway. A-to-I edits could potentially alter the required dsRNA structures of siRNAs (and miRNAs) leading to reduced processing and thus, decreased functional siRNAs. In addition, editing of siRNAs and miRNAs could change their proper targeting to sequence-specific silencing sites in target mRNAs.
The first reported instance of C-to-U editing was within the mRNA encoding apolipoprotein B (apoB). Editing of the apoB mRNA changes a CAA codon to a UAA translational stop codon leading to premature termination of protein synthesis. When the apoB gene is transcribed within hepatocytes the mRNA is not edited and a full-length apoB protein is generated called apoB-100. This apolipoprotein (apoB-100) is found exclusively with the VLDL particles produced and secreted by the liver. Within intestinal enterocytes, the apoB mRNA is edited resulting in the generation of a smaller protein called apoB-48. This apolipoprotein (apoB-48) is found exclusively associated with chylomicrons, the lipoprotein particles produced by the intestines and released to the lymphatic system. C-to-U editing of the apoB mRNA requires a single-stranded RNA template with well defined characteristics in the immediate vicinity of the edited base, as well as protein cofactors that assemble into a functional complex referred to as a holoenzyme or editosome. This functional complex includes a minimal core composed of apolipoprotein B mRNA editing enzyme, catalytic polypeptide 1 (APOBEC-1; the catalytic deaminase) and a competence factor, APOBEC-1 complementation factor (A1CF). The function of A1CF is to act as an adaptor protein by binding both the APOBEC-1 enzyme and the mRNA substrate.
Another example of C-to-U mRNA editing involves site-specific deamination of a CGA to UGA codon in the neurofibromatosis type 1 (NF1) mRNA. The NF1 mRNA encodes a protein identified as neurofibromin 1. The editing of the NF1 mRNA introduces a translational stop codon at position 3916 that results in a truncation of the neurofibromin 1 protein in a critical domain involved in GTPase activation. Although no demonstration of a truncated NF1 protein has been shown, the editing of the NF1 mRNA has been demonstrated in peripheral nerve sheath tumors from patients with type 1 neurofibromatosis.
A third C-to-U edited mRNA encodes eukaryotic initiation factor 4, gamma 2, eIF-4G2 (also identified as p97, DAP5, and NAT1) which is a translational repressor that may be involved in repression of global translation. The editing of the eIF-4G2 mRNA was identified in studies that demonstrated the oncogenic potential of APOBEC-1 when it was overexpressed in experimental animals. In these studies it was found that the eIF-4G2 mRNA underwent C-to-U editing at multiple sites, creating of stop codons that in turn reduced the abundance of the eIF-4G2 protein. The eIF-4G2 protein has a crucial role in early embryogenesis since eIF-4G2-negative embryos die during gastrulation. Although the precise mechanism through which elevated APOBEC-1 activity leads to dysplasia and cancer is not yet defined, host adaptations have been shown to modulate the expression of APOBEC-1 in sporadic human colorectal cancers.
Editing of the apoB mRNA: When the apoB gene is expressed in the liver the resulting mRNA is not edited and is translated into the full-length apoB-100 protein present in VLDL. When the gene is transcribed in the intestines, editing of the mRNA converts a CAA codon to a translational stop codon (UAA) resulting in the translation of a truncated apoB-48 protein that is present in chylomicrons.
The APOBEC-1 deaminase is encoded by the APOBEC1 gene located on chromosome 12p13.1 and is composed of 6 exons that generate three alternatively spliced mRNAs that encode two distinct protein isoforms. The APOBEC1 gene is a member of a large cytidine deaminase gene family but is the only member of the family that encodes an mRNA-specific editing enzyme. All the other members of the family function primarily to edit cytidine residues in different types DNA molecules. The other members of the family include APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4, and activation-induced cytidine deaminase (AICDA). Although the APOBEC3A encoded protein functions principally to deaminate cytidines of single-stranded DNA and to inhibit viruses and retrotransposons, it is also known to deaminate cytidines in mRNAs in monocytes and macrophages in response to hypoxia. The enzynmes encoded by the APOBEC3D, APOBEC3F, APOBEC3G, and APOBEC3H genes function as anti-retroviral enzymes and have been shown to restrict HIV infection. Each of these four enzymes gets assembled into infectious virion particles where they deaminate cytidine residues in the viral cDNA resulting in reduced progression of reverse transcription. The resulting uracil residues induce G-to-A hypermutations in the HIV-1 genome since A base pairs with U during DNA replication.back to the top
Ribozymes represent a special class of RNA molecules that possess catalytic activity. Ribozyme are composed of well-defined tertiary structures that impart the RNAs with their unique biological activity as nucleic acid enzymes. Ribozymes have been identified in a wide range of genomes from viruses to mammals. To date, eight naturally occurring classes of ribozyme have been defined, all of which catalyze cleavage or ligation of the RNA backbone by trans-esterification or hydrolysis of phosphate groups. The catalytic properties of ribozymes are exclusively due to the capacity of these RNA molecules to assume particular structures. RNA molecules have the capacity to fold into several distinct structures which can enable a single RNA to perform more than one function. RNA-mediated catalysis was first demonstrated in the process of intron splicing (group I and II introns). Subsequently, numerous RNAs harboring catalytic activity have been described. Ribozymes have been shown to be involved in tRNA processing (RNaseP), phosphoryl transfer reactions catalyzing the cleavage or ligation of the RNA phosphodiester backbone, in protein synthesis (peptidyltransferase) and in the regulation of gene expression. Despite the similarity of the chemistry of the reactions catalyzed by ribosomes, each molecule possesses a completely unique sequence, tertiary structure, and a specific catalytic mechanism, which reflects the diversity of catalytic strategies of ribozymes. Peptidyltransferase activity of the ribosome represents a distinct ribozyme structure and activity.
The enzymatic activity of ribozymes depends on the capacity of the RNA to fold into specific structures that impart catalytic specificity. The possibility, for a single RNA molecule, to fold into more than one structure, implies that a single RNA polymer could have more than one function. This means the RNA molecules could perform more than one task resulting in a single sequence (the genotype) manifesting multiple phenotypes. That this is indeed the case has been demonstrated for short (25-34 nucleotides) RNA sequences which exhibit the ability to bind two different ligands such as GMP and L-arginine. In addition, another experiment, designed to select for a ribozyme that catalyzed the ligation of two RNA substrates, discovered that the RNA molecule could also undergo a separate self-cleavage reaction. These two distinct enzymatic reactions, ligation and cleavage, were imparted by two distinct sites of the RNA molecule. Multiple bifunctional ribozymes have been identified.
Group I introns are considerably larger and more structurally complex than any of the self-cleaving RNAs. This class of ribozme is found in precursor mRNA, tRNA, and rRNA transcripts from a variety of organisms. The catalytic reaction carried out by group I intron ribozymes occurs in two steps. The reactions result in the ligation of flanking 5' and 3' exons to yield the mature RNA. Several hundred examples of this class of ribozyme have been identified. All of them share a common secondary structure and most likely a similar reaction mechanism. The Tetrahymena thermophila rRNA intron was the first group I self-splicing intron discovered (see section above). The ribozyme derived from this intron is 421 nucleotides long and is composed of a conserved catalytic core of roughly 200 nucleotides. This ribozyme catalyzes the first step of intron self-splicing using an oligonucleotide to mimic the 5'-exon. The 3' oxygen of an exogenous guanosine serves as the nucleophile for this reaction (see Figure above).
The most recently discovered functional class of ribozymes include those that are involved in the regulation of protein synthesis. Two of these newly identified ribozymes are the mammalian cytoplasmic polyadenylation element-binding protein 3 (CPEB3) ribozyme and a variant hammerhead ribozyme embedded in mammalian mRNAs. Hammerhead ribozymes are so-called because of the secondary structure evident in the active ribozyme. The hammerhead, hepatitis delta virus (HDV), hairpin, Neurospora Varkud satellite (VS), and glmS ribozymes are a class of small RNAs (50–150 nucleotides) that catalyze site-specific self-cleavage and were originally characterized in viral, virusoid, bacterial, or satellite RNA genomes.
The glmS ribozyme is a ribozyme found in Gram-positive bacteria. It is considered a metabolite-responsive ribozyme since it was originally discovered by its ability to catalyze site-specific RNA cleavage in the presence of glucosamine-6-phosphate (GlcN6P). The glmS ribozyme was originally identified in the 5'-untranslated region of the GLMS gene which is involved in the synthesis of GlcN6P. The glmS ribozyme is also considered a riboswitch since it is involved in the regulation of gene expression in response to changing concentrations of a metabolite.
The CPEB3 ribozyme is a self-cleaving non-coding RNA located in the second intron of the CPEB3 gene, which belongs to a family of genes regulating the reactions of mRNA polyadenylation. A 72 nucleotide core of the CPEB3 ribozyme sequence is sufficient to carry out self-cleavage. The cleavage activity of the CPEB3 ribozyme is slow which, under normal conditions, allows normal splicing of the CPEB3 pre-mRNA to occur. A trans-acting factor is known to interact with the ribozyme cleavage site thereby, regulating the rate of ribozyme self-cleavage. When self-cleavage is increased, the level of truncated CPEB3 pre-mRNAs increases resulting in degradation of the cleaved RNA fragments. This process may serve as a switch to turn off the synthesis of the CPEB3 protein.back to the top
The presence of introns in eukaryotic genes would appear to be an extreme waste of cellular energy when considering the number of nucleotides incorporated into the primary transcript only to be removed later, as well as the energy utilized in the synthesis of the splicing machinery. However, the presence of introns can protect the genetic makeup of an organism from genetic damage by outside influences such as chemical or radiation. An additionally important function of introns is to allow alternative splicing to occur, thereby, increasing the genetic diversity of the genome without increasing the overall number of genes. By altering the pattern of exons that are spliced together, from a single primary transcript, different proteins can arise from the processed mRNA from a single gene. Alternative splicing can occur either at specific developmental stages or in different cell types. As indicated earlier, the process of alternative splicing has been identified to occur in the primary transcripts from at least 80% of all human protein coding genes. One of the first clinically relevant examples of alternative splicing in humans involved the calcitonin gene (CALCA). Depending upon the site of transcription, the calcitonin gene yields an RNA that synthesizes calcitonin (thyroid) or calcitonin-gene related peptide (CGRP, brain). Even more complex is the alternative splicing that occurs in the α-tropomyosin transcript. At least eight different alternatively spliced α-tropomyosin mRNAs have been identified.
Abnormalities in the splicing process can lead to various disease states. Diseases that have been identified as being due to alteration in, or the result of, alternative splicing are numerous. The causes of the alterations in the alternative splicing process are also numerous. There are diseases that are the result of mutations in splicing regulatory sequences in exons (e.g. the spinal muscular atrophies, SMA) resulting in inappropriate exon skipping. Alterations in alternative splicing can also lead to changes in protein isoform ratios that ultimately results in manifestation of disease (e.g. the diseases of the brain that result from abnormal accumulation of the tau protein). Mutations in sequences within introns can lead to the activation of cryptic splice sites resulting in abnormally spliced exons. Numerous diseases are the result of mutations in either the 5'- or the 3'-splice sites such as various β-thalassemias. Diseases are also caused by mutations in genes the encode proteins of the spliceosomal machinery. Numerous human cancers are caused by mutations that alter splice site selection, particularly in tumor suppressor genes, or by mutations in genes encoding protein factors of the splicing machinery. Patients suffering from a number of different connective tissue diseases exhibit humoral auto-antibodies that recognize small nuclear RNA-protein complexes (snRNPs). Patients suffering from systemic lupus erythematosis (SLE) have auto-antibodies (anti-nuclear antibodies) that recognize the U1 RNA of the spliceosome.back to the top