Regulation of Gene Expression

Return to The Medical Biochemistry Page

© 1996–2017, LLC | info @


The controls that act on gene expression (i.e., the ability of a gene to produce a biologically active protein) are much more complex in eukaryotes than in prokaryotes. A major difference is the presence in eukaryotes of a nuclear membrane, which prevents the simultaneous transcription and translation that occurs in prokaryotes. Whereas, in prokaryotes, control of transcriptional initiation is the major point of regulation, in eukaryotes the regulation of gene expression is controlled nearly equivalently from many different points.












back to the top

Gene Control in Prokaryotes

In bacteria, genes are clustered into operons: gene clusters that encode the proteins necessary to perform coordinated function, such as biosynthesis of a given amino acid. RNA that is transcribed from prokaryotic operons is polycistronic a term implying that multiple proteins are encoded in a single transcript.

In bacteria, control of the rate of transcriptional initiation is the predominant site for control of gene expression. As with the majority of prokaryotic genes, initiation is controlled by two DNA sequence elements that are approximately 35 bases and 10 bases, respectively, upstream of the site of transcriptional initiation and as such are identified as the -35 and -10 positions. These 2 sequence elements are termed promoter sequences, because they promote recognition of transcriptional start sites by RNA polymerase. The consensus sequence for the -35 position is TTGACA, and for the -10 position, TATAAT. (The -10 position is also known as the Pribnow-box.) These promoter sequences are recognized and contacted by RNA polymerase.

The activity of RNA polymerase at a given promoter is in turn regulated by interaction with accessory proteins, which affect its ability to recognize start sites. These regulatory proteins can act both positively (activators) and negatively (repressors). The accessibility of promoter regions of prokaryotic DNA is in many cases regulated by the interaction of proteins with sequences termed operators. The operator region is adjacent to the promoter elements in most operons and in most cases the sequences of the operator bind a repressor protein. However, there are several operons in E. coli that contain overlapping sequence elements, one that binds a repressor and one that binds an activator.

As indicated above, prokaryotic genes that encode the proteins necessary to perform coordinated function are clustered into operons. Two major modes of transcriptional regulation function in bacteria (E. coli) to control the expression of operons. Both mechanisms involve repressor proteins. One mode of regulation is exerted upon operons that produce gene products necessary for the utilization of energy; these are catabolite-regulated operons. The other mode regulates operons that produce gene products necessary for the synthesis of small biomolecules such as amino acids. Expression from the latter class of operons is attenuated by sequences within the transcribed RNA.

A classic example of a catabolite-regulated operon is the lac operon, responsible for obtaining energy from β-galactosides such as lactose. A classic example of an attenuated operon is the trp operon, responsible for the biosynthesis of tryptophan.

back to the top

The lac Operon

The lac operon (see diagram below) consists of one regulatory gene (the i gene) and three structural genes (z, y, and a). The i gene codes for the repressor of the lac operon. The z gene codes for β-galactosidase (β-gal), which is primarily responsible for the hydrolysis of the disaccharide, lactose into its monomeric units, galactose and glucose. The y gene codes for permease, which increases permeability of the cell to β-galactosides. The a gene encodes a transacetylase. During normal growth on a glucose-based medium, the lac repressor is bound to the operator region of the lac operon, preventing transcription. However, in the presence of an inducer of the lac operon, the repressor protein binds the inducer and is rendered incapable of interacting with the operator region of the operon. RNA polymerase is thus able to bind at the promoter region, and transcription of the operon ensues. The lac operon is repressed, even in the presence of lactose, if glucose is also present. This repression is maintained until the glucose supply is exhausted. The repression of the lac operon under these conditions is termed catabolite repression and is a result of the low levels of cAMP that result from an adequate glucose supply. The repression of the lac operon is relieved in the presence of glucose if excess cAMP is added. As the level of glucose in the medium falls, the level of cAMP increases. Simultaneously there is an increase in inducer binding to the lac repressor. The net result is an increase in transcription from the operon. The ability of cAMP to activate expression from the lac operon results from an interaction of cAMP with a protein termed CRP (for cAMP receptor protein). The protein is also called CAP (for catabolite activator protein). The cAMP-CRP complex binds to a region of the lac operon just upstream of the region bound by RNA polymerase and that somewhat overlaps that of the repressor binding site of the operator region. The binding of the cAMP-CRP complex to the lac operon stimulates RNA polymerase activity 20-to-50-fold.

The lac operon of E.coli

Regulation of the lac operon in E. coli. The repressor of the operon is synthesized from the i gene. The repressor protein binds to the operator region of the operon and prevents RNA polymerase from transcribing the operon. In the presence of an inducer (such as the natural inducer, allolactose) the repressor is inactivated by interaction with the inducer. This allows RNA polymerase access to the operon and transcription proceeds. The resultant mRNA encodes the β-galactosidase, permease and transacetylase activities necessary for utilization of β-galactosides (such as lactose) as an energy source. The lac operon is additionally regulated through binding of the cAMP receptor protein, CRP (also termed the catabolite activator protein, CAP) to sequences near the promoter domain of the operon. The result is a 50 fold enhancement of polymerase activity.

back to the top

The trp Operon

The trp operon (see diagram below) encodes the genes for the synthesis of tryptophan. This cluster of genes, like the lac operon, is regulated by a repressor that binds to the operator sequences. The activity of the trp repressor for binding the operator region is enhanced when it binds tryptophan; in this capacity, tryptophan is known as a corepressor. Since the activity of the trp repressor is enhanced in the presence of tryptophan, the rate of expression of the trp operon is graded in response to the level of tryptophan in the cell.

Expression of the trp operon is also regulated by attenuation. The attenuator region, which is composed of sequences found within the transcribed RNA, is involved in controlling transcription from the operon after RNA polymerase has initiated synthesis. The attenuator of sequences of the RNA are found near the 5' end of the RNA termed the leader region of the RNA. The leader sequences are located prior to the start of the coding region for the first gene of the operon (the trpE gene). The attenuator region contains codons for a small leader polypeptide, that contains tandem tryptophan codons. This region of the RNA is also capable of forming several different stable stem-loop structures.

Depending on the level of tryptophan in the cell and hence the level of charged trp-tRNAs, the position of ribosomes on the leader polypeptide and the rate at which they are translating allows different stem-loops to form. If tryptophan is abundant, the ribosome prevents stem-loop 1-2 from forming and thereby favors stem-loop 3-4. The latter is found near a region rich in uracil and acts as the transcriptional terminator loop as described in the RNA synthesis page. Consequently, RNA polymerase is dislodged from the template.

The operons coding for genes necessary for the synthesis of a number of other amino acids are also regulated by this attenuation mechanism. It should be clear, however, that this type of transcriptional regulation is not feasible for eukaryotic cells.

The trp operon of E.coli

Regulation of the trp operon in E. coli. The trp operon is controlled by both a repressor protein binding to the operator region as well as by translation-induced transcriptional attenuation. The trp repressor binds the operator region of the trp operon only when bound to tryptophan. This makes tryptophan a co-repressor of the operon. The trpL gene encodes a non-functional leader peptide which contains several adjacent trp codons. The structural genes of the operon responsible for tryptophan biosynthesis are trpE, D, C, B and A. When tryptophan level are high some binds to the repressor which then binds to the operator region and inhibits transcription. The mechanism of attenuation of the trp operon is diagrammed below.

Attenuation of the trp operon of E.coli

Attenuation of the trp operon. The attenuation region of the trp operon contains sequences that allow the resulting mRNA to form several different stem-loop structures. These regions are identified as 1 through 4. The stem-loops that are significant as to whether transcription is attenuated or not are formed between regions 2 and 3 or between regions 3 and 4. When tryptophan levels are high there is plenty of charged trp-tRNAs available and ribosomes translating the leader peptide encoded by the trpL gene do not stall at the repeated trp codons in the leader peptide. Under these conditions the ribosomes rapidly cover regions 1 and 2 of the mRNA which allows the stem-loop composed of regions 3 and 4 to form. The stem-loop formed by regions 3-4 results in a transcriptional termination structure and transcription of the trp operon ceases, i.e. is attenuated. Conversely, when tryptophan levels are low the level of charged trp-tRNAs will also be low. This leads to a stalling of the ribosomes within the leader peptide when they encounter the trp codon repeats. The ribosome stalls over region 1 of the mRNA which allows step-loop 2-3 to form and prevents the transcriptional termination stem-loop 3-4 from forming. The inability of this structure to form allows the entire operon to be transcribed and the tryptophan biosynthetic enzymes to be produced.

back to the top

Gene Control in Eukaryotes

In eukaryotic cells, the ability to express biologically active proteins comes under regulation at several points:

1. Chromatin Structure: The physical structure of the DNA, as it exists compacted into chromatin, can affect the ability of transcriptional regulatory proteins (termed transcription factors) and RNA polymerases to find access to specific genes and to activate transcription from them. The presence modifications of the histones and of CpG methylation most affect accessibility of the chromatin to RNA polymerases and transcription factors.

2. Epigenetic Control: Epigenesis refers to changes in the pattern of gene expression that are not due to changes in the nucleotide composition of the genome. Literally "epi" means "on" thus, epigenetics means "on" the gene as opposed to "by" the gene.

3. Transcriptional Initiation: This is the most important mode for control of eukaryotic gene expression (see below for more details). Specific factors that exert control include the strength of promoter elements within the DNA sequences of a given gene, the presence or absence of enhancer sequences (which enhance the activity of RNA polymerase at a given promoter by binding specific transcription factors), and the interaction between multiple activator proteins and inhibitor proteins.

4. Transcript Processing and Modification: Eukaryotic mRNAs must be capped and polyadenylated, and the introns must be accurately removed (see RNA Synthesis Page). Several genes have been identified that undergo tissue-specific patterns of alternative splicing, which generate biologically different proteins from the same gene.

5. RNA Transport: A fully processed mRNA must leave the nucleus in order to be translated into protein.

6. Transcript Stability: Unlike prokaryotic mRNAs, whose half-lives are all in the range of 1 to 5 minutes, eukaryotic mRNAs can vary greatly in their stability. Certain unstable transcripts have sequences (predominately, but not exclusively, in the 3'-non-translated regions) that are signals for rapid degradation.

7. Translational Initiation: Since many mRNAs have multiple methionine codons, the ability of ribosomes to recognize and initiate synthesis from the correct AUG codon can affect the expression of a gene product. Several examples have emerged demonstrating that some eukaryotic proteins initiate at non-AUG codons. This phenomenon has been known to occur in E. coli for quite some time, but only recently has it been observed in eukaryotic mRNAs.

8. Small RNAs and Control of Transcript Levels: Within the past several years a new model of gene regulation has emerged that involves control exerted by small non-coding RNAs. This small RNA-mediated control can be exerted either at the level of the translatability of the mRNA, the stability of the mRNA or via changes in chromatin structure.

9. Post-Translational Modification: Common modifications include glycosylation, acetylation, fatty acylation, disulfide bond formations, etc.

10. Protein Transport: In order for proteins to be biologically active following translation and processing, they must be transported to their site of action.

11. Control of Protein Stability: Many proteins are rapidly degraded, whereas others are highly stable. Specific amino acid sequences in some proteins have been shown to bring about rapid degradation.

back to the top

Chromatin Structure and Control of Gene Expression

DNA Methylation: Formation of m5C (5mC)

With respect to DNA methylation, the modification alters chromatin structure thereby altering transcriptional activity and it is, therefore, considered an epigenetic process. The events of DNA methylation and demethylation are covered in much greater detail in the DNA Metabolism page. When determining which C residues in DNA are targets for methylation it was discovered that greater than 90% of methyl-C is found in the dinucleotide, CpG. This is not to say that all CpG dinucleotides contain a methylated C residue. When examining the structure of eukaryotic genes and identifying regions of CpG dinucleotides it is the case that the promoter regions of genes contain 10-20 times as many CpGs when compared to the rest of the genome. In a general sense what is known about DNA methylation and transcriptional status is that when regions of a gene that can be methylated are methylated, the associated gene(s) is(are) transcriptionally silent and when the region is under-methylated the gene(s) is(are) transcriptionally active or can be activated. When cells undergo differentiation it has been observed that genes that become transcriptionally activated exhibit a reduction in methylation status relative to the level prior to activation and that this under-methylation remains even after transcription ceases.

When a C residue in a CpG dinucleotide is methylated the methyl group is attached to the 5-position of the cytidine and is designated m5C or 5mC. The methylation of DNA is catalyzed by several different DNA methyltransferases (abbreviated DNMT). Humans express three DNMT genes identified as DNMT1, DNMT3a, and DNMT3b. The DNMT1 gene is located on chromosome 19p13.2 and is composed of 41 exons that generate four alternatively spliced mRNAs that encode four distinct protein isoforms. The DNMT1 isoform a is the largest isoform and is a 1632 amino acid protein. The DNMT3a gene is located on chromosome 2p23 and is composed of 34 exons that generate six alternatively spliced mRNAs encoding four distinct proteins. The DNMT3b gene is located on chromosome 20q11.2 and is composed of 24 exons that generate six alternatively spliced mRNAs encoding six distinct protein isoforms. Another gene, identified as DNMT3L (for DNMT3-like) has some similarities to the DNA methyltransferases but does not have the methyltransferase catalytic amino acids. The activity of the DNMT3L protein stimulates the DNA methyltransferase activity of DNMT3a. DNMT3L can also affect transcriptional activity through its association with histone deacetylase 1 (HDAC1). Another gene, that was originally designated DNMT2, and thought to be involved in DNA methylation, in fact encodes an enzyme that methylates a specific aspartic acid tRNA. The designation for this gene is now TRDMT1.

When cells divide the DNA contains one strand of parental DNA and one strand of the newly replicated DNA (the daughter strand). If the DNA contains methylated cytidines in CpG dinucleotides the daughter strand must undergo methylation in order to maintain the parental pattern of methylation. This "maintenance" methylation is catalyzed by DNMT1 and thus, this enzyme is called the maintenance methylase. Of the three DNA methyltransferases DNMT1 is the most abundant in all cells. As might be expected from its characterized primary function, DNMT1 has an up to 100-fold higher level of activity towards hemimethylated DNA compared to unmethylated DNA. The activities of DNMT3a and DNMT3b are relatively equivalent towards unmethylated and hemimethylated DNA. The critical role of DNA methylation in controlling developmental fates was demonstrated in mice by inactivating either DNMT3a or DNMT3b. Loss of either gene resulted in death shortly after birth.

process of post-replicative DNA methylation

Process of DNA methylation following DNA replication. Sites of DNA methylation have two fates following the process of DNA replication: they can be maintained or they can be progressively removed. Following replication the parental (template) strands of DNA contain 5mCpG, whereas the reciprocal C residue in the daughter strand is not methylated. If the methylation state of the gene is to be maintained then the maintenance methylase, DNMT1, recognizes the hemimethylated site and incorporates a methyl group into the C residue of the daughter strand CpG dinucleotide.

The correlation between DNA methylation and chromatin structure, as it relates to transcriptional activity, is demonstrated by the observation that there are several proteins, that bind to methylated CpGs but not to unmethylated CpGs, whose functions are integrated into transcriptional regulation. There are currently 15 genes in the human genome that encode proteins that bind to methyl-CpG in DNA. These 15 proteins are divided into 3 subfamilies identified by structural similarities. These sub-families are the methyl binding domain (MBD) proteins, the methyl-CpG-binding zinc finger proteins (also called the Kaiso family), and the SRA domain (SET and RING finger domain Associated) containing proteins. The SET domain is so-called because it was first identified in three Drosophila proteins called Suppressor of variegation variant 3-9 [Su(var)3-9], Enhancer of zeste, and Trithorax. The RING domain is a zinc-finger-like domain which gets its name from the term Really Interesting New Gene.

The first methyl-CpG binding protein to be identified was called methyl-CpG binding protein 1 (MeCP1). The second such protein, and the one most heavily studied, is MeCP2 (methyl-CpG binding protein 2). When MeCP2 binds to methylated CpG dinucleotides the DNA takes on a closed chromatin structure and leads to transcriptional repression. The ability of MeCP2 to bind methylated CpGs is in turn controlled by its state of phosphorylation. When MeCP2 is phosphorylated it binds with less affinity and the DNA acquires a more open chromatin state. The importance of MeCP2 in regulating chromatin structure and consequently transcription is demonstrated by the fact that deficiencies in this protein result in the Rett syndrome. Rett syndrome is a neurodevelopmental disorder that occurs almost exclusively in females manifesting as mental retardation, seizures, microcephaly, arrested development, and loss of speech.

back to the top


Histone Modifications, Chromatin Structure, Transcriptional Regulation

Histone Acetylation - Deacetylation

As described in the DNA Metabolism page, histone proteins are subject a number of modifications and these modifications are known to affect the structure of chromatin. Histone acetylation is known to result in a more open chromatin structure and these modified histones are found in regions of the chromatin that are transcriptionally active. Conversely, underacetylation (deacetylation) of histones is associated with closed chromatin and transcriptional inactivity. A direct correlation between histone acetylation and transcriptional activity was demonstrated when it was discovered that protein complexes, previously known to be transcriptional activators, were found to have histone acetyltransferase (HAT) activity. And as expected, transcriptional repressor complexes were found to contain histone deacetylase (HDAC) activity.

Enzymes that acetylate the ε-amino group of lysine residues in proteins in general and histones in particular are members of the large family of lysine acetyltransferases (KAT) that is composed of 17 genes in humans. Many of the non-histone proteins that are acetylated are involved in DNA replication, recombination, and repair as well as transcription factors and many other protein types. Global protein analysis has identified over 1,700 human proteins that are modified by acetylation of lysine residues. The 17 human KAT genes have been classified into five subfamilies based on sequence homology, shared structural features, and substrate acetylation properties. Mammalian histone acetyltransferases (HAT) are either nuclear localized (often referred to as type A HATs) or localized to the cytoplasm (often referred to as type B HATs). All of the nuclear HATs contain a bromodomain allowing them to recognize and interact with acetylated lysines in histone substrates. The cytoplasmic HATs are responsible for acetylating newly synthesized histone proteins prior to their transport into the nucleus.

The original histone acetyltransferase enzyme to be isolated and characterized was identified as HAT1. Within the context of the KAT nomenclature, HAT1 is also known as KAT1. The five human KAT/HAT subfamilies are identified as HAT1, GCN5/PCAF (GNAT), MYST, p300/CBP, and SRC. In addition to these five subfamilies there are several other KAT enzymes defined by the subclassification of "other".

The HAT1 subfamily is composed of two members, HAT1 and HAT4 (official gene designation is NAA60 for N(alpha)-acetyltransferase 60, NatF catalytic subunit). The HAT1 subfamily proteins are both cytoplasmic enzymes that acetylate newly synthesized histone proteins. The HAT1 protein acetylates histone K5 and K12 in histone H4. It should be noted that some designations include the HAT1 gene in the GCN5/PCAF (GNAT) subfamily.

The GCN5/PCAF subfamily (also known as GCN5-related N-acetyltransferase, GNAT) is so-called because of the initial characterization of the histone acetyltransferase activity of the GCN5 gene encoded protein in the protozoa, tetrahymena. The PCAF gene name is derived from p300/CBP-associated factor. The GCN5/PCAF subfamily consists of the two genes for which the group name is derived, GCN5 (KAT2A) and PCAF (KAT2B). The KAT2A and KAT2B encoded proteins acetylate histones H3 and H4.

The MYST subfamily is named for the four initial members of the group; MOZ, YBF2/SAS3, SAS2, and TIP60. The human MYST subfamily is composed of five proteins, KAT5 (TIP60), KAT6A (MOZ), KAT6B, KAT7, and KAT8. The KAT5 encoded protein acetylates histones H2A and H4. The KAT6A, KAT6B, KAT7, and KAT8 encoded proteins acetylates histones H3 and H4.

The p300/CBP subfamily consists of the two proteins, p300 and CBP, that derived the subfamily name. The p300 protein name is derived from its molecular mass and the protein is encoded by the EP300 (adenovirus E1A binding protein p300) gene. The p300 protein is also defined by the standard KAT nomenclature as KAT3B. The CBP protein name is derived from CREB (cAMP-response element binding protein)-binding protein. The CPB protein is encoded by the CREBBP gene which is also identifed by the standard KAT nomenclature as KAT3A. The CREBBP/KAT3A and EP300/KAT3B encoded proteins acetylate all four histones in the nucleosome, H2A, H2B, H3, and H4.

The SRC subfamily constitutes the nuclear receptor coregulators that have histone acetyltransferase activity. The SRC name is derived from the original identification of steroid receptor coactivator 1 (SRC-1). SRC-1 is encoded by the NCOA1 gene. The SRC subfamily is composed of three members encoded by the NCOA1, NCOA2 (originally GRIP1 for glucocorticoid receptor-interacting protein 1 and also as TIF2 for transcriptional intermediary factor 2), and NCOA3 (originally identified as SRC-3) genes. Each of the SRC subfamily HATs acetylate histones H3 and H4. The NCOA1 encoded protein interacts with other known HATs including KAT3A (CBP), EP300/KAT3B (p300), and KAT2B (PCAF).

Each of the KAT enzymes transfers the acetyl group from acetyl-CoA to the appropriate lysine residue in the target protein. The fact that these enzymes utilize acetyl-CoA as a substrate, and that their catalytic activities result in altered gene expression, provides a direct link between metabolic processes (those that generate acetyl-CoA) and the regulated transcription of genes.

Linkage between DNA methylation and transcriptional regulation via histone acetylation was demonstrated by the observation that proteins that bind to methyl CpG dinucleotides can recruit HDAC complexes to the DNA. In addition, several proteins are known to interact with acetylated lysines in histones that together lead to a more open chromatin structure. Proteins that bind to acetylated histones contain a domain called a bromodomain. The bromodomain is composed of a bundle of four α-helices and is a domain involved in protein-protein interactions in a number of cellular systems in addition to acetylated histone binding and chromatin structure modification.

Histone deacetylation is necessary to regulate the positive or negative effects on gene expression exerted by histone acetylation. The deacetylation of histones in catalyzed by a large superfamily of enzymes that is composed of the sirtuin (SIRT) genes and the histone deacetylase (HDAC) genes. The HDAC genes are further divided into three subfamilies identified as class I, class II, and class IV. The class II HDAC subfamily is further divided into the class IIA and class IIB subfamilies. The human sirtuin gene subfamily is composed of seven genes. The HDAC I subfamily is composed of four genes. The HDAC IIA subfamily is composed of four genes. The HDAC IIB subfamily is composed of two genes. The HDAC IV subfamily is composed of one gene, HDAC11. Little is known about the overall functions of the HDAC11 protein. The sirtuin genes are often referred to as the class III HDAC subfamily. All of the HDAC enzymes are Zn2+-dependent deacetylases, whereas, the sirtuins are NAD+-dependent enzymes.

All of the class I HDAC enzymes are ubiquitously expressed nuclear localized enzymes. In addition, HDAC1, HDAC2, and HDAC3 are components of multiprotein complexes, whereas, HDAC8 is not. The HDAC1 and HDAC2 proteins form both homodimers and heterodimers with each other. Both HDAC1 and HDAC2 are found in at least three distinct multiprotein corepressor complexes. These corepressor complexes are nucleosome remodeling and deacetylating (NRD; also called NuRD), CoREST [corepressor of REST (RE1 silencing transcription factor)], mSin3, and Nanog- and Oct4-associated deacetylase (NODE). In addition to deacetylation activity supplied by HDAC1 and HDAC2, the CoREST complex recruits the histone demethylase (see next section) KDM1 which demethylates the dimethylated K4 residue in histone H3. HDAC3 is a component of the nuclear receptor corepressor (NCoR or NCOR1) and silencing mediator of retinoic acid and thyroid hormone receptor (SMRT or NCOR2) transcriptional corepressor complexes.

The class IIA HDAC proteins all have tissue specific patterns of expression as well as exhibiting distinct functions. All four of the proteins in the class IIA subfamily are shuttled between the cytoplasm and the nucleus. This shuttling process is regulated by their state of phosphorylation. Because the class IIA HDAC proteins all have an amino acid substitution (Tyr for His) in their catalytic domains, these HDACs have little intrinsic deacetylase activity of their own. The principal function of the class IIA HDACs is binding of acetylated lysine residues in other proteins, thereby recruiting chromatin-modifying complexes to specific target genes. The class IIA HDACs function as deacetylases through their ability to recruit HDAC3-containing corepressor complexes to distinct promoters.

The class IIB HDAC proteins also shuttle between the cytoplasm and the nucleus although they are primarily found only in the cytoplasm. One characteristic feature of this class of enzyme is that they all have duplicated catalytic domains. A major function of cytoplasmic HDAC6 is in the clearance of misfolded proteins through the pathway of autophagy or through the formation of aggresomes.

As indicated, the sirtuins are NAD+-dependent protein deacetylases. These enzymes deacetylate not only histones but many other acetylated proteins. Human SIRT1 and SIRT2 proteins are localized to both the nucleus and the cytoplasm. SIRT3, SIRT4, and SIRT5 are localized to the mitochondria, although SIRT3 has been shown to be in the nucleus and the cytoplasm as well. SIRT6 and SIRT7 are only found in the nucleus. A major function of the sirtuins is in the cell survival pathway. Indeed, in studies on the longevity effects of calorie restriction it was found that a major contributor to the positive effects was the activation of the SIRT1 gene. The sirtuins, specifically SIRT1 and SIRT7, inhibit apoptosis via their ability to deacetylate the tumor suppressor protein, p53. Deacetylation of p53 represses its transcriptional activity which decreases its ability to activate apoptotic gene expression pathways. Sirtuins are also involved in pathways that inhibit inflammation and regulate overall cellular metabolic rates. SIRT1 and SIRT3 activation leads to deacetylation of the kinase identified as LKB1 (also called STK11 and PJS kinase). Deacetylation of LKB1 results in its activation leading to phosphorylation and activation of the master metabolic regulatory kinase, AMPK. Another major target of sirtuins that results in metabolic regulation is PGC-1α. Activation of PGC-1α by deacetylation results in the activation of gluconeogenic genes and inhibition of glycolytic genes. PGC-1α also activates mitochondrial oxidative phosphorylation in skeletal muscle. Adipose tissue metabolic processes are also regulated by sirtuin function. SIRT1 in conjunction with the transcriptional corepressor complex NCOR1 represses the transcriptional activation of PPARγ resulting in reduced adipogenesis.

back to the top


Histone Methylation - Demethylation

Another histone modification known to affect chromatin structure is methylation. Methylation of histones can result in three distinct states, monomethylation, dimethylation, or trimethylation. However, with histone methylation there is not a direct correlation between the modification and a specific effect on transcription. Methylation of histones has been shown to occur on both lysine and arginine residues. Histone lysine (K) methylation at certain positions is associated with regions of transcriptionally silenced chromatin, whereas methylation at other positions is associated with transcriptionally active regions of DNA. Histone arginine (R) methylation has been shown to be associated with the promotion of an open chromatin structure and thereby, resulting in transcriptional activation. Methylation of lysine (K) residues in histone H3 (specifically K9 and K27) and histone H4 (K20) is associated with regions of transcriptionally silenced chromatin. These specific methylation sites are identified as H3K9, H3K27, and H4K20. Conversely, methylation at H3K4, H3K36, and H3K79 is associated with transcriptionally active domains in chromatin. However, these associations are not concrete given that H3K9 methylation has been found in transcriptionally active genes and H3K36 methylation has been shown to be associated with repression of intragenic transcription initiation.

All lysine methyltransferase enzymes belong to the large family of enzymes identified as the lysine (K) methyltransferase (KMT) family. The histone lysine methyltransferases are also identified as HMTases (for histone methyltransferases). Humans express a family of 27 protein lysine methyltransferase encoding genes, not all of which methylate histones. The enzymes that carry out histone lysine methylation are all members of the SET-domain-containing family of methyltransferases (except for one enzyme: DOT1L). The SET domain is so-called as it was originally identified in three Drosophila proteins identified as Suppressor of variegation variant 3-9 [Su(var)3-9], Enhancer of zeste, and Trithorax. The SET domain is composed of approximately 130 amino acids. There are four additional histone methyltransferases that belong to a different protein family identified as the PR and SET domain containing transcription factors family, identified as the PRDM family. The PR domain of all of the PRDM family members contains a zinc finger domain. The PR/SET domain family contains 17 members with PRDM2 (also identified as KMT8), PRDM8, PRDM9 (see Figure below), and PRDM14 possessing histone methyltransferase activity. As indicated in the preceding paragraph, several different lysine residues in histones are targets for methylation. Within histone H1 lysine 26 (K26) has been shown to be methylated. Within histone H3 the lysines K4, K9, K27, K36, and K79 are all known to be methylated. Within histone H4 lysines K20 and K59 have been shown to be methylated. Methylation of lysine residues in histones, and other target proteins, by the KMT family enzymes involves the use of S-adenosylmethionine (AdoMet or SAM). The products of the reaction are a methylated lysine and S-adenosylhomocysteine (AdoHcy). The different histone lysine methyltransferases incorporate one (monomethyl), two (dimethyl), or three (trimethyl) methyl groups onto their target lysine. The single non-SET-domain containing histone lysine methyltransferase is encoded by the DOT1L (disruptor of telomeric silencing 1 like) gene. The DOT1L gene is also identified with the KMT nomenclature and is KMT4. The DOT1L encoded methyltransferase catalyzes the methylation of K79 in histone H3 (H3K79).

Histone arginine methylation is catalyzed by family of enzymes designated the protein arginine methyltransferase (PRMT) family. There are nine genes in the human genome that encode PRMT enzymes. Arginine residues in histones H2A, H3, and H4 are known to be methylated. Arginine methylation in histones can be of three distinct types: monomethyl, symmetric dimethyl, and asymmetric dimethyl. The PRMT1 encoded enzyme was the first to be shown to methylate lysine residues if histone proteins. The PRMT1 enzyme incorporates an asymmetric dimethylation on R3 of histone H4. The consequences of H4R3 methylation are enhanced transcriptional activity. Indeed, the PRMT1 protein is considered a transcriptional coactivator and it is recruited to promoters by a number of different transcription factors. The PRMT4 enzyme (also known as coactivator associated arginine methyltransferase 1, CARM1) incorporates asymmetric dimethylation on R17 and R26 of histone H3. Like PRMT1, PRMT4 is considered a transcriptional coactivator. Conversely, the PRMT5 encoded enzyme is a potent transcriptional repressor. The PRMT5-mediated incorporation of a methyl group into R3 of histone H4 imparts a strong transcriptional repressive action. Methylation of arginine residues in histones, and other target proteins, involves the use of AdoMet as for the histone lysine methyltransferases. The products of the PRMT catalyzed reactions are a methylated arginine and S-adenosylhomocysteine.

The methylation of histones provides a site for the binding of other proteins which then leads to alteration of chromatin structure. Proteins that bind to methylated lysines present in histones (as well as other proteins) contain a domain called chromodomain. The chromodomain consists of a conserved stretch of 40–50 amino acids and is found in many proteins involved in chromatin remodeling complexes. In addition, chromodomain proteins are found in the RNA-induced transcriptional silencing (RITS) complex which involves small interfering RNA (siRNA) and microRNA (miRNA)-mediated downregulation of transcription (see below). Another important chromodomain-containing protein is heterochromatin protein 1 (HP1). The presence of methylated H3K9 provides a binding site for HP1 which leads to transcriptional repression due to the formation of heterochromatin (highly compact densely staining chromatin).

processes of protein lysine methylation and demethylation

Processes of protein lysine methylation and demethylation. Histone protein (as well as other protein) lysine methylation and demethylation is catalyzed by a family of lysine methyl transferases (KMT) and lysine demethylases (KDM). Depicted are the enzymatic steps for the generation of a trimethylated lysine residue in a peptide bond in a protein such as histone H3. Various members of the KMT family enzymes can monomethylate, dimethylate, or trimethylate their appropriate substrate lysine residue. This Figure shows the trimethylation catalyzed by the PRDM9 enzyme of the KMT family. The activity of PRDM9 is to trimethylate lysine 4 (K4) of histone H3. The demethylation of lysine residues is catalyzed by members of the Jumonji C (JmjC) domain-containing proteins or the lysine demethylase (LSD) family of proteins. The JmjC-domain demethylases can demethylate all states of lysine methylation with the family member protein KDM5A (formerly called JHDM1A) shown. All these demethylases require 2-oxoglutarate as a cofactor. The LSD family of lysine demethylases only demethylate dimethyl- and monomethyllysine residues, not trimethyllysine. The LSD proteins utilize FAD as their cofactor. The reaction catalyzed by the KDM1A (formerly called LSD1) protein is depicted.

Histone demethylation is carried out by a distinct families of enzymes. The largest family (with numerous subfamilies) of histone demethylases directly reverse histone methylation. An additional family of enzymes indirectly reverses the histone methylation state. All of the histone demethylase enzymes are composed of multiple functional domains. These domains are required for recognition of the correct methylated amino acid in the target histone protein, binding of required cofactors, and carrying out the catalytic reaction. The largest subfamily of histone demethylase enzymes all contain a domain called the Jumonji C (JmjC) domain. The JmjC domain is responsible for cofactor binding in these enzymes. There are at least 30 human genes that encode JmjC-domain-containing proteins and these 30 proteins can be subdivided into 8 subfamilies. The subfamily of JmjC domain-containing histone demethylases is identified as the JmjC-domain-containing histone demethylase (JHDM) family and also known as the JMJD family. All of the JHDM/JMJD subfamily enzymes that catalyze demethylation of lysine residues in histones belong to a larger family of enzymes (at least 80 human family members) that are 2-oxoglutarate (α-ketoglutarate) and Fe2+-dependent dioxygenases. The JHDM enzymes can reverse all three known states of histone methylation. For example JHDM1A reverse H3K36 mono- and dimethylation and H3K4 trimethylation, whereas, JHDM2A reverses H3K9 mono- and dimethylation. Another subfamily of histone demethylases was originally called the lysine specific demethylase (LSD) family since the founding member, a nuclear amine oxidase homolog, was called lysine specific demethylase 1 (LSD1). This subfamily of histone demethylase enzymes directly reverse histone H3K4 or H3K9 methylations by an oxidative reaction that requires the vitamin-derived cofactor, FAD. The LSD family enzymes have only been shown to demethylate mono- and dimethylated histones and not the trimethylated forms.

An additional family of enzymes, that is not strictly a histone demethylase family, converts methyl-arginine residues to citrulline as opposed to direct reversal of the methylation reaction. This family of enzymes was originally referred to as the peptidylarginine deiminase (PADI) family. PADI4 was the first enzyme in the family to be identified to catalyze demethylation of methylated arginine in histones. The catalytic activity of PADI4 functions as a histone deiminase that converts methyl-arginine to citrulline as opposed to directly reversing arginine methylation. Although PADI4 has a clear role in antagonizing methylarginine modifications, it cannot strictly be considered a histone demethylase as it produces citrulline instead of an unmodified arginine. Another enzyme shown to demethylate arginine residues in histones is a JmjC domain-containing enzyme identified as JMJD6. The primary function of the JMJD6 encoded enzyme is to hydroxylate lysine residues in target proteins. However, the enzyme has been shown to demethylate H3R2 and H4R3 residues.

As a result of the large number of histone lysine demethylase enzymes and the different subfamily designations a more refined nomenclature system was adopted. All enzymes that demethylate methylated lysines in histone proteins are now identified as KDM family enzymes where KDM stands for lysine (K) demethylase. There are currently eight KDM subfamilies of enzymes divided based upon factors such as substrate preference, presence of certain domains, and cofactor requirements. Within the context of this new nomenclature human JHDM1A is more correctly identified as KDM2A (KDM2 subfamily) and human JHDM2A is KDM3A (KDM3 subfamily). The human LSD homologs, LSD1 and LSD2 are encoded by the KDM1A and KDM1B genes, respectively.

back to the top


Histone Ubiquitination

Histone proteins can also be modified by addition of the small protein ubiquitin. With respect to the histones, ubiquitin is found on all of the nucleosomal histones (H2A, H2B, H3, and H4) as well as on the linker histone, H1. However, the vast majority of ubiquitylated histones are H2A and H2B and these are both of the monoubiquitin form. Monoubiquitylation of H2A occurs at Lys 119 (K119) and that in H2B is K120. Although monoubiquitylation of H2A and H2B predominates, polyubiquitylation is observed. Polyubiquitylation of K36 in histone H2A and the variant H2AX is associated with responses to DNA damage and this modification is required for the repair processes to be initiated. Histone H3 and H4 are also known to be polyubiquitylated but the precise biological functions of these modified histones is not fully elucidated. When ubiquitylated, H2A is associated with repression of transcription. The exact opposite effect is observed when histone H2B is ubiquitylated, leading to a stimulation of gene activity. One of the reasons that monoubiquitylated histone H2B is associated with transcriptional activity is that this modification promotes the methylation of histone H3 at K4 and K79, which as indicated above is associated with open chromatin structure. Given that ubiquitylation of H2A is primarily associated with gene silencing it is not surprising that the H2A ubiquitin ligases are found associated with transcriptional corepressor complexes.

At least seven different ubiquitin ligases have been shown to ubiquitylate the histones. The vast majority of these characterizations were carried out with studies on the monoubiquitylation of H2A and H2B. The monoubiquitylation of H2A and H2B is known to be reversible and the enzymes that catalyze the removal are peptidases identified as deubiquitylating enzymes (DUB). At least six different DUB enzymes have been identified to be involved in the removal of monoubiquitin from H2A and H2B.

back to the top


Histone Phosphorylation

Histone phosphorylation is known to occur on all four of the nucleosomal histones, H2A, H2B, H3, and H4. Phosphorylation of histones occurs on Ser, Thr, and Tyr residues by the action of several kinases. The removal of the phosphorylation is catalyzed by phosphatases. Phosphorylation of histones occurs primarily, although not exclusively, in response to outside signals such as growth factor stimulation or stress inducers such as heat shock. Phosphorylated histones are localized to genes that become transcriptionally active as a consequence of these outside signals. Phosphorylation of histone proteins is also required to regulate other forms of histone modification. For example, phosphorylation of Ser 1 (S1) in histone H4 prevents the acetylation of this histone.

Numerous residues in the four nucleosomal histones have been shown to be phosphorylated leading to alteration of transcriptional activity. Phosphorylation sites in histone H2A include Ser 1 (S1), S16, and Thr 119 (T119). The consequences of the H2AS1 modification are transcriptional inhibition, whereas H2AT119 is associated with the regulation of chromatin structure during mitosis. Histone H2B is phosphorylated on S14, S32, S36, and Tyr 37 (Y37). The H2BS14 modification is involved in the induction of apoptosis. Phosphorylation of H2B S32 is catalyzed by PKC in response to DNA damage. Phosphorylation of H2B S36 is catalyzed by AMPK in response to cellular stress response pathways. Histone H3 is phosphorylated on numerous residues that includes T3, T6, S10, T11, S28, Y41, and T45. Histone H4 is phosphorylated on S1, S47, His 18 (H18), and H75. The phosphorylation of histidine residues in histone H4 is associated with the facilitation of DNA replication.

In addition to the regulation of transcription as a result of histone phosphorylation, this modification is also associated with the processes of chromatin remodeling and DNA damage repair. A particular H2A gene, identified as H2AFX, encodes a replication-independent histone (protein identified as H2AX or H2a.X) that is critically involved in the response of cells to DNA double-strand breaks (DSB). Phosphorylation of Ser 139 (S139) in H2AX generates the modified histone identified as γH2AX. Phosphorylation of H2AX occurs throughout the cell cycle in response to diverse DNA damage response (DDR) events such as non-homologous end joining (NHEJ), homologous recombination, and replication-coupled DNA repair. Following repair of the damaged DNA, γH2AX is removed from the DNA in order to terminate the retention of DNA damage repair enzymes. In addition to removal from chromatin, γH2AX is dephosphorylated by a number of phosphatases including PP2A. The H2AX protein has also been shown to be phosphorylated on Tyr 142 (Y142) which yields the isoform identified as H2AXY142. Phosphorylation of histone H2B on Ser 14 (S14) is also associated with responses to DNA damage and the induction of apoptosis. The importance of histone phosphorylation in response to DNA damage can be demonstrated in patients with Coffin-Lowry syndrome which results from defects in the RPS6KA3 (ribosomal protein S6 kinase A3; also known as ribosomal S6 kinase 2: RSK2) gene. Coffin-Lowry syndrome is a rare form of X-linked mental retardation characterized by skeletal malformations, growth retardation, hearing deficit, paroxysmal movement disorders, and cognitive impairment in affected males.


Histone O-GlcNAcylation

The hexosamine biosynthesis pathway (HBP) is a major nutrient responsive metabolic pathway whose product (UDP-GlcNAc) is tasked with the regulation of a wide variety cellular processes from metabolism to epigenetic control of gene expression. Recent evidence has conclusively demonstrated that the synthesis of UDP-GlcNAc and the activities of the two enzymes responsible for the addition to (O-GlcNAc transferase: OGT) and removal of (OGA) GlcNAc from nuclear and cytoplasmic proteins contribute to the maintenance of epigenetic states within the chromatin and to the etiology of epigenetic related disease states. With respect to histone modification as an epigenetic event, all four histones present in the nucleosome have been shown to be O-GlcNAcylated with histone H2B being the most highly modified. Histone H2A is known to be O-GlcNAcylated on T101, histone H3 on S11 and T33, and histone H4 on S47. At least six Ser residues and one Thr residue in H2B have been shown to be O-GlcNAcylated under various conditions. The pattern of histone O-GlcNAcylation is dynamic and has been shown to change thoughout the cell cycle. During the G1 phase the level of histone O-GlcNAcylation increases then decreases during S phase and increases again during the G2 and M phases of the cell cycle then declining again as the cells undergo cytokinesis.

The presence of the O-GlcNAc residue on S112 in H2B serves as a docking site for the ubiquitin ligase that modifies the K120 residue of H2B. The ubiquitination of K120 in H2B is associated with transcriptional activation. The O-GlcNAcylation of S112 in H2B is also increased in response to DNA double-strand breaks. The significance of this modification to the normal cellular response to DNA damage has been demonstrated with either H2B mutants that contain an Ala residue at position 112 (S112A) or where the OGT gene has been downregulated. In both instances non-homologous end joining (NHEJ) and homologous repair processes are impaired. When H2A is O-GlcNAcylated on Thr 101 (T101) there is reduced dimerization with H2B which promotes an open chromatin structure leading to increased transcriptional activity.

A link between the energy/nutritional state and regulation of epigenetic modifications of histone proteins has been defined by the observations that the master metabolic regulatory kinase, AMPK, phosphorylates OGT on Thr 444 (T444) which alters the ability of OGT to O-GlcNAcylate histone H2B. When AMPK phosphorylates OGT there is a reduced level of O-GlcNAcylation of S112 in H2B leading to reduced levels of expression of genes that are normally activated by the presence of histone H2B S112 O-GlcNAcylation. Concomitant with the AMPK-mediated phosphorylation of OGT is an increase in the level of histone H3 acetylation on Lys 9 (K9). During nutrient deprivation or energy limitation, AMPK phosphorylates histone H2B on S36 which is one of the sites O-GlcNAcylated by OGT. The phosphorylation of H2B on S36 is essential for the transcriptional response to changes in energy and nutrient content. The interplay between changing OGT activity in nutrient excess and AMPK activity during nutrient deprivation can be shown by the fact that OGT O-GlcNAcylates AMPK on the α-subunit and all three γ-subunits. The consequences of O-GlcNAcylation of AMPK is an increase in its activity indicating a regulatory feedback loop exist between these two important metabolic regulators.


Histone β-Hydroxybutyrylation

Colonic bacteria generate short-chain fatty acids (SCFA) through fermentation of soluble fiber. These SCFA include acetate, propionate, and butyrate which are absorbed by colonocytes. Metabolically the gut bacteria-derived SCFA can be used for oxidation or diverted into the ketogenesis pathway. In addition to hepatocytes, gut epithelial cells are the only other cell to express the HMGCS2 gene allowing them to contribute to ketone synthesis. However, gut-derived SCFA also exert other important cell signaling effects. Although the beneficial effects of these SCFA can be attributed to all three, the most extensively studied effects are those exerted by butyrate. Butyrate promotes colonocyte cell differentiation, suppresses colonic inflammation, and of clinical significance it induces cell cycle arrest and apoptosis in colon cancer cells. These beneficial effects of butyrate (and also shown for propionate), within the colon are mediated, in part, by its ability to inhibit the activity of histone deacetylases (HDACs). Like butyrate, the ketone, β-hydroxybutyrate, has also been shown to inhibit the activity of HDACs. The effects of β-hydroxybutyrate-mediated HDAC inhibition are enhanced expression of genes that reduce the level of oxidative stress.

In addition to altering the patterns of gene expression through modification of HDAC activity, β-hydroxybutyrate can alter gene expression patterns by serving as a direct modifier of lysine residues in histones resulting in lysine β-hydroxybutyrylation. The level of histone β-hydroxybutyrylation increases in hepatocytes in response to prolonged fasting. The effects histone β-hydroxybutyrylation on gene expression represents a novel form of epigenetic control. The level of histone β-hydroxybutyrylation is similar to the level of the more well studied epigenetic histone modification, acetylation. The consequences of histone lysine β-hydroxybutyrylation are changes in expression of numerous genes such as the gene for the transcriptional co-activator, PGC-1β (gene symbol: PPARGC1B) which is itself involved in the regulation of expression of numerous genes involved in energy homeostasis, the insulin receptor substrate 2 (IRS2) gene whose encoded protein is involved in insulin signaling, and the carnitine palmitoyltransferase 1A (CPT1) gene whose encoded protein regulates the ability of the mitochondria to oxidize fatty acids.

back to the top

Epigenetic Control of Gene Expression

The term epigenetics was first coined by Conrad Waddington in 1939 to define the unfolding of the genetic program during development. In addition, he coined the term epigenotype to define "the total developmental system consisting of interrelated developmental pathways through which the adult form of an organism is realized". Clearly this definition encompasses a broad range of concepts dealing with genetics, inheritance and development. Today the term epigenetics is used to define the mechanism by which changes in the pattern of inherited gene expression occur in the absence of alterations or changes in the nucleotide composition of a given gene. A literal interpretation is that epigenetics mean "in addition to changes in genome sequence." The easiest way to understand this concept is to think about the fertilized egg: at the moment of fertilization that single cell is totipotent, i.e. as it divides the daughter cells ultimately differentiate into all the different cells of the organism. The only difference between the various cells of the resultant organism are the consequences of differential gene expression, not due to differences in the sequences of the genes themselves. Evidence indicates that most of the epigenetic modifications are erased during gametogenesis and/or following fertilization.

Several different types of epigenetic events have been identified. As described in the section above, chromatin structure, as a means to control gene expression, can be altered by both DNA modification and histone protein modifications. The role of DNA methylation in these structural changes is likely to be the most important epigenetic event controlling and importantly maintaining the pattern of gene expression during development. However, the importance of other epigenetic phenomena including histone acetylation, methylation, phosphorylation, and ubiquitination cannot be ignored in the overall context of gene regulation via chromatin remodeling. It should be clear that the same events that affect chromatin structure can be defined as epigenetic events. An additional process that affects chromatin structure, and therefore, gene expression is also considered an epigenetic event and this involves the small interfering RNAs (siRNAs) described below.

Genomic imprints, that involve CpG methylation, undergo a cycle of establishment, maintenance, and erasure. It is during spermatogenesis and oogenesis when the CpG methylation status is established. In males the CpG methylation imprints are established in prospermatogonia while in females the imprints are established only by the fully grown oocyte stage. The patterns of CpG methylation that arise in the germ cells are maintained following fertilization and throughout early development and in the adult. During development of the primordial germ cells (PGC), from which sperm and egg will arise, the pattern of CpG methylation is erased. The erasure of the CpG methylation pattern in the PGC ensures the sex-dependent imprint pattern can be established in later stages of spermatogenesis and oogenesis. The DNA methyltransferases responsible for the establishment of the germline differential methylation patterns are encoded by the DNMT3A and DNMT3B genes. As pointed out above, the protein encoded by the DNMT3L gene (which is highly expressed in germ cells) functions to enhance the activity of the DNMT3a enzyme. Once established, the maintenance of the state of germ cell CpG methylation is the function the DNMT1 methylase. The erasure of the CpG methylation imprints, that occurs in primordial germ cells, is carried out by the TET cytidine demethylases (TET1, TET2, and TET3) as well as by activation-induced cytidine deaminase (AID) as described in the DNA Metabolism page for the general removal of 5mC residues in non-imprinted regions of the DNA.

Whereas, epigenesis plays a vital role in the regulation, control, and maintenance of gene expression leading to the many differentiation states of cells in an organism, recent evidence has identified a linkage between epigenetic processes and disease. Most significant is the link between epigenesis and cancer which has been suggested to be a contributing factor in nearly half of all cancers. A clear demonstration has been made between changes in the methylation status of tumor suppressor genes and the development of many types of cancers. Epigenetic effects on immune system function have also been identified. In addition, there is evidence suggesting a link between epigenetic processes and mental health.

back to the top

Control of Eukaryotic Transcription Initiation

Transcription of the different classes of RNAs in eukaryotes is carried out by three different polymerases (see RNA Synthesis Page). RNA pol I synthesizes the rRNAs, except for the 5S species. RNA pol II synthesizes the mRNAs and some small nuclear RNAs (snRNAs) involved in RNA splicing. RNA pol III synthesizes the 5S rRNA and the tRNAs. The vast majority of eukaryotic RNAs are subjected to post-transcriptional processing.

The most complex controls observed in eukaryotic genes are those that regulate the expression of RNA pol II-transcribed genes, the mRNA genes. Almost all eukaryotic mRNA genes contain a basic structure consisting of coding exons and non-coding introns and basal promoters of two types and any number of different transcriptional regulatory domains (see diagrams below). The basal promoter elements are termed CCAAT-boxes and TATA-boxes because of their sequence motifs. The TATA-box resides 20 to 30 bases upstream of the transcriptional start site and is similar in sequence to the prokaryotic Pribnow-box (consensus TATAT/AAT/A, where T/A indicates that either base may be found at that position).

Structure of a typical eukaryotic mRNA gene

Typical structure of a eukaryotic mRNA gene. Eukaryotic mRNA genes have the general regulatory structure composed of a the two basal promoter elements, the TATA-box and the CCAAT-box. In addition there may be one or more enhancer elements associated with the regulatory region of the gene.

Numerous proteins identified as TFIIA, B, C, etc. (for transcription factors regulating RNA pol II), have been observed to interact with the TATA-box. The CCAAT-box (consensus GGT/CCAATCT) resides 50 to 130 bases upstream of the transcriptional start site. The protein identified as C/EBP (for CCAAT-box/Enhancer Binding Protein) binds to the CCAAT-box element.

There are many other regulatory sequences in mRNA genes, as well, that bind various transcription factors (see diagram below). Theses regulatory sequences are predominantly located upstream (5') of the transcription initiation site, although some elements occur downstream (3') or even within the genes themselves. The number and type of regulatory elements to be found varies with each mRNA gene. Different combinations of transcription factors also can exert differential regulatory effects upon transcriptional initiation. The various cell types each express characteristic combinations of transcription factors; this is the major mechanism for cell-type specificity in the regulation of mRNA gene expression.

Structure of the regulatory regions of atypical eukaryotic mRNA gene

Structure of the upstream region of a typical eukaryotic mRNA gene. The diagram indicates the TATA-box and CCAAT-box basal elements reside near nucleotide positions –25 and –100, respectively. The transcription factor TFIID has been shown to be the TATA-box binding protein, TBP. Several additional transcription factor binding sites have been included and shown to reside upstream of the 2 basal elements and of the transcriptional start site. The location and order of the variously indicated transcription factor-binding sites is only diagrammatic and not indicative as being typical of all eukaryotic mRNA genes. There exists a vast array of different transcription factors that regulate the transcription of all 3 classes of eukaryotic gene encoding the mRNAs, tRNAs and rRNAs. CREB: cAMP response element binding protein. C/EBP: CCAAT-box/enhancer binding protein.

back to the top

Nuclear Receptors and Control of Transcriptional Initiation

Nuclear Receptor Coactivators

The first nuclear receptor coactivator to be identified was steroid receptor coactivator-1 (SRC-1). To date, more than 400 coregulators (both coactivators and corepressors) have been identified. There are now known to exist three SRC gene families. SRC-1 (encoded by the NCOA1 gene), SRC-2 (also known as GRIP1 for glucocorticoid receptor-interacting protein 1 and TIF2 for transcriptional intermediary factor 2) encoded by the NCOA2 gene, and SRC-3 (also known as AIB1 for amplified in breast cancer 1 and TRAM-1 for thyroid hormone receptor activator molecule 1) encoded by the NCOA3 gene. The three members of the SRC family contain homologous domains and share between 50% and 54% amino acid sequence similarity. There is also a diverse family of enzymes that interact with and modify SRCs which includes histone acetyltransferases (HATs), histone methyltransferases (HMTs), kinases, phosphatases, ubiquitin ligases, and small ubiquitin-related modifier (SUMO) ligases.

Peroxisome proliferator-activated receptor gamma, coactivator 1 alpha (PGC-1α) is another critical NR coregulator. PGC-1α has been shown be involved in the regulation of metabolism and energy homeostasis. Indeed, expression levels of PGC-1α have been associated with genetic diseases associated with impaired mitochondrial function, including type 2 diabetes and obesity. Another important coactivator is CBP [CREBP (cAMP response-element binding protein)-binding protein]. CBP is closely related to another coactivator called p300. The designation of p300 relates to the molecular size of the originally characterized protein. The p300 protein is encoded by the EP300 gene. Both CBP and p300 possess intrinsic histone acetyltransferase (HAT) activity that leads to relaxation of the chromatin structure near a NR target gene. Other chromatin remodeling complexes, such as coactivator-associated arginine methyltransferase 1 (CARM1), can also stimulate gene transcription by NRs as well as other transcription factors in combination with the SRC family of coactivators.

In addition to acting a coactivators for NRs, the SRC family proteins also interact with many different types of transcription factors and potentiate their transcriptional activity. These include p53, signal transducers and activators of transcription (STATs), nuclear factor-κB (NF-κB), hypoxia-inducible factor 1 (HIF1), and hepatocyte nuclear factor-4 (HNF4) to name just a few. Several extracellular stimuli, such as growth factors and cytokines, that activate membrane-spanning signal transducing receptors, generating phosphorylation codes on SRCs that lead to increased coactivator affinity for the androgen receptor (AR), estrogen receptor-alpha (ERα), and progesterone receptor (PR).

model of nuclear receptor (NR) coactivator complex assembly at a target gene

Model for NR interactions with coactivators: An example of the transcription factor complexes associated with both the RXR and PPAR heterodimeric transcription factor complex at an HRE, and several basal transcription factors associated with RNA pol II at a target gene transcriptional start site. Binding of ligand to a particular PPAR results in assembly of the complete coregulatory (in this case coactivator) complex. Formation of the complex induces histone modifications (such as acetylation, Ac; and methylation, Me) that in turn alter chromatin structure allowing entry of the basal transcriptional machinery including RNA pol II. The complete assembly then leads to activation of target gene transcription.

Nuclear Receptor Corepressors

As a general rule it has been established that when nuclear receptors are free of activating ligand they preferentially interact with corepressor complexes to mediate transcriptional repression. Nuclear receptor corepressor 1 (NCoR1) and silencing mediator of retinoic and thyroid receptors (SMRT) are the most well-characterized NR corepressor complexes. The NCoR1 protein is encoded by the NCOR1 gene and the SMRT protein is encoded by the NCOR2 gene. The core NCoR/SMRT protein complex consists of NCoR/SMRT, transducin β-like 1/related 1 (TBL1/TBLR1: encoded by the TBL1 gene), histone deacetylase 3 (encoded by the HDAC3 gene), and G-protein pathway suppressor 2 (encoded by the GPS2 gene). NCoR and SMRT serve as the docking sites for corepressor complex assembly. NCoR/SMRT bind various nuclear receptors and associate with each of the other complex subunits.

As discussed above, when the NR interacts with ligand, transcriptional activation results due to the ability of the NR-ligand complex to recruit coactivator proteins and displace corepressor proteins. Nuclear receptor corepressors can inhibit the transcriptional activity of most members of the NR superfamily. As always in biology, there are a few exceptions to the general rule of  unliganded NR binding corepressors. These exceptions include LCoR (ligand-dependent nuclear-receptor corepressor; encoded by the LCOR gene), RIP140 (receptor-interacting protein-140; encoded by the NRIP1 gene) and repressor of estrogen receptor activity (REA; encoded by the prohibitin 2 gene, PHB2). These repressors bind to nuclear receptors in a ligand-dependent manner and compete with coactivators by displacing them. In addition, there are several coregulatory factors, such as the ATP-dependent chromatin remodeling complexes SWI/SNF (switching of mating type/sucrose non-fermenting, chromatin remodeling complex), which have been shown to be involved in the regulation of both transcriptional activation and repression.

model of nuclear receptor (NR) corepressor complex assembly at a target gene

Model for NR interactions with corepressors: An example of the transcription corepressor complexes associated with both the RXR and RAR heterodimeric transcription factor complex at an HRE, and several basal transcription factors associated with RNA pol II at a target gene transcriptional start site. The presence of histone deacetylases (e.g. HDAC3) leads to removal of any chromatin activating histone acetylation sites causing formation of transcriptionally repressed chromatin structure.

back to the top

Structural Motifs in Eukaryotic Transcription Factors

Homeodomain: The homeodomain is a highly conserved domain of 60 amino acids found in a large family of transcription factors. This family was first identified in Drosophila as a group of genes that, when altered, would cause transformations of one body part for another (e.g. legs for antenna), so called homeotic transformations. This class of genes has been identified in both invertebrate and vertebrate organisms. The homeodomain itself forms a structure highly similar to the bacterial helix-turn-helix proteins. The principal function of all homeodomain containing proteins is in the establishment of pattern in an organism such as that of the spinal column in vertebrates.

POU Domain: The POU domain is a domain that is a hybrid between a domain related to the homeodomain and an POU-specific domain. The term POU was derived from the names of the first 3 factors shown to have a region of similarity, Pit-1 (a pituitary-specific transcription factor), Oct-1 (an octamer binding protein first shown to regulate immunoglobulin gene transcription) and unc-86 (a nematode gene).

Helix-Loop-Helix (HLH): The HLH domain is involved in protein dimerization. The HLH motif is composed of two regions of α-helix separated by a region of variable length which forms a loop between the 2 α-helices. This motif is quite similar to the Helix-turn-helix motif found in several prokaryotic transcription factors such as the CRP protein involved in the regulation of the lac operon. The α-helical domains are structurally similar and are necessary for protein interaction with sequence elements that exhibit a twofold axis of symmetry. This class of transcription factor most often contains a region of basic amino acids located on the N-terminal side of the HLH domain (termed bHLH proteins) that is necessary in order for the protein to bind DNA at specific sequences. The HLH domain is necessary for homo- and heterodimerization. Examples of bHLH proteins include MyoD (a myogenesis inducing transcription factor) and MYC (originally identified as a retroviral oncogene). Several HLH proteins that do not contain the basic region act as repressors because of this lack. These HLH proteins repress the activity of other bHLH proteins by forming heterodimers with them and preventing DNA binding.

Zinc Fingers: The zinc finger domain is a DNA-binding motif consisting of specific spacings of cysteine and histidine residues that allow the protein to bind zinc atoms. The metal atom coordinates the sequences around the cysteine and histidine residues into a finger-like domain. The finger domains can interdigitate into the major groove of the DNA helix. The spacing of the zinc finger domain in this class of transcription factor coincides with a half-turn of the double helix. The classic example is the RNA pol III transcription factor, TFIIIA. Proteins of the steroid/thyroid hormone family of transcription factors also contain zinc fingers.

Leucine Zipper: The leucine zipper domain is necessary for protein dimerization. It is a motif generated by a repeating distribution of leucine residues spaced 7 amino acids apart within α-helical regions of the protein. These leucine residues end up with their R-groups protruding from the α-helical domain in which the leucine residues reside. The protruding R-groups are thought to interdigitate with leucine R groups of another leucine zipper domain, thus stabilizing homo- or heterodimerization. The leucine zipper domain is present in many DNA-binding proteins, such as MYC, and C/EBP.

Winged Helix: The winged helix is a DNA-binding motif composed of an α/β structure. This structure contains 3 N-terminal α-helices and a 3-stranded antiparallel β-sheet. The folding of the β-sheet region about the α-helices give the appearance of wings on the helices, hence the term winged-helix. This motif was first identified in the transcription factor HNF-3γ. HNF-3γ is a member of a large family of transcription factors that are related to the Drosophila gene forkhead, hence the gene family is termed the fork head (FKH) family. The nomenclature of the fork head family of transcription factors has been changed so that all members have names that initiate with Fox.

back to the top

Table of Representative Transcription Factors

Factor Sequence Motif Comments
MYC and MAX CACGTG MYC first identified as retroviral oncogene; MAX specifically associates with MYC in cells
FOS and JUN TGAC/GTC/AA both first identified as retroviral oncogenes; associate in cells, also known as the factor AP-1
CREB TGACGC/TC/AG/A binds to the cAMP response element (CRE); family of at least 10 factors resulting from different genes or alternative splicing; can form dimers with JUN
ERBA; also TR (thyroid hormone receptor) GTGTCAAAGGTCA first identified as retroviral oncogene; member of the steroid/thyroid hormone receptor superfamily; binds thyroid hormone
ETS G/CA/CGGAA/TGT/C first identified as retroviral oncogene; predominates in B- and T-cells
GATA T/AGATA family of erythroid cell-specific factors, GATA-1 to -6
MYB T/CAACG/TG first identified as retroviral oncogene; hematopoietic cell-specific factor
MYOD CAACTGAC master control of muscle cell differentiation
NFκB and REL GGGAA/CTNT/CCC(1) both factors identified independently; REL first identified as retroviral oncogene; predominate in B- and T-cells
RAR (retinoic acid receptor) ACGTCATGACCT binds to elements termed RAREs (retinoic acid response elements) also binds to JUN/FOS site
SRF (serum response factor) GGATGTCCATATTAGGACATCT exists in many genes that are inducible by the growth factors present in serum

The list is only representative of the hundreds of identified factors, some emphasis is placed on several factors that exhibit oncogenic potential.

(1) N signifies that any base can occupy that position.

back to the top

Small non-Coding RNAs (sncRNA) and Post-Transcriptional Regulation

As recently as 15 years ago it was believed that the only non-coding RNAs were the tRNAs and the rRNAs of the translational machinery. However, in a landmark study published in 1993 on the control of developmental timing in the roundworm Caenorhabditis elegans it was shown that the control of one gene was exerted by the small non-coding RNA (sncRNA) product of another gene. This regulatory gene is identified as lin-4 (lin-4 controls the activity of the lin-14 gene product) and it codes for two RNAs, one is approximately 22 nucleotides (nt) and the other is approximately 61 nt. Examination of the sequences of the larger RNA revealed that it could form a stem-loop structure which then serves as the precursor for the shorter RNA. The shorter lin-4 RNA is considered the founding member of class of small non-coding regulatory RNAs called microRNAs or miRNAs that consist of approximately 22 nt. It is predicted that at least 250 miRNA genes are present in the human genome.

The processing and functioning of miRNAs is similar to that of the RNA silencing pathway identified in plants known as the post-transcriptional gene silencing (PTGS) pathway and the RNA inhibitory (RNAi) pathway in mammals. The RNAi pathway involves the enzymatic processing of double-stranded RNA into small interfering RNAs (siRNAs) of approximately 22–25 nt that may have evolved as a means to degrade the RNA genomes of RNA viruses such as retroviruses. The pathway of processing both miRNAs and siRNAs in diagrammed in the Figure below. The stem-loop of the primary miRNA gene transcript (pri-miRNA) is first cleaved through the action of the RNase III-related activity called Drosha which takes place in the nucleus and generates the precursor miRNA (pre-miRNA). In the siRNA pathway the duplex RNAs are cleaved into 22–25 nt pieces through the action of the enzyme Dicer in the cytosol. Processed miRNA stem-loop structures are transported from the nucleus to the cytosol via the activity of exportin5. In the cytosol the processed miRNA stem-loop is targeted by Dicer which removes the loop portion. The nomenclature of the mature miRNA duplex is miRNA:miRNA*, where the miRNA* strand is the non-functional half of the duplex. Ultimately, fully processed miRNAs and siRNAs interact with proteins of the Argonaut family (e.g. AGO2) and then engaged by the RNA-induced silencing complex (RISC) which separates the two RNA strands. The active strand of RNA, derived either from the miRNA or siRNA pathway, is anti-sense to a region of the target mRNA. The miRNA (or siRNA) is brought to the target mRNA through the actions of the RISC in complex with Argonaut proteins. The RISC then initiates degradation of the target mRNA while also interfering with translation. The end results is reduction in protein synthesis from the targeted mRNA.

Model of the processing of miRNAs and siRNAs

Model for processing miRNAs and siRNAs. miRNA genes are transcribed as larger precursor RNAs that are then processed via the action of the Drosha enzyme, within the nucleus, to a pre-miRNA. The pre-miRNA is then transported to the cytosol. Within the cytosol the pre-miRNA is further processed via the actions of the Dicer complex and an RNA helicase to the functional single-stranded functional miRNA. The miRNA is recognized by proteins pof the Argonaut family (engaged by the RISC complex and associates with the appropriate target mRNA. Following mRNA-miRNA interaction the mRNA is degraded as well as being translationally inhibited. The net result is a reduction (knock down) in gene expression at the level of a given mRNA and protein. AGO2 is the primary human Argonaut family protein involved in the miRNA and siRNA processes.

Two models exist for how siRNAs and miRNAs interfere with the expression of target genes. These models include directed degradation of the target mRNA or interference with the translation of a target mRNA. In the case of miRNA-directed mRNA degradation the proposed model involves the complimentary interaction of the miRNA with the mRNA and then the recruitment of the RISC which ultimately leads to degradation of the target mRNA. In the translation repression model it is believed that either the interaction of the miRNA and the RISC with the mRNA inhibits the progression of the ribosomal machinery along the mRNA without leading to mRNA degradation. This latter model was hypothesized because in the example of lin-4 the amount of lin-14 mRNA does not decrease but the protein product of the lin-14 mRNA is reduced.

Regardless of the mechanism of action the effect is post-transcriptional regulation of gene expression. To date numerous examples of miRNA-mediated gene regulation have been identified in development, cell survival and metabolic pathways. In addition, the involvement of miRNA processes in human disease have been elucidated or inferred. In the case of cancer it is speculated that some miRNAs can be classified as tumor suppressors since the loss of their activity is associated with cancer progression. A role for miRNAs in neurodegenerative diseases is also suggested by the example of the fragile X syndrome. Fragile X syndrome is caused by expansion of a trinucleotide repeat in the FMR1 gene and the product of the FMR1 gene, FMRP, is an RNA-binding protein that associates with miRNAs.

A comprehensive database of miRNA genes and miRNA targets can be found at the miRBase site.

back to the top
Return to The Medical Biochemistry Page
Michael W King, PhD | © 1996–2017, LLC | info @

Last modified: September 28, 2017