Regulation of Gene Expression

Return to The Medical Biochemistry Page

© 1996–2016, LLC | info @


The controls that act on gene expression (i.e., the ability of a gene to produce a biologically active protein) are much more complex in eukaryotes than in prokaryotes. A major difference is the presence in eukaryotes of a nuclear membrane, which prevents the simultaneous transcription and translation that occurs in prokaryotes. Whereas, in prokaryotes, control of transcriptional initiation is the major point of regulation, in eukaryotes the regulation of gene expression is controlled nearly equivalently from many different points.












back to the top

Gene Control in Prokaryotes

In bacteria, genes are clustered into operons: gene clusters that encode the proteins necessary to perform coordinated function, such as biosynthesis of a given amino acid. RNA that is transcribed from prokaryotic operons is polycistronic a term implying that multiple proteins are encoded in a single transcript.

In bacteria, control of the rate of transcriptional initiation is the predominant site for control of gene expression. As with the majority of prokaryotic genes, initiation is controlled by two DNA sequence elements that are approximately 35 bases and 10 bases, respectively, upstream of the site of transcriptional initiation and as such are identified as the -35 and -10 positions. These 2 sequence elements are termed promoter sequences, because they promote recognition of transcriptional start sites by RNA polymerase. The consensus sequence for the -35 position is TTGACA, and for the -10 position, TATAAT. (The -10 position is also known as the Pribnow-box.) These promoter sequences are recognized and contacted by RNA polymerase.

The activity of RNA polymerase at a given promoter is in turn regulated by interaction with accessory proteins, which affect its ability to recognize start sites. These regulatory proteins can act both positively (activators) and negatively (repressors). The accessibility of promoter regions of prokaryotic DNA is in many cases regulated by the interaction of proteins with sequences termed operators. The operator region is adjacent to the promoter elements in most operons and in most cases the sequences of the operator bind a repressor protein. However, there are several operons in E. coli that contain overlapping sequence elements, one that binds a repressor and one that binds an activator.

As indicated above, prokaryotic genes that encode the proteins necessary to perform coordinated function are clustered into operons. Two major modes of transcriptional regulation function in bacteria (E. coli) to control the expression of operons. Both mechanisms involve repressor proteins. One mode of regulation is exerted upon operons that produce gene products necessary for the utilization of energy; these are catabolite-regulated operons. The other mode regulates operons that produce gene products necessary for the synthesis of small biomolecules such as amino acids. Expression from the latter class of operons is attenuated by sequences within the transcribed RNA.

A classic example of a catabolite-regulated operon is the lac operon, responsible for obtaining energy from β-galactosides such as lactose. A classic example of an attenuated operon is the trp operon, responsible for the biosynthesis of tryptophan.

back to the top

The lac Operon

The lac operon (see diagram below) consists of one regulatory gene (the i gene) and three structural genes (z, y, and a). The i gene codes for the repressor of the lac operon. The z gene codes for β-galactosidase (β-gal), which is primarily responsible for the hydrolysis of the disaccharide, lactose into its monomeric units, galactose and glucose. The y gene codes for permease, which increases permeability of the cell to β-galactosides. The a gene encodes a transacetylase. During normal growth on a glucose-based medium, the lac repressor is bound to the operator region of the lac operon, preventing transcription. However, in the presence of an inducer of the lac operon, the repressor protein binds the inducer and is rendered incapable of interacting with the operator region of the operon. RNA polymerase is thus able to bind at the promoter region, and transcription of the operon ensues. The lac operon is repressed, even in the presence of lactose, if glucose is also present. This repression is maintained until the glucose supply is exhausted. The repression of the lac operon under these conditions is termed catabolite repression and is a result of the low levels of cAMP that result from an adequate glucose supply. The repression of the lac operon is relieved in the presence of glucose if excess cAMP is added. As the level of glucose in the medium falls, the level of cAMP increases. Simultaneously there is an increase in inducer binding to the lac repressor. The net result is an increase in transcription from the operon. The ability of cAMP to activate expression from the lac operon results from an interaction of cAMP with a protein termed CRP (for cAMP receptor protein). The protein is also called CAP (for catabolite activator protein). The cAMP-CRP complex binds to a region of the lac operon just upstream of the region bound by RNA polymerase and that somewhat overlaps that of the repressor binding site of the operator region. The binding of the cAMP-CRP complex to the lac operon stimulates RNA polymerase activity 20-to-50-fold.

The lac operon of E.coli

Regulation of the lac operon in E. coli. The repressor of the operon is synthesized from the i gene. The repressor protein binds to the operator region of the operon and prevents RNA polymerase from transcribing the operon. In the presence of an inducer (such as the natural inducer, allolactose) the repressor is inactivated by interaction with the inducer. This allows RNA polymerase access to the operon and transcription proceeds. The resultant mRNA encodes the β-galactosidase, permease and transacetylase activities necessary for utilization of β-galactosides (such as lactose) as an energy source. The lac operon is additionally regulated through binding of the cAMP-receptor protein, CRP (also termed the catabolite activator protein, CAP) to sequences near the promoter domain of the operon. The result is a 50 fold enhancement of polymerase activity.

back to the top

The trp Operon

The trp operon (see diagram below) encodes the genes for the synthesis of tryptophan. This cluster of genes, like the lac operon, is regulated by a repressor that binds to the operator sequences. The activity of the trp repressor for binding the operator region is enhanced when it binds tryptophan; in this capacity, tryptophan is known as a corepressor. Since the activity of the trp repressor is enhanced in the presence of tryptophan, the rate of expression of the trp operon is graded in response to the level of tryptophan in the cell.

Expression of the trp operon is also regulated by attenuation. The attenuator region, which is composed of sequences found within the transcribed RNA, is involved in controlling transcription from the operon after RNA polymerase has initiated synthesis. The attenuator of sequences of the RNA are found near the 5' end of the RNA termed the leader region of the RNA. The leader sequences are located prior to the start of the coding region for the first gene of the operon (the trpE gene). The attenuator region contains codons for a small leader polypeptide, that contains tandem tryptophan codons. This region of the RNA is also capable of forming several different stable stem-loop structures.

Depending on the level of tryptophan in the cell and hence the level of charged trp-tRNAs, the position of ribosomes on the leader polypeptide and the rate at which they are translating allows different stem-loops to form. If tryptophan is abundant, the ribosome prevents stem-loop 1-2 from forming and thereby favors stem-loop 3-4. The latter is found near a region rich in uracil and acts as the transcriptional terminator loop as described in the RNA synthesis page. Consequently, RNA polymerase is dislodged from the template.

The operons coding for genes necessary for the synthesis of a number of other amino acids are also regulated by this attenuation mechanism. It should be clear, however, that this type of transcriptional regulation is not feasible for eukaryotic cells.

The trp operon of E.coli

Regulation of the trp operon in E. coli. The trp operon is controlled by both a repressor protein binding to the operator region as well as by translation-induced transcriptional attenuation. The trp repressor binds the operator region of the trp operon only when bound to tryptophan. This makes tryptophan a co-repressor of the operon. The trpL gene encodes a non-functional leader peptide which contains several adjacent trp codons. The tructural genes of the operon responsible for tryptophan biosynthesis are trpE, D, C, B and A. When trptophan level are high some binds to the repressor which then binds to the operator region and inhibits transcription. The mechanism of attenuation of the trp operon is diagrammed below.

Attenuation of the trp operon of E.coli

Attenuation of the trp operon. The attenuation region of the trp operon contains sequences that allow the resulting mRNA to form several different stem-loop structures. These regions are identified as 1 through 4. The stem-loops that are significant as to whether transcription is attenuated or not are formed between regions 2 and 3 or between regions 3 and 4. When tryptophan levels are high there is plenty of charged trp-tRNAs available and ribosomes translating the leader peptide encoded by the trpL gene do not stall at the repeated trp codons in the leader peptide. Under these conditions the ribosomes rapidly cover regions 1 and 2 of the mRNA which allows the stem-loop composed of regions 3 and 4 to form. The stem-loop formed by regions 3-4 results in a transcriptional termination structure and transcription of the trp operon ceases, i.e. is attenuated. Conversely, when tryptophan levels are low the level of charged trp-tRNAs will also be low. This leads to a stalling of the ribosomes within the leader peptide when they encounter the trp codon repeats. The ribosome stalls over region 1 of the mRNA which allows step-loop 2-3 to form and prevents the transcriptional termination stem-loop 3-4 from forming. The inability of this structure to form allows the entire operon to be transcribed and the tryptophan biosynthetic enzymes to be produced.

back to the top

Gene Control in Eukaryotes

In eukaryotic cells, the ability to express biologically active proteins comes under regulation at several points:

1. Chromatin Structure: The physical structure of the DNA, as it exists compacted into chromatin, can affect the ability of transcriptional regulatory proteins (termed transcription factors) and RNA polymerases to find access to specific genes and to activate transcription from them. The presence modifications of the histones and of CpG methylation most affect accessibility of the chromatin to RNA polymerases and transcription factors.

2. Epigenetic Control: Epigenesis refers to changes in the pattern of gene expression that are not due to changes in the nucleotide composition of the genome. Literally "epi" means "on" thus, epigenetics means "on" the gene as opposed to "by" the gene.

3. Transcriptional Initiation: This is the most important mode for control of eukaryotic gene expression (see below for more details). Specific factors that exert control include the strength of promoter elements within the DNA sequences of a given gene, the presence or absence of enhancer sequences (which enhance the activity of RNA polymerase at a given promoter by binding specific transcription factors), and the interaction between multiple activator proteins and inhibitor proteins.

4. Transcript Processing and Modification: Eukaryotic mRNAs must be capped and polyadenylated, and the introns must be accurately removed (see RNA Synthesis Page). Several genes have been identified that undergo tissue-specific patterns of alternative splicing, which generate biologically different proteins from the same gene.

5. RNA Transport: A fully processed mRNA must leave the nucleus in order to be translated into protein.

6. Transcript Stability: Unlike prokaryotic mRNAs, whose half-lives are all in the range of 1 to 5 minutes, eukaryotic mRNAs can vary greatly in their stability. Certain unstable transcripts have sequences (predominately, but not exclusively, in the 3'-non-translated regions) that are signals for rapid degradation.

7. Translational Initiation: Since many mRNAs have multiple methionine codons, the ability of ribosomes to recognize and initiate synthesis from the correct AUG codon can affect the expression of a gene product. Several examples have emerged demonstrating that some eukaryotic proteins initiate at non-AUG codons. This phenomenon has been known to occur in E. coli for quite some time, but only recently has it been observed in eukaryotic mRNAs.

8. Small RNAs and Control of Transcript Levels: Within the past several years a new model of gene regulation has emerged that involves control exerted by small non-coding RNAs. This small RNA-mediated control can be exerted either at the level of the translatability of the mRNA, the stability of the mRNA or via changes in chromatin structure.

9. Post-Translational Modification: Common modifications include glycosylation, acetylation, fatty acylation, disulfide bond formations, etc.

10. Protein Transport: In order for proteins to be biologically active following translation and processing, they must be transported to their site of action.

11. Control of Protein Stability: Many proteins are rapidly degraded, whereas others are highly stable. Specific amino acid sequences in some proteins have been shown to bring about rapid degradation.

back to the top

Chromatin Structure and Control of Gene Expression

In a broad consideration of chromatin structure there are two forms: heterochromatin and euchromatin which were originally designated based on cytological observations of how darkly the two regions were stained. Heterochromatin is more densely packed than euchromatin and is often found near the centromeres of the chromosomes. Heterochromatin is generally transcriptionally silent. Euchromatin on the other hand is more loosely packed and is where active gene transcription will be found to be taking place.

Although it is possible to predict transcriptionally active regions of chromatin based on cytological assays, research over the past few decades have begun to define the molecular basis for chromatin structure in the context of regulation of gene expression. Two primary mechanisms exist that alter chromatin structure and as a consequence effect alterations in gene expression. These mechanisms are methylation of cytidine residues in the DNA that are found in the dinucleotide, CG (most often written as a CpG dinucleotide) and histone modification. Methylation as a modification of DNA was addressed in the DNA Metabolism page, however, here discussion will expand to define how this modification alters the pattern of gene expression.

When determining which C residues in DNA are targets for methylation it was discovered that greater than 90% of 5-methyl-C (5mC) is found in the dinucleotide, CpG. This is not to say that all CpG dinucleotides contain a methylated C residue. When examining the structure of eukaryotic genes and identifying regions of CpG dinucleotides it is the case that the promoter regions of genes contain 10-20 times as many CpGs when compared to the rest of the genome. In a general sense what is known about DNA methylation and transcriptional status is that when regions of a gene that can be methylated are, the associated gene is transcriptionally silent and when the region is under methylated the gene is transcriptionally active or can be activated. When cells undergo differentiation it has been observed that genes that become transcriptionally activated exhibit a reduction in methylation status relative to the level prior to activation and that this under-methylation remains even after transcription ceases. The role of DNA methylation in controlling transcriptionally activity was first demonstrated by treating cells in culture with the cytidine analog, 5-azacytidine (5-azaC) which has a nitrogen at position 5 of the pyrimidine ring instead of a carbon and thus cannot serve as a substrate for methylation. When fibroblasts were grown in the presence of 5-azaC they differentiated into myoblasts. This differentiation was shown to be the result of under-methylation and activation of the MyoD gene (a master regulator of muscle differentiation).

The methylation of DNA is catalyzed by several different DNA methyltransferases (abbreviated Dnmt). The critical role of DNA methylation in controlling developmental fates was demonstrated in mice by inactivating either Dnmt3a or Dnmt3b. Loss of either gene resulted in death shortly after birth. When cells divide the DNA contains one strand of parental DNA and one strand of the newly replicated DNA (the daughter strand). If the DNA contains methylated cytidines in CpG dinucleotides the daughter strand must undergo methylation in order to maintain the parental pattern of methylation. This "maintenance" methylation is catalyzed by Dnmt1 and thus, this enzyme is called the maintenance methylase.

process of post-replicative DNA methylation

Process of DNA methylation following DNA replication. Sites of DNA methylation have two fates following the process of DNA replication: they can be maintained or they can be progressively removed. Following replication the parental (template) strands of DNA contain 5mCpG, whereas the reciprocal C residue in the daughter strand is not methylated. If the methylation state of the gene is to be maintained then the maintenance methylase, Dnmt1, recognizes the hemi-methylated site and incorporates a methyl group into the C residue of the daughter strand CpG dinucleotide.

The correlation between DNA methylation and chromatin structure as it relates to transcriptional activity is demonstrated by the observation that there are several proteins that bind to methylated CpGs but not to unmethylated CpGs. One such protein is MeCP2 (methyl CP binding protein 2). When MeCP2 binds to methylated CpG dinucleotides the DNA takes on a closed chromatin structure and leads to transcriptional repression. The ability of MeCP2 to bind methylated CpGs is in turn controlled by its' state of phosphorylation. When MeCP2 is phosphorylated it binds with less affinity and the DNA acquires a more open chromatin state. The importance of MeCP2 in regulating chromatin structure and consequently transcription is demonstrated by the fact that deficiencies in this protein result in the Rett syndrome. Rett syndrome is a neurodevelopmental disorder that occurs almost exclusively in females manifesting as mental retardation, seizures, microcephaly, arrested development, and loss of speech.

As described in the DNA Metabolism page, histone proteins are subject a number of modifications and these modifications are known to affect the structure of chromatin. Histone acetylation is known to result in a more open chromatin structure and these modified histones are found in regions of the chromatin that are transcriptionally active. Conversely, underacetylation of histones is associated with closed chromatin and transcriptional inactivity. A direct correlation between histone acetylation and transcriptional activity was demonstrated when it was discovered that protein complexes, previously known to be transcriptional activators, were found to have histone acetylase activity. And as expected, transcriptional repressor complexes were found to contain histone deacetylase activity. Linkage between DNA methylation and transcriptional silencing was demonstrated be the observation that proteins that bind to methyl CpG dinucleotides can recruit histone deacetylases to the DNA. Proteins are known to interact with acetylated lysines in histones that together lead to a more open chromatin structure. Proteins that bind to acetylated histones contain a domain called a bromodomain. The bromodomain is composed of a bundle of four α-helices and is a domain involved in protein-protein interactions in a number of cellular systems in addition to acetylated histone binding and chromatin structure modification.

Another histone modification known to affect chromatin structure is methylation. However, with histone methylation there is not a direct correlation between the modification and a specific effect on transcription. The methylation of histone H4 on R4 (arginine at position 4) promotes an open chromatin structure and thereby, leads to transcriptional activation. Methylation of histone H3 on K4 and K79 (lysines 4 and 79) has been shown to act similarly to histone H4 R4 methylation. However, methylation of histone H3 on K9 and K27 is known to be associated with transcriptionally inactive genes. The methylation of histones provides a site for the binding of other proteins which then leads to alteration of chromatin structure to a more compacted state. Proteins that bind to methylated lysines present in histones (as well as other proteins) contain a domain called chromodomain. The chromodomain consists of a conserved stretch of 40-50 amino acids and is found in many proteins involved in chromatin remodeling complexes. In addition, chromodomain proteins are found in the RNA-induced transcriptional silencing (RITS) complex which involves small interfering RNA (siRNA) and microRNA (miRNA)-medicated downregulation of transcription (see the below).

Histone proteins can also be modified by addition of the small protein ubiquitin. With respect to the histones, uiquitin is found only on histones H2A and H2B and only a small percentage of histone H2A is found ubiquitinated. However, when ubiquitinated, H2A is associated with repression of transcription. The exact opposite effect is observed when histone H2B is ubiquitinated, leading to a stimulation of gene activity. The reason that ubiquitinated histone H2B is associated with transcriptional activity is that this modification promotes the methylation of histone H3 at K4 and K79, which as indicated above is associated with open chromatin structure.

Phosphorylation of histones occurs primarily in response to outside signals such as growth factor stimulation or stress inducers such as heat shock. Phosphorylated histone are localized to genes that become transcriptionally active as a consequence of these outside signals. The importance of histone phosphorylation in control of gene expression can be demonstrated in patients with Coffin-Lowry syndrome. This disease results from defects in the RSK2 gene which encodes the histone phosphorylating enzyme. Coffin-Lowry syndrome is a rare form of X-linked mental retardation characterized by skeletal malformations, growth retardation, hearing deficit, paroxysmal movement disorders, and cognitive impairment in affected males.

back to the top

Epigenetic Control of Gene Expression

The term epigenetics was first coined by Conrad Waddington in 1939 to define the unfolding of the genetic program during development. In addition, he coined the term epigenotype to define "the total developmental system consisting of interrelated developmental pathways through which the adult form of an organism is realized". Clearly this definition encompasses a broad range of concepts dealing with genetics, inheritance and development. Today the term epigenetics is used to define the mechanism by which changes in the pattern of inherited gene expression occur in the absence of alterations or changes in the nucleotide composition of a given gene. A literal interpretation is that epigenetics mean "in addition to changes in genome sequence." The easiest way to understand this concept is to think about the fertilized egg: at the moment of fertilization that single cell is totipotent, i.e. as it divides the daughter cells ultimately differentiate into all the different cells of the organism. The only difference between the various cells of the resultant organism are the consequences of differential gene expression, not due to differences in the sequences of the genes themselves. Evidence indicates that most of the epigenetic modifications are erased during gametogenesis and/or following fertilization.

Several different types of epigenetic events have been identified. As described in the section above relating to chromatin structure as a means to control gene expression and the role of DNA methylation in these structural changes, DNA methylation is likely to be the most important epigenetic event controlling and importantly maintaining the pattern of gene expression during development. Other DNA modification events are also known to effect epigenetic phenomena including acetylation, methylation phosphorylation, ubiquitylation and sumoylation of histone proteins. Thus, it should be clear that the same events that affect chromatin structure can be defined as epigenetic events. An additional process that affects chromatin structure and therefore gene expression is considered an epigenetic event and this involves the small interfering RNAs (siRNAs) described below.

Whereas, epigenesis plays a vital role in the regulation, control, and maintenance of gene expression leading to the many differentiation states of cells in an organism, recent evidence has identified a linkage between epigenetic processes and disease. Most significant is the link between epigenesis and cancer which has been suggested to be a contributing factor in nearly half of all cancers. A clear demonstration has been made between changes in the methylation status of tumor suppressor genes and the development of many types of cancers. Epigenetic effects on immune system function have also been identified. In addition, there is evidence suggesting a link between epigenetic processes and mental health.

back to the top

Control of Eukaryotic Transcription Initiation

Transcription of the different classes of RNAs in eukaryotes is carried out by three different polymerases (see RNA Synthesis Page). RNA pol I synthesizes the rRNAs, except for the 5S species. RNA pol II synthesizes the mRNAs and some small nuclear RNAs (snRNAs) involved in RNA splicing. RNA pol III synthesizes the 5S rRNA and the tRNAs. The vast majority of eukaryotic RNAs are subjected to post-transcriptional processing.

The most complex controls observed in eukaryotic genes are those that regulate the expression of RNA pol II-transcribed genes, the mRNA genes. Almost all eukaryotic mRNA genes contain a basic structure consisting of coding exons and non-coding introns and basal promoters of two types and any number of different transcriptional regulatory domains (see diagrams below). The basal promoter elements are termed CCAAT-boxes and TATA-boxes because of their sequence motifs. The TATA-box resides 20 to 30 bases upstream of the transcriptional start site and is similar in sequence to the prokaryotic Pribnow-box (consensus TATAT/AAT/A, where T/A indicates that either base may be found at that position).

Structure of a typical eukaryotic mRNA gene

Typical structure of a eukaryotic mRNA gene. Eukaryotic mRNA genes have the general regulatory structure composed of a the two basal promoter elements, the TATA-box and the CCAAT-box. In addition there may be one or more enhancer elements associated with the regulatory region of the gene.

Numerous proteins identified as TFIIA, B, C, etc. (for transcription factors regulating RNA pol II), have been observed to interact with the TATA-box. The CCAAT-box (consensus GGT/CCAATCT) resides 50 to 130 bases upstream of the transcriptional start site. The protein identified as C/EBP (for CCAAT-box/Enhancer Binding Protein) binds to the CCAAT-box element.

There are many other regulatory sequences in mRNA genes, as well, that bind various transcription factors (see diagram below). Theses regulatory sequences are predominantly located upstream (5') of the transcription initiation site, although some elements occur downstream (3') or even within the genes themselves. The number and type of regulatory elements to be found varies with each mRNA gene. Different combinations of transcription factors also can exert differential regulatory effects upon transcriptional initiation. The various cell types each express characteristic combinations of transcription factors; this is the major mechanism for cell-type specificity in the regulation of mRNA gene expression.

Structure of the regulatory regions of atypical eukaryotic mRNA gene

Structure of the upstream region of a typical eukaryotic mRNA gene. The diagram indicates the TATA-box and CCAAT-box basal elements reside near nucleotide positions –25 and –100, respectively. The transcription factor TFIID has been shown to be the TATA-box binding protein, TBP. Several additional transcription factor binding sites have been included and shown to reside upstream of the 2 basal elements and of the transcriptional start site. The location and order of the variously indicated transcription factor-binding sites is only diagrammatic and not indicative as being typical of all eukaryotic mRNA genes. There exists a vast array of different transcription factors that regulate the transcription of all 3 classes of eukaryotic gene encoding the mRNAs, tRNAs and rRNAs. CREB: cAMP response element binding protein. C/EBP: CCAAT-box/enhancer binding protein.

back to the top

Structural Motifs in Eukaryotic Transcription Factors

Homeodomain: The homeodomain is a highly conserved domain of 60 amino acids found in a large family of transcription factors. This family was first identified in Drosophila as a group of genes that, when altered, would cause transformations of one body part for another (eg legs for antenna), so called homeotic transformations. This class of genes has been identified in both invertebrate and vertebrate organisms. The homeodomain itself forms a structure highly similar to the bacterial helix-turn-helix proteins. The principal function of all homeodomain containing proteins is in the establishment of pattern in an organism such as that of the spinal column in vertebrates.

POU Domain: The POU domain is a domain that is a hybrid between a domain related to the homeodomain and an POU-specific domain. The term POU was derived from the names of the first 3 factors shown to have a region of similarity, Pit-1 (a pituitary-specific transcription factor), Oct-1 (an octamer binding protein first shown to regulate immunoglobulin gene transcription) and unc-86 (a nematode gene).

Helix-Loop-Helix (HLH): The HLH domain is involved in protein dimerization. The HLH motif is composed of two regions of α-helix separated by a region of variable length which forms a loop between the 2 α-helices. This motif is quite similar to the Helix-turn-helix motif found in several prokaryotic transcription factors such as the CRP protein involved in the regulation of the lac operon. The α-helical domains are structurally similar and are necessary for protein interaction with sequence elements that exhibit a twofold axis of symmetry. This class of transcription factor most often contains a region of basic amino acids located on the N-terminal side of the HLH domain (termed bHLH proteins) that is necessary in order for the protein to bind DNA at specific sequences. The HLH domain is necessary for homo- and heterodimerization. Examples of bHLH proteins include MyoD (a myogenesis inducing transcription factor) and MYC (originally identified as a retroviral oncogene). Several HLH proteins that do not contain the basic region act as repressors because of this lack. These HLH proteins repress the activity of other bHLH proteins by forming heterodimers with them and preventing DNA binding.

Zinc Fingers: The zinc finger domain is a DNA-binding motif consisting of specific spacings of cysteine and histidine residues that allow the protein to bind zinc atoms. The metal atom coordinates the sequences around the cysteine and histidine residues into a finger-like domain. The finger domains can interdigitate into the major groove of the DNA helix. The spacing of the zinc finger domain in this class of transcription factor coincides with a half-turn of the double helix. The classic example is the RNA pol III transcription factor, TFIIIA. Proteins of the steroid/thyroid hormone family of transcription factors also contain zinc fingers.

Leucine Zipper: The leucine zipper domain is necessary for protein dimerization. It is a motif generated by a repeating distribution of leucine residues spaced 7 amino acids apart within α-helical regions of the protein. These leucine residues end up with their R-groups protruding from the α-helical domain in which the leucine residues reside. The protruding R-groups are thought to interdigitate with leucine R groups of another leucine zipper domain, thus stabilizing homo- or heterodimerization. The leucine zipper domain is present in many DNA-binding proteins, such as MYC, and C/EBP.

Winged Helix: The winged helix is a DNA-binding motif composed of an α/β structure. This structure contains 3 N-terminal α-helices and a 3-stranded antiparallel β-sheet. The folding of the β-sheet region about the α-helices give the appearance of wings on the helices, hence the term winged-helix. This motif was first identified in the transcription factor HNF-3γ. HNF-3γ is a member of a large family of transcription factors that are related to the Drosophila gene forkhead, hence the gene family is termed the fork head (FKH) family. The nomenclature of the fork head family of transcription factors has been changed so that all members have names that initiate with Fox.

back to the top

Table of Representative Transcription Factors

Factor Sequence Motif Comments
MYC and MAX CACGTG MYC first identified as retroviral oncogene; MAX specifically associates with MYC in cells
FOS and JUN TGAC/GTC/AA both first identified as retroviral oncogenes; associate in cells, also known as the factor AP-1
CREB TGACGC/TC/AG/A binds to the cAMP response element (CRE); family of at least 10 factors resulting from different genes or alternative splicing; can form dimers with JUN
ERBA; also TR (thyroid hormone receptor) GTGTCAAAGGTCA first identified as retroviral oncogene; member of the steroid/thyroid hormone receptor superfamily; binds thyroid hormone
ETS G/CA/CGGAA/TGT/C first identified as retroviral oncogene; predominates in B- and T-cells
GATA T/AGATA family of erythroid cell-specific factors, GATA-1 to -6
MYB T/CAACG/TG first identified as retroviral oncogene; hematopoietic cell-specific factor
MYOD CAACTGAC master control of muscle cell differentiation
NFκB and REL GGGAA/CTNT/CCC(1) both factors identified independently; REL first identified as retroviral oncogene; predominate in B- and T-cells
RAR (retinoic acid receptor) ACGTCATGACCT binds to elements termed RAREs (retinoic acid response elements) also binds to JUN/FOS site
SRF (serum response factor) GGATGTCCATATTAGGACATCT exists in many genes that are inducible by the growth factors present in serum

The list is only representative of the hundreds of identified factors, some emphasis is placed on several factors that exhibit oncogenic potential.

(1) N signifies that any base can occupy that position.

back to the top

Small RNAs and Post-Transcriptional Regulation

As recently as 15 years ago it was believed that the only non-coding RNAs were the tRNAs and the rRNAs of the translational machinery. However, in a landmark study published in 1993 on the control of developmental timing in the roundworm Caenorhabditis elegans it was shown that the control of one gene was exerted by the small non-coding RNA product of another gene. This regulatory gene is identified as lin-4 (lin-4 controls the activity of the lin-14 gene product) and it codes for two RNAs, one is approximately 22 nucleotides (nt) and the other is approximately 61 nt. Examination of the sequences of the larger RNA revealed that it could form a stem-loop structure which then serves as the precursor for the shorter RNA. The shorter lin-4 RNA is considered the founding member of class of small regulatory RNAs called microRNAs or miRNAs that consist of approximately 22 nt. It is predicted that at least 250 miRNA genes are present in the human genome.

The processing and functioning of miRNAs is similar to that of the RNA silencing pathway identified in plants known as the posttranscriptional gene silencing (PTGS) pathway and the RNA inhibitory (RNAi) pathway in mammals. The RNAi pathway involves the enzymatic processing of double-stranded RNA into small interfering RNAs (siRNAs) of approximately 22–25 nt that may have evolved as a means to degrade the RNA genomes of RNA viruses such as retroviruses. The pathway of processing both miRNAs and siRNAs in diagrammed in the Figure below. The stem-loop of the primary miRNA gene transcript (pri-miRNA) is first cleaved through the action of the RNase III-related activity called Drosha which takes place in the nucleus and generates the precursor miRNA (pre-miRNA). In the siRNA pathway the duplex RNAs are cleaved into 22–25 nt pieces through the action of the enzyme Dicer in the cytosol. Processed miRNA stem-loop structures are transported from the nucleus to the cytosol via the activity of exportin5. In the cytosol the processed miRNA stem-loop is targeted by Dicer which removes the loop portion. The nomenclature of the mature miRNA duplex is miRNA:miRNA*, where the miRNA* strand is the non-functional half of the duplex. Ultimately, fully processed miRNAs and siRNAs are engaged by the RNA-induced silencing complex (RISC) which separates the two RNA strands. The active strand of RNA derived either from the miRNA or siRNA pathway is anti-sense to a region of the target mRNA.

Model of the processing of miRNAs and siRNAs

Model for processing miRNAs and siRNAs. miRNA genes are transcribed as larger precursor RNAs that are then processed via the action of the Drosha enzyme, within the nucleus, to a pre-miRNA. The pre-miRNA is then transported to the cytosol. Within the cytosol the pre-miRNA is further processed via the actions of the Dicer complex and an RNA helicase to the functional single-stranded functional miRNA. The miRNA is engaged by the RISC complex and associates with the appropriate target mRNA. Following mRNA-miRNA interaction the mRNA is degraded as well as being translationally inhibited. The net result is a reduction (knock down) in gene expression at the level of a given mRNA and protein.

Two models exist for how siRNAs and miRNAs interfere with the expression of target genes. These models include directed degradation of the target mRNA or interference with the translation of a target mRNA. In the case of miRNA-directed mRNA degradation the proposed model involves the complimentary interaction of the miRNA with the mRNA and then the recruitment of the RISC which ultimately leads to degradation of the target mRNA. In the translation repression model it is believed that either the interaction of the miRNA and the RISC with the mRNA inhibits the progression of the ribosomal machinery along the mRNA without leading to mRNA degradation. This latter model was hypothesized because in the example of lin-4 the amount of lin-14 mRNA does not decrease but the protein product of the lin-14 mRNA is reduced.

Regardless of the mechanism of action the effect is post-transcriptional regulation of gene expression. To date numerous examples of miRNA-mediated gene regulation have been identified in development, cell survival and metabolic pathways. In addition, the involvement of miRNA processes in human disease have been elucidated or inferred. In the cae of cancer it is speculated that some miRNAs can be classified as tumor suppressors since the loss of their activity is associated with cancer progression. A role for miRNAs in neurodegenerative diseases is also suggested by the example of the fragile X syndrome. Fragile X syndrome is caused by expansion of a trinucleotide repeat in the FMR1 gene and the product of the FMR1 gene, FMRP, is an RNA-binding protein that associates with miRNAs.

A comprehensive database of miRNA genes and miRNA targets can be found at the miRBase site.

back to the top
Return to The Medical Biochemistry Page
Michael W King, PhD | © 1996–2016, LLC | info @

Last modified: January 28, 2016