Return to The Medical Biochemistry Page

© 1996–2017, LLC | info @


All cells undergo a division cycle during their life span. Some cells are continually dividing (e.g. stem cells), others divide a specific number of times until cell death (apoptosis) occurs, and still others divide a few times before entering a terminally differentiated or quiescent state. Most cells of the body fall into the latter category of cells. During the process of cell division everything within the cell must be duplicated in order to ensure the survival of the two resulting daughter cells. Of particular importance for cell survival is the accurate, efficient and rapid duplication of the cellular genome. This process is termed DNA replication.












back to the top

Eukaryotic Genomes

The size of eukaryotic genomes is vastly larger than those of prokaryotes. This is partly due to the complexity of eukaryotic organisms compared to prokaryotes. However, the size of a particular eukaryotic genome is not directly correlated to the organisms complexity. This is the result of the presence of a large amount of non-coding DNA. The functions of these non-coding nucleic acid sequences are only partly understood. Some sequences are involved in the control of gene expression while others may simply be present in the genome to act as an evolutionary buffer able to withstand nucleotide mutation without disrupting the integrity of the organism.

One abundant class of non-coding DNA is termed repetitive DNA. There are two primary sub-classes of repetitive DNA, highly repetitive and moderately repetitive. Highly repetitive DNA can be sub-divided into two distinct subclasses termed microsatellite DNA and minisatellite DNA. Microsatellite DNA consists of short repeat sequences 2–6 bp long reiterated from 100,000–1,000,000 times. These microsatellite sequences of DNA are commonly called short tandem repeats (STR). Since the number of copies of the short repeat sequence is highly variable (i.e. polymorphic) between two different individuals the elements are referred to as short tandem repeat polymorphisms (STRP). Replication of STR DNA frequently results in mismatching of the DNA strands which is normally corrected by a family of enzymes encoded by mismatch repair (MMR) genes. Defects in MMR genes are highly correlated to an increased likelihood of certain types of cancers, in particular colorectal carcinomas. The involvement of mismatch of the microsatellite DNA in diseases such as colorectal carcinomas is known as microsatellite instability, MSI. Minisatellite DNA contains repeat sequences that are 10–60 bp in length. The number of copies of these types of repeats is also highly variable (polymorphic) and so the repetitive DNA is most commonly referred to as variable number tandem repeat (VNTR) DNA.

The DNA of the genome consisting of the genes (coding sequences) is identified as non-repetitive DNA since most genes occur but once in an organism's haploid genome. However, it should be pointed out that several genes exist as tandem clusters of multiple copies of the same gene ranging from seveal copies to hundreds of copies such as is the case for the rRNA genes and the histone genes.

Another characteristic feature that distinguishes eukaryotic from prokaryotic genes is the presence of introns. Introns are stretches of nucleic acid sequences that separate the coding exons of a gene. The existence of introns in prokaryotes is extremely rare or nonexistent. Although essentially all humans genes contain introns there are numerous different mRNA encoding genes in the human genome that contain no introns. Notable intronless genes are the histone genes. In many genes the presence of introns separates exons into coding regions exhibiting distinct functional domains.

back to the top

Chromatin Structure

Histone Genes and Histone Proteins

Chromatin is a term designating the structure in which DNA exists within cells. The structure of chromatin is determined and stabilized through the interaction of the DNA with DNA-binding proteins. There are two classes of DNA-binding proteins. The histones are the major class of DNA-binding proteins involved in maintaining the compacted structure of chromatin. The other class of DNA-binding proteins is a diverse group of proteins called simply, non-histone proteins. This class of proteins includes the various transcription factors, polymerases, hormone receptors and other nuclear enzymes. In any given cell there are greater than 1000 different types of non-histone proteins bound to the DNA.

There are five primary histone proteins identified as H1, H2A, H2B, H3 and H4. All of these histones are referred to as the replication-dependent histones. These histone proteins are derived from large clusters of genes located on chromosomes 6p21–p22 (called the HIST1 cluster and containing 55 genes), 1q21 (called the HIST2 cluster and containing 6 genes), and 1q42 (called the HIST3 cluster and containing 3 genes). All of the histone mRNAs derived from the three histone gene clusters are synthesized from intronless genes and are produced without a poly(A) tail. The 3'-end of all of these replication-dependent histone mRNAs terminate with a stem-loop structure that is followed by a purine-rich region that is complementary to the 5'-end of the U7 snRNA. All of the histone H1 genes are located within the HIST1 cluster. An additional histone gene that produces a H4 mRNA with the stem-loop structure is located on chromosome 12. In addition to the genes encoding the replication-dependent histones there are several genes encoding histones termed replacement variant histones or replication-independent histones. The variant histones are identified as H2A.X (encoded by the H2AFX gene), H2A.Z (encoded by the H2AFZ gene), H3.3A (encoded by the H3F3A gene), H3.3B (encoded by the H3F3B gene), and H10 (encoded by the H1F0 gene). There are approximately 10-20 individual genes encoding each of the core histones of the nucleosome. The replication-dependent histone genes encoding the H2A and H2B proteins generate 10-20 different protein isoforms, the H3 genes encode two protein isoforms, and the H4 genes all encode identical proteins.

The binding of DNA by the replication-dependent histones generates a structure called the nucleosome. The nucleosome core contains an octameric protein structure consisting of two subunits each of H2A, H2B, H3 and H4. Histone H1 occupies the internucleosomal DNA and is identified as the linker histone. The nucleosome core contains approximately 150 bp of DNA. The linker DNA between each nucleosome can vary from 20 to more than 200 bp. These nucleosomal core structures would appear as "beads-on-a-string" if the DNA were pulled into a linear structure and observed under an electron microscope.

Diagrammatic representation of a nucleosome

Diagrammatic representation of a nucleosome

The nucleosome cores themselves coil into a solenoid shape which itself coils to further compact the DNA. These final coils are compacted further into the characteristic chromatin seen in a metaphase karyotyping spread. The protein-DNA structure of chromatin is stabilized by attachment to a non-histone protein scaffold called the nuclear matrix.

Hierarchy of chromatin structure

Hierarchy of chromatin structure

In a broad consideration of chromatin structure there are two forms: heterochromatin and euchromatin which were originally designated based on cytological observations of how darkly the two regions were stained. Heterochromatin is more densely packed than euchromatin and is often found near the centromeres of the chromosomes. Heterochromatin is generally transcriptionally silent. Euchromatin on the other hand is more loosely packed and is where active gene transcription will be found to be taking place.

There are two primary mechanisms operating in a dynamic manner to alter the overall structure of chromatin. These mechanisms are methylation of cytidine residues in the DNA that are found in the dinucleotide, –CG– (most often written as a CpG dinucleotide) and histone protein modification. Because both DNA methylation and histone modifications can alter chromatin structure and, thereby, alter the transcriptional activity of genes, both of these types of modification are termed epigenetic processes. The term epigenetics means that changes in phenotype can come about, not by changes in the actual DNA sequences but by changes that occur on the genes. Methylation of cytidine residues as a post-replication modification of DNA is discussed below and will be examined here as well.

back to the top

Nucleotide Modifications Regulating Chromatin Structure

DNA Methylation: Formation of m5C (5mC)

When determining which C residues in DNA are targets for methylation it was discovered that essentially 100% of methyl-C is found in the dinucleotide, CpG. The cytidine is methylated at the 5 position of the pyrimidine ring generating 5-methylcytidine (designated m5C or 5mC). This is not to say that all CpG dinucleotides contain a methylated C residue. When examining the structure of eukaryotic genes and identifying regions of CpG dinucleotides it is the case that the promoter regions of genes contain 10–20 times as many CpGs when compared to the rest of the genome. In an examintion of the global methylation status in the human genome it has been shown that 60%-80% of all the CpG dinucleotides present in the genome contain m5C.

In a general sense what is known about DNA methylation and transcriptional status is that when regions of a gene that can be methylated are methylated, the associated gene(s) is(are) transcriptionally silent and when the region is under-methylated the gene(s) is(are) transcriptionally active or can be activated. When cells undergo differentiation it has been observed that genes that become transcriptionally activated exhibit a reduction in methylation status relative to the level prior to activation and that this under-methylation remains even after transcription ceases. The correlation between DNA methylation and chromatin structure, as it relates to transcriptional activity, is discussed in greater detail in the Control of Gene Expression page.

The methylation of DNA is catalyzed by several different DNA methyltransferases (abbreviated DNMT). The methyl donor for the methylation reaction is S-adenosylmethionine (SAM; also abbreviated AdoMet). Humans express three DNMT genes identified as DNMT1, DNMT3a, and DNMT3b. The DNMT1 gene is located on chromosome 19p13.2 and is composed of 41 exons that generate four alternatively spliced mRNAs that encode four distinct protein isoforms. The DNMT1 isoform a is the largest isoform and is a 1632 amino acid protein. The DNMT3a gene is located on chromosome 2p23 and is composed of 34 exons that generate six alternatively spliced mRNAs encoding four distinct proteins. The DNMT3b gene is located on chromosome 20q11.2 and is composed of 24 exons that generate six alternatively spliced mRNAs encoding six distinct protein isoforms. Of the three DNA methyltransferases DNMT1 is the most abundant in all cells. Another gene, identified as DNMT3L (for DNMT3-like) has some similarities to the DNA methylatransferases but does not have the methyltransferase catalytic amino acids. The activity of the DNMT3L protein stimulates the DNA methyltransferase activity of DNMT3a. DNMT3L can also affect transcriptional activity through its association with histone deacetylase 1 (HDAC1). Another gene, that was originally designated DNMT2, and thought to be involved in DNA methylation, in fact encodes an enzyme that methylates a specific aspartic acid tRNA. The designation for this gene is now TRDMT1.

When cells divide the DNA contains one strand of parental DNA and one strand of the newly replicated DNA (the daughter strand). If the DNA contains methylated cytidines in CpG dinucleotides the daughter strand must undergo methylation in order to maintain the parental pattern of methylation. This "maintenance" methylation is catalyzed by DNMT1 and thus, this enzyme is called the maintenance methylase. In addition to DNMT1, maintenance methylation requires another protein as an obligate partner for DNMT1. This additional protein is called ubiquitin-like plant homeodomain and RING finger domain 1 and is encoded by the UHRF1 gene. The RING finger domain is a zinc-finger-like domain which gets its name from the term Really Interesting New Gene. The UHRF1 protein is required for the recognition of the hemimethylated sites in the DNA following replication. As might be expected from its characterized primary function, DNMT1 has an up to 100-fold higher level of activity towards hemimethylated DNA compared to unmethylated DNA. The activities of DNMT3a and DNMT3b are relatively equivalent towards unmethylated and hemimethylated DNA. The critical role of DNA methylation in controlling developmental fates was demonstrated in mice by inactivating either DNMT3a or DNMT3b. Loss of either gene resulted in death shortly after birth.

reactions of cytidine methylation and demthylation in DNA

Details of the reactions of cytidine methylation and demthylation. Methylation of a cytidine residue in a CpG dinucleotide is a dynamic process involving addition of the methyl moiety and removal when appropriate or stimulated. The details of the methylation process are described above while the details of the reactions of methyl removal are described below. The primary enzyme responsible for maintaining the state of the 5mC in CpG dinucleotides following replication is the DNMT1 methyltransferase. This enzymes functions in concert with the UHRF1 protein that physically recognizes the hemimethylated regions in the newly synthesized daughter strands. When removal of the methyl group is required it is catalyzed in a series of reactions, the first three of which are catalyzed by the TET (ten eleven translocation) enzyme. The product of the three reactions catalyzed by TET is 5-carboxylcytosine (5caC). The modified cytosine is then removed by thymine DNA glycosylase (TDG) leaving an abasic site in the DNA. The cytosine base is replaced throught the action of the enzyme identified as base excision repair (BER). Abbreviations of the various TET modified cytosine intermediates are described in the next section.

DNA Demethylation: Removal of m5C (5mC)

Given that chromatin structure, and consequently transcriptional activity, can be modified by the addition of a methyl group to cytidine residues, it is not surprising that there are activities in the cell that are responsible for the removal of the methyl group. However, the discovery of enzymes that can remove methyl groups from m5C residues came about rather serendipitously via two independent areas of study. In a search for mammalian homologs of Trypanosoma brucei genes that oxidize thymidine residues in DNA to 5-hydroxymethyluracil (5hmU), a family of three genes was identified. The second approach that resulted in the identification of the same family of genes involved studies aimed at identification of the pathways that led to the introduction of 5-hydroxymethylcytosine (5hmC) in mammalian DNA. The genes that were identified are all related to a gene that was originally identifed in rare cases of acute myeloid and lymphocytic leukemia (MLL). This form of leukemia results from a translocation between chromosome 10 and chromosome 11. The translocation results in the fusion of the mixed-lineage leukemia 1 (MLL1) gene on chromosome 10 and a gene on chromosome 11 that was subsequently given the name TET (ten eleven translocation). The MLL1 protein is a lysine methyltransferase (KMT) family enzyme encoded by the KMT2A gene. For details on KMT genes go to the Control of Gene Expression page.

Three TET gene are expressed in humans identified as TET1, TET2, and TET3. The official name for these genes is tet methylcytosine dioxygenase 1, 2, and 3. The TET gene encoded enzymes are distantly related to the human ALKB homologs (identified as ALKBH genes) which remove aberrant methylation from damaged DNA bases by an oxidative mechanism. The TET1 gene is located on chromosome 10q21 and is composed of 20 exons that encode a 2136 amino acid protein. The TET2 gene is located on chromosome 4q24 and is composed of 11 exons that generate two alternatively spliced mRNAs encoding two distinct protein isoforms. The TET3 gene is located on chromosome 2p13.1 and is composed of 16 exons that encode a 1795 amino acid protein. All three TET genes encode proteins that are zinc-finger domain containing ferrous iron (Fe2+) and 2-oxoglutarate (α-ketoglutarate)-dependent dioxygenases. Each of the three TET enzymes successively oxidize 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) within DNA. All three forms of oxidized methylcytosine have been shown to be present in numerous mammalian tissues.

The primary mechanism for the removal of 5fC and 5caC from DNA involves the action of thymine DNA glycosylase (encoded by the TDG gene). A cytosine is then incorported, which recovers the original CG base pair, by the enzyme base excision repair (encoded by the BER gene). There are several other proposed mechanisms that may be functional in the removal of 5acC, however, none have been definitively demonstrated experimentally. These mechanisms include direct decarboxylation of 5acC to C by an as yet unknown decarboxylase. Another proposed mechanism involves deamination of 5hmC via the action of activation-induced cytidine deaminase (encoded by the AICDA gene) or via the action of one of the APOBEC (apolipoprotein B mRNA editing enzyme, catalytic polypeptide) family of cytidine deaminases. The resultant products of these two distinct activities are thymine and 5hmU which can then be removed and replaced with cytidine via the actions of single-strand-selective monofunctional uracil DNA glycosylase (encoded by the SMUG1 gene) or TDG.

As indicated the TET proteins are Fe2+ and 2-oxoglutarate-dependent dioxygenases. Many other important enzymes, such as the lysine demethylases the demethylate histone proteins (discussed below and in greater detail in the Control of Gene Expression page), are also members of the large family of Fe2+ and 2-oxoglutarate-dependent dioxygenases. Therefore, it has been speculated, and much work directly correlates, that aberrations in the pathways generating 2-oxoglutarate may be important in the development of certain types of tumors. One of the most significant metabolic pathways in which 2-oxoglutarate is a critial intermediate is the TCA cycle. Several genes, encoding TCA cycle enzymes, have been shown to be mutated in several types of human cancers. Two if these genes are the isocitrate dehydrogenase 1 (IDH1) and IDH2 genes. Mutations in IDH1 and IDH2 have been found in a large number of different types of cancers. The mutations in these genes are associated with a change in catalytic activity such that instead of oxidizing isocitrate to 2-oxoglutarate, the enzymes oxidize 2-oxoglutarate to 2-hydroxyglutarate. This metabolic change can significntly reduce the cellular levels of the 2-oxoglutarate, thereby decreasing its role as a cofactor for dependent dioxygenases such as the TET enzymes and the histone demethylases.

back to the top

Histone Modifications Regulating Chromatin Structure

Histone proteins are subject to a number of modifications and these modifications are known to affect the structure of chromatin. Greater detail relating to the types and transcriptional consequences of the modification of histones is presented in the Control of Gene Expression page.

Histone acetylation is known to result in a more open chromatin structure and these modified histones are found in regions of the chromatin that are transcriptionally active. Conversely, underacetylation of histones is associated with closed chromatin and transcriptional inactivity. A direct correlation between histone acetylation and transcriptional activity was demonstrated when it was discovered that protein complexes, previously known to be transcriptional activators, were found to have histone acetylase activity. And as expected, transcriptional repressor complexes were found to contain histone deacetylase activity.

Linkage between DNA methylation and transcriptional silencing was demonstrated by the observation that proteins that bind to methyl CpG dinucleotides can recruit histone deacetylases to the DNA. Proteins are known to interact with acetylated histones that together lead to a more open chromatin structure. Proteins that bind to acetylated lysines in histones contain a domain called a bromodomain. The bromodomain is composed of a bundle of four α-helices and is a domain involved in protein-protein interactions in a number of cellular systems in addition to acetylated histone binding and chromatin structure modification.

Another histone modification known to affect chromatin structure is methylation. However, with histone methylation there is not a direct correlation between the modification and a specific effect on transcription. The methylation of histone H4 on R4 (arginine at position 4) promotes an open chromatin structure and thereby, leads to transcriptional activation. Methylation of histone H3 on K4 and K79 (lysines 4 and 79) has been shown to act similarly to histone H4 R4 methylation. However, methylation of histone H3 on K9 and K27 is known to be associated with transcriptionally inactive genes. The methylation of histones provides a site for the binding of other proteins which then leads to alteration of chromatin structure to a more compacted state. Proteins that bind to methylated histones contain a domain called chromodomain. The chromodomain consists of a conserved stretch of 40-50 amino acids and is found in many proteins involved in chromatin remodeling complexes. In addition, chromodomain proteins are found in the RNA-induced transcriptional silencing (RITS) complex which involves small interfering RNA (siRNA) and microRNA (miRNA)-medicated downregulation of transcription. For more details on small non-coding RNAs and transcriptional regulation see the Control of Gene Expression page.

Histone proteins can also be modified by addition of the small protein ubiquitin. Ubiquitination has been observed to occur on all the nucleosomal histones but is most often found histones H2A and H2B. When ubiquitinated, H2A is associated with repression of transcription. The exact opposite effect is observed when histone H2B is ubiquitinated, leading to a stimulation of gene activity. The reason that ubiquitinated histone H2B is associated with transcriptional activity is that this modification promotes the methylation of histone H3 at K4 and K79, which as indicated above is associated with open chromatin structure.

Phosphorylation of histones occurs primarily in response to outside signals such as growth factor stimulation or stress inducers such as heat shock. Phosphorylated histone are localized to genes that become transcriptionally active as a consequence of these outside signals. The importance of histone phosphorylation in control of gene expression can be demonstrated in patients with Coffin-Lowry syndrome. This disease results from defects in the RPS6KA3 (ribosomal protein S6 kinase A3; also known as ribosomal S6 kinase 2: RSK2) gene. Coffin-Lowry syndrome is a rare form of X-linked mental retardation characterized by skeletal malformations, growth retardation, hearing deficit, paroxysmal movement disorders, and cognitive impairment in affected males.

back to the top

DNA Replication

Replication of DNA occurs during the process of normal cell division cycles. Because the genetic complement of the resultant daughter cells must be the same as the parental cell, DNA replication must possess a very high degree of fidelity. The entire process of DNA replication is complex and involves multiple enzymatic activities.

The mechanics of DNA replication was originally characterized in the bacterium, E. coli which contains 3 distinct enzymes capable of catalyzing the replication of DNA. These have been identified as DNA polymerase (pol) I, II, and III. Pol I is the most abundant replicating activity in E. coli but has as its primary role to ensure the fidelity of replication through the repair of damaged and mismatched DNA. Replication of the E. coli genome is the job of pol III. This enzyme is much less abundant than pol I, however, its activity is nearly 100 times that of pol I.

Up until a few years ago the use of Greek lettering to designate the six known eukaryotic DNA polymerases was sufficient. However, recent evidence indicates that several more members of the eukaryotic DNA polymerase family are present and function in distinct types of DNA replication. These DNA polymerases are divided into four large families designated A, B, X, and Y. In addition, there is a reverse transcriptase activity associated with telomerase as discussed below and the mitochondrial polymerase-primase enzyme. The original six distinct eukaryotic DNA polymerases are identified as α, β, γ, δ, ε, and ζ. The identity of these individual enzymes relates to its subcellular localization, its primary replicative activity and to the order in which it was first described. The known eukaryotic DNA polymerases and descriptions of their activities are indicated in the Table below.

Eukaryotic DNA Polymerases
Polymerase Family Common Nomenclature Enzyme Function, Comments
A γ (gamma) mitochondrial DNA replication; encoded by the POLG gene on chromosome 15q26.1
A θ (theta) DNA repair; encoded by the POLQ gene on chromosome 3q13.33
A ν (nu) DNA repair and homologous recombination; encoded by the POLN gene on chromosome 4p16.3
B α (alpha) initiation of chromosomal DNA replication, Okazaki fragment priming, also involved in double-strand break repair; functions as a multisubunit complex that included two primase proteins (encoded by the PRIM1 and PRIM2A genes), an accessory protein (encoded by the POLA2 gene) and the catalytic subunit encoded by the POLA1 gene on chromosome Xp22.1–p21.3
B δ (delta) chromosomal DNA replication elongation, nucleotide excision repair, double-strand break repair, mismatch repair; consists of a multisubunit complex that includes the four subunit polymerase complex encoded by the POLD1, POLD2, POLD3, and POLD4 genes as well as the multisubunit replication factor C protein (RFC1) and proliferating cell nuclear antigen (PCNA)
B ε (epsilon) chromosomal DNA replication elongation, nucleotide excision repair, double-strand break repair, mismatch repair; consists of a 261kDa catalytic subunit encoded by the POLE gene on chromosome 12q24.33 and a 55kDa accessory protein encoded by the POLE2 gene, additional proteins in the epsilon complex include two histone-fold proteins encoded by the POLE3 and POLE4 genes
B ζ (zeta) bypass (translesion) DNA synthesis; encoded by the POLZ gene on chromosome 6q21; also known as REV3; enzyme responsible for essentially all DNA damage-induced mutagenesis as well as the majority of spontaneous mutagenesis
X β (beta) base-excision repair, required for DNA replication and maintenance, recombination, and drug-resistance; encoded by the POLB gene on chromosome 8p11.21
X λ (lambda) base-excision repair; encoded by the POLL gene on chromosome 10q24.32
X μ (mu) non-homologous end joining (NHEJ); encoded by the POLM gene on chromosome 7p13
X σ (sigma) sister chromatid cohesion; encoded by the POLS gene on chromosome 5p15.31; also known as topoisomerase-related function protein 4 (TRF4) and poly(A) polymerase-associated domain-containing protein 7 (PAPD7)
Y η (eta) bypass (translesion) DNA synthesis; encoded by the POLH gene on chromosome 6p21.1; required for replication through uv-induced cyclobutane pyrimidine dimers (CPD); mutations in POLH result in Xeroderma Pigmentosum variant (XP-V)
Y ι (iota) bypass (translesion) DNA synthesis; encoded by the POLI gene on chromosome 18q21.2; also known as RAD30B
Y κ (kappa) bypass (translesion) DNA synthesis; encoded by the POLK gene on chromosome 5q13.3
Y Rev1L bypass (translesion) DNA synthesis; encoded by the REV1 gene; interacts with POLK and is essential for POLK function
  telomerase telomeric DNA replication; encoded by the TERT gene on chromosome 5p15.33
  PrimPol primase and DNA directed polymerase; involved in both nuclear and mitochondrial DNA maintenance; functions as a DNA polymerase and is also capable of catalyzing the preferential formation of DNA primers in a zinc finger-dependent manner; encoded by the PRIMPOL gene on chromosome 4q35.1

The ability of DNA polymerases to replicate DNA requires a number of additional accessory proteins. The combination of polymerases with several of the accessory proteins yields an activity identified as DNA polymerase holoenzyme.These accessory proteins/complexes include (not ordered with respect to importance):

1. Primase complex (DNA polymerase α complex)

2. Processivity accessory proteins

3. Single strand binding proteins, SSBPs

4. Helicase

5. DNA ligase

6. Topoisomerases

The process of DNA replication begins at specific sites in the chromosomes termed origins of replication, requires a primer bearing a free 3'–OH, proceeds specifically in the 5' → 3' direction on both strands of DNA concurrently and results in the copying of the template strands in a semiconservative manner. The semiconservative nature of DNA replication means that the newly synthesized daughter strands remain associated with their respective parental template strands.

The large size of eukaryotic chromosomes and the limits of nucleotide incorporation during DNA synthesis, make it necessary for multiple origins of replication to exist in order to complete replication in a reasonable period of time. The precise nature of origins of replication in higher eukaryotic organisms is unclear. However, it is clear that at a replication origin the strands of DNA must dissociate and unwind in order to allow access to all of the accessory proteins and the DNA polymerase complex. Unwinding of the duplex at the origin as well as along the strands as the replication process proceeds is carried out by helicases. Helicases involved in DNA replication are DNA-dependent ATPase with DNA helicase activity. The resultant regions of single-stranded DNA are stabilized by the binding of single-strand binding proteins (SSBPs). The stabilized single-stranded regions are then accessible to the enzymatic activities required for replication to proceed. The site of the unwound template strands is termed the replication fork.

In order for DNA polymerases to synthesize DNA they must encounter a free 3'–OH which is the substrate for attachment of the 5'–phosphate of the incoming nucleotide. During repair of damaged DNA the 3'–OH can arise from the hydrolysis of the backbone of one of the two strands. During replication the 3'–OH is supplied through the use of an RNA primer, synthesized by the activity of the primase complex. The primase complex is composed of four proteins that includes two primase proteins identified as p58 and p49, a p68 accessory subunit of DNA polymerase α and the catalytic subunit of DNA polymerase α. Together these four proteins constitute what is more correctly referred to as the DNA polymerase α complex. The p49 and p58 primase proteins form a heterodimeric complex that interacts with DNA polymerase α and the p68 subunit. The two primase proteins are encoded by the PRIM1 gene (p49) and the PRIM2A gene (p58). The p68 accessory protein of the complex is encoded by the POLA2 gene and the catalytic DNA polymerase α protein is encoded by the POLA1 gene. The primase complex utilizes the DNA strands as templates and synthesizes a short stretch of RNA generating a primer for DNA polymerase. The PRIM1 gene is located on chromosome 12q13 and is composed of 13 exons that encode a protein of 420 amino acids. The PRIM2A gene is located on chromosome 6p12–p11.1 and is composed of 19 exons that generate several alternatively spliced mRNAs. The POLA2 gene is located on chromosome 11q13.1 and is composed of 21 exons that encode a protein of 598 amino acids. The POLA1 gene is located on the X chromosome (Xp22.1–p21.3) and is composed of 38 exons that encode a protein of 1462 amino acids.

Synthesis of DNA proceeds in the 5' → 3' direction through the attachment of the 5'–phosphate of an incoming dNTP to the existing 3'–OH in the elongating DNA strands with the concomitant release of pyrophosphate. Initiation of synthesis, at origins of replication, occurs simultaneously on both strands of DNA. Synthesis then proceeds bidirectionally, with one strand in each direction being copied continuously and one strand in each direction being copied discontinuously. During the process of DNA polymerases incorporating dNTPs into DNA in the 5' → 3' direction they are moving in the 3' → 5' direction with respect to the template strand. In order for DNA synthesis to occur simultaneously on both template strands as well as bidirectionally one strand appears to be synthesized in the 3' → 5' direction. In actuality one strand of newly synthesized DNA is produced discontinuously.

The strand of DNA synthesized continuously is termed the leading strand and the discontinuous strand is termed the lagging strand. The lagging strand of DNA is composed of short stretches of RNA primer plus newly synthesized DNA approximately 100–200 bases long (the approximate distance between adjacent nucleosomes). The lagging strands of DNA are also called Okazaki fragments. The concept of continuous strand synthesis is somewhat of a misnomer since DNA polymerases do not remain associated with a template strand indefinitely. The ability of a particular polymerase to remain associated with the template strand is termed its' processivity. The longer it associates the higher the processivity of the enzyme. DNA polymerase processivity is enhanced by additional protein activities of the replisome identified as processivity accessory proteins.

DNA replication fork

Diagrammatic representation of one side of a DNA replication fork. Large arrow depicts the overall direction of replication with both the leading strand and lagging strands of replication shown separated from each other. In actuality the process involves a looping of one of the two parental strands in order to allow the simultaneous replication of both strands in what appears to be the same direction. This detail is illustrated in the following Figure.

How is it that DNA polymerase can copy both strands of DNA in the 5' → 3' direction simultaneously? A model has been proposed where DNA polymerases exist as dimers associated with the other necessary proteins at the replication fork and identified as the replisome. The template for the lagging strand is temporarily looped through the replisome such that the DNA polymerases are moving along both strands in the 3' → 5' direction simultaneously for short distances, the distance of an Okazaki fragment. As the replication forks progress along the template strands the newly synthesized daughter strands and parental template strands reform a DNA double helix. The means that only a small stretch of the template duplex is single-stranded at any given time.

details of DNA replication

Details of simultaneous DNA strand replication. Figure illustrates the mechanism by which both strands of DNA are replicated simultaneously in the same direction. A portion of the lagging strand is looped around through the DNA polymerase holoenzyme complex such that short stretches of 500–1000 can be continuously replicated in the same direction as the leading strand. Eventually torsional stress will result in dissociation of the enzyme complex and the looping process will need to begin again. This is what results in the average length of the Okazaki fragments generated from the lagging strand. Only DNA helicase, single-strand binding proteins (SSBPs), and the DNA polymerase complex are shown.

The progression of the replication fork requires that the DNA ahead of the fork be continuously unwound. Due to the fact that eukaryotic chromosomal DNA is attached to a protein scaffold the progressive movement of the replication fork introduces severe torsional stress into the duplex ahead of the fork. This torsional stress is relieved by DNA topoisomerases. Topoisomerases relieve torsional stresses in duplexes of DNA by introducing either double- (topoisomerases II) or single-stranded (topoisomerases I) breaks into the backbone of the DNA. These breaks allow unwinding of the duplex and removal of the replication-induced torsional strain. The nicks are then resealed by the topoisomerases.

The RNA primers of the leading strands and Okazaki fragments are removed by the repair DNA polymerases simultaneously replacing the ribonucleotides with deoxyribonucleotides. The gaps that exist between the 3'–OH of one leading strand and the 5'–phosphate of another as well as between one Okazaki fragment and another are repaired by DNA ligases thereby, completing the process of replication.

back to the top

Additional DNA Polymerase Activities

The main enzymatic activity of DNA polymerases is the 5' → 3' synthetic activity. However, DNA polymerases possess two additional activities of importance for both replication and repair. These additional activities include a 5' → 3' exonuclease function and a 3' → 5' exonuclease function. The 5' → 3' exonuclease activity allows the removal of ribonucleotides of the RNA primer, utilized to initiate DNA synthesis, along with their simultaneous replacement with deoxyribonucleotides by the 5' → 3' polymerase activity. The 5' → 3' exonuclease activity is also utilized during the repair of damaged DNA. The 3' → 5' exonuclease function is utilized during replication to allow DNA polymerase to remove mismatched bases and is referred to as the proof-reading activity of DNA polymerase. It is possible (but rare) for DNA polymerases to incorporate an incorrect base during replication. These mismatched bases are recognized by the polymerase immediately due to the lack of Watson-Crick base-pairing. The mismatched base is then removed by the 3' → 5' exonuclease activity and the correct base inserted prior to progression of replication.

back to the top

Telomere Replication: Implications for Aging and Disease

Telomeres are the specialized DNA structures at the ends of all chromosomes that consist of repetitive DNA sequences and nucleoproteins, the overall structure of which is referred to as a nucleoprotein cap. The telomere sequence on the lagging strand is composed of the repeat 5'–TTAGGG–3'. The telomeric repeat sequence spans up to several kilobases and is involved in protecting the ends of the chromosomes from exonucleolytic activity. The telomeric ends of the lagging strand of each chromosome requires a unique method of replication which involves the activity of the enzyme complex called telomerase. This is due to the fact that even if the primase activity incorporated a primer sequence for DNA polymerase δ on the extreme 3'-end of the lagging strand, the end of that strand would not be fully replicated and therefore, would be susceptible to degradation.

Telomerase is a complex composed of two copies each of several proteins, two copies of an RNA with sequences complimentary to the telomeric repeats, and two copies of a reverse transcriptase activity that extends the 3'-end of the lagging strands using the telomerase RNA as the template. The reverse transcriptase activity of telomerase is encoded by the TERT gene (telomerase reverse transcriptase) and the RNA component is encoded by the TERC gene (telomerase RNA component). The TERC RNA contains a repeating hexanucleotide sequence, 3'–AAUCCC–5', that spans between 3 and 20 kilobases. This sequence in the TERC RNA forms a duplex with the lagging DNA strand at the ends of the chromosomes. The 3'-end of the lagging strand then serves as the primer for the reverse transcriptase activity (TERT) which extends the 3'-end of the chromosome using the TERC RNA as a template. The additional proteins involved in the telomerase complex are identified as dyskerin pseudouridine synthase 1 (commonly referred to simply as dyskerin; encoded by the DKC1 gene) and telomerase associated protein 1 (encoded by the TEP1 gene). For a large size image: Click here.

process of telomere replication by telomerase

Steps in telomere replication by telomerase

The telomerase process extends the end of the lagging strand that can then be replicated by normal DNA polymerase thereby, preserving the length of the chromosome. Numerous lines of evidence strongly implicate telomere shortening with activation of programmed cell death (apoptosis), loss of tissue stem cells, disease progression, and the overall processes of aging. The importance of telomere length and functional telomerase activity was initially defined in cultures of human fibroblasts as early as the 1960's. Hayflick and co-workers demonstrated that as fibroblasts went through progressive cell cycles in culture their telomeres became progressively shorter and induced a state of proliferative arrest. The fibroblasts exhibited a finite number of cell divisions leading up to the arrest and this barrier to proliferation was called the Hayflick limit. Forced cell division beyond the limit resulted in further telomere loss culminating in uncontrolled chromosomal instability and the triggering of apoptosis. At the opposite end of the spectrum, forced expression of telomerase, specifically the TERT gene, in cultured fibroblasts results in a preservation of telomere length and the cells gain the ability to divide indefinitely without any malignant properties.

Numerous studies have demonstrated a correlation between telomere shortening and human aging and disease. Decreased telomere length in peripheral blood leukocytes has been shown to correlate with higher mortality rates in older (over 60 years of age) individuals. This is contrasted by studies in centenarians and their offspring that have shown a positive link between telomere length and longevity. These latter studies also demonstrated that individuals with longer telomeres had an overall healthier profile relative to individuals of similar age with telomeres of shorter length. There is also an intriguing correlation between telomere length and psychological stress and the risk for development of psychiatric disease. Studies in women aged 20–50 years have shown that those individuals with the highest levels of psychological stress had the shortest telomeres. In addition, the level of telomerase activity in peripheral blood leukocytes was lowest in those individuals with the highest levels of stress which also coincided with the highest levels of oxidative stress. This correlation between stress and telomerase activity and telomere length is quite intriguing given that it is known that individuals subject to chronic psychological stress show a shortened lifespan and more rapid onset of diseases that are more typical of an aged population such as cardiovascular disease.

Telomere maintenance correlated to a healthy lifespan is also inferred from studies of various inherited degenerative disorders. As an example, individuals carrying a mutation in either the TERT or TERC genes develop autosomal dominant dyskeratosis congenita, DKS (characterized by a triad of abnormal nails, reticular skin pigmentation, and oral leukoplakia; also called Zinsser-Cole-Engman syndrome). DKS patients have shortened telomeres, a reduced lifespan, and exhibit signs of accelerated ageing. There is another form of DKS that is inherited as an X-linked disease resulting from defects in the DKC1 gene encoding a protein called dyskerin pseudouridine synthase 1 (simply called dyskerin). In addition to being a component of the telomerase complex, dyskerin is also a component of a pseudouridine synthase complex that modifies rRNAs, and another complex that processes small nucleolar RNAs (snoRNAs) into snoRNA-containing ribonucleoprotein complexes (snoRNPs). Other disorders that manifest with signs of premature aging, such as Werner syndrome and ataxia telangiectasia also correlate with shortened telomeres. Werner syndrome is a rare autosomal recessive disorder resulting from a deficiency of the WRN protein, which is a DNA helicase involved in DNA repair, DNA recombination and telomere maintenance. These patients develop normally until puberty. At this time they begin to manifest signs of multiple progressive premature ageing pathologies, including senile cataracts, osteoporosis, skin atrophy, myocardial infarction and cancer. Werner syndrome fibroblasts show accelerated telomere loss and undergo premature senescence that can be reversed by enforced TERT expression.

In addition to inherited disorders, telomere shortening is also correlated with acquired degenerative conditions associated with chronically elevated tissue turnover. For example, cirrhosis of the liver is associated with a progressive decline in telomere length.

back to the top

Post-Replicative Modification of DNA, Methylation

One of the major post-replicative reactions that modifies the DNA is methylation. DNA methylation can alter chromatin structure, as pointed out above, and as discussed in the Control of Gene Expression page can therefore, also alter the transcription of genes. As pointed out earlier, DNA methylation represents a major epigenetic process. The sites of natural methylation (i.e. not chemically induced) of eukaryotic DNA is always on cytosine residues that are present in CpG dinucleotides. However, it should be noted that not all CpG dinucleotides are methylated at the C residue. The cytidine is methylated at the 5 position of the pyrimidine ring generating 5-methylcytidine. Enzymes that incorporate methyl groups into DNA molecules are called DNA methyltransferases, DNMTs. As pointed out earlier, humans express three DNMT genes identified as DNMT1, DNMT3a, and DNMT3b. The enzymes of the DNMT3 family are responsible, principally, for the generation of the initial pattern of CpG dinucleotide methylation. The DNMT1 family enzymes are principally tasked with maintaining the DNMT3-established methylation patterns.

Methylation of DNA in prokaryotic cells also occurs. The function of this methylation is to prevent degradation of host DNA in the presence of enzymatic activities synthesized by bacteria called restriction endonucleases. These enzymes recognize specific nucleotide sequences of DNA. The role of this system in prokaryotic cells (called the restriction-modification system) is to degrade invading viral DNAs. Since the viral DNAs are not modified by methylation they are degraded by the host restriction enzymes. The methylated host genome is resistant to the action of these enzymes.

The role of methylation in eukaryotic DNA serves two clearly defined and overlapping functions. The methylation of cytidine in CpG dinucleotides affects the overall structure of chromatin which in turn broadly alters the availability of the chromatin to the transcriptional machinery. This effect of methylation is one mechanism of epigenetics. Epigenetics, as a means of gene control, is discussed in the Control of Gene Expression page. The effects of methylation on the transcription of specific genes was elegantly demonstrated in experiemnts that led to the under-methylation of the MyoD gene, a master control gene regulating the differentiation of muscle cells through the control of the expression of muscle-specific genes. Under-methylation of MyoD in fibroblasts results in their conversion to myoblasts. The experiments were carried out by allowing replicating fibroblasts to incorporate 5-azacytidine into their newly synthesized DNA. This analog of cytidine prevents methylation. The net result is that the original pattern of DNA methylation in the fibroblast is lost and numerous genes become under methylated and consequently transcriptionally activated.

Following DNA replication the original pattern of methylation is copied by the maintenance methylase enzyme, DNMT1. The DNMT1 protein recognizes the pattern of methylated C residues in the parental DNA strand following replication and methylates the C residue present in the corresponding CpG dinucleotide of the daughter strand.

process of post-replicative DNA methylation

Process of DNA methylation following DNA replication. Sites of DNA methylation have two fates following the process of DNA replication: they can be maintained or they can be progressively removed. Following replication the parental (template) strands of DNA contain 5mCpG, whereas the reciprocal C residue in the daughter strand is not methylated. If the methylation state of the gene is to be maintained then the maintenance methylase, DNMT1, incorporates a methyl group into the C residue of the daughter strand CpG dinucleotide. Recognition of the hemimethylated CpG dinucleotide requires the DNMT1 accesory protein encoded by the UHRF1 (ubiquitin like with PHD and ring finger domains 1) gene.

back to the top

DNA Methylation: Role in Genomic Imprinting

The phenomenon of genomic imprinting refers to the fact that the expression of some genes is exclusive to the maternally or paternally derived alleles. In other words, some genes are only expressed from the mothers chromosomes, whereas some genes are only expressed from the fathers chromosomes. Imprinted genes have been identified to be distributed throughout the genome. The majority of imprinted loci are organized in clusters that are up to one megabase (Mb) in size. The allele-specific expression of imprinted genes is regulated by epigenetic modifications of which DNA methylation is the controlling modification. Thus, these imprinted alleles are "marked" by their state of methylation.

The methylated CpG-rich regions of imprinted loci contain the methylation state on only one of the two parental chromosomes. For this reason these imprinted domains are referred to as differentially methylated regions, DMR. In several of these DMR the differential methylation is also found when comparing sperm and eggs representing the fact that the DMR is gametic in origin. These DMRs are, therefore, referred to as germline or gametic DMR. Most of the gametic DMR are found in the maternal germline. To date only four DMR are found in the paternal germline. These four loci are identified as H19, DLK1 (delta-like non-canonical Notch ligand 1), RASGRF1 (RAS protein specific guanine nucleotide releasing factor 1), and ZDBF2 (zinc finger DBF-type containing 2).

The significance of CpG methylation to imprinting can be shown by the fact that mutations in the maintenance DNA methylase, encoded by the DNMT1 gene, result in loss of the parental-origin-specific patterns of expression of imprinted genes. Although CpG methylation is the predominant regulator of the imprinting phenomenon, other epigenetic modifications, such as histone modifications, and other factors such as insulator proteins (e.g. CTCF) and long non-coding RNAs (lncRNA), are also involved in the overall imprinting process. The CTCF protein, also known as the CCCTC-binding factor, is a zinc-finger domain (11 zinc fingers) containing DNA-binding protein that was first identified as a regulator of MYC gene expression and subsequently shown to be an important regulator of the expression of numerous genes and to be involved in the regulation of the process of X chromosome inactivation, XIC. Within the human genome there are several thousand CTCF binding sites that are found either around genes (intergenic sites), within genes (intragenic sites), or in the promoter regions. Numerous CTCF binding sites contain CpG dinucleotides and the status of DNA methylation in these sites plays an important role in the occupancy of these sites by CTCF. The CTCF gene is located on chromosome 16q22.1 and is composed of 13 exons that generate two alternatively spliced mRNAs encodeing proteins of 727 amino acids (isoform 1) and 399 amino acids (isoform 2). The CTCF isoform 2 encoding mRNA lacks two internal exons and the encoded protein is initiated at a downstream AUG codon, relative to the isoform 1 start site. CTCF is a major transcriptional regulator with at least 5,000 identified binding sites in the human genome.

The methylated CpG-rich regions of imprinted loci contain the methylation state on only one of the two parental chromosomes. For this reason these imprinted domains are referred to as differentially methylated regions, DMR. In several of these DMR the differential methylation is also found when comparing sperm and eggs representing the fact that the DMR is gametic in origin. These DMRs are, therefore, referred to as germline or gametic DMR. Most of the gametic DMR are found in the maternal germline. To date only four DMR are found in the paternal germline. These four loci are identified as H19, DLK1 (delta-like non-canonical Notch ligand 1), RASGRF1 (RAS protein specific guanine nucleotide releasing factor 1), and ZDBF2 (zinc finger DBF-type containing 2).

To date several hundred imprinted genes have been characterized. One of the very first imprinted genes identified was the insulin-like growth factor-2 gene (symbol: IGF2). The IGF2 gene encoded protein (IGF-2) is required for normal fetal development and growth. Expression of IGF2 occurs exclusively from the paternal copy of the gene. In the case of IGF2 an element in the paternal locus, called an insulator element, is methylated blocking the ability of a transcriptional repressor from binding and inhibiting paternal IGF2 gene transcription. The function of the un-methylated insulator is to bind a transcriptional repressor protein that when bound blocks activation of IGF2 expression. This protein is called CTCF or, as indicated earlier, is also known as CCCTC-binding factor. The insulator element contains the sequence CCCTC that the factor binds to when nearby CpGs are undermethylated. When the CpG containing insulator region of the IGF2 gene is methylated, as in the case of the paternal gene, CTCF cannot bind the insulator DNA (CCCTC), thus allowing a distant enhancer element to drive expression of the IGF2 gene. In the maternal genome, the CpGs in the insulator domain are not methylated, therefore, the CTCF protein binds to it blocking the action of the distant enhancer element. The function of IGF-2 is exerted by its binding to a specific receptor identifed as the IGF-2 receptor, IGF2R. Interestingly the IGF2R gene is also imprinted, being expressed exclusively from the maternal gene. Indeed, the IGF2R gene was the very first human gene to be identified as being imprinted. Several of these imprinted loci have been associated with various diseases that result when there is a disruption in the normal pattern of imprinting. Two of the most clinically interesting disorders are Prader-Willi syndrome (PWS) and Angelman syndrome (AS). These two disorders are phenotypically quite distinct yet arise due to alterations in the same imprinted locus on chromosome 15 that encompasses several genes.

Genomic imprints, that involve CpG methylation, undergo a cycle of establishment, maintenance, and erasure. It is during spermatogenesis and oogenesis when the CpG methylation status is established. In males the CpG methylation imprints are established in prospermatogonia while in females the imprints are established only by the fully grown oocyte stage. The patterns of CpG methylation that arise in the germ cells are maintained following fertilization and throughout early development and in the adult. During development of the primordial germ cells (PGC), from which sperm and egg will arise, the pattern of CpG methylation is erased. The erasure of the CpG methylation pattern in the PGC ensures the sex-dependent imprint pattern can be established in later stages of spermatogenesis and oogenesis. The DNA methyltransferases responsible for the establishment of the germline differential methylation patterns are encoded by the DNMT3A and DNMT3B genes. As pointed out above, the protein encoded by the DNMT3L gene (which is highly expressed in germ cells) functions to enhance the activity of the DNMT3a enzyme. Once established, the maintenance of the state of germ cell CpG methylation is the function the DNMT1 methylase. The erasure of the CpG methylation imprints, that occurs in primordial germ cells, is carried out by the TET cytidine demethylases (TET1, TET2, and TET3) as well as by activation-induced cytidine deaminase (AID) as described above for the general removal of 5mC residues in non-imprinted regions of the DNA.

back to the top

X Chromosome Inactivation: Contributions from DNA Methylation

In mammals sex is determined by a pair of sex chromosomes identified as the X and Y chromosomes. Mammalian males are heterogametic and contain one copy of the X chromosome and one copy of the Y chromosome, whereas mammalian females are homogametic and contain two copies of the X chromosome. Whereas the X chromosome has retained a conserved high density of genes (approximately 1000 genes), the Y chromosome has lost the vast majority of its ancenstral genes such that modern mammalian Y chromosomes contain few protein coding genes. The fact that females harbor two copies of the X chromosome has led to the potential for differences in X-linked gene dosage between females and males. These potential gene dosage differences are compensated for through the process of X chromosome inactivation (XCI), a process first characterized in 1961 by Mary Lyon. As a result of her discovery the process of XCI is often referred to as lyonization. The original observation, that led to the discovery of XCI by Mary Lyon, was that of a chromosome in cat nerve cell nuclei that was structurally distinct and characterized by a dense heterochromatic nuclear morphology. This work was published in 1949 by Murray Barr and his student Edwart Bertram and this highly condensed chromosome is, therefore, referred to as the Barr body. Additional research demonstrated that mice with only one X chromosome were viable and phenotypically normal, indicating that only a single X chromosome was necessary.

The process of XCI takes place very early in development and ensures equivalent X chromosome gene expression levels throughout development in both males and females. Mammalian XCI is a diverse process such that in certain species (e.g. marsupials) a specific X chromosome (the paternal) is inactivated. However, in eutherian mammals, such as humans, the process of XCI is random such that in human females roughly half of the cells in her body have inactivated the paternal X chromosome (Xp) and the other half have inactivated the maternal X chromosome (Xm). This results in all human females being mosaic for cell populations within the same tissues where either the Xp or the Xm chromosome is inactive. Once a human X chromosome is inactiated it remains inactive in all the progeny cells throughout life. Although one of the two X chromosomes in females is randomly inactivated not all of the genes on the inactive X chromosome (Xi) are in fact transcriptionally silent. Several of the genes that remain active are found in the pseudo-autosomal region, PAR. The X chromosome PAR is a region of the X chromosome that is homologous to a region in the Y chromosome and is responsible for the X and Y chromosomes pairing up during meiosis. In addition to the genes in the PAR, several otherl genes on the Xi remain transcriptionally active. These latter genes are referred to as escape genes and approximately 15–20% of human X-linked genes completely escape inactivation while another 10% partially escape inactivation. Because the X chromosome PAR genes are present in the Y chromosome these genes are biallelically expressed in both males and females. X chromosome escape genes, on the other hand, do not have Y chromosome counterparts and are, therefore, differentially expressed between the sexes. The expression of PAR genes, along with the escape genes, most likely accounts for the phenotypes that are observed in Turner syndrome (45,X) females and in triple XXX females.

Mechanisms of X Chromosome Inactivation

The overall process of XCI in eutherian mammals is highly similar although there are known species specific differences. Due to the ease of use of mouse embryonic stem cells in the study of XCI most of the understanding of the process has been obtained with this model system. XCI requires a family of long, non-coding RNAs (lncRNA) with the central regulatory lncRNA being encoded by the XIST gene (X-inactive specific transcript). Like most lncRNAs, the primary XIST transcript is spliced and polyadenylated. The XIST gene resides in a region on the X chromosome called the X inactivation center (XIC). The human XIC is located in the q arm at around band 13 (Xq13). In addition to the XIST gene there are other genes as well as regulatory DNA sequences within the XIC that are involved in the X inactivation process. The two major genes (in addition to the XIST gene), that reside within the XIC, that regulate the process of XCI encode lncRNAs. These lncRNA encoding genes are JPX and TSIX. The TSIX gene is transcribed in the antisense direction relative to the XIST gene and is, therefore, referred to as a natural antisense transcript (NAT). There are also protein coding genes involved in the process of XCI including the RNF12 (which itself resides in the XIC), REX1, and CTCF genes. The RNF12 encoded protein is an E3 ubiquitin ligase family member and was the first non-lncRNA factor shown to be required for XCI. The REX1 gene encodes a transcription factor whose activity is regulated by the RNF12 protein and whose function is to regulate the expression of the XIST and TSIX genes. The CTCF gene, as described above, encodes a transcriptional regulator that binds to CCCTC sites in target genes. One of the important DNA elements involved in XCI resides just upstream of the transcriptional start site for the TSIX gene and is identified as XITE (X inactivation intergenic transcription elements).

structure of the X chromosome inactivation center, XIC

Structure of the X inactivation center, XIC. The region of Xq13 contains the X chromosome inactivation center, XIC. The region of the XIC contains several genes that encode regulators of the X inactivation process. Note that not all the genes identified in the region of the XIC are shown in this Figure, nor are the locations of the genes nor their size drawn to scale. The genes within the XIC generate both proteins and non-coding RNAs. The non-coding RNAs expressed in the XIC are all members of the long non-coding RNA (lncRNA) family. The major lncRNA regulator of XIC is encoded by the XIST gene. The transcription of this gene occurs with the 5' to 3' direction towards the centromere. The major inhibitor of XIST gene expression ois the TSIX gene that also encodes a lncRNA but does so in the antisense direction relative to that of XIST. The major protein coding gene in the XIC that is critical to the process of XCI is the RNF12 gene. Not all of the genes in the XIC are included in this Figure. T-lines represent inhibitory processes while green arrows indicate activation functions. As explained in the text, the primary effect of the E3 ubiquitin ligase, RNF12 is to ubiquitylate the REX1 protein, thereby resulting in its destruction. Loss of REX1 in the regulatory complex that includes the pluripotency factors, MYC and KLF-4, resuls in activation of XIST gene expression sincd the complex normaly represses XIST. While simutaneously repressing XIST the REX1 complex also activated the expression of the TSIX gene such that the activation of RNF12 expression results in loss of TSIX activity. The primary function of the lncRNA encoded by the JPX gene is to displace the CTCF transcriptional repressor from the promoter region of the XIST gene allowing for its expression to be activated. Not shown in the Figure, but important in the overall process of control of XCI is that CTCF also binds to the XITE element which is required for activation of TSIX expression. The lncRNA encoded by the FTX gene positively regulates XIST gene expression. The XITE regulatory element reside upstream of the transcriptional start site for the TSIX gene. The SLC16A2 gene, which encodes a thyroid hormone transporter, resides within the XIC and is itself subject to X-inactivation.

Expression of the XIST gene is exclusive to the future inactive X chromosome (Xi) and the encoded XIST lncRNA coats the X chromosome. The coating of the future Xi with XIST results in the recruitment of chromatin remodeling complexes that lead to the formation of highly condensed inactive heterochromatin, the Barr body. One important remodeling complex involved in the formation and stabilization Xi following XIST coating is the polycomb repressive complex 2 (PRC2. The PRC2 is composed of six subunits, two of which are enzymes that trimethylate lysine residues in histones (the EZH1 and EZH2 gene encoded proteins), specifically lysine 27 of histone H3. This modified histone is identified as H3K27me3. Once XIST function has resulted in silencing of one of the X chromosomes this state is stably inherited such that all progeny cells maintain the Xi. The prevention of X inactivation on the other X chromosome in females is accomplished via expression of the TSIX gene, which as indicated earlier, is transcribed in the antisense orientation relative to the XIST gene. The TSIX encoded transcript is also a lncRNA. TSIX epxression occurs from the active X chromosome (Xa) prior to and during the process of XCI. The presence of the TSIX lncRNA on the Xa participates in the prevention of XIST gene expression. Like the XIST lncRNA, the TSIX lncRNA recruits chromatin remodeling complexes to the XIST promoter region preventing its expression from the Xa.

As indicated in the Figure above, there are additional regulators of the process of XCI that function either as activators or inhibitors. One significant activator of XIST gene expression is the E3 ubiquitin ligase encoded by the RNF12 gene. The RNF12 gene resides in the XIC towards the telomeric end of the X chromosome about 500 kb upstream of the XIST transcriptional start site. The RNF12 protein was the first regulator of XCI that was not a lncRNA. The activation of XIST expression by RNF12 occurs in a dose-dependent manner. Given that RNF12 expression in males occurs from a single X chromosome the level of expression is not sufficient to induce XCI. However, in female cells the double dose of RNF12 is sufficient to initiate XCI. The major target for ubiquitylation by RNF12 is the zinc-finger protein identified as REX1 (named for Reduced EXpression 1 and encoded by the ZFP42 gene) which is a component of a pluripotency complex that includes the KLF-4 and MYC proteins. This pluripotency complex represses the expression of the XIST gene and activates the TSIX gene. Therefore, when REX1 is degraded following RNF12-mediated ubiquitylation, expression of XIST is activated. Once the inactivation process is started on one X chromosome expression of the RNF12 gene is silenced.

The lncRNAs encoded by the JPX and FTX genes also regulate the expression of XIST. The major function ot the JPX lncRNA is to displace the CTCF transcriptional repressor protein from the promoter region of the XIST gene. When bound to multiple sites in the XIST gene CTCF interferes with its expression such that its displacement by the JPX lncRNA results in activation of XIST expression. The FTX gene encodes not only a lnRNA but also contains several microRNA (miRNA) genes. The FTX lncRNA has been shown to alter the structure of chromatin around the promoter of the XIST gene by preventing DNA hypermethylation and promoting accumulation of histone H3 lysine 4 dimethylation (H3K4me2). These changes result in direct regulation of XIST expression.

The onset of XCI can be inhibited by autosome encoded proteins that have been shown to be critical in the establishment of pluripotency in mouse embryonic stem cells and in induced pluripotent stem cells (iPS). The key pluripotency proteins include the transcription factors OCT4, REX1, SOX2, KLF-4, NANOG and the reprogramming transcription factor MYC. The binding of different combinations of these pluripotency factors at different locations within the XIC results in either repression of XIST expression or activation of TSIX expression. With respect to the regulation of XIST, evidence has demonstrated that SOX2, OCT4, and NANOG bind to sequences in intron 1 of the XIST gene resulting in direct repression of XIST epxression. Within the promoter of the TSIX gene both OCT4 and SOX2 have been shown bind and activate expression of TSIX. Repression of XIST expression falls upon cellular differentiation concomitant with the decline in expression of the pluripotency factor genes. In addition to inhibiting the expression of XIST, OCT4, SOX2, and NANOG have been shown to repress XCI through interfeerence with the function of RNF12.

Once XCI has been initiated and established it must be maintained since the evidence is clear that once an X chromosome is inactivated (Xi) it is maintained in the inactive state throughout the remainder of the life of the cell and within all of progeny of the cell. The maitenance of XCI occurs in part through the coating of the Xi with the XIST lncRNA. The coating results in the recruitment of chromatin remodeling complexes that result in the gradual accumulation of Xi-specific epigenetic marks. In addition, RNA polymerase II (RNA pol II) is excluded from the XIST promoter limiting the overall level of XIST lncRNA. It is believed that the exclusion of RNA pol II from the XIST gene is a contributing factor in the prevention of the spread of XIST RNA to the active X chromosome (Xa) as well as from autosomes. The precise mechanism that prevents XIST lncRNA spreading is not clearly defined but may involve components of the nuclear matrix and heterogeneous ribonucleoprotein complexes, hnRNPs.

DNA Methylation Role in X Chromosome Inactivation

Within the context of the inactivated X chromosome (Xi) and comparison to the active X chromosome (Xa), the state of CpG methylation can be highly correlated to gene expression levels. As an example, the level of CpG methylation in the promoter regions of most of the genes on the X chromosome is higher on the Xi when compared to the same genes on the Xa. However, in the promoter regions of genes in the PAR and of escape gene the CpG methylation state is lower, overall, than the level of methylation of XCI-associated silenced genes. Typical of this pattern of DNA methylation between transcriptional start sites in the Xi versus the Xa is the XIST gene promoter. The XIST gene promoter has several CpG sites that are hypermethylated on the Xi when compared to the Xa and experimental evidence demonstrates that removal of the methylation from the XIST promoter reactivates expression of the gene. However, it is important to note that CpG hypermethylation alone in not sufficient to silence some of the genes on the Xi since examination of methylation state has shown several genes on the Xa that have the same level of CpG methylation when compared to the Xi, yet the Xa genes are active while the Xi genes are not. This clearly demonstrates that, whereas, CpG methylation can be highly correlated to transcriptional silencing associated with XCI, chromatin structure also plays an important role in the transcriptional silencing as well as in the escape of certain genes from XCI. This latter fact is expemplified by the observations that the coating of the Xi by the XIST lncRNA is coupled to the recruitment of chromatin remodeling complexes.

back to the top

DNA Recombination

DNA recombination refers to the phenomenon whereby two parental strands of DNA are spliced together resulting in an exchange of portions of their respective strands. This process leads to new molecules of DNA that contain a mix of genetic information from each parental strand. There are 3 main forms of genetic recombination. These are homologous recombination, site-specific recombination and transposition.

Homologous recombination is the process of genetic exchange that occurs between any two molecules of DNA that share a region (or regions) of homologous DNA sequences. This form of recombination occurs frequently while sister chromatids are paired during meiosis. Indeed, it is the process of homologous recombination between the maternal and paternal chromosomes that imparts genetic diversity to an organism. Homologous recombination generally involves exchange of large regions of the chromosomes.

Site-specific recombination involves exchange between much smaller regions of DNA sequence (approximately 20–200 base pairs) and requires the recognition of specific sequences by the proteins involved in the recombination process. Site-specific recombination events occur primarily as a mechanism to alter the program of genes expressed at specific stages of development. The most significant site-specific recombinational events in humans are the somatic cell gene rearrangements that take place in the immunoglobulin genes during B-cell differentiation in response to antigen presentation. These gene rearrangements in the immunoglobulin genes result in an extremely diverse potential for antibody production. A typical antibody molecule is composed of both heavy and light chains. The genes for both these peptide chains undergo somatic cell rearrangement yielding the potential for approximately 3,000 different light chain combinations and approximately 5,000 heavy chain combinations. Then because any given heavy chain can combine with any given light chain the potential diversity exceeds 10,000,000 possible different antibody molecules.

back to the top

DNA Transposition

Transposition is a unique form of recombination where mobile genetic elements (transposable elements, TEs) can virtually move from one region to another within one chromosome or to another chromosome entirely. There is no requirement for sequence homology for a transpositional event to occur. As a result of the mobilization of transposons, these DNA elements can reshuffle sequences, promote ectopic DNA rearrangements, and create novel genes. Examples of genes derived through the process of transposition include the paired box 6 (PAX6) gene and the gene encoding the reverse transcriptase activity (TERT) of the telomerase complex. PAX6 is a DNA-binding protein that regulates transcription and this gene was demonstrated to be derived from a transposase of a class II TE, whereas telomerase was shown to be derived from retrotransposons. In rare cases there has been documentation between transposable element insertions causing mutations which ultimately lead to disease manifestation. To date at least 65 diseases have been identified as being the result of TE insertions.

Transposable elements (TE) represent up to 50% of the DNA in the human genome. In humans there are two broad classifications for transposable elements. Class I TEs have an intermediate state that is RNA which is then reverse transcribed into DNA prior to insertion into chromatin, a process referred to as "copy and paste". Because of the reverse transcription process involved in the mobilization of class I TEs these elements are referred to as retrotransposons. Class II TEs are exclusively DNA transposons and mobilize by a "cut and paste" process that requires the enzyme identified as transposase. Class II TEs remain relatively inactive within the human genome. On the other hand, class I TEs remain actively mobile and as a result continuously serve as sources of genetic diversity in the process of generating new transposon insertions. The movement of TEs are classified as autonomous (move by themselves) or non-autonomous (require another TE to move). The non-automous TEs lack the required reverse transcriptase activity (class I TEs) or the lack the transposase enzyme (class II TEs). The most abundant autonomous retrotransposons are the Long Interspersed Nucleotide Elements, (LINE) with LINE-1 (L1) being the common designation for this class of TE. The L1 element is approximately 6,000 bp long and contains two open reading frames (ORF). One of the two ORFs in L1 encodes an RNA binding protein while the other ORF encodes a multi-subunit complex composed of the reverse transcriptase and an endonuclease. The non-autonomous TEs are the SINEs (Short Interspersed Nucleotide Elements), Alu repeats, and the SVAs. The SVAs represent a hybrid TE containing DNA sequences of SINEs, variable number tandem repeats (VNTRs), and Alu repeats. The Alu repeats represent the most abundant class of TEs in the human genome. The Alu element is approximately 300 bp in length and, therefore, it is a subclass of the SINE class of TE. The name Alu repeat stems from the fact that the element contains sequences recognized by the restriction endonuclease, AluI. Alu elements were derived from the small cytoplasmic RNA (7SL RNA) that is a component of the signal recognition particle (SRP) that recognizes the signal peptide of newly translated proteins and then targets those proteins to the endoplasmic reticulum, ER.

back to the top

Repair of Damaged DNA

Damage to DNA is a constant ongoing event and, therefore, mechanisms are in place to constantly surveil the structure of DNA and initiate the appropriate response pathways to effect repair. Cancer, in most non-viral induced cases, is the severe medically relevant consequence of the inability to repair damaged DNA. It is clear that multiple somatic cell mutations in DNA can lead to the genesis of the transformed phenotype. Therefore, it should be obvious that complete understanding of DNA repair mechanisms would be invaluable in the design of potential therapeutic agents in the treatment of cancer. DNA damage can occur as the result of exposure to environmental stimuli such as alkylating chemicals or ultraviolet and other forms of ionizing radiation and free radicals (e.g. reactive oxygen species, ROS) generated spontaneously in the oxidizing environment of the cell. These phenomena can, and do, lead to the introduction of mutations in the coding capacity of the DNA. Mutations in DNA can also, but rarely, arise from the spontaneous tautomerization of the bases. The extent of DNA damage that must be repaired is vast and includes alterations to the DNA structure occurring during replication and recombination, damage to the bases in the DNA (most often due to ROS and ionizing radiation), single- and double-strand breaks in the backbone of the DNA, and strand cross-linking. Coding region mutations in DNA can be of two general types. Transition mutations result from the exchange of one purine, or pyrimidine, for another purine, or pyrimidine. Transversion mutations result from the exchange of a purine for a pyrimidine or visa versa.

In order for the DNA damage repair systems to effect repair of DNA damage, the alterations need to be recognized. Two major damage recognition systems function in human cells to effect DNA repair. One is direct damage recognition and repair which refers to the mechanism by which the repair enzyme or repair complex recognizes the damage and effects repair directly. An example of this type of repair is described below in the context of the base excision repair (BER) process by which alkylated guanine residues are recognized and repaired through the action of O6-methylguanine-DNA methyltransferase. The second major mechanism is a multistep process where an enzyme or complex recognizes the DNA damage and then recruits the repair enzyme or complex to the site of damage to effect repair.

Base Excision Repair, BER

Cytidine bases in DNA can undergo spontaneous deamination to uridine, usually as a result of thermal or ionizing radiation. In addition, uridine can be misincorporated during DNA replication. Removal of the uracil base from these uridine residues is catalyzed by one of several enzymes that possess uracil DNA N-glycosylase activity. The major enzyme is encoded by the UNG gene located on chromosome 12q23-q24.1 which is composed of 7 exons. As a result of alternative promoter usage and alternative splicing, two distinct isoforms are generated from the UNG gene. The mitochondrial form of the enzyme is referred to as UNG1, while the nuclear isoform in referred to as UNG2. The removal of the uracil leads to activation of the base excision repair (BER) pathway that leads to replacement of the U with the correct C residue.

Modification of the DNA bases by alkylation (most often the incorporation of methyl groups) occurs nearly exclusively on purine residues. Methylation of G residues allows them to base pair with T instead of C. A unique activity called O6-methylguanine-DNA methyltransferase (MGMT gene) removes the methyl group from G residues. The protein itself becomes methylated and is no longer active, thus, a single protein molecule can remove only one alkyl group. Another modified guanine that is recognized as abnormal and removed is 8-oxoguanine, 8-oxoG. The generation of 8-oxoG occurs due to exposure of the DNA to reactive oxygen species, the consequences of which are that 8-oxoG base pairs with A instead of C. The removal of 8-oxoG is catalyzed by the enzyme 8-oxoguanine DNA glycosylase which is encoded by the OGG1 gene.

Nucleotide Excision Repair

Nucleotide excision repair (NER) involves the removal of a wide array of structurally unrelated DNA lesions including cyclobutane thymine (pyrimidine) dimers (CPD), [6-4] pyrimidine-pyrimidone photoproducts (6-4PP) produced in human skin by shortwave uv, and DNA helix-distorting chemical adducts induced by xenobiotic chemical carcinogens.

The prominent by-product from uv irradiation of DNA is the formation of thymine dimers. These form from two adjacent T residues in the DNA. Repair of thymine dimers is most understood from consideration of the mechanisms used in E. coli. However, several mechanism are common to both prokaryotes and eukaryotes. Thymine dimers in mammalian DNA are removed by several mechanisms. Specific glycohydrolases recognize the dimer as abnormal and cleave the N-glycosidic bond of the bases in the dimer. This results in the base leaving and generates an apyrimidinic site in the DNA. This is repaired by DNA polymerase and ligase. Glycohydrolases are also responsible for the removal of other abnormal bases, not just thymine dimers.

Humans defective in DNA repair, (in particular the repair of uv-induced thymine dimers), due to autosomal recessive genetic defects suffer from the disease Xeroderma pigmentosum (XP). There are at least eight distinct genetic defects associated with this disease leading to the designation of eight forms of XP identified as XPA (XPA gene), XPB (ERCC3 gene), XPC (XPC gene), XPD (ERCC2 gene), XPE (DDB2 gene), XPF (ERCC4 gene), XPG (ERCC5 gene), and XPV (POLH gene). The genes that are mutated in XP subtypes XPA through XPG are all involved in nucleotide excision repair (NER) while the gene mutated in the XPV form encodes a DNA polymerase involved in replication of damaged DNA on the leading strand. The ERCC genes are so designated from Excision Repair Cross Complementing. The DDB1 gene encodes damage specific DNA binding protein 2. Many of the protein products of these genes are found in heterodimeric complexes with other proteins involved in repair of damaged DNA. There are two major clinical forms of XP, one which leads to progressive degenerative changes in the eyes and skin and the other which also includes progressive neurological degeneration.

Another inherited disorder affecting DNA repair in which patients suffer from sun sensitivity, short stature and progressive neurological degeneration without an increased incidence of skin cancer is Cockayne Syndrome.

Ataxia telangiectasia (AT) is an autosomal recessive disorder resulting in neurological disability and suppressed immune function. Telangiectasias are dilated superficial blood vessels as also seen in patients with rosacea or scleroderma. AT patients develop a disabling cerebellar ataxia early in life and have recurrent infections. Patients suffering from AT have an increased sensitivity to x-irradiation resulting from defects in the ATM gene (ataxia telangiectasia mutated) which encodes a kinase that is induced in response to double-strand breaks in DNA. One important substrate for the ATM kinase is the tumor suppressor protein, p53. There are four distinct complementation groups resulting in AT identified as ATA, ATC, ATD, and ATE. All four are the result of mutations in the ATM gene.

Several diseases associated with defective repair of damaged DNA can be found in the Inborn Errors page.

back to the top

Chemotherapies Targeting Replication

The class of compounds that have been used the longest as anticancer drugs are the alkylating agents. The major alkylating agents are derived from nitrogenous mustards that were originally developed for use by the military. Commonly used alkylating agents include cyclophosphamide (Cytoxan, Neosar), ifosphamide, decarbazine, chlorambucil (Leukeran) and procarbazine (Matulane, Natulan).

Alkylating agents function by reacting with and disrupting the structure of DNA. Some agents react with alkyl groups in DNA resulting in fragmentation of the DNA as a consequence of the action of DNA repair enzymes. Some agents catalzye the cross-linking of bases in the DNA which prevents the separation of the two strands during DNA replication. Some agents induce mis-pairing of nucleotides resulting in permanent mutations in the DNA. Alkylating agents act upon DNA at all stages of the cell cycle, thus they are potent anticancer drugs. However, because of their potency, prolonged use of alkylating agents can lead to secondary cancers, particularly leukemias.

Several classes of anticancer drugs function through interference with the actions of the topoisomerases. Two of these classes are the anthracyclines and the camptothecins. The anthracyclins were originally isolated from the fungus, Streptomyces. Doxorubicin (Adriamycin, Doxil, Rubex) and daunorubicin (Cerubidine, DaunoXome, Daunomycin, Rubidomycin) have similar modes of action, although doxorubicin is the more potent of the two and is used in the treatment breast cancers, lymphomas and sarcomas. The anthracyclins inhibit the actions of topoisomerase II whose function is to introduce double-strand breaks in DNA during the process of replication as a means to relive torsional stresses. Anthracyclines also function by inducing the formation of oxygen free radicals that cause DNA strand breaks resulting in inhibition of replication. Another plant-derived anticancer compound that functions through inhibition of topoisomerase II is etoposide (VP-16, Vepesid, Etophos, Eposin). Etoposide is isolated from the mandrake plant and is in a class of compounds referred to as epipodophyllotoxins. The camptothecins were originally found in the bark of the Camptotheca accuminata tree and include irinotecan (Campto, Camptosar) and topotecan (Hycamtin). Camptothecins inhibit the action of topoisomerase I, an enzyme that induces single-strand breaks in DNA during replication.

Anticancer compounds that are extracted from the periwinkle plant, Vinca rosea, are called the vinca alkaloids. These compounds bind to tubulin monomers leading to the disruption of the microtubules of the mitotic spindle fibers that are necessary for cell division during mitosis. There are four major vinca alkaloids that are currently used as chemotherapeutics, vincristine (Oncovin, Vincrex), vinblastine (Velbe, Velban, Velsar), vinorelbin (VIN, Navelbine), and vindesine (Eldisine). The taxanes are another class of plant-derived compounds that act via interference with microtubule function. These compounds are isolated from the Pacific yew tree, Taxus brevifolia and include paclitaxel (Taxol) and docetaxel (Taxotere). The taxanes function by hyperstabilizing microtubules which prevents cell division (cytokinesis). These compounds are used to treat a wide range of cancers including head and neck, lung, ovarian, breast, bladder, and prostate cancers. Taxol has proven highly effective in the treatment of certain forms of breast cancer.

In order for DNA replication to proceed, proliferating cells require a pool of nucleotides. The class of anticancer drugs that has been developed to interfere with aspects of nucleotide metabolism is known as the antimetabolites. There are two major types of antimetabolites used in the treatment of a broad range of cancers: componds that inhibit thymidylate synthase and compounds that inhibit dihydrofolate reductase (DHFR). Both of these enzymes are involved in thymidine nucleotide biosynthesis (see Figure below). Drugs that inhibit thymidylate synthase include 5-fluorouracil (5-FU, Adrucil, Efudex) and 5-fluorodeoxyuridine. Those that inhibit DHFR are analogs of the vitamin folic acid and include methotrexate (Trexall, Rheumatrex) and trimethoprim (Proloprim, Trimpex).

Reactions of thymidine synthesis

Pathway of thymidine synthesis. The synthesis of thymidine begins with deoxyuridine monophosphate (dUMP) which is the product of the action of ribonucleotide reductase on UDP forming dUDP followed its dephosphorylation to dUMP. Thymidylate synthetase utilizes an active folate derivative, N5,N10-methylene tetrahydrofolate (THF), as the methyl group donor in the synthesis of dTMP.

back to the top
Return to The Medical Biochemistry Page
Michael W King, PhD | © 1996–2017, LLC | info @

Last modified: October 17, 2017