All cells undergo a division cycle during their life span. Some cells are continually dividing (e.g. stem cells), others divide a specific number of times until cell death (apoptosis) occurs, and still others divide a few times before entering a terminally differentiated or quiescent state. Most cells of the body fall into the latter category of cells. During the process of cell division everything within the cell must be duplicated in order to ensure the survival of the two resulting daughter cells. Of particular importance for cell survival is the accurate, efficient and rapid duplication of the cellular genome. This process is termed DNA replication.
back to the top
The size of eukaryotic genomes is vastly larger than those of prokaryotes. This is partly due to the complexity of eukaryotic organisms compared to prokaryotes. However, the size of a particular eukaryotic genome is not directly correlated to the organisms complexity. This is the result of the presence of a large amount of non-coding DNA. The functions of these non-coding nucleic acid sequences are only partly understood. Some sequences are involved in the control of gene expression while others may simply be present in the genome to act as an evolutionary buffer able to withstand nucleotide mutation without disrupting the integrity of the organism.
One abundant class of non-coding DNA is termed repetitive DNA. There are two primary sub-classes of repetitive DNA, highly repetitive and moderately repetitive. Highly repetitive DNA can be sub-divided into two distinct subclasses termed microsatellite DNA and minisatellite DNA. Microsatellite DNA consists of short repeat sequences 2–6 bp long reiterated from 100,000–1,000,000 times. These microsatellite sequences of DNA are commonly called short tandem repeats (STR). Since the number of copies of the short repeat sequence is highly variable (i.e. polymorphic) between two different individuals the elements are referred to as short tandem repeat polymorphisms (STRP). Replication of STR DNA frequently results in mismatching of the DNA strands which is normally corrected by a family of enzymes encoded by mismatch repair (MMR) genes. Defects in MMR genes are highly correlated to an increased likelihood of certain types of cancers, in particular colorectal carcinomas. The involvement of mismatch of the microsatellite DNA in diseases such as colorectal carcinomas is known as microsatellite instability, MSI. Minisatellite DNA contains repeat sequences that are 10–60 bp in length. The number of copies of these types of repeats is also highly variable (polymorphic) and so the repetitive DNA is most commonly referred to as variable number tandem repeat (VNTR) DNA.
The DNA of the genome consisting of the genes (coding sequences) is identified as non-repetitive DNA since most genes occur but once in an organism's haploid genome. However, it should be pointed out that several genes exist as tandem clusters of multiple copies of the same gene ranging from 50 to 10,000 copies such as is the case for the rRNA genes and the histone genes.
Another characteristic feature that distinguishes eukaryotic from prokaryotic genes is the presence of introns. Introns are stretches of nucleic acid sequences that separate the coding exons of a gene. The existence of introns in prokaryotes is extremely rare or nonexistent. Although essentially all humans genes contain introns there are numerous different mRNA encoding genes in the human genome that contain no introns. Notable intronless genes are the histone genes. In many genes the presence of introns separates exons into coding regions exhibiting distinct functional domains.back to the top
Chromatin is a term designating the structure in which DNA exists within cells. The structure of chromatin is determined and stabilized through the interaction of the DNA with DNA-binding proteins. There are two classes of DNA-binding proteins. The histones are the major class of DNA-binding proteins involved in maintaining the compacted structure of chromatin. There are five different histone proteins identified as H1, H2A, H2B, H3 and H4.
The other class of DNA-binding proteins is a diverse group of proteins called simply, non-histone proteins. This class of proteins includes the various transcription factors, polymerases, hormone receptors and other nuclear enzymes. In any given cell there are greater than 1000 different types of non-histone proteins bound to the DNA.
The binding of DNA by the histones generates a structure called the nucleosome. The nucleosome core contains an octamer protein structure consisting of two subunits each of H2A, H2B, H3 and H4. Histone H1 occupies the internucleosomal DNA and is identified as the linker histone. The nucleosome core contains approximately 150 bp of DNA. The linker DNA between each nucleosome can vary from 20 to more than 200 bp. These nucleosomal core structures would appear as "beads-on-a-string" if the DNA were pulled into a linear structure and observed under an electron microscope.
The nucleosome cores themselves coil into a solenoid shape which itself coils to further compact the DNA. These final coils are compacted further into the characteristic chromatin seen in a metaphase karyotyping spread. The protein-DNA structure of chromatin is stabilized by attachment to a non-histone protein scaffold called the nuclear matrix.
In a broad consideration of chromatin structure there are two forms: heterochromatin and euchromatin which were originally designated based on cytological observations of how darkly the two regions were stained. Heterochromatin is more densely packed than euchromatin and is often found near the centromeres of the chromosomes. Heterochromatin is generally transcriptionally silent. Euchromatin on the other hand is more loosely packed and is where active gene transcription will be found to be taking place.
There are two primary mechanisms operating in a dynamic manner to alter the overall structure of chromatin. These mechanisms are methylation of cytidine residues in the DNA that are found in the dinucleotide, –CG– (most often written as a CpG dinucleotide) and histone protein modification. Because both DNA methylation and histone modifications can alter chromatin structure and, thereby, alter the transcriptional activity of genes, both of these types of modification are termed epigenetic processes. The term epigenetics means that changes in phenotype can come about, not by changes in the actual DNA sequences but by changes that occur on the genes. Methylation of cytidine residues as a post-replication modification of DNA is discussed below and will be examined here as well.
When determining which C residues in DNA are targets for methylation it was discovered that greater than 90% of methyl-C is found in the dinucleotide, CpG. The cytidine is methylated at the 5 position of the pyrimidine ring generating 5-methylcytidine (designated m5C or 5mC). This is not to say that all CpG dinucleotides contain a methylated C residue. When examining the structure of eukaryotic genes and identifying regions of CpG dinucleotides it is the case that the promoter regions of genes contain 10-20 times as many CpGs when compared to the rest of the genome. In a general sense what is known about DNA methylation and transcriptional status is that when regions of a gene that can be methylated are methylated, the associated gene(s) is(are) transcriptionally silent and when the region is under-methylated the gene(s) is(are) transcriptionally active or can be activated. When cells undergo differentiation it has been observed that genes that become transcriptionally activated exhibit a reduction in methylation status relative to the level prior to activation and that this under-methylation remains even after transcription ceases. The correlation between DNA methylation and chromatin structure, as it relates to transcriptional activity, is discussed in greater detail in the Control of Gene Expression page.
The methylation of DNA is catalyzed by several different DNA methyltransferases (abbreviated DNMT). Humans express three DNMT genes identified as DNMT1, DNMT3a, and DNMT3b. The DNMT1 gene is located on chromosome 19p13.2 and is composed of 41 exons that generate four alternatively spliced mRNAs that encode four distinct protein isoforms. The DNMT1 isoform a is the largest isoform and is a 1632 amino acid protein. The DNMT3a gene is located on chromosome 2p23 and is composed of 34 exons that generate six alternatively spliced mRNAs encoding four distinct proteins. The DNMT3b gene is located on chromosome 20q11.2 and is composed of 24 exons that generate six alternatively spliced mRNAs encoding six distinct protein isoforms. Another gene, identified as DNMT3L (for DNMT3-like) has some similarities to the DNA methylatransferases but does not have the methylatransferase catalytic amino acids. The activity of the DNMT3L protein stimulates the DNA methyltransferase activity of DNMT3a. DNMT3L can also affect transcriptional activity through its association with histone deacetylase 1 (HDAC1). Another gene, that was originally designated DNMT2, and thought to be involved in DNA methylation, in fact encodes an enzyme that methylates a specific aspartic acid tRNA. The designation for this gene is now TRDMT1.
When cells divide the DNA contains one strand of parental DNA and one strand of the newly replicated DNA (the daughter strand). If the DNA contains methylated cytidines in CpG dinucleotides the daughter strand must undergo methylation in order to maintain the parental pattern of methylation. This "maintenance" methylation is catalyzed by DNMT1 and thus, this enzyme is called the maintenance methylase. Of the three DNA methyltransferases DNMT1 is the most abundant in all cells. As might be expected from its characterized primary function, DNMT1 has an up to 100-fold higher level of activity towards hemimethylated DNA compared to unmethylated DNA. The activities of DNMT3a and DNMT3b are relatively equivalent towards unmethylated and hemimethylated DNA. The critical role of DNA methylation in controlling developmental fates was demonstrated in mice by inactivating either DNMT3a or DNMT3b. Loss of either gene resulted in death shortly after birth.
Given that chromatin structure, and consequently transcriptional activity, can be modified by the addition of a methyl group to cytidine residues, it is not surprising that there are activities in the cell that are responsible for the removal of the methyl group. However, the discovery of enzymes that can remove methyl groups from m5C residues came about rather serendipitously via two independent areas of study. In a search for mammalian homologs of Trypanosoma brucei genes that oxidize thymidine residues in DNA to 5-hydroxymethyluracil (5hmU), a family of three genes was identified. The second approach that resulted in the identification of the same family of genes involved studies aimed at identification of the pathways that led to the introduction of 5-hydroxymethylcytosine (5hmC) in mammalian DNA. The genes that were identified are all related to a gene that was originally identifed in rare cases of acute myeloid and lymphocytic leukemia (MLL). This form of leukemia results from a translocation between chromosome 10 and chromosome 11. The translocation results in the fusion of the mixed-lineage leukemia 1 (MLL1) gene on chromosome 10 and a gene on chromosome 11 that was subsequently given the name TET (ten eleven translocation). The MLL1 protein is a lysine methyltransferase (KMT) family enzyme encoded by the KMT2A gene. For details on KMT genes go to the Control of Gene Expression page.
Three TET gene are expressed in humans identified as TET1, TET2, and TET3. The official name for these genes is tet methylcytosine dioxygenase 1, 2, and 3. The TET gene encoded enzymes are distantly related to the human ALKB homologs which remove aberrant methylation from damaged DNA bases by an oxidative mechanism. The TET1 gene is located on chromosome 10q21 and is composed of 20 exons that encode a 2136 amino acid protein. The TET2 gene is located on chromosome 4q24 and is composed of 11 exons that generate two alternatively spliced mRNAs encoding two distinct protein isoforms. The TET3 gene is located on chromosome 2p13.1 and is composed of 16 exons that encode a 1795 amino acid protein. All three genes encode proteins that are zinc-finger domain containing ferrous iron (Fe2+) and 2-oxoglutarate (α-ketoglutarate)-dependent dioxygenases. Each of the three TET enzymes successively oxidize 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) within DNA. All three forms of oxidized methylcytosine have been shown to be present in numerous mammalian tissues.
The primary mechanism for the removal of 5fC and 5caC from DNA involves the action of thymine DNA glycosylase (encoded by the TDG gene). A cytosine is then incorported, which recovers the original CG base pair, by the enzyme base excision repair (encoded by the BER gene). There are several other proposed mechanisms that may be functional in the removal of 5acC, however, none have been definitively demonstrated experimentally. These mechanisms include direct decarboxylation of 5acC to C by an as yet unknown decarboxylase. Another proposed mechanism involves deamination of 5hmC via the action of activation-induced cytidine deaminase (encoded by the AID gene) or via the action of APOBEC (apolipoprotein B mRNA editing enzyme, catalytic polypeptide). The resultant products of these two distinct activities are thymine and 5hmU which can then be removed and replaced with cytidine via the actions of single-strand-selective monofunctional uracil DNA glycosylase (encoded by the SMUG1 gene) or TDG.
As indicated the TET proteins are 2-oxoglutarate-dependent dioxygenases. Many other important enzymes, such as the lysine demethylases the demthylate histone proteins (discussed below) also require 2-oxoglutarate as a cofactor. Therefore, it has been speculated, and much work directly correlates, that aberrations in the pathways generating 2-oxoglutarate may be important in the development of certain types of tumors. One of the most significant pathways involves the isocitrate dehydrogenase 1 (IDH1) and IDH2 genes. Mutations in IDH1 and IDH2 have been found in a large number of different types of cancers. The mutations in these genes are associated with a change in catalytic activity such that instead of oxidizing isocitrate to 2-oxoglutarate in the TCA cycle, the enzymes oxidize 2-oxoglutarate to 2-hydroxyglutarate. This metabolic change can significntly reduce the cellular levels of the 2-oxoglutarate, thereby decreasing its role as a cofactor for dependent dioxygenases such as the TET enzymes and the histone demethylases.
Histone proteins are subject to a number of modifications and these modifications are known to affect the structure of chromatin. Greater detail relating to the types and transcriptional consequences of the modification of histones is presented in the Control of Gene Expression page.
Histone acetylation is known to result in a more open chromatin structure and these modified histones are found in regions of the chromatin that are transcriptionally active. Conversely, underacetylation of histones is associated with closed chromatin and transcriptional inactivity. A direct correlation between histone acetylation and transcriptional activity was demonstrated when it was discovered that protein complexes, previously known to be transcriptional activators, were found to have histone acetylase activity. And as expected, transcriptional repressor complexes were found to contain histone deacetylase activity.
Linkage between DNA methylation and transcriptional silencing was demonstrated by the observation that proteins that bind to methyl CpG dinucleotides can recruit histone deacetylases to the DNA. Proteins are known to interact with acetylated histones that together lead to a more open chromatin structure. Proteins that bind to acetylated lysines in histones contain a domain called a bromodomain. The bromodomain is composed of a bundle of four α-helices and is a domain involved in protein-protein interactions in a number of cellular systems in addition to acetylated histone binding and chromatin structure modification.
Another histone modification known to affect chromatin structure is methylation. However, with histone methylation there is not a direct correlation between the modification and a specific effect on transcription. The methylation of histone H4 on R4 (arginine at position 4) promotes an open chromatin structure and thereby, leads to transcriptional activation. Methylation of histone H3 on K4 and K79 (lysines 4 and 79) has been shown to act similarly to histone H4 R4 methylation. However, methylation of histone H3 on K9 and K27 is known to be associated with transcriptionally inactive genes. The methylation of histones provides a site for the binding of other proteins which then leads to alteration of chromatin structure to a more compacted state. Proteins that bind to methylated histones contain a domain called chromodomain. The chromodomain consists of a conserved stretch of 40-50 amino acids and is found in many proteins involved in chromatin remodeling complexes. In addition, chromodomain proteins are found in the RNA-induced transcriptional silencing (RITS) complex which involves small interfering RNA (siRNA) and microRNA (miRNA)-medicated downregulation of transcription. For more details on small non-coding RNAs and transcriptional regulation see the Control of Gene Expression page.
Histone proteins can also be modified by addition of the small protein ubiquitin. Ubiquitination has been observed to occur on all the nucleosomal histones but is most often found histones H2A and H2B. When ubiquitinated, H2A is associated with repression of transcription. The exact opposite effect is observed when histone H2B is ubiquitinated, leading to a stimulation of gene activity. The reason that ubiquitinated histone H2B is associated with transcriptional activity is that this modification promotes the methylation of histone H3 at K4 and K79, which as indicated above is associated with open chromatin structure.
Phosphorylation of histones occurs primarily in response to outside signals such as growth factor stimulation or stress inducers such as heat shock. Phosphorylated histone are localized to genes that become transcriptionally active as a consequence of these outside signals. The importance of histone phosphorylation in control of gene expression can be demonstrated in patients with Coffin-Lowry syndrome. This disease results from defects in the RPS6KA3 (ribosomal protein S6 kinase A3; also known as ribosomal S6 kinase 2: RSK2) gene. Coffin-Lowry syndrome is a rare form of X-linked mental retardation characterized by skeletal malformations, growth retardation, hearing deficit, paroxysmal movement disorders, and cognitive impairment in affected males.back to the top
Replication of DNA occurs during the process of normal cell division cycles. Because the genetic complement of the resultant daughter cells must be the same as the parental cell, DNA replication must possess a very high degree of fidelity. The entire process of DNA replication is complex and involves multiple enzymatic activities.
The mechanics of DNA replication was originally characterized in the bacterium, E. coli which contains 3 distinct enzymes capable of catalyzing the replication of DNA. These have been identified as DNA polymerase (pol) I, II, and III. Pol I is the most abundant replicating activity in E. coli but has as its primary role to ensure the fidelity of replication through the repair of damaged and mismatched DNA. Replication of the E. coli genome is the job of pol III. This enzyme is much less abundant than pol I, however, its activity is nearly 100 times that of pol I.
Up until a few years ago the use of Greek lettering to designate the six known eukaryotic DNA polymerases was sufficient. However, recent evidence indicates that several more members of the eukaryotic DNA polymerase family are present and function in distinct types of DNA replication. These DNA polymerases are divided into four large families designated A, B, X, and Y. In addition, there is a reverse transcriptase activity associated with telomerase as discussed below. The original six distinct eukaryotic DNA polymerases are identified as α, β, γ, δ, ε, and ζ. The identity of these individual enzymes relates to its subcellular localization, its primary replicative activity and to the order in which it was first described. The known eukaryotic DNA polymerases and descriptions of their activities are indicated in the Table below.
|Eukaryotic DNA Polymerases|
|Polymerase Family||Common Nomenclature||Enzyme Function, Comments|
|A||γ (gamma)||mitochondrial DNA replication; encoded by the POLG gene on chromosome 15q26.1|
|A||θ (theta)||DNA repair; encoded by the POLQ gene on chromosome 3q13.33|
|B||α (alpha)||initiation of chromosomal DNA replication, Okazaki fragment priming, also involved in double-strand break repair; functions as a multisubunit complex that included two primase proteins (encoded by the PRIM1 and PRIM2A genes), an accessory protein (encoded by the POLA2 gene) and the catalytic subunit encoded by the POLA1 gene on chromosome Xp22.1–p21.3|
|B||δ (delta)||chromosomal DNA replication elongation, nucleotide excision repair, double-strand break repair, mismatch repair; consists of a multisubunit complex that includes the four subunit polymerase complex encoded by the POLD1, POLD2, POLD3, and POLD4 genes as well as the multisubunit replication factor C protein (RFC1) and proliferating cell nuclear antigen (PCNA)|
|B||ε (epsilon)||chromosomal DNA replication elongation, nucleotide excision repair, double-strand break repair, mismatch repair; consists of a 261kDa catalytic subunit encoded by the POLE gene on chromosome 12q24.33 and a 55kDa accessory protein encoded by the POLE2 gene, additional proteins in the epsilon complex include two histone-fold proteins encoded by the POLE3 and POLE4 genes|
|B||ζ (zeta)||bypass (translesion) DNA synthesis; encoded by the POLZ gene on chromosome 6q21; also known as REV3; enzyme responsible for essentially all DNA damage-induced mutagenesis as well as the majority of spontaneous mutagenesis|
|X||β (beta)||base-excision repair, required for DNA replication and maintenance, recombination, and drug-resistance; encoded by the POLB gene on chromosome 8p11.21|
|X||λ (lambda)||base-excision repair; encoded by the POLL gene on chromosome 10q24.32|
|X||μ (mu)||non-homologous end joining (NHEJ); encoded by the POLM gene on chromosome 7p13|
|X||σ (sigma)||sister chromatid cohesion; encoded by the POLS gene on chromosome 5p15.31; also known as topoisomerase-related function protein 4 (TRF4) and poly(A) polymerase-associated domain-containing protein 7 (PAPD7)|
|Y||η (eta)||bypass (translesion) DNA synthesis; encoded by the POLH gene on chromosome 6p21.1; required for replication through uv-induced cyclobutane pyrimidine dimers (CPD); mutations in POLH result in Xeroderma Pigmentosum variant (XP-V)|
|Y||ι (iota)||bypass (translesion) DNA synthesis; encoded by the POLI gene on chromosome 18q21.2; also known as RAD30B|
|Y||κ (kappa)||bypass (translesion) DNA synthesis; encoded by the POLK gene on chromosome 5q13.3|
|Y||Rev1L||bypass (translesion) DNA synthesis; encoded by the REV1 gene; interacts with POLK and is essential for POLK function|
The ability of DNA polymerases to replicate DNA requires a number of additional accessory proteins. The combination of polymerases with several of the accessory proteins yields an activity identified as DNA polymerase holoenzyme.These accessory proteins/complexes include (not ordered with respect to importance):
1. Primase complex (DNA polymerase α complex)
2. Processivity accessory proteins
3. Single strand binding proteins, SSBPs
5. DNA ligase
7. Uracil-DNA N-glycosylase
The process of DNA replication begins at specific sites in the chromosomes termed origins of replication, requires a primer bearing a free 3'–OH, proceeds specifically in the 5' → 3' direction on both strands of DNA concurrently and results in the copying of the template strands in a semiconservative manner. The semiconservative nature of DNA replication means that the newly synthesized daughter strands remain associated with their respective parental template strands.
The large size of eukaryotic chromosomes and the limits of nucleotide incorporation during DNA synthesis, make it necessary for multiple origins of replication to exist in order to complete replication in a reasonable period of time. The precise nature of origins of replication in higher eukaryotic organisms is unclear. However, it is clear that at a replication origin the strands of DNA must dissociate and unwind in order to allow access to all of the accessory proteins and the DNA polymerase complex. Unwinding of the duplex at the origin as well as along the strands as the replication process proceeds is carried out by helicases. Helicases involved in DNA replication are DNA-dependent ATPase with DNA helicase activity. The resultant regions of single-stranded DNA are stabilized by the binding of single-strand binding proteins (SSBPs). The stabilized single-stranded regions are then accessible to the enzymatic activities required for replication to proceed. The site of the unwound template strands is termed the replication fork.
In order for DNA polymerases to synthesize DNA they must encounter a free 3'–OH which is the substrate for attachment of the 5'–phosphate of the incoming nucleotide. During repair of damaged DNA the 3'–OH can arise from the hydrolysis of the backbone of one of the two strands. During replication the 3'–OH is supplied through the use of an RNA primer, synthesized by the activity of the primase complex. The primase complex is composed of four proteins that includes two primase proteins identified as p58 and p49, a p68 accessory subunit of DNA polymerase α and the catalytic subunit of DNA polymerase α. Together these four proteins constitute what is more correctly referred to as the DNA polymerase α complex. The p49 anmd p58 primase proteins form a heterodimeric complex that interacts with DNA polymerase α and the p68 subunit. The two primase proteins are encoded by the PRIM1 gene (p49) and the PRIM2A gene (p58). The p68 accessory protein of the complex is encoded by the POLA2 gene and the catalytic DNA polymerase α protein is encoded by the POLA1 gene. The primase complex utilizes the DNA strands as templates and synthesizes a short stretch of RNA generating a primer for DNA polymerase. The PRIM1 gene is located on chromosome 12q13 and is composed of 13 exons that encode a protein of 420 amino acids. The PRIM2A gene is located on chromosome 6p12–p11.1 and is composed of 19 exons that generate several alternatively spliced mRNAs. The POLA2 gene is located on chromosome 11q13.1 and is composed of 21 exons that encode a protein of 598 amino acids. The POLA1 gene is located on the X chromosome (Xp22.1–p21.3) and is composed of 38 exons that encode a protein of 1462 amino acids.
Synthesis of DNA proceeds in the 5' → ;3' direction through the attachment of the 5'–phosphate of an incoming dNTP to the existing 3'–OH in the elongating DNA strands with the concomitant release of pyrophosphate. Initiation of synthesis, at origins of replication, occurs simultaneously on both strands of DNA. Synthesis then proceeds bidirectionally, with one strand in each direction being copied continuously and one strand in each direction being copied discontinuously. During the process of DNA polymerases incorporating dNTPs into DNA in the 5' → 3' direction they are moving in the 3' → 5' direction with respect to the template strand. In order for DNA synthesis to occur simultaneously on both template strands as well as bidirectionally one strand appears to be synthesized in the 3' → 5' direction. In actuality one strand of newly synthesized DNA is produced discontinuously.
The strand of DNA synthesized continuously is termed the leading strand and the discontinuous strand is termed the lagging strand. The lagging strand of DNA is composed of short stretches of RNA primer plus newly synthesized DNA approximately 100–200 bases long (the approximate distance between adjacent nucleosomes). The lagging strands of DNA are also called Okazaki fragments. The concept of continuous strand synthesis is somewhat of a misnomer since DNA polymerases do not remain associated with a template strand indefinitely. The ability of a particular polymerase to remain associated with the template strand is termed its' processivity. The longer it associates the higher the processivity of the enzyme. DNA polymerase processivity is enhanced by additional protein activities of the replisome identified as processivity accessory proteins.
Diagrammatic representation of one side of a DNA replication fork. Large arrow depicts the overall direction of replication with both the leading strand and lagging strands of replication shown separated from each other. In actuality the process involves a looping of one of the two parental strands in order to allow the simultaneous replication of both strands in what appears to be the same direction. This detail is illustrated in the following Figure.
How is it that DNA polymerase can copy both strands of DNA in the 5' → 3' direction simultaneously? A model has been proposed where DNA polymerases exist as dimers associated with the other necessary proteins at the replication fork and identified as the replisome. The template for the lagging strand is temporarily looped through the replisome such that the DNA polymerases are moving along both strands in the 3' → 5' direction simultaneously for short distances, the distance of an Okazaki fragment. As the replication forks progress along the template strands the newly synthesized daughter strands and parental template strands reform a DNA double helix. The means that only a small stretch of the template duplex is single-stranded at any given time.
Details of simultaneous DNA strand replication. Figure illustrates the mechanism by which both strands of DNA are replicated simultaneously in the same direction. A portion of the lagging strand is looped around through the DNA polymerase holoenzyme complex such that short stretches of 500–1000 can be continuously replicated in the same direction as the leading strand. Eventually torsional stress will result in dissociation of the enzyme complex and the looping process will need to begin again. This is what results in the average length of the Okazaki fragments generated from the lagging strand. Only DNA helicase, single-strand binding proteins (SSBPs), and the DNA polymerase complex are shown.
The progression of the replication fork requires that the DNA ahead of the fork be continuously unwound. Due to the fact that eukaryotic chromosomal DNA is attached to a protein scaffold the progressive movement of the replication fork introduces severe torsional stress into the duplex ahead of the fork. This torsional stress is relieved by DNA topoisomerases. Topoisomerases relieve torsional stresses in duplexes of DNA by introducing either double- (topoisomerases II) or single-stranded (topoisomerases I) breaks into the backbone of the DNA. These breaks allow unwinding of the duplex and removal of the replication-induced torsional strain. The nicks are then resealed by the topoisomerases.
The RNA primers of the leading strands and Okazaki fragments are removed by the repair DNA polymerases simultaneously replacing the ribonucleotides with deoxyribonucleotides. The gaps that exist between the 3'–OH of one leading strand and the 5'–phosphate of another as well as between one Okazaki fragment and another are repaired by DNA ligases thereby, completing the process of replication.back to the top
The main enzymatic activity of DNA polymerases is the 5' → 3' synthetic activity. However, DNA polymerases possess two additional activities of importance for both replication and repair. These additional activities include a 5' → 3' exonuclease function and a 3' → 5' exonuclease function. The 5' → 3' exonuclease activity allows the removal of ribonucleotides of the RNA primer, utilized to initiate DNA synthesis, along with their simultaneous replacement with deoxyribonucleotides by the 5' → 3' polymerase activity. The 5' → ;3' exonuclease activity is also utilized during the repair of damaged DNA. The 3' → 5' exonuclease function is utilized during replication to allow DNA polymerase to remove mismatched bases and is referred to as the proof-reading activity of DNA polymerase. It is possible (but rare) for DNA polymerases to incorporate an incorrect base during replication. These mismatched bases are recognized by the polymerase immediately due to the lack of Watson-Crick base-pairing. The mismatched base is then removed by the 3' → 5' exonuclease activity and the correct base inserted prior to progression of replication.back to the top
Telomeres are the specialized DNA structures at the ends of all chromosomes that consist of repetitive DNA sequences and nucleoproteins, the overall structure of which is referred to as a nucleoprotein cap. The telomere sequence on the lagging strand is composed of the repeat 5'–TTAGGG–3'. The telomeric repeat sequence spans up to several kilobases and is involved in protecting the ends of the chromosomes from exonucleolytic activity.
The telomeric ends of the lagging strand of each chromosome requires a unique method of replication which involves the activity of the enzyme complex called telomerase. This is due to the fact that even if the primase activity incorporated a primer sequence for DNA polymerase δ on the extreme 3'-end of the lagging strand, the end of that strand would not be fully replicated and therefore, would be susceptible to degradation. Telomerase is complex composed of several proteins, an RNA with sequence complimentary to the telomeric repeats, and a reverse transcriptase activity that extends the 3'-end of the lagging strands using the telomerase RNA as the template. The reverse transcriptase activity of telomerase is encoded by the TERT gene (telomerase reverse transcriptase) and the RNA component is encoded by the TERC gene (telomerase RNA component). The TERC RNA contains a repeating hexanucleotide sequence, 3'–AAUCCC–5', that spans between 3 and 20 kilobases. This sequence in the TERC RNA forms a duplex with the lagging DNA strand at the ends of the chromosomes. The 3'-end of the lagging strand then serves as the primer for the reverse transcriptase activity (TERT) which extends the 3'-end of the chromosome using the TERC RNA as a template. For a large size image: Click here.
The telomerase process extends the end of the lagging strand that can then be replicated by normal DNA polymerase thereby, preserving the length of the chromosome. Numerous lines of evidence strongly implicate telomere shortening with activation of programmed cell death (apoptosis), loss of tissue stem cells, disease progression, and the overall processes of aging. The importance of telomere length and functional telomerase activity was initially defined in cultures of human fibroblasts as early as the 1960's. Hayflick and co-workers demonstrated that as fibroblasts went through progressive cell cycles in culture their telomeres became progressively shorter and induced a state of proliferative arrest. The fibroblasts exhibited a finite number of cell divisions leading up to the arrest and this barrier to proliferation was called the Hayflick limit. Forced cell division beyond the limit resulted in further telomere loss culminating in uncontrolled chromosomal instability and the triggering of apoptosis. At the opposite end of the spectrum, forced expression of telomerase, specifically the TERT gene, in cultured fibroblasts results in a preservation of telomere length and the cells gain the ability to divide indefinitely without any malignant properties.
Numerous studies have demonstrated a correlation between telomere shortening and human aging and disease. Decreased telomere length in peripheral blood leukocytes has been shown to correlate with higher mortality rates in older (over 60 years of age) individuals. This is contrasted by studies in centenarians and their offspring that have shown a positive link between telomere length and longevity. These latter studies also demonstrated that individuals with longer telomeres had an overall healthier profile relative to individuals of similar age with telomeres of shorter length. There is also an intriguing correlation between telomere length and psychological stress and the risk for development of psychiatric disease. Studies in women aged 20–50 years have shown that those individuals with the highest levels of psychological stress had the shortest telomeres. In addition, the level of telomerase activity in peripheral blood leukocytes was lowest in those individuals with the highest levels of stress which also coincided with the highest levels of oxidative stress. This correlation between stress and telomerase activity and telomere length is quite intriguing given that it is known that individuals subject to chronic psychological stress show a shortened lifespan and more rapid onset of diseases that are more typical of an aged population such as cardiovascular disease.
Telomere maintenance correlated to a healthy lifespan is also inferred from studies of various inherited degenerative disorders. As an example, individuals carrying a mutation in either the TERT or TERC genes develop autosomal dominant dyskeratosis congenita, DKS (characterized by a triad of abnormal nails, reticular skin pigmentation, and oral leukoplakia; also called Zinsser-Cole-Engman syndrome). DKS patients have shortened telomeres, a reduced lifespan, and exhibit signs of accelerated ageing. There is another form of DKS that is inherited as an X-linked disease resulting from defects in the DKC1 gene encoding a protein called dyskerin. Dyskerin forms a complex with other proteins generating the telomerase complex as well as another complex that is a pseudouridine synthetase that modifies rRNAs. Other disorders that manifest with signs of premature ageing, such as Werner syndrome and ataxia telangiectasia also correlate with shortened telomeres. Werner syndrome is a rare autosomal recessive disorder resulting from a deficiency of the WRN protein, which is a DNA helicase involved in DNA repair, DNA recombination and telomere maintenance. These patients develop normally until puberty. At this time they begin to manifest signs of multiple progressive premature ageing pathologies, including senile cataracts, osteoporosis, skin atrophy, myocardial infarction and cancer. Werner syndrome fibroblasts show accelerated telomere loss and undergo premature senescence that can be reversed by enforced TERT expression.
In addition to inherited disorders, telomere shortening is also correlated with acquired degenerative conditions associated with chronically elevated tissue turnover. For example, cirrhosis of the liver is associated with a progressive decline in telomere length.back to the top
One of the major post-replicative reactions that modifies the DNA is methylation. DNA methylation can alter chromatin structure, as pointed out above, and as discussed in the Control of Gene Expression page can therefore, also alter the transcription of genes. As pointed out earlier, DNA methylation represents a major epigenetic process. The sites of natural methylation (i.e. not chemically induced) of eukaryotic DNA is always on cytosine residues that are present in CpG dinucleotides. However, it should be noted that not all CpG dinucleotides are methylated at the C residue. The cytidine is methylated at the 5 position of the pyrimidine ring generating 5-methylcytidine. Enzymes that incorporate methyl groups into DNA molecules are called DNA methyltransferases, DNMTs. As pointed out earlier, humans express three DNMT genes identified as DNMT1, DNMT3a, and DNMT3b. The enzymes of the DNMT3 family are responsible, principally, for the generation of the initial pattern of CpG dinucleotide methylation. The DNMT1 family enzymes are principally tasked with maintaining the DNMT3-established methylation patterns.
Methylation of DNA in prokaryotic cells also occurs. The function of this methylation is to prevent degradation of host DNA in the presence of enzymatic activities synthesized by bacteria called restriction endonucleases. These enzymes recognize specific nucleotide sequences of DNA. The role of this system in prokaryotic cells (called the restriction-modification system) is to degrade invading viral DNAs. Since the viral DNAs are not modified by methylation they are degraded by the host restriction enzymes. The methylated host genome is resistant to the action of these enzymes.
The role of methylation in eukaryotic DNA serves two clearly defined and overlapping functions. The methylation of CpG dinucleotides affects the overall structure of chromatin which in turn broadly alters the availability of the chromatin to the transcriptional machinery. This effect of methylation is one mechanism of epigenesis. Epigenetics as a means of gene control is discussed in the Control of Gene Expression page. The effects of methylation on the transcription of specific genes was elegantly demonstrated in experiemnts that led to the under-methylation of the MyoD gene (a master control gene regulating the differentiation of muscle cells through the control of the expression of muscle-specific genes). Under-methylation of MyoD in fibroblasts results in their conversion to myoblasts. The experiments were carried out by allowing replicating fibroblasts to incorporate 5-azacytidine into their newly synthesized DNA. This analog of cytidine prevents methylation. The net result is that the maternal pattern of methylation is lost and numerous genes become under methylated.
The pattern of methylation is copied post-replicatively by the maintenance methylase enzyme, DNMT1. The DNMT1 protein recognizes the pattern of methylated C residues in the maternal DNA strand following replication and methylates the C residue present in the corresponding CpG dinucleotide of the daughter strand.
Process of DNA methylation following DNA replication. Sites of DNA methylation have two fates following the process of DNA replication: they can be maintained or they can be progressively removed. Following replication the parental (template) strands of DNA contain 5mCpG, whereas the reciprocal C residue in the daughter strand is not methylated. If the methylation state of the gene is to be maintained then the maintenance methylase, DNMT1, recognizes the hemi-methylated site and incorporates a methyl group into the C residue of the daughter strand CpG dinucleotide.
The phenomenon of genomic imprinting refers to the fact that the expression of some alleles depends on whether or not they are inherited in a maternally or paternally specific manner. In other words, some genes are only expressed from the mothers chromosomes, whereas some genes are only expressed from the fathers chromosomes. Imprinted genes have been identified to be distributed throughout the genome. Several imprinted alleles are isolated (i.e single genes) while others occur in pairs. However, the majority of imprinted loci are organized in clusters that are up to one megabase (Mb) in size. These imprinted domains are very nearly equally distributed between the maternal and paternal genomes. The allele-specific expression of imprinted genes is regulated by epigenetic modifications of which DNA methylation is the major modification. Thu, these imprinted alleles are "marked" by their state of methylation. The vast majority of imprinted genes, or chromosomal loci, are silenced by their state of hypermethylation.
To date several hundred imprinted genes have been characterized. The very first imprinted gene identified was the insulin-like growth factor-2 gene (symbol: IGF2). The IGF2 gene encoded protein (IGF-2) is required for normal fetal development and growth. Expression of IGF2 occurs exclusively from the paternal copy of the gene. In the case of IGF2 an element in the paternal locus, called an insulator element, is methylated blocking its function. The function of the un-methylated insulator is to bind a protein that when bound blocks activation of IGF2 expression. This protein is called CTCF for CCCTC-binding factor. The insulator element contains the sequence CCCTC that the factor binds to when associated CpGs are methylated. When the insulator element is methylated CTCF cannot bind the insulator thus allowing a distant enhancer element to drive expression of the IGF2 gene. In the maternal genome, the insulator is not methylated, therefore, the CTCF protein binds to it blocking the action of the distant enhancer element. The function of IGF-2 is exerted by its binding to a specific receptor identifed as the IGF-2 receptor, IGF2R. Interestingly the IGF2R gene is also imprinted, being expressed exclusively from the maternal gene. Several of these imprinted loci have been associated with various diseases, the two most clinically interesting being Prader-Willi syndrome (PWS) and Angelman syndrome (AS). These two disorders are phenotypically quite distinct yet arise due to alterations in the same imprinted locus on chromosome 15 that encompasses several genes.back to the top
DNA recombination refers to the phenomenon whereby two parental strands of DNA are spliced together resulting in an exchange of portions of their respective strands. This process leads to new molecules of DNA that contain a mix of genetic information from each parental strand. There are 3 main forms of genetic recombination. These are homologous recombination, site-specific recombination and transposition.
Homologous recombination is the process of genetic exchange that occurs between any two molecules of DNA that share a region (or regions) of homologous DNA sequences. This form of recombination occurs frequently while sister chromatids are paired during meiosis. Indeed, it is the process of homologous recombination between the maternal and paternal chromosomes that imparts genetic diversity to an organism. Homologous recombination generally involves exchange of large regions of the chromosomes.
Site-specific recombination involves exchange between much smaller regions of DNA sequence (approximately 20–200 base pairs) and requires the recognition of specific sequences by the proteins involved in the recombination process. Site-specific recombination events occur primarily as a mechanism to alter the program of genes expressed at specific stages of development. The most significant site-specific recombinational events in humans are the somatic cell gene rearrangements that take place in the immunoglobulin genes during B-cell differentiation in response to antigen presentation. These gene rearrangements in the immunoglobulin genes result in an extremely diverse potential for antibody production. A typical antibody molecule is composed of both heavy and light chains. The genes for both these peptide chains undergo somatic cell rearrangement yielding the potential for approximately 3,000 different light chain combinations and approximately 5,000 heavy chain combinations. Then because any given heavy chain can combine with any given light chain the potential diversity exceeds 10,000,000 possible different antibody molecules.back to the top
Transposition is a unique form of recombination where mobile genetic elements can virtually move from one region to another within one chromosome or to another chromosome entirely. There is no requirement for sequence homology for a transpositional event to occur. Because the potential exists for the disruption of a vitally important gene by a transposition event this process must be tightly regulated. The exact nature of how transpositional events are controlled is unclear.
Transposition occurs with a higher frequency in bacteria and yeasts than it does in humans. The identification of the occurrence of transposition in the human genome resulted when it was found that certain processed genes were present in the genome. These processed genes are nearly identical to the mRNA encoded by the normal gene. The processed genes contain the poly(A) tail that would have been present in the RNA and they lack the introns of the normal gene. These particular forms of genes must have arisen through a reverse transcription event, similar to the life cycle of retroviral genomes, and then been incorporated into the genome by a transpositional event. Since most of the processed genes that have been identified are non-functional they have been termed pseudogenes.back to the top
Cancer, in most non-viral induced cases, is the severe medically relevant consequence of the inability to repair damaged DNA. It is clear that multiple somatic cell mutations in DNA can lead to the genesis of the transformed phenotype. Therefore, it should be obvious that complete understanding of DNA repair mechanisms would be invaluable in the design of potential therapeutic agents in the treatment of cancer.
DNA damage can occur as the result of exposure to environmental stimuli such as alkylating chemicals or ultraviolet or radioactive irradiation and free radicals generated spontaneously in the oxidizing environment of the cell. These phenomena can, and do, lead to the introduction of mutations in the coding capacity of the DNA. Mutations in DNA can also, but rarely, arise from the spontaneous tautomerization of the bases.
Modification of the DNA bases by alkylation (predominately the incorporation of –CHCH3 groups) predominately occurs on purine residues. Methylation of G residues allows them to base pair with T instead of C. A unique activity called O6-alkylguanine transferase removes the alkyl group from G residues. The protein itself becomes alkylated and is no longer active, thus, a single protein molecule can remove only one alkyl group.
Mutations in DNA are of two types. Transition mutations result from the exchange of one purine, or pyrimidine, for another purine, or pyrimidine. Transversion mutations result from the exchange of a purine for a pyrimidine or visa versa.
The prominent by-product from uv irradiation of DNA is the formation of thymine dimers. These form from two adjacent T residues in the DNA. Repair of thymine dimers is most understood from consideration of the mechanisms used in E. coli. However, several mechanism are common to both prokaryotes and eukaryotes.
Thymine dimers are removed by several mechanisms. Specific glycohydrolases recognize the dimer as abnormal and cleave the N-glycosidic bond of the bases in the dimer. This results in the base leaving and generates an apyrimidinic site in the DNA. This is repaired by DNA polymerase and ligase. Glycohydrolases are also responsible for the removal of other abnormal bases, not just thymine dimers.
Another, widely distributed activity, is DNA photolyase or photoreactivating enzyme. This protein binds to thymine dimers in the dark. In response to visible light stimulation the enzyme cleaves the pyrimidine rings. The chromophore associated with this enzyme that allows visible light activation is FADHDH2.
Humans defective in DNA repair, (in particular the repair of uv-induced thymine dimers), due to autosomal recessive genetic defects suffer from the disease Xeroderma pigmentosum (XP). There are at least eight distinct genetic defects associated with this disease identified as XPA, XPB, XPC, XPD, XPE, XPF, XPG, and XPV. The variants XPA through XPG are all involved in nucleotide excision repair (NER) while the XPV locus is involved in replication of damaged DNA on the leading strand. NER involves the removal of a wide array of structurally unrelated DNA lesions including cyclobutane thymine dimers, other pyrimidine dimers, pyrimidine-pyrimidone photoproducts produced in human skin by shortwave uv, and helix-distorting chemical adducts induced by carcinogens. Many of the protein products of these genes are found in heterodimeric complexes with other proteins involved in repair of damaged DNA. There are two major clinical forms of XP, one which leads to progressive degenerative changes in the eyes and skin and the other which also includes progressive neurological degeneration.
Another inherited disorder affecting DNA repair in which patients suffer from sun sensitivity, short stature and progressive neurological degeneration without an increased incidence of skin cancer is Cockayne Syndrome.
Ataxia telangiectasia (AT) is an autosomal recessive disorder resulting in neurological disability and suppressed immune function. Telangiectasias are dilated superficial blood vessels as also seen in patients with rosacea or scleroderma. AT patients develop a disabling cerebellar ataxia early in life and have recurrent infections. Patients suffering from AT have an increased sensitivity to x-irradiation resulting from defects in the ATM gene (ataxia telangiectasia mutated) which encodes a kinase that is induced in response to double-strand breaks in DNA. One important substrate for the ATM kinase is the tumor suppressor protein, p53. There are four distinct complementation groups resulting in AT identified as ATA, ATC, ATD, and ATE. All four are the result of mutations in the ATM gene.
Several diseases associated with defective repair of damaged DNA can be found in the Inborn Errors page.back to the top
The class of compounds that have been used the longest as anticancer drugs are the alkylating agents. The major alkylating agents are derived from nitrogenous mustards that were originally developed for use by the military. Commonly used alkylating agents include cyclophosphamide (Cytoxan, Neosar), ifosphamide, decarbazine, chlorambucil (Leukeran) and procarbazine (Matulane, Natulan).
Alkylating agents function by reacting with and disrupting the structure of DNA. Some agents react with alkyl groups in DNA resulting in fragmentation of the DNA as a consequence of the action of DNA repair enzymes. Some agents catalzye the cross-linking of bases in the DNA which prevents the separation of the two strands during DNA replication. Some agents induce mis-pairing of nucleotides resulting in permanent mutations in the DNA. Alkylating agents act upon DNA at all stages of the cell cycle, thus they are potent anticancer drugs. However, because of their potency, prolonged use of alkylating agents can lead to secondary cancers, particularly leukemias.
Several classes of anticancer drugs function through interference with the actions of the topoisomerases. Two of these classes are the anthracyclines and the camptothecins.
The anthracyclins were originally isolated from the fungus, Streptomyces. Doxorubicin (Adriamycin, Doxil, Rubex) and daunorubicin (Cerubidine, DaunoXome, Daunomycin, Rubidomycin) have similar modes of action, although doxorubicin is the more potent of the two and is used in the treatment breast cancers, lymphomas and sarcomas. The anthracyclins inhibit the actions of topoisomerase II whose function is to introduce double-strand breaks in DNA during the process of replication as a means to relive torsional stresses. Anthracyclines also function by inducing the formation of oxygen free radicals that cause DNA strand breaks resulting in inhibition of replication.
Another plant-derived anticancer compound that functions through inhibition of topoisomerase II is etoposide (VP-16, Vepesid, Etophos, Eposin). Etoposide is isolated from the mandrake plant and is in a class of compounds referred to as epipodophyllotoxins.
The camptothecins were originally found in the bark of the Camptotheca accuminata tree and include irinotecan (Campto, Camptosar) and topotecan (Hycamtin). Camptothecins inhibit the action of topoisomerase I, an enzyme that induces single-strand breaks in DNA during replication.
Anticancer compounds that are extracted from the periwinkle plant, Vinca rosea, are called the vinca alkaloids. These compounds bind to tubulin monomers leading to the disruption of the microtubules of the mitotic spindle fibers that are necessary for cell division during mitosis. There are four major vinca alkaloids that are currently used as chemotherapeutics, vincristine (Oncovin, Vincrex), vinblastine (Velbe, Velban, Velsar), vinorelbin (VIN, Navelbine), and vindesine (Eldisine).
The taxanes are another class of plant-derived compounds that act via interference with microtubule function. These compounds are isolated from the Pacific yew tree, Taxus brevifolia and include paclitaxel (Taxol) and docetaxel (Taxotere). The taxanes function by hyperstabilizing microtubules which prevents cell division (cytokinesis). These compounds are used to treat a wide range of cancers including head and neck, lung, ovarian, breast, bladder, and prostate cancers. Taxol has proven highly effective in the treatment of certain forms of breast cancer.
In order for DNA replication to proceed, proliferating cells require a pool of nucleotides. The class of anticancer drugs that has been developed to interfere with aspects of nucleotide metabolism is known as the antimetabolites. There are two major types of antimetabolites used in the treatment of a broad range of cancers: componds that inhibit thymidylate synthase and compounds that inhibit dihydrofolate reductase (DHFR). Both of these enzymes are involved in thymidine nucleotide biosynthesis (see Figure below). Drugs that inhibit thymidylate synthase include 5-fluorouracil (5-FU, Adrucil, Efudex) and 5-fluorodeoxyuridine. Those that inhibit DHFR are analogs of the vitamin folic acid and include methotrexate (Trexall, Rheumatrex) and trimethoprim (Proloprim, Trimpex).