Protein Modifications and Protein Targeting

Return to The Medical Biochemistry Page

© 1996–2016, LLC | info @

Protein Folding: Co- and Posttranslational Processing












The process of protein synthesis, of and by itself, does not directly result in the generation of functionally and structurally complete macromolecules. Many proteins must undergo one or more forms of modification that can occur either cotranslationally and/or post-translationally as described in detail in the following sections. However, an equally critical process that must be undertaken to ensure protein function is the folding of the protein into a defined three-dimensional structure. Within the environment of the cell, newly synthesized proteins are at great risk of aberrant folding and aggregation. Improper folding and protein aggregation can lead to the formation of potentially toxic species. To reduce and prevent these negative outcomes, cells harbor a complex network of molecular chaperones whose functions are to promote efficient folding and to prevent protein aggregation. The structure of proteins within the cell is in a highly dynamic state and, therefore, constant molecular chaperone surveillance is required to ensure protein homeostasis.

Human cells express several families of molecular chaperones with the most abundant families being those that were originally designated as responding to the stresses induced by heat in the fruit fly Drosophila melanogaster. These protein families are, therefore, classically termed heat shock proteins (Hsp) and also referred to as stress proteins. Although defined by the initial observation of heat-induced expression, Hsp genes can be induced by inflammation, ischemia, infections, irradiation, and exposure to heavy metals, oxidants, and organics. Many Hsp proteins possess intrinsic ATPase activity and it is the hydrolysis of ATP that powers the protein folding processes. The Hsp superfamily is composed of multiple protein subfamilies that are loosely designated by molecular weights. These Hsp families include the Hsp40, Hsp60, Hsp70, Hsp90, and Hsp100 families. In addition, there is the small heat shock protein family often referred to as the Hsp25 family or the HSPB family. Protein folding chaperones in the ER belong to the Hsp40, Hsp70, Hsp90, and Hsp100 families. Two additional folding chaperones are calnexin and calreticulin that function in the endoplasmic reticulum (ER) in the processes of glycoprotein maturation. Additional folding chaperones possess distinct (non-ATPase-mediated) enzymatic activity. Protein disulfide isomerases (PDI) catalyze the formation of disulfide bonds between cysteine residues in a substrate protein. Peptidylprolyl cis-trans isomerases (PPI) catalyze isomerization of peptide bonds involving proline residues.

Hsp40 Family

The proteins of the Hsp40 family are also referred to as DnaJ proteins. The DnaJ nomenclature is derived its from the E. coli Hsp70 co-chaperone DnaJ. Hsp40 proteins are correctly referred to as obligate co-chaperones since they function in conjunction with proteins of the Hsp70 family whose ATPase activities require the interactions with Hsp40 proteins. The J-domain in Hsp40/DnaJ family of chaperones is an approximately 70 amino acid domain composed of four α-helices found at the N-terminus of most proteins in the family. The J-domain was originally identified as being involved in the regulation of the ATPase activity of Hsp70 family member proteins. Within the J-domain is a HPD (His-Pro-Asp) tripeptide motif which is essential for stimulation of the ATPase activity of Hsp70 proteins. In addition to the J-domain, Hsp40 proteins possess a zinc-finger domain that is critical for the sequestration of a denatured substrate as well as in assisting Hsp70 proteins in the folding reaction. The Hsp40/DnaJ proteins are involved in processes that regulate gene expression and translational initiation as well as those controlling the folding and unfolding, translocation, and degradation of proteins. Hsp40/DnaJ proteins also bind to unfolded or non-native polypeptides in order to prevent their aggregation. Based primarily upon structural characteristics the Hsp40/DnaJ protein family can be divided into four subtypes: type I, type II, type III, and type IV. The type I proteins contain all of the domains originally identified in the E. coli DnaJ protein. Type II proteins lack the zinc-finger domain, type III proteins possess only the J-domain which is in the C-terminus, and type IV proteins also possess a C-terminal J-domain but it lacks the HPD motif. Humans express a total of 49 genes that encode proteins of the Hsp40/DnaJ family.

Hsp60 Family

Humans express a single gene of the Hsp60 family identified as the HSPD1 gene. Hsp60 is also referred to as a chaperonin, specifically a group I chaperonin. Proteins that are classified as chaperonins are large (800-900 kDa) double-ring complexes that function by globally enclosing substrate proteins for folding. Proteins up to a size of around 60 kDa can be acted upon by the chaperonins. There are 15 genes in the human genome that encode proteins termed chaperonins with the HSPE1 encoded protein (chaperonin 10/Hsp10) being required as a co-chaperone for Hsp60 function. Human Hsp60 was originally characterized as a mitochondrial chaperone involved in correct folding of mitochondrial proteins during their import into this organelle. The activity of Hsp60 is catalyzed, in part, by its intrinsic ATPase activity. In addition to its role in mitochondrial protein folding, Hsp60 is also involved in the recognition and ATPase-mediated unfolding of non-native and aggregated proteins so that they can efficiently refold into their native conformations. Non-protein folding functions of Hsp60 include a role in mitochondrial DNA replication. The group II chaperonins are members of the CCT [chaperonins containing TCP1 (T-complex 1)] subfamily and these proteins do not require the Hsp10 co-chaperone for activity.

Hsp70 Family

The Hsp70 family represents the most ubiquitous class of chaperones and is highly conserved in all organisms. Proteins of the Hsp70 family possess intrinsic ATPase activity. The Hsp70 family of chaperones control all aspects of intracellular protein homeostasis, which includes nascent protein folding, protein import into organelles, unfolding of non-native and aggregated proteins, and the assembly of multi-protein complexes. Several proteins of the Hsp70 family of chaperones also function extracellularly mediating immune modulation and cytoprotection. All Hsp70 proteins possess intrinsic ATPase activity which regulates their ability to interact with exposed hydrophobic surfaces of substrate proteins. Regulation of Hsp70 functions are also effected by other proteins that interact with subdomains such as is the case for the obligate co-chaperone, Hsp40/DnaJ. Humans express 17 genes that encode Hsp70 family member proteins. The major stress-inducible Hsp70 proteins are identified as Hsp70-1 (encoded by the HSPA1A gene) and Hsp70-2 (encoded by the HSPA1B gene). These two proteins differ by only two amino acids and are often collectively referred to as Hsp70 or Hsp70-1. The HSPA8 encoded protein (Hsp70-8) is an essential housekeeping chaperone responsible for the bulk of Hsp70-mediated protein folding and protein transport across membranes. The HSPA9 encoded protein (Hsp70-9) is predominantly found in the mitochondria and the protein contains 46 amino acid mitochondrial targeting sequence.

Proteins of the Hsp70 family are highly conserved at the amino acid level and possess a set of common domains. The N-terminal nucleotide binding domain (NBD) binds and hydrolyzes ATP. The NBD is composed of four subdomains, termed IA, IB, IIA, and IIB, that surround the ATP-binding pocket. The C-terminal substrate binding domain (SBD) is the domain that binds extended polypeptide substrates. There is also a central domain defined by the presence of several protease sensitive sites. The NBD is conserved in all of the Hsp70 family members except for the proteins encoded by the HSPA12A and HSP12B genes which contain a more divergent NBD. The Hsp70 family member proteins that are predominantly cytosolic also contain a glycine and proline-rich (G/P-rich) C-terminal region that contains the tetrapeptide, EEVD (GluGluValAsp). This tetrapeptide motif is involved in binding of co-chaperones and other Hsp proteins. The EEVD motif is not present in the mitochondrial- nor the ER-localized Hsp70 proteins, Hsp70-9 (HSPA9) and Hsp70-5 (HSPA5), respectively. Additionally, Hsp70-5 contains the C-terminal ER retention signal, KDEL (LysAspGluLeu).

One critically important member of the Hsp70 family is the protein commonly identified as BiP (Binding immunoglobulin Protein; also known as Hsp70-5; also known as glucose-regulated protein 78-kDa, GRP78). The human BiP protein is encoded by the HSPA5 [heat shock protein family A (Hsp70) member 5] gene. The BiP protein is localized to the membranes of the ER via the C-terminal ER retention signal, KDEL. BiP is referred to as the master regulator of the ER. The function of BiP is not only to participate in protein folding but to also maintain the permeability barrier of the ER by sealing the luminal side of inactive translocons (protein complexes through which nascent polypeptides are extruded); to facilitate the translocation of growing polypeptide chains into the ER lumen, to regulate the aggregation of nonnative polypeptides, and to contribute to calcium homeostasis in the ER.

Proteins of the Hsp70 family also function as potent inhibitors of apoptosis. Hsp70-1 blocks the mitochondrial translocation and activation of mitochondrial outer membrane-associated BAX complex. As described in detail in the Protein, Organelle, and Cell Turnover page, activation of BAX (as well as the related pore complex BAK) results in the release of cytochrome c which leads to activation of the intrinsic apoptosis pathway. Hsp70-1 also inhibits assembly of the death-inducing signaling complex (DISC) typical of the extrinsic apoptosis pathway. Within the context of the intrinsic apoptosis pathway, Hsp70-1 binds directly to apoptotic protease activating factor 1 (APAF1) and blocks the recruitment of procaspase-9 to the mitochondrial apoptosome which includes cytochrome c and ATP. Hsp70-1 also interacts with the pro-apoptotic mitochondrial protein known as apoptosis-inducing factor (AIF; encoded by the AIFM1 gene) which results in inhibition of caspase-independent apoptosis. Hsp70-1 also regulates ER-stress-mediated apoptosis by interfering with the activities of c-Jun N-terminal kinase (JNK), p38 mitogen-activated protein kinase (p38 MAPK; encoded by the MAPK14 gene), and extracellular signal-regulated kinase (ERK1/ERK2; encoded by the MAPK1 and MAPK3 genes) in the apoptotic pathway. Hsp70-1 also stimulates proteosomal degradation of apoptosis signal-regulated kinase 1 (ASK1; encoded by the MAP3K5 gene) which normally phosphorylates and activates JNK and p38 MAPK. This latter function of Hsp70-1 occurs in conjunction with the E3 ubiquitin ligase known as C-terminal Hsp70-interacting protein (CHIP).

Hsp90 Family

The Hsp90 family member proteins function as "holdases" (similar to the activity of the small Hsp family members) to keep substrate proteins in the non-aggregated state before transferring the substrate to an Hsp70 protein for refolding. Humans express four genes in the Hsp90 family, HSP90AA1, HSP90AB1, HSP90B1, and TRAP1 (TNF receptor-associated protein 1). The HSP90AA1 and HSP90AB1 encoded proteins are localized to the cytosol, the HSP90B1 encoded protein is localized to the ER, and the TRAP1 encoded protein is localized to the mitochondria. The domain structure of the Hsp90 family proteins is very similar to that of the Hsp70 family proteins which includes an N-terminal nucleotide binding domain (NBD), a central (middle) domain, and a C-terminal substrate/co-factor binding domain. Like the C-terminal domain in most Hsp70 proteins, the C-terminal domain of the Hsp90 family proteins contains an EEVD motif. However, the Hsp90 proteins contain a pentapeptide motif, MEEVD.

Hsp100 Family

Humans express a single Hsp100 family member identified as Hsp110 which is encoded by the HSPH1 gene. Hsp110 functions in a heterodimeric complex with Hsp70 family member proteins. This complex functions as a nucleotide exchange factor (NEF) whereby one protein serves to exchange ATP for ADP in the other protein of the complex and visa versa. Both proteins function cooperatively in the process of disassembling stable protein aggregates.

Calnexin and Calreticulin

Calnexin is encoded by the CNX gene and is a transmembrane protein chaperone associated with the ER. The function of calnexin is to assist in the folding of N-glycosylated proteins within the ER. The function of calnexin ensures that only glycoproteins that are properly folded and assembled continue further along the secretory pathway. Calnexin only binds to N-glycoproteins that have a Glc1Man9GlcNAc2 oligosaccharide attached to an Asp residue. This structure results glucosidase actions on the en bloc oligosaccharide (Glc3Man9GlcNAc2) catalyzed first by glucosidase I (GluI) which is encoded by the mannosyl-oligosaccharide glucosidase (MOGS) gene and then by glucosidase II (GluII). Functional GluII is composed of an α-subunit and a β-subunit. The GluII α-subunit gene is GANAB (glucosidase II alpha subunit) and the β-subunit gene is PRKCSH (protein kinase C substrate 80K-H). Calreticulin is encoded by the CALR gene. Calreticulin is a multifunctional protein whose primary function is to bind and sequester Ca2+ ions in the lumen of the ER. Calreticulin also binds to misfolded proteins preventing them from being exported from the ER to the Golgi apparatus.

Protein Disulfide Isomerases

The protein disulfide isomerase (PDI) family of enzymes consists of 21 members. This large family of oxidoreductases catalyze exchange reactions between thiols and disulfides. PDI enzymes contain multiple domains initially characterized in thioredoxin and are, therefore, referred to as thioredoxin domains. The N- and C-terminal thioredoxin domains possess the catalytic disulfide/dithiol centers and contain the canonical CXXC motif. The internal thioredoxin domains impart structure to the PDI enzymes as well as providing sites for additional interactions with the protein substrates. A PDI related to the Hsp40 protein family is encoded by the DNAJC10 gene and the encoded protein was originally called PDIA19.

Peptidylprolyl Isomerases

Peptidylprolyl cis-trans isomerases (peptidylprolyl isomerases (PPIases) represent a family of enzymes that catalyze the interconversion of cis and trans isomers of peptide bonds made with the amino acid proline. Humans express PPIase genes that are divided into two large families, the parvulins and the immunophilins. The immunophilin family is further divided into the FKBP prolyl isomerase and the cyclophilin peptidylprolyl isomerase subfamilies. There are two genes in the parvulin family of PPIases. The immunophilins are so-called because they were first identified as modifying the activity of immunosuppressive drugs such as cyclosporin and tacrolimus. The FKBP prolyl isomerase subfamily is composed of 18 proteins. The term FKBP refers to the fact that the founding member was identified by its binding to the immunosuppressant FK506, thus it was called FK506-binding protein (FKBP). The drug FK506 is also known as tacrolimus. The cyclophilin peptidylprolyl isomerase subfamily is composed of 19 proteins where cyclophilin A is the founding member (encoded by the PPIA gene). Whereas both the FKBP and cyclophilin family PPIases, the parvulins do not bind to immunosuppressants.

The activity of human PPIases is dependent, in part, on the presence of various different functional domains. One of the most common domains found in human PPIases is the TPR (tetratricopeptide repeats) motif. A TPR motif consists of multiple repeats of 34 amino acids defined by a specific pattern of hydrophobic amino acid side. Clustered TPR motifs in a protein result in the arrangement of parallel helix-turn-helix domains. In most proteins with TPR motifs it is common to find three consecutive motifs in the organization of the repeat. When present in multiple repeats the TPR motifs form a right-handed superhelix. This superhelix forms a groove with a large surface area to which the appropriate ligand can bind. TPR motifs are found in PPIases of both the cyclophilin and FKBP families. Another domain found in several human PPIases is the Ca2+-binding EF hand domain. The human PPIases that have an EF-hand domain are all FKBP family members and all of the EF-hand-containing FKBP proteins are localized to the endoplasmic reticulum (ER).

Within eukaryotic proteins the trans isomer of proline peptide bonds is the more common form. However, several important proteins, including several ribonucleases and interleukins, possess cis-prolines in the native state. Given that the energy for cis-trans isomerization is quite high, the transformation will not occur spontaneously and represents the rate-limiting step in protein folding. Several PPIases possess autocatalytic activity (e.g. Pin1 and FKBP) catalyze intrachain cis-trans isomerizations, thus participating in their own folding.

back to the top

Secreted and Membrane-Associated Proteins

Proteins that are membrane bound or are destined for excretion (as well as Glycoproteins) are synthesized by ribosomes associated with the membranes of the ER. The ER associated with ribosomes is termed rough ER (RER). These classes of protein all contain an N-terminus termed a signal sequence or signal peptide. The signal peptide is usually 13-36 predominantly hydrophobic amino acid residues in length. Proteins that contain a signal peptide are called preproteins to distinguish them from proproteins (proteins that undergo proteolysis to become active). However, some proteins that are destined for secretion are also further proteolyzed following secretion and are, therefore, termed preproproteins. The insulin precursor protein is a prime example of a preproprotein.

The signal peptide is recognized by a ribonucleoprotein complex termed the signal recognition particle (SRP). Recognition occurs as the signal peptide emerges from the exit side of the ribosome. The eukaryotic SRP is composed of six proteins and an RNA termed the 7SL RNA. The six proteins of the SRP are SRP9, SRP14, SRP19, SRP54, SRP68, and SRP72. Humans express at least three genes encoding the 7SL RNA identified as RN7SL1, RN7SL2, and RN7SL3. When SRP binds to the emerging signal peptide it induces a translational elongation arrest until the entire translational complex and SRP binds to the SRP receptor on the ER. The SRP receptor (termed SR) is a heterodimeric complex composed of an α-subunit (SR-α; encoded by the SRPRA gene) and a β-subunit (SR-β; encoded by the SRPRB gene). Associated with the SRP receptor is a translocation channel through which the emerging polypeptide is extruded into the lumen of the ER. The translocation channel is referred to as the translocon. Although the translocon is composed of multiple protein subunits the critical channel is formed from the heterotrimeric Sec61 complex which contains the Sec61α, Sec61β, and Sec61γ proteins. The Sec61α subunit is encoded by the SEC61A1 gene, the Sec61β subunit is encoded by the SEC61B gene, and the Sec61γ subunit is encoded by the SEC61G gene.

The signal peptide is removed from the elongating protein following passage through the ER membrane. The removal of the signal peptide is catalyzed by enzymes of the serine protease family known as signal peptidases. The human ER signal peptidase is a multiprotein complex identified as the signal peptidase complex, SPC. Human SPC is composed of five subunits encoded by the SPCS1, SPCS2, SPCS3, SEC11A, and SEC11C genes. The protein encoded by the SEC11A gene constitutes the catalytic portion of the SPC.

Synthesis of membrane-associated and secreted proteins on rough ER

Mechanism of synthesis of membrane bound, secreted, and glycoproteins. Ribosomes engage the ER membrane through interaction of the signal recognition particle, SRP in the ribosome with the SRP receptor in the ER membrane. As the protein is synthesized the signal sequence is passed through the ER membrane into the lumen of the ER. After sufficient synthesis the signal peptide is removed by the action of signal peptidase. Synthesis will continue and if the protein is secreted it will end up completely in the lumen of the ER. If the protein is membrane associated a stop transfer motif in the protein will stop the transfer of the protein through the ER membrane. This will become the membrane spanning domain of the protein. If the protein is to be decorated with carbohydrates these will be added as the protein progresses through the ER and then the Golgi apparatus.

Tail-Anchoring of Membrane Proteins

Most membrane anchored proteins become embedded in the membrane via the mechanisms just outlined that involve the N-terminal signal peptide targeting the protein to the membrane. However, there is another important class of membrane anchored proteins whose membrane anchoring signal resides in the C-terminus. These membrane proteins are members of the tail-anchored membrane protein family. As described, proteins that are membrane targeted via the presence of an N-terminal signal peptide become embedded co-translationally. Tail-anchoring of membrane proteins, on the other hand, does not occur co-translationally but is, in fact, uncoupled from ongoing protein synthesis. All tail-anchored proteins contain a single transmembrane (TM) sequence at their extreme C-terminus. The presence of this TM sequence not only targets the protein to the appropriate subcellular organelle membrane but it also serves to ensure the protein remains anchored to the membrane. The amino acids adjacent to the TM anchoring segment participate in the correct subcellular localization of a given tail-anchored protein. All tail-anchored proteins expose the larger N-terminal portion of the protein to the cytosol, regardless of subcellular localization. Tail-anchored proteins are found in the mitochondrial membranes, persoisomal membranes, and the intracellular membrane compartments that are connected to the exocytotic and endocytotic pathways. These latter membranes include endoplasmic reticulum (ER), Golgi, endosome, and lysosome membranes. With respect to the prcesses of exocytosis and endocytosis, the large family of proteins required for vesicle trafficking and membrane fusion, the SNARE proteins (see Pathways for Protein Exocytosis and Endocytosis section below), are all tail-anchored.

The processes by which tail-anchored proteins become embedded in a membrane are of three main types. In one pathway, first identified in the membrane targeting of SNARE family member protein identified as synaptobrevin 2 (encoded by the VAMP2 gene), the protein becomes transiently associated with the signal peptide-binding domain of the signal recognition particle (SRP) which is involved in the classic co-translational membrane targeting pathway outlined above. The interaction of the tail-anchored protein with SRP is followed by membrane insertion, a process involving the SRP receptor. The SRP-dependent tail-anchoring process is associated with a subset of tail-anchored proteins whose TM segments are highly hydrophobic. In the case of SRP-mediated tail-anchoring, the energy for the process is supplied via GTP hydrolysis. However, the vast majority of tail-anchored proteins are inserted into their target membrane via an ATP-dependent process.

In the second pathway for tail-anchoring, ATPases of the heat shock protein 70 (HSP70) family of molecular chaperones catalyze the targeting and insertion. Humans express 17 different genes that encode proteins of the HSP70 family with the cytosolic ATPase commonly identified as HSC70 (encoded by the HSPA8 gene) being involved in the tail-anchoring process. HSC70 and its co-chaperone identified as HSP40 (encoded by the DNAJB1 gene) promote the ATP-dependent insertion of a subfamily of tail-anchored proteins. A characteristic of HSC70-HSP40 catalyzed tail-anchoring is that their substrate proteins possess a TM segement that has a low hydrophobic character.

The third major tail-anchoring pathway involves what is called the GET pathway. Proteins of the GET pathway were initially identified in experiments aimed at dissecting the secretory pathways in yeast and their names derive from Golgi-ER Trafficking. Subsequent to these initial characterizations it was found that human homologs of the yeast GET genes encode proteins involved in the process of tail-anchoring membrane proteins. The designation GET is now used to refer to Guided Entry of Tail-anchored proteins. In the GET pathway the membrane proteins identified as Get1 and Get2 form the receptor for a complex referred to as the TM-recognition complex, TRC. In humans a potential Get1 homolog is encoded by the WRB (tryptophan rich basic protein) gene and the Get2 homolog is likely to be encoded by the CAMLG (calcium modulating ligand) gene. Within the cytosol Get3, Get4, and Get5 interact to form the TRC. The targeted tail-anchored protein is bound by the Get3 protein of the TRC. In humans the Get3 protein is encoded by the ASNA1 (arsA arsenite transporter, ATP-binding, homolog 1) gene. Humans express a GET4 gene and the likely gene encoding the human Get5 homolog is the UBL4A (ubiquitin-like 4A) gene. Following binding of the appropriate tail-anchored protein to the TRC, the complex interacts with the Get1/Get2 receptor and protein anchoring ensues.

back to the top

Targeting Proteins to Specific Subcellular Organelles

Numerous proteins are unique to the organelle in which they serve their functions whether they be structural or enzymatic. Because various proteins are destined for specific subcellular organelles there must be mechanisms in place to accurately and efficiently target them to their functional destination. The process of organelle targeting involves both direct amino acid sequences in the proteins and proteins involved in the recognition and transport of organellar-specific proteins. For example, proteins that are members of the lysosomal hydrolase family are targeted to the lysosomes via a specific carbohydrate modification which is then recognized by a specific receptor in the lysosomal membrane. Proteins destined for the nucleus, the mitochondria, and the peroxisomes are all targeted to these organelles by the presence of specific peptide sequences and protein structures.

Nuclear Protein Targeting and Import

The nucleus is the organelle in which the genome is isolated from the rest of the cellular processes. The nucleus is enclosed by a membrane system composed of two lipid bilayers. The nuclear membrane is most often referred to as the nucleolemma and is composed of closely associated inner and outer lipid bilayers. The space between the inner and outer nuclear membranes is referred to as the perinuclear space. This space is in contact with the lumen of the endoplasmic reticulum (ER) via connections between the outer nuclear membrane and the ER membranes. The inner and outer nuclear membranes are also connected at thousands of locations via multiprotein complexes that generate pores in the nuclear membrane called nuclear pore complexes, NPC. The nuclear pores are through which RNA and proteins are transported. The genome is transcribed into the various RNA forms within the interior of the nucleus in what is referred to as the nucleoplasm and then the RNAs that are destined for the cytosol are transported through the NPCs. Nuclear proteins such as histones, RNA polymerases, and transcription factors etc, are all translated in the cytosol and then transported into the nucleus via the actions of the proteins of the NPC. The import and export of nuclear macromolecules through the NPC involves proteins identified as importins and exportins and the monomeric G-protein Ran (with bound GTP; RanGTP).

The NPC is a large protein complex that consists of a central channel surrounded by three large ring-like structures, the cytoplasmic ring, the central spoke ring, and the nuclear ring. In addition to the proteins forming the NPC channel and the ring-like structures there are proteins associated with both the cytoplasmic and nuclear sides of the core structure. Each of the more than 30 proteins of the NPC are referred to as nucleoporins. Molecules that are less than 40 kDa in mass can move through the NPC channel passively. Larger molecules require active transport and this process involves a family of nuclear chaperones called importins and exportins which are members of a family of proteins called the karyopherin-β family. Humans express numerous importin (at least 16) and exportin (at least 7) proteins that are defined, in part, by the type of protein cargo they are involved in actively transporting through the NPC.

Importins bind to cargo proteins on the cytoplasmic side of the NPC and do so through the recogition of a specific nuclear localization sequence, NLS. The NLS is composed of a Lys (K) and Arg (R) rich peptides in nuclear proteins. The K/R-rich peptides can be composed of a monopartite or a bipartite motif. The prototypical monopartite NLS is K-K/R-X-K/R, where X represents any aminoi acid. The prototypical bipartite NSL is two clusters of K/R-rich peptides separated by a spacer of approximately 10 amino acids. A more complex NLS, that is recognized by the importin-β2 protein, is called the PY-NLS. At least 100 human proteins have been identified that contain thisa PY-NLS. Following transport through the NPC the cargo protein is released from the importin via interaction with RanGTP. Following cargo release the importin-RanGTP complex is transported back to the cytosol. In the cytosol the GTP bound to Ran is hydrolyzed to GDP by the GTPase activating protein (GAP) RanGAP. The hydrolysis of GTP causes the release of the importin so that it can be made available for nuclear import.

Exportins bind their cargo proteins inside the nucleus, along with RanGTP, for transport out of the nucleus. The recognition of cargo proteins by exportins involves a nuclear export signal (NES) in the target protein. The NES is a short peptide of hydrophobic amino acids with a common consensus of LXXXLXXLXL, where X represents any amino acid. Like the import of cytoplasmic proteins into the nuclues, the export process is coupled to GTP hydrolysis in the RanGTP complex through the action of RanGAP. Following GTP hydrolysis the exported protein is released to the cytosol. The exportin protein is then transported back into the nucleoplasm. Since RNA molecules do not contain amino acid sequences they do not, directly, possess nuclear export signals. Therefore, cytoplasmic RNAs (e.g. mRNA, rRNA, and tRNA) form ribonucleoprotein complexes in the nucleoplasm and the protein components of these complexes are recognized by exportins. For example the exportin identified as exportin-t is responsible for the nuclear export of tRNAs.

In addition to directionally specific importins and exportins, all of which are members of the karyopherin-β family of proteins, there are several bidirectional karyopherin-β transporters. Human importin 13, exportin 4, and exportin 5 all have bidirectional nuclear transport capabilities.

Mitochondrial Protein Targeting and Import

The mitochondria are critically vital organelles, second only to the nucleus, whose functions are required for cell viability. The major function of the mitochondria is to generate the high energy molecule ATP, through the utilization of the energy contained in the reduced electron carriers, NADH and FADH2. This vital process is explained in detail in the Oxidative Phosphorylation page. Through the process of oxidative phosphorylation the mitochondria interconnects the metabolic processes of carbohydrate, lipid, and amino acid catabolism. In addition, the mitochondria serve as important conduits in the processes of urea synthesis, heme synthesis, and steroidogenesis. Mitochondrial are able to carry out these highly diverse, but interrelated, biochemical processes via the catalytic activity of proteins encoded by both the mitochondrial (mtDNA) and the nuclear genomes. Proteomic analysis has determined that the mitochondrial proteome consists of nearly 1,000 different proteins, the vast majority of which are derived from nuclear genes. The mitochondrial genome consists of a total of 16,569 bp that encoded 13 proteins, 22 tRNAs, and 2 rRNAs. The mitochondrial proteins that are encoded by the nuclear genome are translated in the cytoplasm and then these proteins are imported into the mitochondria by specialized recognition and transport proteins and complexes. Mitochondrial protein import and localization is controlled by specific amino acid sequences in the precursor proteins. These sequences not only target the proteins to the mitochondria but also ensure that individual proteins are properly distributed into the four mitochondrial compartments: outer membrane (OM), intermembrane space (IMS), inner membrane (IM), and matrix.

Like all but the mitochondrial genome encoded proteins, mitochondrial proteins are synthesized in the cytosol. Proteins destined for the mitochondria posses a presequence akin to the leader peptide of ER targeted proteins. Transfer of these proteins to their appropriate location in the mitochondria requires specific chaperones and transport complexes. At the level of the outer mitochondrial membrane the transport process involves a complex called the translocase of outer mitochondrial membrane, TOM. The TOM is composed several proteins that are receptors for mitochondrially targeted proteins and that compose the channel itself of the TOM. The TOM complexes are localized to specialized domains of the outer mitochondrial membrane that are closely associated with openings in the inner mitochondrial membrane that are referred to as cristae junctions. The channel of the TOM complex is formed from the Tom40, Tpom22, and Tom7 proteins while the preprotein receptors are Tom20 and Tom70. Accessory subunits of the TOM that are required for assembly and stabilization of the TOM are Tom5 and Tom6. The human Tom proteins are encoded by genes with the designation TOMM. As an example the Tom40 protein is encoded by the TOMM40 gene which is located on chromosome 19q13.32 and is composed of 10 exons that three alternatively spliced mRNAs all of which encode the same 361 amino acid protein.

Mitochondrial precursor proteins that contain an appropriate presequence are initially recognized by Tom20 and then they interact with Tom22 prior to import. Other mitochondrial proteins that are hydrophobic precursors contain an integral targeting sequences that is recognized by Tom70. Whether recognized by Tom20 or Tom70, both classes of mitochondrial protein are imported through the Tom40 channel. Many proteins that are embedded in the outer mitochoindrial membrane such as Tom20 and Tom70 are referred to as signal-anchored proteins. These proteins are embedded in the outer membrane via sequences in the N-terminus and have their C-terminal domains extended into the cytosol. Other classes of outer membrane anchored proteins have a C-terminal transmembrane domain and are referred to as tail-anchored proteins. The proteins of the Bcl-2 family of apoptosis regulating proteins are members of the mitochondrial tail-anchored family of proteins.

Proteins of the intermembrane space (IMS) are generally small soluble proteins that are imported via the mitochondrial IMS import and assembly (MIA) pathway. The MIA pathway is unique in that it couples the process of protein import to the folding and oxidation of the imported protein leading to the formation of internal disulfide bonds in the process. The introduction of the disulfide bonds in IMS proteins is catalyzed by a protein identified as Mia40 which acts as the chaperone and the sulhydryl oxidase identified as growth factor, augmenter of liver regeneration (encoded by the GFER gene; is the human homolog of the yeast Erv1 enzyme). The human Mia40 protein is encoded by the coiled-coil-helix-coiled-coil-helix domain containing 4 (CHCHD4) gene.

Transport of mitochondrial proteins to the inner membrane and the mitochondrial matrix involves complexes identified as translocase of inner mitochondrial membrane, TIM. Like the TOM the TIM complexes are composed of central channel forming proteins. The TIM that is responsible for matrix targeted proteins contains the Tim23 (encoded by the TIMM23 gene) protein which is the primary channel protein, along with Tim17. Proteins that are transported via the Tim23 mediated TIM contain an amphipathic helix at their N-termini that, that like TOM transport, is referred to as a presequence. The TIM23 complex contains several additional proteins such as Tim50 which regulates channel opening and Tim 21 which regulates docking functions. Matrix proteins that contain the appropriate presequence have it proteolytically removed during the transport process.

Mitochondrial inner membrane proteins also contain a TIM recognized presequence. However, multipass integral inner membrane proteins are inserted into the membrane via the TIM complex identifed as TIM22. The central channel of the TIM22 complex is generated from the Tim22 protein encoded by the TIMM22 gene. The Tim22 protein is referred to as the carrier translocase. Like the TIM23 complex, the TIM22 complex is composed of several additional subunits such as Tim12, Tim18, and Tim54. The functionally most significant proteins inserted into the inner mitochondrial membrane via the TIM22 complex are the metabolite transporters such as the dicarboxylate transporter encoded by the SLC25A10 gene.

Peroxisomal Protein Targeting and Import

The peroxisomes are a single membrane organelle, similar to lysosomes, present in virtually all eukaryotic cells. The peroxisome is a specialized enzyme "factory" that contains in excess of 50 different enzymes involved in a variety of metabolic processes of lipid metabolism that includes β-oxidation of very long chain fatty acids, α-oxidation of fatty acids and synthesis of cholesterol, bile acids, and ether-lipids. Proteins that are involved in, and necessary for correct peroxisome biogenesis, are called peroxins (PEX). At least 16 PEX genes have been identified in humans with three (PEX11A, PEX11B, and PEX11G) comprising subunits of the PEX11 complex. The biogenesis of peroxisomes is directly tied to the endoplasmic reticulum (ER) with the lipids of the membranes of the peroxisomes being derived from the ER and most peroxisomal membrane proteins (PMP) being synthesized in, and trafficking through, the ER. Two PEX proteins, PEX3 and PEX19 are critical for the correct targeting of PMPs to the peroxisomal membrane. PMP that are dependent on PEX19 (most all PMP) are referred to as type I PMP while type II PMP utilize PEX3 (as well as PEX22) for correct targeting. PEX3 and PEX19 also are involved in peroxisomal targeting of peroxisomal proteins that are synthesized in the cytosol. Defects in peroxisomal biogenesis genes result in a family of disorders referred to as peroxisomal biogenesis disorders, PBD. The most severe PBD is Zellweger syndrome which represents a cluster of disorders that results from mutations in at least eight different PEX genes.

Enzymes that are targeted to the peroxisomes contain either of two amino acid consensus elements called peroxisome targeting sequences (PTS). The PTS1 is a C-terminal consensus sequence of –(S/A/C)(K/R/H)(L/M) referred to as the SKL motif. This sequence element is recognized by a cytosolic PTS1 receptor encoded by the PEX5 gene. There are two primary isoforms of PEX5 encoded proteins in humans identified as Pex5pS and Pex5pL (for short and long forms, respectively). The Pex5pL protein has an internal 37 amino acid insertion, hence the "long" designation. The PTS2 is an N-terminal consensus sequence of –(R/K)(L/V/I/Q)XX(L/V/I/H/Q)(L/S/G/A/K)X(H/Q)(L/A/F)–, where X represents any amino acid. The PTS2 receptor is encoded by the PEX7 gene and the encoded protein is referred to as Pex7p.

Pex5pS, Pex5L, and Pex7p interact with newly synthesized target proteins in the cytosol and direct them to the peroxisome. On the membrane of the peroxisome is a component of the protein import machinery encoded by the PEX14 gene called Pex14p. Following interaction of Pex5pS or Pex5pL, to which a PTS1-containing protein is bound, with Pex14p, the PTS1 containing protein is transferred into the peroxisome. The activity of Pex7p in peroxisome protein import actually requires Pex5pL as well. PTS2 containing proteins interact with Pex7p and then, in conjunction with Pex5pL, the complex interacts with Pex14p and the PTS2 containing protein is transferred into the peroxisome. Very few proteins contain a PTS2 sequence but one enzyme of note is phytanoyl-CoA hydroxylase (PHYH) which is defective in classic Refsum disease.

back to the top

Pathways for Protein Exocytosis and Endocytosis

The release of cellular substances, particularly proteins, hormones, and neurotransmitters, involves controlled and regulated processes collectively referred to as exocytosis. The reverse process, exemplified by ligand-bound receptor internalization, also involves controlled and regulated processes and these processes are referred to as endocytosis. The membrane vesicles of exocytosis originate from the trans-Golgi network or from the recycling of endosomes. Their migration to the plasma membrane involves the cytoskeletal machinery of the cell. Exocytosis serves numerous biologically important functions in the cell. Exocytosis is the cellular means by which lipids and proteins are delivered to the plasma membrane facilitating cellular growth. Exocytosis allows a cell to signal to the external environment by the release of vesicle contents. The process of exocytosis allows proteins that are embedded in the vesicle membrane to be delivered to the plasma membrane such as is the case for transport proteins, protein channels, and signaling receptors.

The processes of exocytosis and endocytosis are connected, in most instances, since the exocytosed vesicles are retrieved via the processes of endocytosis. The processes of exocytosis and endocytosis can be collectively categorized into three main modes. In one mode (classic) the exocytotic vesicles fuse into the plasma membrane followed by an endocytotic process that involves membrane invagination and vesicle reformation. In the second mode (kiss-and-run) a fusion pore opens to release vesicle contents and then recloses. In the third mode (bulk) giant vesicles, that have formed via vesicle-vesicle fusion, are exocytosed and then the giant vesicles are retrieved by bulk endocytosis. Exocytosis is dependent upon endocytosis in order to prevent vesicle membrane exhaustion. Indeed, endocytosis of neurotransmitter vesicle membranes is required to maintain the size of nerve terminals.

Although the process of exocytosis is coupled to endocytosis as a means to maintain and replenish membranes, there are several highly specific endocytotic processes that are not directly coupled to exocytosis. The controlled uptake of small molecules and fluids by cells within small vesicles is a process of endocytosis termed pinocytosis. One particular type of pinocytotic uptake is receptor-mediated endocytosis where ligand binding to a receptor results in the complex being internalized in the target cell. This process involves receptors that are embedded in membrane domains formed by the protein clathrin, referred to as clathrin-coated pits. The process of invading pathogen uptake by cells of the immune system is also a highly specialized form of endocytosis term phagocytosis.

The overall processes of exocytosis and endocytosis are, in all modes, controlled via calcium influx through voltage dependent calcium channels and via intracellular calcium sensor proteins such as calmodulin (in endocytosis) and the synaptotagmins, SYT (in exocytosis). The role of calcium fluxes and calcium sensors in the regulation of exocytosis and endocytosis can be clearly demonstrated with the use of drugs that inhibit the activity of calmodulin or those that inhibit calcium influx. In either case endocytosis is inhibited leading to a rapid loss in membrane replenishment at synaptic terminals in nerve cells. One of the major downstream targets of calmodulin in the process of endocytosis is the phosphatase, calcineurin (CaN; also known as protein phosphatase 3). Calcineurin functions as a heterodimer composed of a catalytic subunit (calcineurin A) and a Ca2+-binding regulatory subunit (calcineurin B). Humans express three catalytic subunit genes (PPP3CA, PPP3CB, PPP3CC) and two regulatory subunit genes (PPP3R1 and PPP3R2). Humans express 17 synaptotagmin genes identified as SYT1–SYT17.

Equally important to the processes of exocytosis and endocytosis are members of the large RAB family of monomeric G-proteins. There are 65 genes in the human genome that encode RAB family proteins and each protein is known to serve distinct roles in membrane identity, exocytotic and endocytotic vesicle budding, and membrane fusion through the recruitment of various effector proteins. For example the RAB5 protein is associated with clathrin-coated pits and as such is involved in the control of receptor-mediated endocytosis. Another important example is the intracellular vesicles in skeletal muscle and adipose tissue cells that contain the GLUT4 glucose transporter. These vesicles are stimulated to fuse with the plasma membrane in response to insulin signaling and the migration and fusion of these vesicles involves the RAB8, RAB10, and RAB14 proteins.

A large family of proteins, the SNARE family, is required for the processes of membrane fusion that are necessary intermediate steps in the exocytotic and endocytotic pathways. The term SNARE is derived from SNAP REceptor, where SNAP is Soluble NSF (N-ethylmaleimide-sensitive factor) Attachment Protein. The NSF protein is a member of the AAA subfamily of ATPases. The SNARE superfamily contains 15 genes in humans and two additional gene subfamilies identified as the vesicle-associated membrane proteins (VAMP) subfamily and the syntaxins (STX) subfamily. Humans express 8 genes of the VAMP family with the best known member being more commonly called synaptobrevin. Humans express two forms of synaptobrevin identified as isoform 1 (encoded by the VAMP1 gene) and isoform 2 (encoded by the VAMP2 gene). There are 16 genes of the STX family with syntaxin 1A being the most well characterized via its interactions with the calcium sensors of the synaptotagmin family. Many aspects of membrane fusion involve SNARE proteins, not just the processes of exocytosis and endocytosis. For example, intracellular vesicle fusion with target membrane compartments results in the formation of peroxisomes and lysosomes. In addition to specific gene families, the SNARE proteins can be defined as being a v-SNARE (vesicle) or a t-SNARE (target). The v-SNARE proteins are incorporated into the membranes of the transport vesicles, while t-SNARE proteins are found in the membranes of the target membrane. For example, neurotransmitter vesicles contain v-SNARE proteins while the nerve terminal membrane contains t-SNARE proteins.

The overall process of exocytosis requires multiple distinct steps each of which can be regulated by many different factors. The transport of exocytotic cargo (be it a secreted hormone or a membrane channel protein) begins when the macromolecules are packaged into vesicles that bud from the trans-Golgi network. These exocytotic vesicles are carried from point of origin to final destination by motor proteins of the myosin and kinesin families. The motor proteins utilize actin filaments and microtubule tracks to guide their movement and the hydrolysis of ATP serve as the source of the energy required for the movement. The specificity of the overall migration is controlled by the RAB proteins. Once a vesicle reaches the plasma membrane exocytosis involves a the formation of a multisubunit tethering complex called the exocyst. The exocyst facilitates SNARE family member protein-mediated membrane fusion. The fusion process involves interactions between the particular vesicle SNARE (v-SNARE) and the particular target SNARE (t-SNARE) which results in SNARE proteins "zippering" together. Following fusion and completion of the exocytosis process the SNARE protein interactions need to be disassembled in order to allow membrane recycling carried out via endocytosis. The two proteins that carry out the disassembly process are NSF and α-SNAP (soluble NSF attachment protein).

back to the top

Protein Activation via Proteolytic Cleavage

Most proteins undergo proteolytic cleavage following translation. The simplest form of this is the removal of the initiation methionine. Many proteins are synthesized as inactive precursors that are activated under proper physiological conditions by limited proteolysis. Pancreatic enzymes and enzymes involved in clotting are examples of the latter. Inactive precursor proteins that are activated by removal of polypeptides are termed proproteins. If a precursor protein is synthesized via association with the endoplasmic reticulum (ER), as described above, it is targeted to that location by the N-terminal signal sequence (signal peptide or leader peptide) which is proteolytically removed after the association with the ER. These latter proteins are referred to as preproteins. A protein that begins with a leader peptide and also must undergo further proteolysis to be functional is termed a preproprotein.

A complex example of post-translational processing of a preproprotein is the cleavage of prepro-opiomelanocortin (POMC) synthesized in the pituitary (see the Peptide Hormones page for discussion of POMC). This preproprotein undergoes complex cleavages, the pathway of which differs depending upon the cellular location of POMC synthesis.

Another example of a preproprotein is insulin. Since insulin is secreted from the pancreas it has a signal sequence (leader peptide) making it a preprotein. Following cleavage of the 24 amino acid signal peptide the protein folds into proinsulin. Proinsulin is further cleaved yielding active insulin (thus it is synthesized as a preproprotein) which is composed of two peptide chains linked together through disulfide bonds.

Still other proteins, that are enzymes, are synthesized as inactive precursors called zymogens. Zymogens are activated by proteolytic cleavage such as is the situation for several proteins of the blood clotting cascade.

back to the top

Protein Methylation

Post-translational methylation of proteins occurs on nitrogens and oxygens. The activated methyl donor for these reactions is S-adenosylmethionine (SAM). The most common methylations are on the ε-amine of the R-group of lysine residues and the guanidino moiety of the R-group of arginine. Methylation of lysine residues in histones in the nucleosome is an important regulator of chromatin structure and consequently of transcriptional activity. Lysine methylation was originally thought to be a permanent covalent mark, providing long-term signaling, including the histone-dependent mechanism for transcriptional memory. However, it has become clear that lysine methylation, similar to other covalent modifications, can be transient and dynamically regulated by an opposing demethylation activity. Methylation of lysine residues affects gene expression not only at the level of chromatin modification, but also by modifying the activity of numerous transcription factors. Histone arginine methylation is also known to regulate chromatin structure and consequently transcriptional activty. Humans express 27 lysine (K) methyltransferases (identifed as KMT family enzymes) and nine arginine methyltransferases. The latter family of enzymes is identified as the protein arginine (R) methyltransferase (PRMT) family. Numerous enzymes catalyze lysine demethylation reactions with one of the largest being the Jumonji C (JmjC) domain containing demethylases, all of which are members of a large family of at least 80 enzymes that are 2-oxoglutarate and Fe2+-dependent dioxygenases. For more complete information on the functions of protein methylation and demethylation go to the Regulation of Gene Expression page.

Additional nitrogen methylations are found on the imidazole ring of histidine and the R-group amides of glutamate and aspartate. Methylation of the oxygen of the R-group carboxylates of glutamate and aspartate also takes place and forms methyl esters. Proteins can also be methylated on the thiol R-group of cysteine.

As indicated below, many proteins are modified at their C-terminus by prenylation near a cysteine residue in the consensus CAAX. Following the prenylation reaction the protein is cleaved at the peptide bond of the cysteine and the carboxylate residue of the cysteine is methylated by a prenylated protein methyltransferase.

back to the top

Protein Acetylation

Post-translational acetylation of proteins occurs on the ε-amine of lysine residues the same as for the methylation of lysines in proteins. In addition, a large number of proteins (more than 80% of human proteins) are acetylated on the N-terminal amino acid. The enzymes that catalyze protein acetylation of lysine residues are classified as lysine (K) acetyltransferases and denoted by the nomenclature KAT. Humans express 17 genes encoding KAT enzymes. The activated acetyl donor for the KAT enzymes is acetyl-CoA. The role of acetyl-CoA in the acetylation of proteins places this post-translational processing event at the crossroads of metabolic regulation. Physiological and pathophysiological conditions that result in increases or decreases in the production and utilization of acetyl-CoA will, therefore, have profound effects on the ability of KAT enzymes to carry out their functions.

Lysine Acetylation

Acetylation of lysine residues in histones in the nucleosome is an important regulator of chromatin structure and consequently of transcriptional activity. Like the reversibility of lysine methylation, protein lysine acetylation is also reversible. The enzymes that carry out removal of the acetyl group are broadly classified into two primary groups. One group is identified as the histone deacetylases (HDAC), which are Zn2+-dependent enzymes and the other group is identified as the sirtuins (SIRT) which are NAD+-dependent enzymes. More than 1,750 proteins in human tissues have been shown to be modified by acetylation. Greater detail on histone acetylation-deacetylation can be found in the Control of Gene Expression page. The discussion here will focus on metabolic regulation via reversible acetylation.

Protein lysine acetylation is observed on proteins in most all compartments of the cell. Recent evidence has demonstrated that numerous enzymes, that control a vast array of metabolic processes, have their activity modulated by reversible lysine acetylation. Within the liver, nearly 1,000 different proteins (not including nuclear proteins) have been shown to be acetylated with many of the proteins functional in the processes of metabolic regulation. Of these nearly 1,000 proteins, more than 150 are found in the mitochondria of hepatocytes. An astounding outcome of the work on metabolic regulation, via protein acetylation, is that very nearly all of the enzymes involved in glycolysis, glycogen metabolism, gluconeogenesis, the TCA cycle, fatty acid oxidation, the urea cycle, and nitrogen metabolism, and have been shown to be acetylated. In addition, several enzymes involved in oxidative phosphorylation and amino acid metabolism have also been found to be acetylated.

The acetylation of metabolic enzymes results in alterations in their activities by several different mechanisms. Acetylation can lead to subsequent ubiquitylation and proteosomal degradation of the modified protein. Acetylation can also result in destruction of the modified protein via the lysosomes. Protein degradation is not the only mechanism whereby lysine acetylation can be used to regulate an enzymes level of activity. Numerous enzymes, including metabolic enzymes, that are acetylated have altered catalytic activity. Acetylation can lead to neutralization of an active site lysine or the acetylation can lead to blockade of the action of an allosteric activator. Numerous other lysine acetylation-mediated effects on enzyme activity have been documented including the blocking of substrate binding, blocking of metabolite binding, and modifying the subcellular localization of an enzyme.

Several Metabolic Enzymes Regulated by Reversible Acetylation

Enzyme Name Gene Acetylase Deacetylase Comments
Acetyl-CoA acetyltransferase 1 ACAT1 unknown SIRT3 mitochondrial enzyme involved in ketone body utilization; major activity is the cleavage of acetoacetyl-CoA into two acetyl-CoA units; acetylation down-regulates the activity of the enzyme; K260 and K265 deacetylated by SIRT3 but K187 is not
Acyl-CoA dehydrogenase, long chain ACADL unknown SIRT3 mitochondrial fatty acid β-oxidation enzyme; acetylation down-regulates the activity of the enzyme
Acyl-CoA synthetase 1 ACSL1 unknown SIRT3 major liver and adipose tissue enzyme involved in the activation of fatty acids for β-oxidation; enzyme contains at least 15 sites of acetylation that are acetylated differentially dependent upon physiological status; acetylation of K285 is known to down-regulate the activity of the enzyme
Aldehyde dehydrogenase 2 ALDH2 unknown SIRT3 the mitochondrial aldehyde dehydrogenase; multiple sites of acetylation; acetylation increase the activity of the enzyme; K370 is deacetylated by SIRT3 but K453 is not
acyl-CoA synthetase short chain family member 1 ACSS1 unknown SIRT3 mitochondrial enzyme; also identified as AceCS2; catalyzes conversion of acetate to acetyl-CoA; important in energy homeostasis during periods of fasting; acetylation results in down-regulation of enzyme activity
acyl-CoA synthetase short chain family member 2 ACSS2 KAT3A (CBP) SIRT1 cytoplasmic enzyme; also identified as AceCS1; catalyzes conversion of acetate to acetyl-CoA; acetate stimulates interactions between ACSS2, CBP [derived from CREB (cAMP-response element binding protein)-binding protein], and the hypoxia induced factor, HIF-2 (see the Glycolysis page for more details on the hypoxia induced pathway); acetylation of ACSS2 results in down-regulation of enzyme activity by interference with the active site
argininosuccinate lyase ASL unknown unknown urea cycle enzyme; acetylation results in down-regulation of enzyme activity by interference with the active site
carbamoylphosphate synthetase I CPS1 unknown SIRT5 urea cycle enzyme; acetylation results in down-regulation of enzyme activity
carnitine palmitoyltransferase 2, CPT2 CPS1 unknown unknown mitochondrial enzyme involved in transport of activated fatty acids into the mitochondria for β-oxidation; consequences of acetylation of four sites (K104, K453, K537, and K544) yet to be determined
glyceraldehyde-3-phosphate dehydrogenase GAPDH KAT2B HDAC5 glycolytic enzyme; KAT2B was originally identified as PCAF (p300/CBP-associated factor); lysine residues K117, K227, K251 and K254 are acetylated; acetylation of K227 causes an interaction of GAPDH and one of the seven in abstentia homolog (SIAH) ubiquitin ligases resulting in cytoplasmic to nuclear translocation; the seven in abstentia gene was originally identified in Drosophila as being required for the specification of R7 cell fate in the eye; humans express three SIAH gene identifed as SIAH1, SIAH2, and SIAH3; acetylation of K254 results in increased enzyme activity in response to increased glucose concentration
glutamate dehydrogenase GLUD1 unknown SIRT3 major enzyme of overall nitrogen homeostasis and regulator of energy status;
glutaminase GLS2 unknown unknown enzyme involved in overall nitrogen homeostasis; acetylation of K329 results in down-regulation of enzyme activity
3-hydroxy-3-methylglutaryl CoA synthase 2 HMGCS2 unknown SIRT3 mitochondrial enzyme involved in synthesis of the ketone bodies; acetylation results in down-regulation of enzyme activity; K310 is deacetylated by SIRT3 but K354 is not
isocitrate dehydrogenase 2 IDH2 unknown SIRT3 mitochondrial enzyme involved in the production of NADPH in response to oxidative stress; acetylation results in down-regulation of enzyme activity
malate dehydrogenase 2 MDH2 unknown unknown mitochondrial enzyme of the TCA cycle; lysines K185, K301, K307, and K314 are acetylated; acetylation results in up-regulation of enzyme activity; acetylation of MDH2 increases under conditions of increased fatty acid intake
ornithine transcarbamoylase OTC2 unknown SIRT3 mitochondrial enzyme involved in urea cycle; lysine K88 in the active site is a primary target for acetylation; acetylation of K88 inhibits enzyme activity by decreasing affinity for substrate, carbamoyl phosphate; mutation of K88 to asparagine (K88N mutation) found in some patients suffering from OTC deficiency
phosphoenolpyruvate caboxykinase 1 PCK1 EP300 SIRT2 cytoplasmic form of the enzyme (also known as PEPCK-c) involved in gluconeogenesis; the EP300 gene encodes the p300 protein (adenovirus E1A binding protein p300) that is a close relative of the CBP acetyltransferase; EP300 also identifed by the KAT nomenclature as KAT3B; CPB protein is encoded by the CREBBP gene which is also identifed by the standard KAT nomenclature as KAT3A; acetylation of PEPCK-c results in down-regulation of enzyme activity via interaction with the UBR5 ubiquitin ligase (ubiquitin ligase E3 component N-recognin 5)
phosphoglycerate mutase 1 PGAM1 unknown SIRT1 cytoplasmic enzyme involved in glycolysis; at least nine lysines shown to be acetylated in PGAM1; the major sites of acetylation are K251, K253, and K254; acetylation results in up-regulation of enzyme activity
pyruvate kinase, muscle isoform PKM2 KAT2B unknown cytoplasmic enzyme involved in glycolysis; the PKM2 gene produces two PKM isoforms (PKM1 and PKM2) as a result of alternative mRNA splicing; expression of the gene is induced in proliferating cells and all human cancers; expression of PKM2 and synthesis of the PKM2 isoform of the enzyme results in reduced oxidation of glucose to pyruvate resulting in the accumulation of glycolytic intermediates which promotes the production of macromolecules from glucose carbons; acetylation of K305 is stimulated in the presence of high glucose; acetylation results in down-regulation of enzyme activity as a result of the lysosomal degradation pathway referred to as chaperone-mediated autophagy, CMA
succinate dehydrogenase complex subunit A SDHA unknown SIRT3 mitochondrial enzyme that is one of four subunits of the SDH complex; involved in the TCA cycle and in oxidative phosphorylation; acetylation results in down-regulation of enzyme activity
superoxide dismutase 2 SOD2 unknown SIRT3 mitochondrial matrix enzyme involved in removal of super oxide anions; catalyzes reduction of super oxide anion to hydrogen peroxide; acetylation results in down-regulation of enzyme activity
sphingosine kinase 1 SPHK1 p300/CBP unknown cytoplasmic enzyme involved in synthesis of the bioactive lipid sphingosine-1-phosphate, S1P; acetylation results in stabilization of the protein leading to up-regulation of enzyme activity

N-Terminal Acetylation

The acetylation of proteins on the N-treminal amino acid occurs in greater than 80% of all human proteins. The modification is identified as Nt-acetylation. In most proteins where the initiator methione remains at the N-terminus, this amino acid is acetylated. When the initiator methionine is removed, as is the case for all secreted, transmembrane, and glycoproteins due to removal ofthe leader peptide in the lumen of the ER, the protein can still be Nt-acetylated. The most commonly occurring amino acid at the N-terminus that are acetylated are alanine (A), serine (S), cysteine (C), threonine (T), and valine (V). In the vast majority of cases, the presence of the Nt-acetylation creates a specific degradation signal referred to as a degron. The presence of the degron signal then targets the protein for ubiquitylation via the ubiquitin-dependent N-end rule pathway. The ubiquitylated proteins are then degraded in the proteosome. Components of the N-end-rule pathway are referred to as N-recognins. It is the N-recognins that are the ubiquitin ligases (UBR: for (ubiquitin ligase E3 component N-recognin) that ubiquitinate the Nt-acetylated protein. An example of a metabolic enzyme that is targeted for ubiquitylation via the N-recognin pathway is PEPCK as indicated in the Table above. In this example the ubiquitin ligase is UBR5. Humans express five UBR ubiquitin E3 ligases (UBR1–UBR5). When the N-terminal amino acid that is acetylated is a cysteine it can be oxidized by nitric oxide (NO) followed by arginine attachment via the action of an arginyltransferase such as the enzyme encoded by the ATE1 gene.

The enzymes that incorporate an acetyl group onto the N-terminal amino acid of human proteins are referred to as N-acetyltransferases (NAT). These enzymes represent a distinct family of acetyltransferases that distinguishes them from the lysine acetyltransferases (KAT). Like the KAT enzymes the NAT enzymes utilize acetyl-CoA as the acetyl donor for the acetyltransferase reaction. There are six NAT complexes in human cells identified as NatA–NatF. Functional NAT enzymes are heterotrimeric complexes where the α-subunit of the complex is the catalytic protein. The catalytic α-subunits are encoded by a family of 12 genes identified as N(alpha)-acetyltransferases (NAA). The NatA complex can be generated through the association of four different NAA proteins (NAA10, NAA11, NAA15, and NAA16). The NatB complex can contain either the NAA20 or NAA25 protein. The NatC complex can contain either the NAA30, NAA35, or NAA38 protein. The NatD, NatE, and NatF complexes each contain a single NAA protein, NAA40, NAA50, and NAA60, respectively.

back to the top

Protein Phosphorylation

Post-translational phosphorylation is one of the most common reversible protein modifications that occurs in animal cells. The vast majority of phosphorylations occur as a mechanism to regulate the biological activity of an enzyme or protein and as such are transient. In other words a phosphate (or more than one in many cases) is added by a specific kinase and later removed by a specific phosphatase.

Physiologically relevant examples are the phosphorylations that occur in glycogen synthase and glycogen phosphorylase in hepatocytes in response to glucagon release from the pancreas. Phosphorylation of glycogen synthase inhibits its activity, whereas, the activity of glycogen phosphorylase is increased. These two events lead to increased hepatic glucose delivery to the blood.

The enzymes that phosphorylate proteins are termed kinases and those that remove phosphates are termed phosphatases. For a more detailed discussion of kinases and phosphatases go to the Signal Transduction page. Protein kinases catalyze reactions of the following type:

ATP + protein → phosphoprotein + ADP

In animal cells serine, threonine and tyrosine are the amino acids subject to phosphorylation. The largest group of kinases are those that phosphorylate either serine or threonine residues and as such are termed protein serine/threonine kinases. The ratio of phosphorylation of the three different amino acids is approximately 1000/100/1 for serine/threonine/tyrosine.

Although the level of tyrosine phosphorylation is minor, the importance of phosphorylation of this amino acid is profound. As an example, the activity of numerous growth factor receptors is controlled by tyrosine phosphorylation.

back to the top

Protein Fatty Acid Acylation

Many proteins are modified at their N-terminus following synthesis. N-terminal modifications can include acetylation and myristoylation. As indicated above many proteins are N-terminally acetylated, the consequences of which are targeted degradation via the N-recognin ubiquitin ligase pathway. Despite the fact that the initiator methionine is very often hydrolyzed following protein synthesis (catalyzed by methionine aminopeptidases), acylation of the N-terminus still occurs. N-terminal acetylation is catalyzed by a family of N-terminal acetyltransferases (NATs), as discussed above, using acetyl-CoA as the acetyl donor for these reactions. Protein fatty acylation at the N-terminus most often involves attachment of the 14-carbon fatty acid, myristic acid to an N-terminal glycine residue, referred to as N-myristoylation. Another common fatty acylation of proteins utilizes the 16-carbon fatty acid palmitic acid which is attached to the sulfhydryl group of internal and N-terminal cysteine residues and is, therefore, referred to as S-palmitoylation. Although other long, medium, and short chain fatty acids are found attached to either the N-terminal amino acid (such as N-terminal propionylation) or to internal amino acids, N-myristoylation and S-palmitoylation represent the bulk of protein acylations. One physiologically relevant example of internal protein acylation is the hormone ghrelin. Ghrelin is a stomach-derived hormone that is acylated, a modification required for its biological activity, with octanoic acid on a specific serine residue. The ghrelin acylation is catalized by an enzyme that is a member of the multipass transmembrane acyltransferase family termed MBOAT for membrane-bound O-acyltransferase. The ghrelin acyltransferase is encoded by the MBOAT4 gene which was originally identified as ghrelin O-acyltransferase, GOAT.

Protein N-Myristoylation

N-terminal myristoylation is catalyzed by N-terminal myristoyltransferases (NMTs). Humans express two NMT genes identified as NMT1 and NMT2. Incorporation of myristic acid onto an N-terminal glycine residue occurs predominantly as a co-translational event, although post-translational N-myristoylation has been shown to occur in apoptotic cells. Within the human proteome, it has been shown that approximately 0.5% of all proteins are N-myristoylated.

Protein S-Palmitoylation

Although not as common as protein N-myristoylation, protein S-palmitoylation is an important post-translational modification effecting the regulation of membrane attachment, intracellular trafficking, and membrane subdomain localization. The bulk of S-palmitoylation occurs on the sulfhydryl of internal cysteine residues, however, important examples of N-terminal palmitoylated proteins are known. In addition to the attachment of palmitic acid, S-palmitoylation is a term used to describe the S-acylation of proteins with stearic acid (18:0), oleic acid (18:1), arachidonic acid (20:4), and eicosapentaenoic acid (20:5). Palmitoylation of proteins is catalyzed by a family of protein acyltransferase (PATs) that are members of the Asp-His-His-Cys-containing protein acyltransferase family, identifed as DHHC-PATs. Due to the DHHC motif forming a zinc-finger domain the genes encoding these enzymes are termed zinc finger DHHC type containing (ZDHHC) with a number designating the specific gene. Currently 23 human ZDHHC genes have been identified and characterized, ZDHHC1–ZDHHC9, ZDHHC11–ZDHHC24 (there is no ZDHHC10 gene). N-terminal palmitoylation is known to occur on the α-subunit of Gs-type G-proteins as well as on the sonic hedgehog (SHH) protein. N-terminal palmitoylation of SHH is catalyzed by a specific enzyme encoded by the HHAT (hedgehog acyltransferase) gene. The HHAT protein belongs to the MBOAT family of multipass transmembrane acyltransferases. In addition to N-terminal S-palmitoylation, SHH is modified by the attachment of cholesterol to the C-terminus, a modification required to limit the spread of the protein across the anteroposterior axis of the developing neural tube and the developing limb bud. The human homolog of the Drosophila melanogaster segment polarity gene porcupine, encoded by the PORCN gene, is also a member of the MBOAT family of acyltransferases. The PORCN encoded enzyme S-palmitoylates the Wnt family proteins, a modification that is required for correct distribution of the gradients of these important development regulatory growth factors.

back to the top

Protein Prenylation

Prenylation refers to the addition of the 15 carbon farnesyl group or the 20 carbon geranylgeranyl group to acceptor proteins, both of which are isoprenoid compounds derived from the cholesterol biosynthetic pathway. The isoprenoid groups are attached to cysteine residues at the carboxy terminus of proteins in a thioether linkage (C–S–C). A common consensus sequence at the C-terminus of prenylated proteins has been identified and is composed of CAAX, where C is cysteine, A is any aliphatic amino acid (except alanine) and X is the C-terminal amino acid. More than 120 human proteins have been identified that are modified by the addition of a prenyl group. These proteins include the γ-subunit of numerous heterotrimeric G-proteins, members of the Ras superfamily of small GTPases, the nuclear lamins, and several protein kinases and protein phosphatases.

In the course of the prenylation reaction, the prenyl group (either farnesyl or geranylgeranyl) is added to the cysteine in the CAAX motif at the C-terminus of target proteins and the AAX tripeptide is subsequently removed. These prenylation reactions are carried out, in humans, by one of several CAAX isoprenylation enzymes. The major isoprenylation enzymes are farnesyltransferase and geranylgeranyltransferase type I. Farnesyltransferase and geranylgeranyltransferase function as heterodimers composed of a common α-subunit and a distinct β-subunit. The common α-subunit is encoded by the FNTA gene (farnesyltransferase, CAAX box, alpha). The β-subunit of farnesyltransferase is encoded by the FNTB gene. The β-subunit of geranylgeranyl transferase type I is encoded by the PGGT1B gene (protein geranylgeranyltransferase type I subunit beta). Following protein isoprenylation the AAX tripeptide is removed by CAAX proteases. The major CAAX protease in humans is encoded by the RCE1 gene (Ras and a-factor converting enzyme 1). The last step in protein isoprenylation involves the methylation of the carboxylate group of the prenylated cysteine in a reaction utilizing S-adenosylmethionine as the methyl donor. Humans express three enzymes that carry out the isoprenylcysteine methyltransferase reaction with the most abundant being encoded by the ICMT gene (isoprenylcysteine carboxyl methyltransferase).

Reactions of protein prenylation

Reactions of protein prenylation. The prenylation of target proteins occurs in three steps. The first step is the isoprenylation of the cysteine residue in the C-terminal CAAX motif. The AAX tripeptide is then removed followed by methylation of the hydroxyl of the carboxylic acid of the now C-terminal prenylated cysteine. The major prenylation reactions involve farnesylation or geranylgeranylation. The major CAAX protease is encoded by the RCE1 gene and the major cysteine carboxymethyltransferase is encoded by the ICMT gene.

In addition to numerous prenylated proteins that contain the CAAX consensus, prenylation is known to occur on proteins of the RAB family of RAS-related G-proteins. There are 65 proteins in this family that are prenylated at either a CC or CXC element in their C-termini. The RAB family of proteins are involved in signaling pathways that control intracellular membrane trafficking.

Some of the most important proteins whose functions depend upon prenylation are those that modulate immune responses. These include proteins involved in leukocyte motility, activation, and proliferation and endothelial cell immune functions. It is these immunomodulatory roles of many prenylated proteins that are the basis for a portion of the anti-inflammatory actions of the statin class of cholesterol synthesis-inhibiting drugs due to a reduction in the synthesis of farnesylpyrophosphate and geranylpyrophosphate and thus reduced extent of inflammatory events. Other important examples of prenylated proteins include the oncogenic GTP-binding and hydrolyzing protein RAS and the γ-subunit of the visual protein transducin, both of which are farnesylated. In addition, as indicated above, numerous heterotrimeric G-proteins have their γ-subunits modified by geranylgeranylation.

back to the top

Protein Sulfation

Sulfate modification of proteins occurs at tyrosine residues. As many as 1% of all tyrosine residues present in the eukaryotic proteome are modified by sulfate addition making this the most common tyrosine modification. Tyrosine sulfation is accomplished via the activity of tyrosylprotein sulfotransferases (TPST) which are membrane-associated enzymes of the trans-Golgi network. There are two known TPSTs identified as TPST-1 and TPST-2. The universal sulfate donor for these TPST enzymes is 3'-phosphoadenosyl-5'-phosphosulphate (PAPS). Addition of sulfate occurs almost exclusively on secreted and trans-membrane spanning proteins. Since sulfate is added permanently it is necessary for the biological activity and not used as a regulatory modification like that of tyrosine phosphorylation.

Synthesis and structure of 3'-phosphoadenosyl-5'-phosphosulphate (PAPS)

Two-step reaction for synthesis of PAPS. The synthesis of PAPS involves the addition of sulfate at the β (beta) position of the phosphates of ATP with the resultant loss of the γ (gamma) phosphate generating adenosine 5'-phosphosulfate, APS. APS is then phosphorylated at the 3'-position of the ribose moiety forming the ultimate product, PAPS. Synthesis of PAPS in humans is catalyzed by the bi-functional enzyme 3'-phosphoadenosine 5'-phosphosulfate synthase, PAPSS. PAPSS possesses both the ATP sulfurylase and APS kinase activities that are associated with two separate enzymes in yeasts, bacteria, and plants. Humans express two PAPSS genes identified as PAPSS1 and PAPSS2.

At least 34 human proteins have been identified that are tyrosine sulfated although the total number that are predicted is much higher. In all vertebrates a total of 310 tyrosine sulfated proteins have been identified. It is predicted that the mouse proteome is likely to contain over 2000 tyrosine sulfated proteins. The addition of sulfate to tyrosine is believed to play a role in the modulation of protein-protein interactions of secreted and membrane-bound proteins. The process of tyrosine sulfation has been shown to be critical for the processes of blood coagulation, various immune functions, intracellular trafficking, and ligand recognition by several G-protein-coupled receptors (GPCRs). Some well-known tyrosine sulfated proteins are the coagulation protein factor VIII, and the gut peptides gastrin and cholecystokinin (CCK).

back to the top

Vitamin C-Dependent Protein Modifications

Modifications of proteins that depend upon vitamin C as a cofactor include proline and lysine hydroxylations and carboxy terminal amidation of neuroendocrine peptides. The hydroxylating enzymes are identified as prolyl hydroxylases and lysyl hydroxylases. The most important hydroxylated proteins are the collagens. Within collagens secific proline residues are hydroxylated by prolyl 3-hydroxylase and prolyl 4-hydroxylase and specific lysine residues are hydroxylated by lysyl hydroxylases.

Humans express three distinct prolyl 3-hydroxylase genes (P3H1, P3H2, and P3H3). Human prolyl 4-hydroxylases are functional as heterotetrameric enzymes composed of two α-subunits (the catalytic subunits) and two β-subunits. Humans express four distinct prolyl 4-hydroxylase α-subunit genes (P4HA1, P4HA2, P4HA3, and P4HTM) and one β-subunit gene (P4HB). The protein encoded by the P4HTM gene is a transmembrane protein localized to the endoplasmic reticulum. The P4HTM protein functions in the modulation of cellular responses to hypoxia by altering the activity of hypoxia inducible factor 1 (HIF1) as a result of stabilization of one of the specific subunits of HIF termed HIF-1α. The proly 3- and prolyl 4-hydroxylases all belong to the large family of 2-oxoglutarate and Fe2+-dependent dioxygenases whose members are most notable for their roles in histone demethylation and the regulation of cellular responses to hypoxia initiated by HIF1.

Humans express three distinct lysyl hydroxylase genes identified as PLOD1, PLOD2, and PLOD3. PLOD stands for procollagen-lysine, 2-oxoglutarate 5-dioxygenase. The PLOD1 gene is located on chromosome 1p36.22 and is composed of 21 exons that encode a precursor protein of 727 amino acids. The PLOD2 gene is located on chromosome 3q24 and is composed of 22 exons that generate two alternatively spliced mRNAs encoding two isoforms of the PLOD2 enzyme. The PLOD3 gene is located on chromosome 7q22 and is composed of 17 exons that encode a 736 amino acid precursor protein. Each of the lysyl hydroxylase enzymes, like the prolyl hydroxylases, are members of the large family of 2-oxoglutarate and Fe2+-dependent dioxygenases.

During the process of C-terminal protein amidation the modified amino acid is glycine that is found within the context –XGXX–COOH where X can be any amino acid. The amidation of the C-terminus results in the neutralization of negative charges. The glycine amidation process is a two-step process carried out by the enzyme peptidylglycine α-amidating monooxygenase which is encoded by the PAM gene. The PAM encoded protein is expressed as a preproprotein which is proteolytically processed and possesses two distinct enzymatic activites required for protein amidation. The two activities of the PAM encoded protein are contained within the peptidylglycine α-hydroxylating monooxygenase (PHM) domain and the peptidyl-α-hydroxyglycine α-amidating lyase (PAL) domain. The PAM gene is located on chromosome 5q21.1 and is composed of 28 exons that generate six alternatively spliced mRNAs that are predicted to encode precursor proteins that may all be proteolytically processed to an active enzyme similar to the 973 amino acid precursor identified as isoform e. Several peptide hormones such as oxytocin and vasopressin have C-terminal amidation.

back to the top

Vitamin K-Dependent Protein Modifications

Vitamin K is a cofactor in the carboxylation of glutamic acid residues catalyzed by the enzyme gamma-glutamyl carboxylase (γ-glutamyl carboxylase). The result of this type of reaction is the formation of a γ-carboxyglutamate (gamma-carboxyglutamate), referred to as a gla residue. The gene encoding γ-glutamyl carboxylase is identified as GGCX and is located on chromosome 2p11.2. The GGCX gene spans 13 kbp and consists of 15 exons encoding a 758 amino acid protein. The γ-glutamyl carboxylase protein is an integral membrane protein with three transmembrane spanning domains associated with microsomal membranes.

The overall reaction, resulting in the incorporation of a gla-residue, actually involves a series of three distinct reactions. The reaction catalyzed by γ-glutamyl carboxylase is the one that incorporates the gla-residue but two additional enzyme activities are required to convert vitamin K back to its active hydroquinone (quinol) form. The latter two reactions are catalyzed by vitamin K epoxide reductase (VKORC1). These latter two reactions involve a dithiol conversion to a disulfide. An additional enzyme called vitamin K quinone reductase (VKQR) can also carry out the conversion of the quinone form of vitamin K (as formed by the action of VKORC1 or as obtained from the diet) to the hydroquinone form. This latter reaction utilizes NADH as a co-factor.

Formation of a γ-carboxyglutamamte (gla) residue in prothrombin

Incorporation of a gla-residue into prothrombin: The incorporation of a gla-residue into a protein such as prothrombin requires the hydroquinone (KH2) form of vitamin K (either K1, K2, or synthetic K3). The utilization and regeneration of the KH2 form in the overall process of the γ-glutamyl carboxylase (GGCX) reaction is referred to as the vitamin K cycle. Either following the carboxylation, or directly from dietary quinone forms of vitamin K, the action of vitamin K epoxide reductase (VKORC1) is to provide a continuous source of the KH2 form.

The formation of gla residues within several proteins of the blood clotting cascade is critical for their normal function. The presence of gla residues allows the protein to chelate calcium ions and thereby render an altered conformation and biological activity to the protein. The coumarin-based anticoagulants, warfarin and dicumarol function by inhibiting the second and third enzymes of the overall carboxylation reaction.

back to the top


Selenium is a trace element and is found as a component of several prokaryotic and eukaryotic enzymes that are involved in redox reactions. Two critical re-dox enzyme familiess that require selenocysteine residues are the glutathione peroxidase and thioredoxin reductase families. Glutathione peroxidase is a critical enzyme involved in the protection of red blood cells from reactive oxygen species (ROS). This enzyme is a component of a re-dox system that also involves the enzyme glutathione reductase and NADPH as the terminal electron donor. This system is required for the continued reduction of oxidized glutathione (GSSG) and represents the single most significant system requiring continued glucose metabolism via the Pentose Phosphate Pathway in erythrocytes as the means for the production of the NADPH. Glutathione (GSH) becomes oxidized in the context of reducing various ROS and peroxides and to continue in this capacity the oxidized form needs to be continously reduced. Humans express eight different glutathione peroxidase genes identified as GPX1 through GPX8. The enzyme encoded by the GPX1 gene (GPx1) is found in the cytosol of nearly all cell types in humans. GPx1 functions almost exclusively to reduce hydrogen peroxide (H2O2) to water. The protein encoded by the GPX3 gene, GPx3, is an extracellular enzyme found primarily in the plasma. The GPX4 encoded enzyme, GPx4, is localized to the intestines and is an extracellular enzyme as well. The GPX1 gene is located on chromsome 3p21.3 and is composed of 2 exons that generate two alternatively spliced mRNAs. The GPX1 coding region contains a polyalanine tract in the N-terminal region of the protein. There are several alleles of this gene that have five, six, or seven alanine repeats. The allele with five alanine repeats has been shown to be highly correlated to increased risk for development of breast cancer. The GPX2 gene is located on chromsome 14q24.1 and is composed of 4 exons. The GPX3 gene is located on chromsome 5q33.1 and is composed of 5 exons. The GPX4 gene is located on chromsome 19p13.3 and is composed of 8 exons. The GPX5 gene is located on chromsome 6p22.1 and is composed of 7 exons. The resultant GPX5 mRNA does not contain the canonical selenocysteine codon (UGA) and thus, the resulting protein does not contain a selenocysteine residue. Expression of the GPX5 gene is regulated by androgens and the gene is expressed exclusively in the epididymis in the male reproductive tract where the expressed protein, GPx5, is involved in protecting spermatazoa membranes from the damaging effects of lipid peroxidation. The GPX6 gene is located on chromsome 6p22.1 and is composed of 5 exons. GPX6 expression is restricted to embryonic tissues and the adult olfactory system. The GPX7 gene is located on chromsome 1p32 and is composed of 3 exons. The GPX8 gene is located on chromsome 5q11.2 and is composed of 3 exons.

As the name of the enzyme implies, thioredoxin reductase is involved in the reduction of thioredoxin which itself is principally involved in the reduction of oxidized disulfide bonds in proteins. The reduction of these disulfide bonds results in oxidation of thioredoxin which then is reduced by thioredoxin reductase. The overall process, like the glutathione peroxidase system, requires NADPH as the terminal electron donor for the reduction process. A critically important reaction that is coupled to the thioredoxin system is the formation of deoxynucleotides. Humans contain three thioredoxin reductase genes that encode three distinct enzymes identified as TrxR1, TrxR2, and TrxR3. The TrxR1 enzyme is functional in the cytosol and is primarily involved in the maitenance of the ribonucleotide reductase system. The TrxR2 enzyme is functional in the mitochondria where it is principally involved in the detoxification of reactive oxygen species (ROS) produced in this organelle. TrxR3 is a testes-specific isoform of the enzyme. The TrxR1 enzyme is encoded by the TXNRD1 gene located on chromosome 12q23–q24.1 and is composed of 18 exons that generate several alternatively spliced mRNAs encoding five different isoforms of TrxR1. The TrxR2 enzyme is encoded by the TXNRD2 gene located on chromosome 22q11.21 and is composed of 19 exons that generate two alternatively spliced mRNAs resulting in two different isoforms of TrxR2. The TrxR3 enzyme is encoded by the TXNRD3 gene located on chromosome 3q21.3 and is composed of 16 exons that generate two alternatively spliced mRNAs resulting in two different isoforms of TrxR3.

The enzymes of the deiodinase family are also important selenocysteine-containing enzymes. Clinically relevant enzymes in this family are the thyroid deiodinases that are critical for the maturation and catabolism of the thyroid hormones. Humans express three different thyroid deiodinase genes identified as DIO1, DIO2, and DIO3. The enzyme encoded by the DIO1 gene, thryroxine deiodinase type I (also called iodothyronine deiodinase type I) is involved in the peripheral tissue conversion of thyroxine (T4) to bioactive form of thyroid hormone, tri-iodothyronine (T3). In addition to its role in the generation of T3, thyroxine deiodinase I is involved in the catabolism of thyroid hormones. The enzyme encoded by the DIO2 gene, iodothyronine deiodinase type II, is also involved in the conversion of T4 to T3 but does so within the thyroid gland itself. The activity of iodothyronine deiodinase II has been associated with the thyrotoxicosis of Graves disease. The enzyme encoded by the DIO3 gene is involved only in the inactivation (catabolism) of T3 and T4. Expression of the DIO3 gene is highest the female uterus during pregnancy and in fetal and neonatal tissue suggesting a role for this enzyme in the regulation of thyroid hormone levels and functions during early development. The DIO1 gene is located on chromosome 1p33–p32 and is composed of 4 exons that generate four alternatively spliced mRNAs. The DIO2 gene is located on chromosome 14q24.2–q24.3 and is composed of 6 exons that generate four alternatively spliced mRNAs. The DIO3 gene is located on chromosome 14q32 and is an intronless gene (is a single exon gene) that encodes a protein of 304 amino acids.

Selenocysteine incorporation in eukaryotic proteins occurs cotranslationally at UGA codons (normally stop codons) via the interactions of a number of specialized proteins and protein complexes. In addition, there are specific secondary structures in the 3′ untranslated regions of selenoprotein mRNAs, termed SECIS elements, that are required for selenocysteine insertion into the elongating protein. One of the complexes required for this important modification is comprised of a selenocysteinyl tRNA [(Sec)-tRNA(Ser)Sec] and its specific elongation factor identified as selenoprotein translation factor B (SelB). SelB is also commonly called eukaryotic elongation factor, selenocysteine-tRNA-specific (EEFsec or EFsec). The protein that is involved in the interaction of the SECIS element with the (Sec)-tRNA(Ser)Sec if referred to as SECIS binding protein, SBP2. Additional proteins involved in synthesis pathway include two selenophosphate synthetases, SPS1 and SPS2, ribosomal protein L30, and two factors that have been shown to bind (Sec)-tRNA(Ser)Sec identified as soluble liver antigen/liver protein (SLA/LP) and SECp43.

Incorporation of selenocysteine during protein synthesis

Selenocysteine biosynthesis and incorporation. The first steps involve the activation of serine onto the (Sec)-tRNA followed by enzymatic conversion to selenocysteine generating (Sec)-tRNA(Ser)Sec. Next the (Sec)-tRNA(Ser)Sec is bound by SelB and the complex is incorporated into the translational machinery aided by SBP2 (not shown). The elongating protein is transfered to the selenocysteinyl-tRNA via the action of peptidyltransferase as for any other incoming amino acid and normal elongation continues.

Selenocysteine Containing Proteins

The following Table is not intended to represent a complete list of all known selenocysteine contianing proteins, it is just a representative list.

Protein Name(s) Gene(s) Functions / Comments
glutathione peroxidases 1, 2, 3, 4, and 6 GPX1, GPX2, GPX3, GPX4, GPX6 humans express eight glutathione peroxidases; major class of anti-oxidant enzymes; involved in numerous reaction pathways involved in the reduction of hydrogen peroxide (H2O2) such as is critical in the erythrocyte
iodothyronine (thyroxine) deiodinases 1, 2, and 3 DIO1, DIO2, DIO3 catalyze the conversion of the thyroid hormone thyroxine (T4) to triiodothyronine (T3); also involved in the catabolism of T3 and T4 to inactive molecules
methionine sulfoxide reductase B1 MSRB1 functions in the protection of cells from oxidative stress; catalyzes the reduction of methionine-R-sulfoxides to methionine; highly expressed in the liver and kidneys
selenophosphate synthetase 2 SEPHS2 catalyzes the synthesis of selenophosphate from selenium (as the selenide ion: Se2–) and ATP; selenophosphate is the selenium donor in the incorporation of selenocysteine residues
selenoprotein F SELENOF exact function has not been determined; found in the endoplasmic reticulum (ER) assotiated with UDP-glucose:glycoprotein glucosyltransferase (UGTR)
selenoprotein H SELENOH a nucleolar localized enzyme involved in regulating redox status of cells
selenoprotein I SELENOI CDP-alcohol phosphatidyltransferase class-I family; catalyzes the transfer of phosphethanolamine from CDP-ethanolamine to diacylglycerol in the synthesis of phosphatidylethanolamine, PE
thioredoxin reductases 1, 2, and 3 TXNRD1, TXNRD2, TXNRD3 these enzyme reduce oxidized thioredoxin such as that which is generated during the synthesis of deoxynucleotides via the action of ribonucleotide reductase

back to the top
Return to The Medical Biochemistry Page
Michael W King, PhD | © 1996–2016, LLC | info @

Last modified: January 12, 2017