Funding, Deals & Partnerships: BIOLOGICS & MEDICAL DEVICES; BioMed e-Series; Medicine and Life Sciences Scientific Journal – http://PharmaceuticalIntelligence.com
Advances in RNA sequencing technologies have revealed the complexity of our genome. Non-coding RNAs make up the majority (98%) of the transcriptome, and several different classes of regulatory RNA with important functions are being discovered. Understanding the significance of this RNA world is one of the most important challenges facing biology today, and the non-coding RNAs within it represent a gold mine of potential new biomarkers and drug targets. lncRNA sequences
Long non-coding RNAs (lncRNAs) are a large and diverse class of transcribed RNA molecules with a length of more than 200 nucleotides that do not encode proteins (or lack > 100 amino acid open reading frame). lncRNAs are thought to encompass nearly 30,000 different transcripts in humans, hence lncRNA transcripts account for the major part of the non-coding transcriptome. lncRNA discovery is still at a preliminary stage. There are many specialized lncRNA databases, which are organized and centralized throughRNAcentral.
lncRNAs can be transcribed as whole or partial natural antisense transcripts (NAT) to coding genes, or located between genes or within introns. Some lncRNAs originate from pseudogenes (Milligan & Lipovich, 2015). lncRNAs may be classified into different subtypes (Antisense, Intergenic, Overlapping, Intronic, Bidirectional, and Processed) according to the position and direction of transcription in relation to other genes (Peschansky & Wahlestedt, 2014, Mattick & Rinn, 2015). lncRNA expression
Gene expression profiling and in situ hybridization studies have revealed that lncRNA expression is developmentally regulated, can be tissue- and cell-type specific, and can vary spatially, temporally, or in response to stimuli. Many lncRNAs are expressed in a more tissue-specific fashion and with greater variation between tissues compared to protein-coding genes (Derrien et al., 2012).
In general, the expression level of lncRNA is at least one order of magnitude below that of mRNA. Many lncRNAs are located exclusively in the nucleus, but some are cytoplasmic or are located in both nucleus and cytoplasm. lncRNA functions
To date, very few lncRNAs have been characterized in detail. However, it is clear that lncRNAs are important regulators of gene expression, and lncRNAs are thought to have a wide range of functions in cellular and developmental processes. lncRNAs may carry out both gene inhibition and gene activation through a range of diverse mechanisms, adding yet another layer of complexity to our understanding of genomic regulation. It is estimated that 25 – 40% of coding genes have overlapping antisense transcription, so the impact of lncRNAs on gene regulation is not to be underestimated.
Overview of some of the functions of long non-coding RNA. (Click for a larger image) lncRNAs are involved in gene regulation through a variety of mechanisms. The process of transcription of the lncRNA itself can be a marker of transcription and the resulting lncRNA can function in transcriptional regulation or in chromatin modification (usually via DNA and protein interactions) both in cis and in trans. lncRNAs can bind to complementary RNA and affect RNA processing, turnover or localization. The interaction of lncRNA with proteins can affect protein function and localization as well as facilitate formation of riboprotein complexes. Some lncRNAs are actually precursors for smaller regulatory RNAs such as microRNAs or piwi RNAs. Figure modified from Wilusz et al. Genes Dev. 2009. 23: 1494-1504. PMID: 19571179.
lncRNA mechanisms of gene regulation
lncRNAs are not defined by a common mode of action, and can regulate gene expression and protein synthesis in a number of different ways (Figure 1). Some lncRNAs are relatively highly expressed, and appear to function as scaffolds for specialized subnuclear domains. lncRNA possess secondary structures which facilitate their interactions with DNA, RNA and proteins. lncRNA may also bind to DNA or RNA in a sequence-specific manner. Gene regulation may occur in cis (e.g. in close proximity to the transcribed lncRNA) or in trans (at a distance from the transcription site). In the case of chromatin modulation, the effect of lncRNA is typically gene-specific, exerted at a local level (in cis) however regulation of chromatin can also occur in trans.
A few lncRNAs have had their functions experimentally defined and have been shown to be involved in fundamental processes of gene regulation including:
Chromatin modification and structure
Direct transcriptional regulation
Regulation of RNA processing events such as splicing, editing, localization, translation and turnover/degradation
Post-translational regulation of protein activity and localization
Facilitation of ribonucleoprotein (RNP) complex formation
Modulation of microRNA regulation
Gene silencing through production of endogenous siRNA (endo-siRNA)
Regulation of genomic imprinting
It has recently been attempted to categorize the various types of molecular mechanisms that may be involved in lncRNA function. lncRNAs may be defined as one or more of the following five archetypes:
The Signal archetype: functions as a molecular signal or indicator of transcriptional activity.
The Decoy archetype: binds to and titrates away other regulatory RNAs (e.g. microRNAs) or proteins (e.g. transcription factors).
The Guide archetype: directs the localization of ribonucleoprotein complexes to specific targets (e.g. chromatin modification enzymes are recruited to DNA).
The Scaffold archetype: has a structural role as platform upon which relevant molecular components (proteins and or RNA) can be assembled into a complex or spatial proximity.
The Enhancer archetype: controls higher order chromosomal looping in an enhancer-like model.
lncRNA and disease
With such a wide range of functions, it is not surprising that lncRNA play a role in the development and pathophysiology of disease. Interestingly, genome wide association studies have demonstrated that most disease variants are located outside of protein-coding genes.
lncRNAs have been found to be differentially expressed in various types of cancer including leukemia, breast cancer, hepatocellular carcinoma, colon cancer, and prostate cancer. Key oncogenes and tumor suppressors including PTEN and KRAS are now known to be regulated by corresponding lncRNA pseudogenes which also act as competing endogenous RNAs (ceRNAs) or microRNA sponges (Poliseno et al., 2010, Johnsson et al., 2013). This highlights the important role that lncRNAs play in oncogenesis.
Other diseases where lncRNAs are dysregulated include cardiovascular diseases, neurological disorders and immune-mediated diseases and genetic disorders. One of the first lncRNA to be discovered was the Xist lncRNA which plays an important role in X chromosome inactivation (Penny et al., 1996), an extreme case of genomic imprinting. lncRNAs are present at almost all imprinted loci, arguing for an important role for lncRNAs in this form of epigenetic regulation.
lncRNAs represent a gold mine of potential new biomarkers and drug targets, as well as a step change in the way we understand mechanisms of disease. The challenges of studying lncRNA
Only a relatively small proportion of lncRNAs have so far been investigated and although we can start to classify different types of lncRNA functions, we are still far from being able to predict the function of new lncRNAs. This is mainly due to the fact that unlike protein-coding genes whose sequence motifs are indicative of their function, lncRNA sequences are not usually conserved and they don’t tend to contain conserved motifs. Other differences between lncRNA and mRNA are summarized in Table 1.
The main challenges of working with lncRNA are the fact that they can be present in very low amounts (typically an order of magnitude lower than mRNA expression levels), can overlap with coding transcripts on both strands and are often restricted to the nucleus.
Table 1
mRNA
lncRNA
Tissue-specific expression
Tissue-specific expression
Form secondary structure
Form secondary structure
Undergo post-transcriptional processing, i.e. 5’cap, polyadenylation, splicing
Undergo post-transcriptional processing, i.e. 5’cap, polyadenylation, splicing
Important roles in diseases and development
Important roles in diseases and development
Protein coding transcript
Non-protein coding, regulatory functions
Well conserved between species
Poorly conserved between species
Present in both nucleus and cytoplasm
Many predominantly nuclear, others nuclear and/or cytoplasmic
Total 20-24,000 mRNAs
Currently ~30,000 lncRNA transcripts, predicted 3-100 fold of mRNA in number
Expression level: low to high
Expression level: very low to moderate
Similarities and differences (dark) between mRNA and lncRNA
ncRNA discovery and profiling using Next Generation Sequencing
Expression profiling is one way to start to uncover the function of lncRNA. Identifying lncRNAs that are differentially expressed during development or in particular situations can shed light on their potential functions. Alternatively, looking for lncRNAs and protein-coding genes whose expression is correlated, can perhaps indicate co-regulation or related functions.
Whole transcriptome RNA sequencing is the method of choice for comprehensive lncRNA expression profiling, including the discovery of novel lncRNAs. Whole transcriptome sequencing enables the characterization of all RNA transcripts, including both the coding mRNA and non-coding RNA larger than 170 nucleotides in length, regardless of whether they are polyadenylated or not.
Exiqon offers a comprehensive whole transcriptome NGS Service including everything from RNA isolation to the final report including advanced data analysis and interpretation. Advantages of LNA™-enhanced research tools for lncRNA
Exiqon offer a broad range of sensitive and specific tools specifically designed to address the challenges faced when investigating lncRNA expression and function. Exiqon’s tools are based on the Locked nucleic acid (LNA™) technology. LNA™ is a class of high-affinity RNA analogs that exhibit unprecedented thermal stability when hybridized to a complementary DNA or RNA strand. Hence, LNA™ enables superior sensitivity and specificity in any hybridization-based approach. We continue to use the LNA™ technology to develop new and innovative ways to improve our understanding of lncRNAs in this rapidly developing field.
Functional analysis of lncRNAs has been revolutionized by the development of Antisense LNA™ GapmeRs which enable efficient silencing of lncRNA both in vitro and in vivo. Exiqon also offer tools to investigate lncRNA function in other ways, for example using LNA™ oligos to block interactions between lncRNA and DNA, RNA or proteins. ExiLERATE LNA™ qPCR assays have been developed to enable robust detection of even low abundance and challenging lncRNAs by qPCR. Precise subcellular localization of lncRNAs can be studied using LNA™ probes for in situ hybridization. Silencing lncRNA to disrupt their function
One strategy to study the function of lncRNA is to silence them using specific and potent antisense oligonucleotides (Antisense LNA™ GapmeRs).
The nuclear localization of many lncRNAs has meant that siRNA approaches to knockdown lncRNA, have met with limited success. The double-stranded siRNA duplex has difficulty crossing the nuclear membrane and the passenger strand (non-targeting sequence) of the duplex can often elicit its own effect, confounding interpretation of results.
Antisense LNA™ GapmeRs overcome this challenge by enabling highly efficient RNase H mediated silencing of all lncRNA. RNase H is present both in the cytoplasm and in the nucleus and it has been shown that LNA™ Gapmers offer significantly better knockdown of nuclear targets than siRNA mediated silencing. In addition, the single stranded LNA™ GapmeRs are an advantage for lncRNAs that are transcribed as antisense transcripts to coding genes because there is no second strand that could compromise specificity.
The fact that the molecular mechanism of lncRNAs often relies on sequence specific interaction with DNA, RNA or proteins means that it is possible to design highly specific LNA™-oligonucleotides that can be used to inhibit these interactions and thereby reveal the details of how lncRNAs function. Please contact us and our experts can help you with the design of custom LNA™ oligonucleotides for studying lncRNA interactions. lncRNA analysis by qPCR
Short, high affinity, LNA™-enhanced qPCR primers offer an advantage for the detection of low abundance targets. In addition, the use of LNA™ to adjust primer melting temperature provides greater flexibility in primer design which is important for qPCR analysis of overlapping transcripts. The ExiLERATE LNA™ qPCR System offers a sophisticated primer design tool combined with highly sensitive and specific qPCR assays for any RNA target.
ExiLERATE LNA™ qPCR assays are ideal to monitor the efficiency of LNA™ GapmeR-mediated RNA knockdown. Validated LNA™ qPCR primer sets are available to detect the lncRNA targeted by Antisense LNA™ GapmeR positive controls.
ExiLERATE LNA™ qPCR primer sets also provide a convenient way to validate RNA sequencing data. Our advanced online design algorithm can design LNA™ qPCR assays for novel lncRNA transcripts, isoforms or splice variants. LNA™qPCR assays for multiple lncRNAs can easily be designed using the batch mode function in our online design algorithm. Subcellular localization of lncRNA expression by in situ hybridization
Understanding the subcellular localization of a lncRNA is important information when starting to hypothesize the potential functions that the lncRNA may be performing. LNA™-enhanced probes for in situ hybridization have increased affinity for their target sequence and offer increased sensitivity and increased signal to noise ratio, which is important for detection of rare targets such as lncRNA.
High-throughput RNA sequencing studies have revealed pervasive transcription of the human genome, which generates a variety of long noncoding RNAs (lncRNAs) that have no apparent protein-coding functions (1). Subsequent studies that globally monitor translation have similarly identified numerous translation events outside of canonical protein-coding sequences (2–4), suggesting pervasive translation of the transcriptome. However, only a few examples of functional peptides encoded by RNA regions previously thought to be noncoding have been reported to regulate distinct biological processes (5–9). On page 1140 of this issue, Chen et al. (10) provide evidence for an expanded repertoire of functional peptides encoded by lncRNAs and other “untranslated” RNA regions.
The researchers sequenced ribosome-protected messenger RNA fragments (RFPs) to identify global translated open reading frames (ORFs). RFPs are considered noncanonical ORFs and found in many lncRNAs and untranslated regions of RNA. Therefore Chen et al. used genome-wide loss of function screens to assess noncanonical ORFs affect on cell growth. They filtered RFPs obtained from several human cell types and identified over 5000 previously unannotated ORFs which also include many variants if canonical ORFs, upstream ORFs in 5′ UTRs, and ORFs within transcripts that were annotated as lncRNAs.
Using CRISPR-Cas9, Chen et al. disrupted 2353 unannotated ORFs and identified over 400 RFPs that promoted cell growth in human leukemic cells and stem cells. However there were only a few lncRNAs, when disrupted, showed consistent effects on growth suggesting noncoding functions of the remaining lncRNA loci. Many of the lncRNA ORFs encoded for small peptides. These microproteins present a challenge to identify such proteins that don’t have much evolutionary conservation.
This noncanonical translation has been linked to many neurological diseases such as short tandem repeats diseases such as fragile X sydrome and other polyglutamate diseases such as Huntington’s Chorea.
References cited within this paper include
1. I. Ulitsky, D. P. Bartel, Cell154, 26 (2013).
2. A. A. Bazzini et al., EMBO J. 33, 981 (2014).
3. N.T. Ingolia et al., Cell Rep. 8, 1365 (2014).
4. Z. Ji, R. Song, A. Regev, K. Struhl, eLife 4, e08890 (2015).
5. D. M. Anderson et al., Cell160, 595 (2015).
6. T. Kondo et al., Nat. Cell Biol. 9, 660 (2007).
7. A. Pauli et al., Science 343, 1248636 (2014).
8. E. G. Magny et al., Science 341, 1116 (2013).
9. S. R. Starck et al., Science 351, aad3867 (2016).
10. J. Chen et al., Science 367, 1140 (2020).
11. M. Guttman, P. Russell, N. T. Ingolia, J. S. Weissman, E. S. Lander, Cell154, 240 (2013).
12. T.G. Johnstone, A. A. Bazzini, A. J. Giraldez, EMBO J. 35, 706 (2016).
13. W. F. Doolittle, T. D. Brunet, S. Linquist, T. R. Gregory,
Genome Biol.Evol. 6, 1234 (2014).
14. F. B. Gao, J. D. Richter, D. W. Cleveland, Cell171, 994 (2017).
15. M. G. Kearse et al., Mol. Cell 62, 314 (2016).
Other articles of note on lncRNAs on this Online Open Access Journal Include
This discussion that completes and is an epicrisis (summary and critical evaluation) of the series of discussions that preceded it.
Innervation of Heart and Heart Rate
Action of hormones on the circulation
Allogeneic Transfusion Reactions
Graft-versus Host reaction
Unique problems of perinatal period
High altitude sickness
Deep water adaptation
Heart-Lung-and Kidney
Acute Lung Injury
The concept inherent in this series is that the genetic code is an imprint that is translated into a message. It is much the same as a blueprint, or a darkroom photographic image that has to be converted to a print. It is biologically an innovation of evolutionary nature because it establishes a simple and reproducible standard for the transcription of the message through the transcription of the message using strings of nucleotides (oligonucleotides) that systematically transfer the message through ribonucleotides that communicate in the cytoplasm with the cytoskeleton based endoplasmic reticulum (ER), composing a primary amino acid sequence. This process is a quite simple and convenient method of biological activity. However, the simplicity ends at this step. The metabolic components of the cell are organelles consisting of lipoprotein membranes and a cytosol which have particularly aligned active proteins, as in the inner membrane of the mitochondrion, or as in the liposome or phagosome, or the structure of the ER, each of which is critical for energy transduction and respiration, in particular, for the mitochondria, cellular remodeling or cell death, with respect to the phagosome, and construction of proteins with respect to the ER, and anaerobic glycolysis and the hexose monophosphate shunt in the cytoplasmic domain. All of this refers to structure and function, not to leave out the membrane assigned transport of inorganic, and organic ions (electrolytes and metabolites).
I have identified a specific role of the ER, the organelles, and cellular transactions within and between cells that is orchestrated. But what I have outlined is a somewhat limited and rigid model that does not reach into the dynamics of cellular transactions. The DNA has expression that may be old, no longer used messages, and this is perhaps only part of a significant portion of “dark matter”. There is also nuclear DNA that is enmeshed with protein, mRNA that is a copy of DNA, and mDNA is copied to ribosomal RNA (rRNA). There is also rDNA. The classic model is DNA to RNA to protein. However, there is also noncoding RNA, which plays an important role in regulation of transcription.
This has been discussed in other articles. But the important point is that proteins have secondary structure through disulfide bonds, which is determined by position of sulfur amino acids, and by van der Waal forces, attraction and repulsion. They have tertiary structure, which is critical for 3-D structure. When like subunits associate, or dissimilar oligomers, then you have heterodimers and oligomers. These constructs that have emerged over time interact with metabolites within the cell, and also have an important interaction with the extracellular environment.
When you take this into consideration then a more complete picture emerges. The primitive cell or the multicellular organism lives in an environment that has the following characteristics – air composition, water and salinity, natural habitat, temperature, exposure to radiation, availability of nutrients, and exposure to chemical toxins or to predators. In addition, there is a time dimension that proceeds from embryonic stage to birth in mammals, a rapid growth phase, a tapering, and a decline. The time span is determined by body size, fluidity of adaptation, and environmental factors. This is covered in great detail in this work. The last two pieces are in the writing stage that completes the series. Much content has already be presented in previous articles.
The function of the heart, kidneys and metabolism of stressful conditions have already been extensively covered in http://pharmaceuticalintelligence.com in the following and more:
The Amazing Structure and Adaptive Functioning of the Kidneys: Nitric Oxide – Part I
This chapter I made to follow signaling, rather than to precede it. I had already written much of the content before reorganizing the contents. The previous chapters on carbohydrate and on lipid metabolism have already provided much material on proteins and protein function, which was persuasive of the need to introduce signaling, which entails a substantial introduction to conformational changes in proteins that direct the trafficking of metabolic pathways, but more subtly uncovers an important role for microRNAs, not divorced from transcription, but involved in a non-transcriptional role. This is where the classic model of molecular biology lacked any integration with emerging metabolic concepts concerning regulation. Consequently, the science was bereft of understanding the ties between the multiple convergence of transcripts, the selective inhibition of transcriptions, and the relative balance of aerobic and anaerobic metabolism, the weight of the pentose phosphate shunt, and the utilization of available energy source for synthetic and catabolic adaptive responses.
The first subchapter serves to introduce the importance of transcription in translational science. The several subtitles that follow are intended to lay out the scope of the transcriptional activity, and also to direct attention toward the huge role of proteomics in the cell construct. As we have already seen, proteins engage with carbohydrates and with lipids in important structural and signaling processes. They are integrasl to the composition of the cytoskeleton, and also to the extracellular matrix. Many proteins are actually enzymes, carrying out the transformation of some substrate, a derivative of the food we ingest. They have a catalytic site, and they function with a cofactor – either a multivalent metal or a nucleotide.
The amino acids that go into protein synthesis include “indispensable” nutrients that are not made for use, but must be derived from animal protein, although the need is partially satisfied by plant sources. The essential amino acids are classified into well established groups. There are 20 amino acids commonly found in proteins. They are classified into the following groups based on the chemical and/or structural properties of their side chains :
Aliphatic Amino Acids
Cyclic Amino Acid
AAs with Hydroxyl or Sulfur-containing side chains
Inosine triphosphate pyrophosphatase – Pyrophosphatase that hydrolyzes the non-canonical purine nucleotides inosine triphosphate (ITP), deoxyinosine triphosphate (dITP) as well as 2′-deoxy-N-6-hydroxylaminopurine triposphate (dHAPTP) and xanthosine 5′-triphosphate (XTP) to their respective monophosphate derivatives. The enzyme does not distinguish between the deoxy- and ribose forms. Probably excludes non-canonical purines from RNA and DNA precursor pools, thus preventing their incorporation into RNA and DNA and avoiding chromosomal lesions.
Genetic variation of inosine triphosphatase (ITPA) causing an accumulation of inosine triphosphate (ITP) has been shown to protect patients against ribavirin (RBV)-induced anemia during treatment for chronic hepatitis C infection by genome-wide association study (GWAS). However, the biologic mechanism by which this occurs is unknown.
Although ITP is not used directly by human erythrocyte ATPase, it can be used for ATP biosynthesis via ADSS in place of guanosine triphosphate (GTP). With RBV challenge, erythrocyte ATP reduction was more severe in the wild-type ITPA genotype than in the hemolysis protective ITPA genotype. This difference also remains after inhibiting adenosine uptake using nitrobenzylmercaptopurine riboside (NBMPR).
ITP confers protection against RBV-induced ATP reduction by substituting for erythrocyte GTP, which is depleted by RBV, in the biosynthesis of ATP. Because patients with excess ITP appear largely protected against anemia, these results confirm that RBV-induced anemia is due primarily to the effect of the drug on GTP and consequently ATP levels in erythrocytes.
Determination of inosine triphosphate pyrophosphatase phenotype in human red blood cells using HPLC.
Citterio-Quentin A1, Salvi JP, Boulieu R.
Thiopurine drugs, widely used in cancer chemotherapy, inflammatory bowel disease, and autoimmune hepatitis, are responsible for common adverse events. Only some of these may be explained by genetic polymorphism of thiopurine S-methyltransferase. Recent articles have reported that inosine triphosphate pyrophosphatase (ITPase) deficiency was associated with adverse drug reactions toward thiopurine drug therapy. Here, we report a weak anion exchange high-performance liquid chromatography method to determine ITPase activity in red blood cells and to investigate the relationship with the occurrence of adverse events during azathioprine therapy.
The chromatographic method reported allows the analysis of IMP, inosine diphosphate, and ITP in a single run in <12.5 minutes. The method was linear in the range 5-1500 μmole/L of IMP. Intraassay and interassay precisions were <5% for red blood cell lysates supplemented with 50, 500, and 1000 μmole/L IMP. Km and Vmax evaluated by Lineweaver-Burk plot were 677.4 μmole/L and 19.6 μmole·L·min, respectively. The frequency distribution of ITPase from 73 patients was investigated.
The method described is useful to determine the ITPase phenotype from patients on thiopurine therapy and to investigate the potential relation between ITPase deficiency and the occurrence of adverse events.
System wide analyses have underestimated protein abundances and the importance of transcription in mammals
Jingyi Jessica Li1, 2, Peter J Bickel1 and Mark D Biggin3
Using individual measurements for 61 housekeeping proteins to rescale whole proteome data from Schwanhausser et al. (2011), we find that the median protein detected is expressed at 170,000 molecules per cell and that our corrected protein abundance estimates show a higher correlation with mRNA abundances than do the uncorrected protein data. In addition, we estimated the impact of further errors in mRNA and protein abundances using direct experimental measurements of these errors. The resulting analysis suggests that mRNA levels explain at least 56% of the differences in protein abundance for the 4,212 genes detected by Schwanhausser et al. (2011), though because one major source of error could not be estimated the true percent contribution should be higher.We also employed a second, independent strategy to determine the contribution of mRNA levels to protein expression.We show that the variance in translation rates directly measured by ribosome profiling is only 12% of that inferred by Schwanhausser et al. (2011), and that the measured and inferred translation rates correlate poorly (R2 D 0.13). Based on this, our second strategy suggests that mRNA levels explain 81% of the variance in protein levels. We also determined the percent contributions of transcription, RNA degradation, translation and protein degradation to the variance in protein abundances using both of our strategies. While the magnitudes of the two estimates vary, they both suggest that transcription plays a more important role than the earlier studies implied and translation a much smaller role. Finally, the above estimates only apply to those genes whose mRNA and protein expression was detected. Based on a detailed analysis by Hebenstreit et al. (2012), we estimat that approximately 40% of genes in a given cell within a population express no mRNA. Since there can be no translation in the ab-sence of mRNA, we argue that differences in translation rates can play no role in determining the expression levels for the 40% of genes that are non-expressed.
Related studies that reveal issues that are not part of this chapter:
Ubiquitylation in relationship to tissue remodeling
Post-translational modification of proteins
Glycosylation
Phosphorylation
Methylation
Nitrosylation
Sulfation – sulfotransferases
cell-matrix communication
Acetylation and histone deacetylation (HDAC)
Connecting Protein Phosphatase to 1α (PP1α)
Acetylation complexes (such as CBP/p300 and PCAF)
Sirtuins
Rel/NF-kB Signal Transduction
Homologous Recombination Pathway of Double-Strand DNA Repair
Glycination
cyclin dependent kinases (CDKs)
lyase
transferase
This year, the Lasker award for basic medical research went to Kazutoshi Mori (Kyoto University) and Peter Walter (University of California, San Francisco) for their “discoveries concerning the unfolded protein response (UPR) — an intracellular quality control system that
detects harmful misfolded proteins in the endoplasmic reticulum and signals the nucleus to carry out corrective measures.”
About UPR: Approximately a third of cellular proteins pass through the Endoplasmic Reticulum (ER) which performs stringent quality control of these proteins. All proteins need to assume the proper 3-dimensional shape in order to function properly in the harsh cellular environment. Related to this is the fact that cells are under constant stress and have to make rapid, real time decisions about survival or death.
A major indicator of stress is the accumulation of unfolded proteins within the Endoplasmic Reticulum (ER), which triggers a transcriptional cascade in order to increase the folding capacity of the ER. If the metabolic burden is too great and homeostasis cannot be achieved, the response shifts from
damage control to the induction of pro-apoptotic pathways that would ultimately cause cell death.
This response to unfolded proteins or the UPR is conserved among all eukaryotes, and dysfunction in this pathway underlies many human diseases, including Alzheimer’s, Parkinson’s, Diabetes and Cancer.
The discovery of a new class of human proteins with previously unidentified activities
In a landmark study conducted by scientists at the Scripps Research Institute, The Hong Kong University of Science and Technology, aTyr Pharma and their collaborators, a new class of human proteins has been discovered. These proteins [nearly 250], called Physiocrines belong to the aminoacyl tRNA synthetase gene family and carry out novel, diverse and distinct biological functions.
The aminoacyl tRNA synthetase gene family codes for a group of 20 ubiquitous enzymes almost all of which are part of the protein synthesis machinery. Using recombinant protein purification, deep sequencing technique, mass spectroscopy and cell based assays, the team made this discovery. The finding is significant, also because it highlights the alternate use of a gene family whose protein product normally performs catalytic activities for non-catalytic regulation of basic and complex physiological processes spanning metabolism, vascularization, stem cell biology and immunology
Muscle maintenance and regeneration – key player identified
Muscle tissue suffers from atrophy with age and its regenerative capacity also declines over time. Most molecules discovered thus far to boost tissue regeneration are also implicated in cancers. During a quest to find safer alternatives that can regenerate tissue, scientists reported that the hormone Oxytocin is required for proper muscle tissue regeneration and homeostasis and that its levels decline with age.
Oxytocin could be an alternative to hormone replacement therapy as a way to combat aging and other organ related degeneration.
Oxytocin is an age-specific circulating hormone that is necessary for muscle maintenance and regeneration (June 2014)
Role of forkhead box protein A3 in age-associated metabolic decline.
Ma X1, Xu L1, Gavrilova O2, Mueller E3.
Aging is associated with increased adiposity and diminished thermogenesis, but the critical transcription factors influencing these metabolic changes late in life are poorly understood. We recently demonstrated that the winged helix factor forkhead box protein A3 (Foxa3) regulates the expansion of visceral adipose tissue in high-fat diet regimens; however, whether Foxa3 also contributes to the increase in adiposity and the decrease in brown fat activity observed during the normal aging process is currently unknown. Here we report that during aging, levels of Foxa3 are significantly and selectively up-regulated in brown and inguinal white fat depots, and that midage Foxa3-null mice have increased white fat browning and thermogenic capacity, decreased adipose tissue expansion, improved insulin sensitivity, and increased longevity. Foxa3 gain-of-function and loss-of-function studies in inguinal adipose depots demonstrated a cell-autonomous function for Foxa3 in white fat tissue browning. Furthermore, our analysis revealed that the mechanisms of Foxa3 modulation of brown fat gene programs involve the suppression of peroxisome proliferator activated receptor γ coactivtor 1 α (PGC1α) levels through interference with cAMP responsive element binding protein 1-mediated transcriptional regulation of the PGC1α promoter.
Asymmetric mRNA localization contributes to fidelity and sensitivity of spatially localized systems
Although many proteins are localized after translation, asymmetric protein distribution is also achieved by translation after mRNA localization. Why are certain mRNA transported to a distal location and translated on-site? Here we undertake a systematic, genome-scale study of asymmetrically distributed protein and mRNA in mammalian cells. Our findings suggest that asymmetric protein distribution by mRNA localization enhances interaction fidelity and signaling sensitivity. Proteins synthesized at distal locations frequently contain intrinsically disordered segments. These regions are generally rich in assembly-promoting modules and are often regulated by post-translational modifications. Such proteins are tightly regulated but display distinct temporal dynamics upon stimulation with growth factors. Thus, proteins synthesized on-site may rapidly alter proteome composition and act as dynamically regulated scaffolds to promote the formation of reversible cellular assemblies. Our observations are consistent across multiple mammalian species, cell types and developmental stages, suggesting that localized translation is a recurring feature of cell signaling and regulation.
An overview of the potential advantages conferred by distal-site protein synthesis, inferred from our analysis.
An overview of the potential advantages conferred by distal-site protein synthesis
Turquoise and red filled circle represents off-target and correct interaction partners, respectively. Wavy lines represent a disordered region within a distal site synthesis protein. Grey and red line in graphs represents profiles of t…
Tweaking transcriptional programming for high quality recombinant protein production
Since overexpression of recombinant proteins in E. coli often leads to the formation of inclusion bodies, producing properly folded, soluble proteins is undoubtedly the most important end goal in a protein expression campaign. Various approaches have been devised to bypass the insolubility issues during E. coli expression and in a recent report a group of researchers discuss reprogramming the E. coli proteostasis [protein homeostasis] network to achieve high yields of soluble, functional protein. The premise of their studies is that the basal E. coli proteostasis network is insufficient, and often unable, to fold overexpressed proteins, thus clogging the folding machinery.
By overexpressing a mutant, negative-feedback deficient heat shock transcription factor [σ32 I54N] before and during overexpression of the protein of interest, reprogramming can be achieved, resulting in high yields of soluble and functional recombinant target protein. The authors explain that this method is better than simply co-expressing/over-expressing chaperones, co-chaperones, foldases or other components of the proteostasis network because reprogramming readies the folding machinery and up regulates the essential folding components beforehand thus maintaining system capability of the folding machinery.
The Heat-Shock Response Transcriptional Program Enables High-Yield and High-Quality Recombinant Protein Production in Escherichia coli (July 2014)
Unfolded proteins collapse when exposed to heat and crowded environments
Proteins are important molecules in our body and they fulfil a broad range of functions. For instance as enzymes they help to release energy from food and as muscle proteins they assist with motion. As antibodies they are involved in immune defence and as hormone receptors in signal transduction in cells. Until only recently it was assumed that all proteins take on a clearly defined three-dimensional structure – i.e. they fold in order to be able to assume these functions. Surprisingly, it has been shown that many important proteins occur as unfolded coils. Researchers seek to establish how these disordered proteins are capable at all of assuming highly complex functions.
Ben Schuler’s research group from the Institute of Biochemistry of the University of Zurich has now established that an increase in temperature leads to folded proteins collapsing and becoming smaller. Other environmental factors can trigger the same effect.
Measurements using the “molecular ruler”
“The fact that unfolded proteins shrink at higher temperatures is an indication that cell water does indeed play an important role as to the spatial organisation eventually adopted by the molecules”, comments Schuler with regard to the impact of temperature on protein structure. For their studies the biophysicists use what is known as single-molecule spectroscopy. Small colour probes in the protein enable the observation of changes with an accuracy of more than one millionth of a millimetre. With this “molecular yardstick” it is possible to measure how molecular forces impact protein structure.
With computer simulations the researchers have mimicked the behaviour of disordered proteins.
(Courtesy of Jose EDS Roselino, PhD.
MLKL compromises plasma membrane integrity
Necroptosis is implicated in many diseases and understanding this process is essential in the search for new therapies. While mixed lineage kinase domain-like (MLKL) protein has been known to be a critical component of necroptosis induction, how MLKL transduces the death signal was not clear. In a recent finding, scientists demonstrated that the full four-helical bundle domain (4HBD) in the N-terminal region of MLKL is required and sufficient to induce its oligomerization and trigger cell death.
They also found a patch of positively charged amino acids on the surface of the 4HBD that bound to phosphatidylinositol phosphates (PIPs) and allowed the recruitment of MLKL to the plasma membrane that resulted in the formation of pores consisting of MLKL proteins, due to which cells absorbed excess water causing them to explode. Detailed knowledge about how MLKL proteins create pores offers possibilities for the development of new therapeutic interventions for tolerating or preventing cell death.
MLKL compromises plasma membrane integrity by binding to phosphatidylinositol phosphates (May 2014)
Mitochondrial and ER proteins implicated in dementia
Mitochondria and the endoplasmic reticulum (ER) form tight structural associations that facilitate a number of cellular functions. However, the molecular mechanisms of these interactions aren’t properly understood.
A group of researchers showed that the ER protein VAPB interacted with mitochondrial protein PTPIP51 to regulate ER-mitochondria associations and that TDP-43, a protein implicated in dementia, disturbs this interaction to regulate cellular Ca2+ homeostasis. These studies point to a new pathogenic mechanism for TDP-43 and may also provide a potential new target for the development of new treatments for devastating neurological conditions like dementia.
ER-mitochondria associations are regulated by the VAPB-PTPIP51 interaction and are disrupted by ALS/FTD-associated TDP-43. Nature (June 2014)
A novel strategy to improve membrane protein expression in Yeast
Membrane proteins play indispensable roles in the physiology of an organism. However, recombinant production of membrane proteins is one of the biggest hurdles facing protein biochemists today. A group of scientists in Belgium showed that,
by increasing the intracellular membrane production by interfering with a key enzymatic step of lipid synthesis,
enhanced expression of recombinant membrane proteins in yeast is achieved.
Specifically, they engineered the oleotrophic yeast, Yarrowia lipolytica, by
deleting the phosphatidic acid phosphatase, PAH1 gene,
which led to massive proliferation of endoplasmic reticulum (ER) membranes.
For all 8 tested representatives of different integral membrane protein families, they obtained enhanced protein accumulation.
An unconventional method to boost recombinant protein levels
MazF is an mRNA interferase enzyme in E.coli that functions as and degrades cellular mRNA in a targeted fashion, at the “ACA” sequence. This degradation of cellular mRNA causes a precipitous drop in cellular protein synthesis. A group of scientists at the Robert Wood Johnson Medical School in New Jersey, exploited the degeneracy of the genetic code to modify all “ACA” triplets within their gene of interest in a way that the corresponding amino acid (Threonine) remained unchanged. Consequently, induction of MazF toxin caused degradation of E.coli cellular mRNA but the recombinant gene transcription and protein synthesis continued, causing significant accumulation of high quality target protein. This expression system enables unparalleled signal to noise ratios that could dramatically simplify structural and functional studies of difficult-to-purify, biologically important proteins.
Tandem fusions and bacterial strain evolution for enhanced functional membrane protein production
Membrane protein production remains a significant challenge in its characterization and structure determination. Despite the fact that there are a variety of host cell types, E.coli remains the popular choice for producing recombinant membrane proteins. A group of scientists in Netherlands devised a robust strategy to increase the probability of functional membrane protein overexpression in E.coli.
By fusing Green Fluorescent Protein (GFP) and the Erythromycin Resistance protein (ErmC) to the C-terminus of a target membrane protein they wer e able to track the folding state of their target protein while using Erythromycin to select for increased expression. By increasing erythromycin concentration in the growth media and testing different membrane targets, they were able to identify four evolved E.coli strains, all of which carried a mutation in the hns gene, whose product is implicated in genome organization and transcriptional silencing. Through their experiments the group showed that partial removal of the transcriptional silencing mechanism was related to production of proteins that were essential for functional overexpression of membrane proteins.
The role of an anti-apoptotic factor in recombinant protein production
In a recent study, scientists at the Johns Hopkins University and Frederick National Laboratory for Cancer Research examined an alternative method of utilizing the benefits of anti-apoptotic gene expression to enhance the transient expression of biotherapeutics, specifically, through the co-transfection of Bcl-xL along with the product-coding target gene.
Chinese Hamster Ovary(CHO) cells were co-transfected with the product-coding gene and a vector containing Bcl-xL, using Polyethylenimine (PEI) reagent. They found that the cells co-transfected with Bcl-xL demonstrated reduced apoptosis, increased specific productivity, and an overall increase in product yield.
B-cell lymphoma-extra-large (Bcl-xL) is a mitochondrial transmembrane protein and a member of the Bcl-2 family of proteins which are known to act as either pro- or anti-apoptotic proteins. Bcl-xL itself acts as an anti-apoptotic molecule by preventing the release of mitochondrial contents such as cytochrome c, which would lead to caspase activation. Higher levels of Bcl-xL push a cell toward survival mode by making the membranes pores less permeable and leaky.
Introduction to Protein Synthesis and DegradationUpdated 8/31/2019
N-Terminal Degradation of Proteins: The N-End Rule and N-degrons
In both prokaryotes and eukaryotes mitochondria and chloroplasts, the ribosomal synthesis of proteins is initiated with the addition of the N-formyl methionine residue. However in eukaryotic cytosolic ribosomes, the N terminal was assumed to be devoid of the N-formyl group. The unformylated N-terminal methionine residues of eukaryotes is then often N-acetylated (Ac) and creates specific degradation signals, the Ac N-end rule. These N-end rule pathways are proteolytic systems which recognize these N-degrons resulting in proteosomal degradation or autophagy. In prokaryotes this system is stimulated by certain amino acid deficiencies and in eukaryotes is dependent on the Psh1 E3 ligase.
Two papers in the journal Science describe this N-degron in more detail.
In both bacteria and eukaryotic mitochondria and chloroplasts, the ribosomal synthesis of proteins is initiated with the N-terminal (Nt) formyl-methionine (fMet) residue. Nt-fMet is produced pretranslationally by formyltransferases, which use 10-formyltetrahydrofolate as a cosubstrate. By contrast, proteins synthesized by cytosolic ribosomes of eukaryotes were always presumed to bear unformylated N-terminal Met (Nt-Met). The unformylated Nt-Met residue of eukaryotic proteins is often cotranslationally Nt-acetylated, a modification that creates specific degradation signals, Ac/N-degrons, which are targeted by the Ac/N-end rule pathway. The N-end rule pathways are a set of proteolytic systems whose unifying feature is their ability to recognize proteins containing N-degrons, thereby causing the degradation of these proteins by the proteasome or autophagy in eukaryotes and by the proteasome-like ClpAP protease in bacteria. The main determinant of an N‑degron is a destabilizing Nt-residue of a protein. Studies over the past three decades have shown that all 20 amino acids of the genetic code can act, in cognate sequence contexts, as destabilizing Nt‑residues. The previously known eukaryotic N-end rule pathways are the Arg/N-end rule pathway, the Ac/N-end rule pathway, and the Pro/N-end rule pathway. Regulated degradation of proteins and their natural fragments by the N-end rule pathways has been shown to mediate a broad range of biological processes.
RATIONALE
The chemical similarity of the formyl and acetyl groups and their identical locations in, respectively, Nt‑formylated and Nt-acetylated proteins led us to suggest, and later to show, that the Nt-fMet residues of nascent bacterial proteins can act as bacterial N-degrons, termed fMet/N-degrons. Here we wished to determine whether Nt-formylated proteins might also form in the cytosol of a eukaryote such as the yeast Saccharomyces cerevisiae and to determine the metabolic fates of Nt-formylated proteins if they could be produced outside mitochondria. Our approaches included molecular genetic techniques, mass spectrometric analyses of proteins’ N termini, and affinity-purified antibodies that selectively recognized Nt-formylated reporter proteins.
RESULTS
We discovered that the yeast formyltransferase Fmt1, which is imported from the cytosol into the mitochondria inner matrix, can generate Nt-formylated proteins in the cytosol, because the translocation of Fmt1 into mitochondria is not as efficacious, even under unstressful conditions, as had previously been assumed. We also found that Nt‑formylated proteins are greatly up-regulated in stationary phase or upon starvation for specific amino acids. The massive increase of Nt-formylated proteins strictly requires the Gcn2 kinase, which phosphorylates Fmt1 and mediates its retention in the cytosol. Notably, the ability of Gcn2 to retain a large fraction of Fmt1 in the cytosol of nutritionally stressed cells is confined to Fmt1, inasmuch as the Gcn2 kinase does not have such an effect, under the same conditions, on other examined nuclear DNA–encoded mitochondrial matrix proteins. The Gcn2-Fmt1 protein localization circuit is a previously unknown signal transduction pathway. A down-regulation of cytosolic Nt‑formylation was found to increase the sensitivity of cells to undernutrition stresses, to a prolonged cold stress, and to a toxic compound. We also discovered that the Nt-fMet residues of Nt‑formylated cytosolic proteins act as eukaryotic fMet/N-degrons and identified the Psh1 E3 ubiquitin ligase as the recognition component (fMet/N-recognin) of the previously unknown eukaryotic fMet/N-end rule pathway, which destroys Nt‑formylated proteins.
CONCLUSION
The Nt-formylation of proteins, a long-known pretranslational protein modification, is mediated by formyltransferases. Nt-formylation was thought to be confined to bacteria and bacteria-descended eukaryotic organelles but was found here to also occur at the start of translation by the cytosolic ribosomes of a eukaryote. The levels of Nt‑formylated eukaryotic proteins are greatly increased upon specific stresses, including undernutrition, and appear to be important for adaptation to these stresses. We also discovered that Nt-formylated cytosolic proteins are selectively destroyed by the eukaryotic fMet/N-end rule pathway, mediated by the Psh1 E3 ubiquitin ligase. This previously unknown proteolytic system is likely to be universal among eukaryotes, given strongly conserved mechanisms that mediate Nt‑formylation and degron recognition.
(Top) Under undernutrition conditions, the Gcn2 kinase augments the cytosolic localization of the Fmt1 formyltransferase, and possibly also its enzymatic activity. Consequently, Fmt1 up-regulates the cytosolic fMet–tRNAi (initiator transfer RNA), and thereby increases the levels of cytosolic Nt-formylated proteins, which are required for the adaptation of cells to specific stressors. (Bottom) The Psh1 E3 ubiquitin ligase targets the N-terminal fMet-residues of eukaryotic cytosolic proteins, such as Cse4, Pgd1, and Rps22a, for the polyubiquitylation-mediated, proteasome-dependent degradation.
(Top) Under undernutrition conditions, the Gcn2 kinase augments the cytosolic localization of the Fmt1 formyltransferase, and possibly also its enzymatic activity. Consequently, Fmt1 up-regulates the cytosolic fMet–tRNAi (initiator transfer RNA), and thereby increases the levels of cytosolic Nt-formylated proteins, which are required for the adaptation of cells to specific stressors. (Bottom) The Psh1 E3 ubiquitin ligase targets the N-terminal fMet-residues of eukaryotic cytosolic proteins, such as Cse4, Pgd1, and Rps22a, for the polyubiquitylation-mediated, proteasome-dependent degradation.
The second paper describes a glycine specific N-degron pathway in humans. Specifically the authors set up a screen to identify specific N-terminal degron motifs in the human. Findings included an expanded repertoire for the UBR E3 ligases to include substrates with arginine and lysine following an intact initiator methionine and a glycine at the extreme N-terminus, which is a potent degron.
Glycine N-degron regulation revealed
For more than 30 years, N-terminal sequences have been known to influence protein stability, but additional features of these N-end rule, or N-degron, pathways continue to be uncovered. Timms et al. used a global protein stability (GPS) technology to take a broader look at these pathways in human cells. Unexpectedly, glycine exposed at the N terminus could act as a potent degron; proteins bearing N-terminal glycine were targeted for proteasomal degradation by two Cullin-RING E3 ubiquitin ligases through the substrate adaptors ZYG11B and ZER1. This pathway may be important, for example, to degrade proteins that fail to localize properly to cellular membranes and to destroy protein fragments generated during cell death.
The ubiquitin-proteasome system is the major route through which the cell achieves selective protein degradation. The E3 ubiquitin ligases are the major determinants of specificity in this system, which is thought to be achieved through their selective recognition of specific degron motifs in substrate proteins. However, our ability to identify these degrons and match them to their cognate E3 ligase remains a major challenge.
RATIONALE
It has long been known that the stability of proteins is influenced by their N-terminal residue, and a large body of work over the past three decades has characterized a collection of N-end rule pathways that target proteins for degradation through N-terminal degron motifs. Recently, we developed Global Protein Stability (GPS)–peptidome technology and used it to delineate a suite of degrons that lie at the extreme C terminus of proteins. We adapted this approach to examine the stability of the human N terminome, allowing us to reevaluate our understanding of N-degron pathways in an unbiased manner.
RESULTS
Stability profiling of the human N terminome identified two major findings: an expanded repertoire for UBR family E3 ligases to include substrates that begin with arginine and lysine following an intact initiator methionine and, more notably, that glycine positioned at the extreme N terminus can act as a potent degron. We established human embryonic kidney 293T reporter cell lines in which unstable peptides that bear N-terminal glycine degrons were fused to green fluorescent protein, and we performed CRISPR screens to identify the degradative machinery involved. These screens identified two Cul2 Cullin-RING E3 ligase complexes, defined by the related substrate adaptors ZYG11B and ZER1, that act redundantly to target substrates bearing N-terminal glycine degrons for proteasomal degradation. Moreover, through the saturation mutagenesis of example substrates, we defined the composition of preferred N-terminal glycine degrons specifically recognized by ZYG11B and ZER1.
We found that preferred glycine degrons are depleted from the native N termini of metazoan proteomes, suggesting that proteins have evolved to avoid degradation through this pathway, but are strongly enriched at annotated caspase cleavage sites. Stability profiling of N-terminal peptides lying downstream of all known caspase cleavages sites confirmed that Cul2ZYG11Band Cul2ZER1 could make a substantial contribution to the removal of proteolytic cleavage products during apoptosis. Last, we identified a role for ZYG11B and ZER1 in the quality control of N-myristoylated proteins. N-myristoylation is an important posttranslational modification that occurs exclusively on N-terminal glycine. By profiling the stability of the human N-terminome in the absence of the N-myristoyltransferases NMT1 and NMT2, we found that a failure to undergo N-myristoylation exposes N-terminal glycine degrons that are otherwise obscured. Thus, conditional exposure of glycine degrons to ZYG11B and ZER1 permits the selective proteasomal degradation of aberrant proteins that have escaped N-terminal myristoylation.
CONCLUSION
These data demonstrate that an additional N-degron pathway centered on N-terminal glycine regulates the stability of metazoan proteomes. Cul2ZYG11B– and Cul2ZER1-mediated protein degradation through N-terminal glycine degrons may be particularly important in the clearance of proteolytic fragments generated by caspase cleavage during apoptosis and in the quality control of protein N-myristoylation.
Stability profiling of the human N-terminome revealed that N-terminal glycine acts as a potent degron. CRISPR screening revealed two Cul2 complexes, defined by the related substrate adaptors ZYG11B and ZER1, that recognize N-terminal glycine degrons. This pathway may be particularly important for the degradation of caspase cleavage products during apoptosis and the removal of proteins that fail to undergo N-myristoylation.
Stability profiling of the human N-terminome revealed that N-terminal glycine acts as a potent degron. CRISPR screening revealed two Cul2 complexes, defined by the related substrate adaptors ZYG11B and ZER1, that recognize N-terminal glycine degrons. This pathway may be particularly important for the degradation of caspase cleavage products during apoptosis and the removal of proteins that fail to undergo N-myristoylation.
The following is a second in the 2nd series that is focused on the topic of the impact of genomics and transcriptomics in the evolution of 21st century of medicine, which shall have to be more efficient and more effective by the end of this decade, if the prediction for the funding of Medicare is expected to run out. Even so, Social Security was devised by none other than the Otto von Bismarck, who unified Germany, and United Kingdom has had a charity hospital care system begun to protect the widows of the ravages of war, and nursing was developed by Florence Nightengale as a result of the experience of war. It can only be concluded that the care for the elderly, the infirm, and those who have little resources to live on has a long history in western civilization, and it will not cease to exist as a public social obligation anytime soon. The 20th century saw an explosive development of physics; organic, inorganic, biochemistry, and medicinal chemistry, and the elucidation of the genetic code and its mechanism of translation in plants, microorganisms, and eukaryotes. All of which occurred irrespective of the most horrendous wars that have reshaped the world map.
The following are the second portions of a puzzle in construction that is intended to move into deeper complexities introduced by proteomics, cell metabolism, metabolomics, and signaling. This is the only manner by which I can begin to appreciate what a wonder it is to view and live in this world with all its imperfections.
We have already visited the transcription process, by which an RNA sequence is read. This is essential for protein synthesis through the ordering of the amino acids in the primary structure. However, there are microRNAs and noncoding RNAs, and there are transcription factors. The transcription factors bind to chromatin, and the RNAs also have some role in regulating the transcription process. We shall examine this further.
Quantifying transcription factor kinetics: At work or at play?
Posted online on September 11, 2013. (doi:10.3109/10409238.2013.833891)
Florian Mueller1,2, Timothy J. Stasevich3, Davide Mazza4, and James G. McNally5
1Institut Pasteur, Computational Imaging and Modeling Unit, CNRS, Paris, Fr
2Functional Imaging of Transcription, Institut de Biologie de l’Ecole Normale Supérieure, Paris, Fr
3Graduate School of Frontier Biosciences, Osaka University, Osaka, Jp
4Istituto Scientifico Ospedale San Raffaele, Centro di Imaging Sperimentale e Università Vita-Salute
San Raffaele, Milano, It, and
5Fluorescence Imaging Group, National Cancer Institute, NIH, Bethesda, MD, USA
Transcription factors (TFs) interact dynamically in vivo with chromatin binding sites. Here we summarize and compare the four different techniques that are currently used to measure these kinetics in live cells, namely fluorescence recovery after photobleaching (FRAP), fluorescence correlation spectroscopy (FCS), single molecule tracking (SMT) and competition ChIP (CC). We highlight the principles underlying each of these approaches as well as their advantages and disadvantages. A comparison of data from each of these techniques raises an important question: do measured transcription kinetics reflect biologically functional interactions at specific sites (i.e. working TFs) or do they reflect non-specific interactions (i.e. playing TFs)? To help resolve this dilemma we discuss five key unresolved biological questions related to the functionality of transient and prolonged binding events at both specific promoter response elements as well as non-specific sites. In support of functionality, we review data suggesting that TF residence times are tightly regulated, and that this regulation modulates transcriptional output at single genes. We argue that in addition to this site-specific regulatory role, TF residence times also determine the fraction of promoter targets occupied within a cell thereby impacting the functional status of cellular gene networks. Thus, TF residence times are key parameters that could influence transcription in multiple ways.
The Transcription Factor Titration Effect Dictates Level of Gene ExpressionCalifornia Institute of Technology
Robert C. Brewster, Franz M. Weinert, Hernan G. Garcia, Dan Song, Mattias Rydenfelt, and Rob Phillips CalTech
Cell Mar 13, 2014; 156:1312–1323,.
Models of transcription are often built around a picture of RNA polymerase and transcription factors (TFs) acting on a single copy of a promoter. However, most TFs are shared between multiple genes with varying binding affinities. Beyond that, genes often exist at high copy number—in multiple identical copies on the chromosome or on plasmids or viral vectors with copy numbers in the hundreds. Using a thermodynamic model, we characterize the interplay between TF copy number and the demand for that TF. We demonstrate the parameter-free predictive power of this model as a function of the copy number of the TF and the number and affinities of the available specific binding sites; such predictive control is important for the understanding of transcription and the desire to quantitatively design the output of genetic circuits. Finally, we use these experiments to dynamically measure plasmid copy number through the cell cycle.
Optimal reference genes for normalization of qRT-PCR data from archival formalin-fixed, paraffin-embedded breast tumors controlling for tumor cell content and decay of mRNA.
Gene-expression analysis is increasingly performed on degraded mRNA from formalin-fixed, paraffin-embedded tissue (FFPE), giving the option of examining retrospective cohorts. The aim of this study was to select robust reference genes showing stable expression over time in FFPE, controlling for various content of tumor tissue and decay of mRNA because of variable length of storage of the tissue.
Sixteen reference genes were quantified by qRT-PCR in 40 FFPE breast tumor samples, stored for 1 to 29 years. Samples included 2 benign lesions and 38 carcinomas with varying tumor content. Stability of the reference genes were determined by the geNorm algorithm. mRNA was successfully extracted from all samples, and the 16 genes quantified in the majority of samples.
Results showed 14% loss of amplifiable mRNA per year, corresponding to a half-life of 4.6 years. The 4 most stable expressed genes were CALM2, RPL37A, ACTB, and RPLP0. Several of the other examined genes showed considerably instability over time (GAPDH, PSMC4, OAZ1, IPO8).
In conclusion, we identified 4 genes robustly expressed over time and independent of neoplastic tissue content in the FFPE block. PMID:23846446
Structures of Cas9 Endonucleases Reveal RNA-Mediated Conformational Activation
1Department of Biochemistry, University of Zurich, CH-8057 Zurich, Switzerland. 2Department of Molecular and Cell Biology,3Howard Hughes Medical Institute, 4California Institute for Quantitative Biosciences, 5Department of Chemistry, 6Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720,. 7The Laboratory for Molecular Infection Medicine Sweden, Umeå University, Umeå S-90187, Sweden. 8Helmholtz Centre for Infection Research, Department of Regulation in Infection Biology, D-38124 Braunschweig, Germany. 9Hannover Medical School, D-30625 Hannover, Germany. 10Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720.
↵‡ Present address: Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66 CH-4058 Basel, Switzerland.
↵§ Present address: Department of Agricultural and Biological Engineering, University of Florida, Gainesville, FL 32611, USA.
Type II CRISPR-Cas systems use an RNA-guided DNA endonuclease, Cas9,
to generate double-strand breaks in invasive DNA during an adaptive bacterial immune response.
Cas9 has been harnessed as a powerful tool for genome editing and gene regulation in many eukaryotic organisms.
Here, we report 2.6 and 2.2 Å resolution crystal structures of two major Cas9 enzymes subtypes,
revealing the structural core shared by all Cas9 family members.
The architectures of Cas9 enzymes define nucleic acid binding clefts, and
single-particle electron microscopy reconstructions show that the two structural lobes harboring these clefts undergo guide
RNA-induced reorientation to form a central channel where DNA substrates are bound.
The observation that extensive structural rearrangements occur before target DNA duplex binding
implicates guide RNA loading as a key step in Cas9 activation.
MicroRNA function in endothelial cells
Dr. Virginie Mattot
Angiogenesis, endothelium activation
Solving the mystery of an unknown target gene using microRNA Target Site Blockers
Dr. Virgine Mattot works in the team “Angiogenesis, endothelium activation and Cancer” directed by Dr. Fabrice Soncin at the Institut de Biologie de Lille in France where she studies the roles played by microRNAs in endothelial cells during physiological and pathological processes such as angiogenesis or endothelium activation. She has been using Target Site Blockers to investigate the role of microRNAs on putative targets which functions are yet unknown.
What is the main focus of the research conducted in your lab?
We are studying endothelial cell functions with a particular interest in angiogenesis and endothelium activation during physiological and tumoral vascular development.
How did your research lead to the study of microRNAs?
A few years ago, we identified
an endothelial cell-specific gene which
harbors a microRNA in its intronic sequence.
We have since been working on understanding the functions of
both this new gene and its intronic microRNA in endothelial cells.
What is the aim of your current project?
While we were searching for the functions of the intronic microRNA,
we identified an unknown gene as a putative target.
The aim of my project was to investigate if this unknown gene was actually a genuine target and if regulation of this gene by the microRNA was involved in endothelial cell function. We had already characterized the endothelial cell phenotype associated with the inhibition of our intronic microRNA. We then used miRCURY LNA™ Target Site Blockers to demonstrate
the expression of this unknown gene is actually controlled by this microRNA.
the microRNA regulates specific endothelial cell properties through regulation of this unknown gene.
How did you perform the experiments and analyze the results?
LNA™ enhanced target site blockers (TSB) for our microRNA were designed by Exiqon. We
transfected the TSBs into endothelial cells using our standard procedure and
analysed the induced phenotype.
As a control for these experiments, a mutated version of the TSB was designed by Exiqon and transfected into endothelial cells. We first verified that this TSB was functional by analyzing
the expression of the miRNA target against which the TSB was directed
we then showed the TSB induced similar phenotypes as those when we inhibited the microRNA in the same cells.
What do you find to be the main benefits/advantage of the LNA™ microRNA target site blockers from Exiqon?
Target Site Blockers are efficient tools to demonstrate the specific involvement of
putative microRNA targets in the function played by this microRNA.
What would be your advice to colleagues about getting started with microRNA functional analysis?
it is essential to perform both gain and loss of functions experiments.
Changing the core of transcription
Different members of the TAF family of proteins work in differentiated cells, such as motor neurons or brown fat cells, to control the expression of genes that are specific to each cell type.
Related research articles: Herrera FJ, Yamaguchi T, Roelink H, Tjian R. 2014. Core promoter factor TAF9B regulates neuronal gene expression. eLife 3:e02559. http://dx.doi.org:/10.7554eLife.02559
Zhou H, Wan B, Grubisic I, Kaplan T, Tjian R. 2014. TAF7L modulates brown adipose tissue formation. eLife 3:e02811. Http://dx.doi.org:/10.7554/eLife.02811
Motor neurons (green) being grown in vitro
Image Motor neurons (green) being grown in vitro
In a developing organism, different genes are expressed at different times
the pattern of gene expression can often change abruptly.
Expressing a gene involves multiple steps:
the DNA must be transcribed into a molecule of messenger RNA,
which is then translated into a protein.
The mechanisms that start the transcription of protein-coding genes in rapidly growing cells are reasonably well understood: two types of proteins—
DNA-binding activators and general transcription factors—
cooperate to recruit an enzyme called RNA polymerase, which then transcribes the gene (Kadonaga, 2012).
These proteins bind to a region of the gene called the promoter, which is
upstream from the protein-coding region of the gene.
TATA-binding protein is a general transcription factor that
binds to certain sequences of DNA bases found within promoters
14 TATA-binding protein associated factors (TAFs) are included into two different protein complexes called TFIID and SAGA (Müller et al., 2010). which, in budding yeast, can recruit TATA-binding protein to gene promoters (Basehoar et al., 2004), but not all genes require all of the general transcription factors, and some genes require both TFIID and SAGA complexes.
Although the steps that are required to switch on genes when cells are rapidly dividing are fairly well known,
the same is not true for cells that are differentiating into specialised cell types.
In these cells, many transcription factors are downregulated and
the entire pattern of gene expression changes dramatically.
Moreover, certain TAFs are strongly up-regulated during differentiation. The core transcriptional machinery is essentially rebuilt at the genes that are expressed in differentiated cells.
Over the years Robert Tjian of the University of California Berkeley and co-workers have illuminated how individual TAFs can affect how a cell differentiates in different contexts (Figure 1). Now, in eLife, Francisco Herrera of UC Berkeley and co-workers—including Teppei Yamaguchi, Henk Roelink and Tjian—have identified a critical role for a TAF called TAF9B in the expression of genes in motor neurons (Herrera et al., 2014).
Herrera et al. found that TAF9B predominantly associates with the SAGA complex, rather than the TFIID complex, in the motor neuron cells. Mice in which the gene for TAF9B had been deleted had less neuronal tissue in the developing spinal cord. Moreover, the genes that are involved in forming the branches of neurons were not properly regu¬lated in these mice.
Recently, in another eLife paper, Tjian and co-workers at Berkeley, Fudan University and the Hebrew University of Jerusalem—including Haiying Zhou as first author, Bo Wan, Ivan Grubisic and Tommy Kaplan—reported that another TAF protein, called TAF7L, works as part of the TFIID complex to up-regulate genes that direct cells to become brown adipose tissue (Zhou et al., 2014).
TATA-binding protein associated factors
Figure 1. TATA-binding protein associated factors (TAFs) regulate transcription in specific cell types. TAF3, for example, works with another transcription factor to regulate the expression of genes that are critical for the differentiation of the endoderm in the early embryo (Liu et al., 2011). TAF3 also forms a complex with the TATA-related factor, TRF3, to regulate Myogenin and other muscle-specific genes to form myotubes (Deato et al., 2008). TAF7L interacts with another transcription factor to activate genes involved in the formation of adipocytes (‘fat cells’) and adipose tissue (Zhou et al., 2013; Zhou et al., 2014). Finally, TAF9B is a key regulator of transcription in motor neurons (Herrera et al., 2014). The names of some of the genes regulated by the TAFs are shown in brackets.
TAF9B
Deleting the gene for TAF9B in mouse embryonic stem cells revealed that this TAF
is not needed for the growth of stem cells, or
required for the expression of genes that prevent differentiation:
both of these processes are known to be highly-dependent upon the TFIID complex
(Pijnappel et al., 2013). However,
genes that would normally be expressed specifically in neurons were not
up-regulated when cells without the TAF9B gene started to specialise.
Herrera et al. identified numerous genes that can only be switched on when the TAF9B protein is present, which means that it joins a growing list of TAF proteins that are dedicated to controllingthe expression of genes in specialised cell types.
TAF9B activates neuron-specific genes by binding to sites that
reside outside of these genes’ core promoters.
Further, many of these sites were also bound by a master regulator of motor neuron-specific genes.
TAF7L
Whilst most of the fat tissue in humans is white adipose tissue, which contains cells that store fatty molecules, some is brown adipose tissue, or ‘brown fat’, that instead generates heat. When TAF7L promotes the differentiation of brown fat, it up-regulates genes that are targeted by a transcription
factor called PPAR-γ; last year it was shown that this transcription factor also promotes the differentiation of white adipose tissue (Zhou et al., 2013).
Mice without the TAF7L gene had 40% less brown fat than wild-type mice, and also grew too much skeletal muscle tissue. TAF7L was specifically required to activate genes that control how brown fat develops and functions. Thus TAF7L expression appears to shift the fate of a stem cell towards brown adipose tissue, potentially at the expense of skeletal muscle, as both cell types develop from the same group of stem cells.
When stem cells with less TAF7L than normal are differentiated in vitro, they yield more muscle than fat cells. Conversely, cells with an excess of TAF7L express brown fat-specific genes and switch off muscle-specific genes.
The work of Herrera et al. and Zhou et al. reinforces the idea that different TAFs
provide the flexibility needed to control gene expression in a tissue-specific manner, and
enable differentiating cells to change which genes they express rapidly.
However many interesting questions remain:
Which signals lead to the destruction of core transcription factors?
Are core promoter elements at tissue-specific genes designed to recognise variant TAFs?
What determines whether variant TAFs are incorporated within TFIID, SAGA, or other complexes?
Shortly after RNA polymerase II starts to transcribe a gene, it briefly pauses. Interestingly, a DNA sequence associated with this pausing, called the pause button, closely matches the sequences that bind to two subunits of TFIID (TAF6 and TAF9; Kadonaga, 2012). Consequently, TAF6 and TAF9 might be involved in pausing transcription, and if so, the variant TAF9B could play a similar role at motor neuron genes.
During RNA synthesis, RNA polymerase moves erratically along DNA, frequently
resting as it produces an RNA copy of the DNA sequence. Such pausing helps coordinate the appearance of a transcript with its utilization by cellular processes; to this end,
the movement of RNA polymerase is modulated by mechanisms that determine its rate. For example,
pausing is critical to regulatory activities of the enzyme such as the termination of transcription. It is also
essential during early modifications of eukaryotic RNA polymerase II that activate the enzyme for elongation.
Two reports analyzing transcription pausing on a global scale in Escherichia coli, by Larson et al. ( 1) and by Vvedenskaya et al. ( 2) on page 1285 of this issue, suggest
new functions of pausing and important aspects of its molecular basis.
The studies of Larson et al. and Vvedenskaya et al. follow decades of analysis of
bacterial transcription that has illuminated the molecular basis of polymerase pausing
events that serve critical regulatory functions.
A transcription pause specified by the DNA sequence synchronizes the translation of RNA into protein
with the transcription of leader regions of operons (groups of genes transcribed together) for amino acid biosynthesis;
this coordination controls amino acid synthesis in response to amino acid availability ( 3).
A protein induced pause occurs when the E. coli initiation factor σ70 restrains RNA polymerase by binding a second occurrence of the “–10” promoter element.
This paused polymerase provides a structure for engaging a transcription antiterminator (the bacteriophage λ Q protein) ( 4) that, in turn, inhibits transcription
pauses, including those essential for transcription termination.
Biochemical and structural analyses have identified an endpoint of the pausing process called the “elemental pause” in which the catalytic structure in the active site is distorted,
preventing further nucleotide addition ( 7).
The elemental paused state also involves distinct
conformational changes in the polymerase that may favor transcription termination
and allow the his and related pauses to be stabilized by RNA hairpins ( 8).
A consensus sequence for ubiquitous pauses was identified, with two important elements:
a preference for pyrimidine [mostly cytosine (C)] at the newly formed RNA end
followed by G to be incorporated next—just as found for the his pause; and a preference for G at position –10 of the RNA (10 nucleotides before the 3’ end)
Polymerase, paused
Polymerase, paused. During transcription, RNA exists in two states as RNA polymerase progresses: pretranslocated, just after the addition of the last nucleotide [here, cytosine (C)];
and posttranslocated, after all nucleic acids have shifted in register by one nucleotide relative
to the enzyme, exposing the active site for binding of the next substrate molecule [here, guanine (G)]. The pretranslocated state is dominant in the pause. The critical G-C base (RNA-DNA) pair at position –10 in the pretranslocated state and the nontemplate DNA strand G bound in the
polymerase in the posttranslocated state are marked with an asterisk.
Binding of G at position 1 to CRE only occurs in the posttranslocated state, which would thus
be favored over the pretranslocated state. Hence, if G binding inhibits pausing, then the rate-limiting paused structure must be in the pretranslocated state (a conclusion also made by Larson et al. from biochemical experiments).
This is an important insight into the sequence of protein–nucleic acid interactions that occur in pausing. Vvedenskaya et al. suggest that the actual role of the G binding site is to promote translocation and thus
inhibit pausing, to smooth out adventitious pauses in genomic DNA.
The studies by Larson et al. and Vvedenskaya et al. provide a refined and detailed analysis of DNA sequence–induced transcription pausing.
Processive Antitermination
Robert A. Weisberg1* and Max E. Gottesman2
Section on Microbial Genetics, Laboratory of Molecular Genetics, National Institute of Child Health and
Human Development, National Institutes of Health, Bethesda, Maryland 20892-2785,1 and
Institute of Cancer Research, Columbia University, New York, New York 100322
Journal Of Bacteriology, Jan. 1999; 181(2): 359–367.
After initiating synthesis of RNA at a promoter, RNA polymerase (RNAP) normally continues to elongate the transcript until it reaches a termination site. Important elements of termination sites are transcribed before polymerase translocation stops, and the resulting RNA is an active element of the termination pathway. Nascent transcripts of intrinsic sites can halt transcription without the assistance of additional factors, and
those of Rho-dependent sites recruit the Rho termination protein to the elongation complex. In both cases, RNAP, the transcript, and the template dissociate (reviewed in references 76 and 80).
Termination is rarely, if ever, completely efficient, and the expression of downstream genes can be controlled by altering the efficiency of terminator readthrough. Two distinct mechanisms of elongation control have been reported for bacterial RNA polymerases. In one, exemplified by attenuation of the his and trp operons of Salmonella typhimurium and Escherichia coli, respectively,
a single terminator is inactivated by interaction with an upstream sequence in the transcript, with a terminator-specific protein, or with a translating ribosome that follows closely behind RNAP (reviewed in references 35 and 104).
In a second, whose prototype is antitermination of phage l early transcription,
polymerase is stably modified to a terminator-resistant form after it leaves the promoter.
In this case, the modified enzyme not only transcribes through sequential downstream terminators,
but also it is less sensitive to the pause sites that normally delay transcript elongation.
Both pathways are widespread in nature, but in this minireview we consider only the second,
known as processive antitermination
(for previous reviews, see references 22, 23, 27, and 32).
The recent explosive growth in our understanding of transcription elongation (reviewed in references 57, 96, and 99) make this an especially appropriate time to survey regulatory elements that target the transcription elongation complex.
Antitermination in l is induced by two quite distinct mechanisms.
the result of interaction between l N protein and its targets in the early phage transcripts,
an interaction between the l Q protein and its target in the late phage promoter.
We describe the N mechanism first. Lambda N, a small basic protein of the arginine- rich motif (ARM) (Fig. 1) family of RNA binding proteins, binds to a 15-nucleotide (nt) stem-loop called BOXB (17) (Fig. 2).
FIG. 1. [not shown] (A) Alignment of phage N proteins and the HK022 Nun protein. The color groupings reflect the frequency of amino acid substitutions in evolutionarily related protein domains: an amino acid is more likely to be replaced by one in the same color group than by one in a different color group in related proteins (34).
The amino-proximal ARM regions were aligned by eye and according to the structures of the P22 and l ARMs complexed to their cognate nut sites (see text and Fig. 2), and the remainder of the proteins was aligned by ClustalW (38). The dots indicate gaps introduced to improve the alignment. Aside from the ARM regions, the
proteins fall into three very distantly related (or unrelated) families: (i) l and phage 21; (ii) P22, phage L, and HK97; and (iii) HK022 Nun.
FIG. 2. [not shown] BOXA and BOXB RNAs and their interaction with the ARM of their cognate N proteins. The amino acid-nucleotide interactions are shown to the left except for BOXB of phage 21, for which the structure of the complex is unknown. The sequences of BOXA and BOXA-BOXB spacer are shown to the right. The dots
to the left and right of the spacer sequences are for alignment. (A) l N-ARM-BOXB complex (adapted from reference 48 with permission of the publisher). Open circles, pentagons, and rectangles represent phosphates, riboses, and bases, respectively. Watson-Crick base pairs (????) are indicated. The zigzag line denotes a sheared
G z A base pair. Open circles, open rectangles, and arrowheads depict ionic, hydrophobic, and hydrogen-bonding interactions, respectively. Guanine-11, indicated by a bold rectangle, is extruded from the BOXB loop (see text). (B) P22 N-ARM-BOXB complex (adapted from reference 15 with permission of the publisher). Open
circles, pentagons, rectangles, and ovals represent phosphates, riboses, bases, and amino acids, respectively. The solid pentagons indicate riboses with a C29-endo pucker.
Base stacking ( ), intermolecular hydrogen bonding or electrostatic interactions (,—–), intermolecular hydrophobic or van der Waals interactions (4), intramolecular hydrogen bonds (– – – –) and Watson-Crick base pairs (?????) are indicated. Cytosine-11 is extruded from the loop (see text). Note that the amino-terminal amino acid
residue in the complex corresponds to Asn-14 in the complete protein (Fig. 1), and the displayed amino acids are numbered accordingly. (C) NUTL site of phage 21. The arrows indicate the inverted sequence repeats of BOXB.
FIG. 3. [not skown] HK022 put sites and folded PUT RNAs. (A) Alignment of putL and putR (43). The numbers give distances from the start sites of the PL and PR promoters, respectively, and the pairs of arrows indicate inverted sequence repeats. (B) Folded PUTL and PUTR RNAs. The structures, which were generated by energy
minimization as described (43), have been partially confirmed by genetic and biochemical studies (7, 43).
The active bacterial elongation complex consists of
core RNAP,
template, and
RNA product.
The 39 end of the RNA
is engaged in the active site of the enzyme,
The following ;8 nt are hybridized to the template strand of the DNA, and
the next ;9 nt remain closely associated with RNAP (64).
About 17 nt of the nontemplate DNA strand are separated from the template strand in the transcription bubble.
Elongation complexes can also contain NusA and/or NusG. These proteins, which
increase the stability of the N-mediated antitermination complex (see above),
have different effects on elongation.
NusA decreases and NusG increases the elongation rate, and
both proteins alter termination efficiency in a terminator-specific manner (13, 14, 86; see reference 76).
An elongation complex, unless located at a terminator, is extraordinarily stable,
even when translocation is prevented by removal of substrates.
Recent observations suggest that this stability depends mainly on
interactions between RNAP and the RNA-DNA hybrid as well as
between polymerase and the downstream duplex DNA template (63, 87).
Nascent RNA emerging from the hybrid region and upstream duplex DNA
do not appear to be required.
The strength of the RNA-DNA hybrid is believed to
assure the lateral stability of the complex.
Reducing the strength of the RNA-DNA bonds, for example
by incorporation of nucleotide analogs,
favors backsliding of RNAP on the template, with consequent
disengagement of the 39 RNA end from the active site, and
concerted retreat of the RNA-DNA hybrid region from the 39 end (65).
Such a disengaged complex retains its resistance to dissociation and
is capable of resuming elongation if the original or a newly created 39 end reengages with the active site (10, 44, 45, 65, 71, 95).
Intrinsic terminators consist of a guanine- and cytosine-rich RNA hairpin stem
immediately followed by a short uracil-rich segment
within which termination can occur.
If termination does not occur at this point,
polymerase continues to elongate the transcript with normal processivity
until it reaches the next terminator.
Neither the stem nor the uracil-rich segment
is sufficient for termination, although
either can transiently slow elongation.
The weakness of base pairing between rU and dA
destabilizes the RNA-DNA hybrid in the uracil-rich segment, and
this probably contributes to termination.
Formation of the hairpin stem as nascent terminator RNA emerges from polymerase
destabilizes the RNA-DNA hybrid and
interrupts contacts between the emerging nascent RNA and RNAP (62a).
It might also interfere with the stabilizing interactions between
RNAP and the hybrid or those between RNAP and
the downstream region of the template.
Cross-linking of nucleic acid to RNAP suggests that
both the downstream DNA and the nascent RNA
that emerges from the hybrid region, and
within which the terminator hairpin might form,
are located close to the same regions of the enzyme (64).
Conversely, modifications that render RNAP termination resistant
could prevent the terminator stem from destabilizing one or more of these targets,
at least while the 39 end of the RNA is within the uracil rich segment of the terminator.
The l N and Q proteins and HK022 PUT RNA
also suppress Rho-dependent terminators (43a, 79, 103) which,
in contrast to intrinsic terminators, lack a precisely determined termination point.
Rho is an RNA-dependent ATPase that binds to cytosine-rich, unstructured regions in nascent RNA and acts preferentially
to terminate elongation complexes that are paused at nearby downstream sites
(19, 29, 46, 47, 59, 60).
Rho possesses RNA-DNA helicase activity, and this activity is directional,
unwinding DNA paired to the 39 end of the RNA molecule (11, 90).
This corresponds to the location of the hybrid and of RNAP
in an active ternary elongation complex.
The ability of antiterminators to suppress Rho-dependent and -independent terminators
suggests that they prevent a step that is common to both classes.
Given the helicase activity of Rho, a likely candidate for this step is disruption of the RNA-DNA
hybrid. However, other candidates, such as destabilization of RNAP-template or RNAP-hybrid interactions, are also plausible.
Alternatively, the ability of N, Q, and PUT to suppress RNAP pausing (31, 43, 54, 74)
suggests that they prevent Rho-dependent termination
by accelerating polymerase away from Rho bound at upstream RNA sites.
This explanation raises the problem of why NusG,
which also accelerates polymerase,
enhances rather than suppresses Rho-dependent termination (see above).
Clearly, the molecular details of processive antitermination remain poorly understood despite the 30 years that have elapsed since its discovery.
System wide analyses have underestimated protein abundances and the importance of transcription in mammals
OPEN ACCESS
Jingyi Jessica Li1, 2, Peter J Bickel1 and Mark D Biggin3
1Department of Statistics, University of California, Berkeley, CA, USA
2Departments of Statistics and Human Genetics, University of California, Los Angeles, CA, USA
3Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Large scale surveys in mammalian tissue culture cells suggest that the protein ex-
pressed at the median abundance is present at 8,000_16,000 molecules per cell and
that differences in mRNA expression between genes explain only 10_40% of the dif-
ferences in protein levels. We find, however, that these surveys have significantly un-
derestimated protein abundances and the relative importance of transcription.
Using individual measurements for 61 housekeeping proteins to rescale whole proteome
data from Schwanhausser et al. (2011), we find that the median protein detected is
expressed at 170,000 molecules per cell and that our corrected protein abundance
estimates show a higher correlation with mRNA abundances than do the uncorrected
protein data. In addition, we estimated the impact of further errors in mRNA and
protein abundances using direct experimental measurements of these errors.
The resulting analysis suggests that mRNA levels explain at least
56% of the differences in protein abundance for the 4,212 genes
detected by Schwanhausser et al. (2011), though because one major source of error
could not be estimated the true percent contribution should be higher.
We also employed a second, independent strategy to
determine the contribution of mRNA levels to protein expression.
The variance in translation rates directly measured by ribosome profiling is only 12%
of that inferred by Schwanhausser et al. (2011), and
the measured and inferred translation rates correlate poorly (R2 D 13).
Based on this, our second strategy suggests that
mRNA levels explain _81% of the variance in protein levels.
We also determined the percent contributions of
transcription,
RNA degradation,
translation
and protein degradation
to the variance in protein abundances using both of our strategies.
While the magnitudes of the two estimates vary, they both suggest that
transcription plays a more important role than the earlier studies implied and
translation a much smaller role.
Finally, the above estimates only apply to those genes whose mRNA and protein expression was detected. Based on a detailed analysis by Hebenstreit et al. (2012), we estimate that approximately
40% of genes in a given cell within a population express no mRNA.
Since there can be no translation in the absence of mRNA, we argue that
differences in translation rates can play no role in determining the expression levels for the _40% of genes that are non-expressed.
Subjects Bioinformatics, Computational Biology
Keywords Transcription, Translation, Mass spectrometry, Gene expression, Protein abundance
How to cite this article Li et al. (2014), System wide analyses have underestimated protein abundances and the importance of transcription in mammals. PeerJ 2:e270;
Background: Pathway databases are becoming increasingly important and almost omnipresent in most types of biological and translational research. However, little is known about the quality and completeness of pathways stored in these databases. The present study conducts a comprehensive assessment of transcriptional regulatory pathways in humans for seven well-studied transcription factors: MYC, NOTCH1, BCL6, TP53, AR, STAT1, and RELA.
The employed benchmarking methodology first
involves integrating genome-wide binding with functional gene expression data to derive direct targets of transcription factors.
Then the lists of experimentally obtained direct targets are compared with relevant lists of transcriptional targets from 10 commonly used pathway databases.
Results: The results of this study show that for the majority of pathway databases,
the overlap between experimentally obtained target genes and targets reported in transcriptional regulatory pathway databases is surprisingly small and often is not statistically significant.
The only exception is MetaCore pathway database which yields statistically significant intersection with experimental results in 84% cases. Additionally, we suggest that
the lists of experimentally derived direct targets obtained in this study can be used to reveal new biological insight in transcriptional regulation and
suggest novel putative therapeutic targets in cancer.
Conclusions: Our study opens a debate on validity of using many popular pathway databases to obtain transcriptional regulatory targets. We conclude that the choice of pathway databases should be informed by solid scientific evidence and rigorous empirical evaluation.
Illustration of statistical methodology
Figure 2 Illustration of statistical methodology for comparison
between a gold-standard and a pathway database
Additional material
Additional file 1: Supplementary Information. Table S1: Functional gene expression data. Table 2: Transcription factor-DNA binding data. Table S3: Most confident direct transcriptional targets of each of the four transcription factors. These targets were obtained by overlapping several gold-standards obtained with different datasets for the same transcription factor. Table S4: Genes directly regulated by two or more of the three transcription factors: MYC, NOTCH1, and RELA. Figure S1: Comparison of gene sets of transcriptional targets derived from ten different pathway databases by Jaccard index. In case, where Jaccard index of an overlap could not be determined due to comparison of two empty gene lists, we assigned value 0. Cells are colored according to the Jaccard index, from white (Jaccard index equal to 0) to dark-orange (Jaccard index equal to 1). Each sub-figure gives results for a different transcription factor: (a) AR, (b) BCL6, (c) MYC, (d) NOTCH1, (e) RELA, (f) STAT1, (g) TP53
Cite this article as: Shmelkov et al.: Assessing quality and completeness of human transcriptional regulatory pathways on a genome-wide scale. Biology Direct 2011 6:15
The Functional Consequences of Variation in Transcription Factor Binding
Darren A. Cusanovich1, Bryan Pavlovic1,2, Jonathan K. Pritchard1,2,3*, Yoav Gilad1*
1 Department of Human Genetics, University of Chicago, 2 Howard Hughes Medical Institute, University of Chicago, Chicago,
Illinois, 3 Departments of Genetics and Biology and Howard Hughes Medical Institute, Stanford University, Stanford, California,
One goal of human genetics is to understand how the information for precise and dynamic gene expression programs is encoded in the genome. The interactions of transcription factors (TFs) with DNA regulatory elements clearly play an important role in determining gene expression outputs, yet the regulatory logic underlying functional transcription factor binding is poorly understood. Many studies have focused on characterizing the genomic locations of TF binding, yet it is unclear to what extent TF binding at any specific locus has functional consequences with respect to gene expression output.
To evaluate the context of functional TF binding we knocked down
59 TFs and chromatin modifiers in one HapMap lymphoblastoid cell line.
We identified genes whose expression was affected by the knockdowns.
We intersected the gene expression data with transcription factor binding data
(based on ChIP-seq and DNase-seq) within 10 kb of the transcription start sites
This combination of data allowed us to infer functional TF binding.
we found that only a small subset of genes bound by a factor were differentially expressed following the knockdown of that factor, suggesting that
most interactions between TF and chromatin do not result in measurable changes in gene expression levels of putative target genes.
functional TF binding is enriched in regulatory elements that harbor
a large number of TF binding sites,
at sites with predicted higher binding affinity, and
at sites that are enriched in genomic regions annotated as ‘‘active enhancers.’’
Author Summary
An important question in genomics is to understand how a class of proteins called ‘‘transcription factors’’ controls the expression level of other genes in the genome in a cell type-specific manner – a process that is essential to human development. One major approach to this problem is to
study where these transcription factors bind in the genome, but this does not tell us about the effect of that binding on gene expression levels and it is generally accepted that much of the binding does not strongly influence gene expression. To address this issue, we artificially reduced the concentration of 59 different transcription factors in the cell and then examined which genes were impacted by the reduced transcription factor level. Our results implicate some attributes that might
influence what binding is functional, but they also suggest that a simple model of functional vs. non-functional binding may not suffice.
Citation: Cusanovich DA, Pavlovic B, Pritchard JK, Gilad Y (2014) The Functional Consequences of Variation in Transcription Factor Binding. PLoS Genet 10(3):e1004226. http://dx.doi.org:/10.1371/journal.pgen.1004226
Editor: Yitzhak Pilpel, Weizmann Institute of Science, Israel
Effect sizes for differentially expressed genes
Figure 2. Effect sizes for differentially expressed genes.
Boxplots of absolute Log2(fold-change) between knockdown arrays
and control arrays for all genes identified as differentially expressed in
each experiment. Outliers are not plotted. The gray bar indicates the
interquartile range across all genes differentially expressed in all
knockdowns. Boxplots are ordered by the number of genes differentially
expressed in each experiment. Outliers were not plotted.
Intersecting binding data and expression data for each knockdown
Figure 3. Intersecting binding data and expression data for each knockdown. (a) Example Venn diagrams showing the overlap of binding and differential expression for the knockdowns of HCST and IRF4 (the same genes as in Figure 1). (b) Boxplot summarizing the distribution of the fraction of all expressed genes that are bound by the targeted gene or downstream factors. (c) Boxplot summarizing the distribution of the fraction of
bound genes that are classified as differentially expressed, using an FDR of either 5% or 20%.
Figure 4. Degree of binding correlated with function. Boxplots comparing (a) the number of sites bound, and (b) the number of differentially expressed transcription factors binding events near functionally or non-functionally bound genes. We considered binding for siRNA-targeted factor and any factor differentially expressed in the knockdown. (c) Focusing only on genes differentially expressed in common between each pairwise set of knockdowns we tested for enrichments of functional binding (y-axis). Pairwise comparisons between knock-down experiments were binned by the fraction of differentially expressed transcription factors in common between the two experiments. For these boxplots, outliers were not plotted.
Figure 5. Distribution of functional binding about the TSS. (a) A density plot of the distribution of bound sites within 10 kb of the TSS for both functional and non-functional genes. Inset is a zoom-in of the region +/21 kb from the TSS (b) Boxplots comparing the distances from the TSS to the binding sites for functionally bound genes and non-functionally bound genes. For the boxplots, 0.001 was added before log10 transforming
Magnitude and direction of differential expression after knockdown
Figure 6. Magnitude and direction of differential expression after knockdown. (a) Density plot of all Log2(fold-changes) between the knockdown arrays and controls for genes that are differentially expressed at 5% FDR in one of the knockdown experiments as well as bound by the targeted transcription factor. (b) Plot of the fraction of differentially expressed putative direct targets that were up-regulated in each of the knockdown experiments.
To test whether the number of paralogs or the degree of similarity with the closest paralog for each transcription factor knocked down might influence the number of genes differentially expressed in our experiments, we obtained definitions of paralogy and the calculations of percent identity for 29 different factors from Ensembl’s BioMart (http://useast.ensembl.org/biomart/martview/) [31]. We used genome build GRCh37.p13.
For each gene, we counted the number of paralogs classified as a ‘‘within_species_paralog’’. After selecting only genes considered a ‘‘within_species_paralog’’, we also assigned the maximum percent identity as the closest paralog.
To evaluate the effect that an independent assignment of target genes to regulatory regions might have on our analyses, we used the definition of target genes defined by Thurman et al. (ftp://ftp.ebi.ac.uk/pub/databases/…)
which use correlations in DNase hypersensitivity between distal and proximal regulatory regions across different cell types to link distal elements to putative target genes [38].
We intersected the midpoints of our called binding events (defined above) with these regulatory elements in order to assign our binding events to specific target genes and then re-analyzed the overlap between
binding and differential expression in our experiments.
PLOS Genetics 6 Mar 2014; 10 (3), e1004226
The essential biology of the endoplasmic reticulum stress response
for structural and computational biologists
Sadao Wakabayashia, Hiderou Yoshidaa,*
aDepartment of Molecular Biochemistry, Graduate School of Life Science,
Abstract: The endoplasmic reticulum (ER) stress response is a cytoprotective mechanism that maintains homeostasis of the ER by
upregulating the capacity of the ER in accordance with cellular demands.
If the ER stress response cannot function correctly, because of reasons such as aging, genetic mutation or environmental stress,
unfolded proteins accumulate in the ER and cause ER stress-induced apoptosis,
resulting in the onset of folding diseases,
including Alzheimer’s disease and diabetes mellitus.
Although the mechanism of the ER stress response has been analyzed extensively by biochemists, cell biologists and molecular biologists, many aspects remain to be elucidated. For example,
it is unclear how sensor molecules detect ER stress, or
how cells choose the two opposite cell fates
(survival or apoptosis) during the ER stress response.
To resolve these critical issues, structural and computational approaches will be indispensable, although the mechanism of the ER stress response is complicated and difficult to understand holistically at a glance. Here, we provide a concise introduction to the mammalian ER stress response for structural and computational biologists.
The basic mechanism of the mammalian ER stress response
The mammalian ER stress response consists of three pathways: the ATF6, IRE1 and PERK pathways, of which the main functions are
augmentation of folding and ERAD capacity, and
translational attenuation, respectively.
Although these response pathways cross-talk with each other and have several branched subpathways, we focus on the main pathways in this section.
The ATF6 pathway regulates the transcriptional induction of ER chaperone genes
pATF6(P) is a sensor molecule comprising a type II transmembrane protein residing on the ER membrane (Figure 2).
When pATF6(P) detects ER stress,
the protein is transported to the Golgi apparatus through vesicular transport in a COP-II vesicle
and is sequentially cleaved by two proteases residing in the Golgi,
namely site 1 protease (S1P) and site 2 protease (S2P)
The cytoplasmic portion of pATF6(P) (pATF6(N)) is
released from the Golgi membrane,
translocates into the nucleus,
binds to an enhancer element called the ER stress response element (ERSE),
and activates the transcription of ER chaperone genes,
including BiP, GRP94, calreticulin and protein disulfide isomerase (PDI)
The consensus nucleotide sequence of ERSE is CCAAT(N9)CCACG, and pATF6(N) recognizes both the CCACG portion and another transcription factor NF-Y,
which binds to the CCAAT portion
NF-Y is a general transcription factor required for
the transcription of various human genes
Figure 2. The ATF6 pathway. The sensor molecule pATF6(P) located on the ER membrane is transported to the Golgi apparatus by transport vesicles in response to ER stress. In the Golgi apparatus, pATF6(P) is sequentially cleaved by two proteases, S1P and S2P, resulting in release of the cytoplasmic portion pATF6(N) from the ER membrane. pATF6(N) translocates into the nucleus and activates transcription of ER chaperone genes through binding to the cis-acting enhancer ERSE.
Figure 3. The IRE1 pathway. In normal growth conditions, the sensor molecule IRE1 is an inactive monomer, whereas IRE1 forms an active oligomer in response to ER stress. Activated IRE1 converts unspliced XBP1 mRNA to mature mRNA by the cytoplasmic mRNA splicing. From mature XBP1 mRNA, an active transcription factor pXBP1(S) is translated and activates the transcription of ERAD genes through binding to the enhancer UPRE.
Figure 4. The PERK pathway. When PERK detects unfolded proteins in the ER, PERK phosphorylates eIF2α, resulting in translational attenuation and translational induction of ATF4. ATF4 activates the transcription of target genes encoding translation factors, anti-oxidation factors and a transcription factor CHOP. Other kinases such as PKR, GCN2 and HRI also phosphorylate eIF2α, and phosphorylated eIF2α is dephosphorylated by CReP, PP1C-GADD34 and p58IPK
Figure 7. Three functions of pXBP1(U). pXBP1(U) translated from XBP1(U) mRNA binds to pXBP1(S) and enhances its degradation. The CTR region of pXBP1(U) interacts with the ribosome tunnel and slows translation, while the HR2 region anchors XBP1(U) mRNA to the ER membrane, in order to enhance splicing of XBP1(U) mRNA by IRE1.
Figure 8. Major pathways of ER stress-induced apoptosis. ER stress induces apoptosis through various pathways, including transcriptional induction of CHOP by the PERK and ATF6 pathways, the IRE1-TRAF2 pathway and the caspase-12 pathway.
If cells are damaged by strong and sustained ER stress that they cannot deal with and ER stress still persists and hampers the survival of the organism, the ER stress response activates the apoptotic pathways and disposes of damaged cells from the body.
Computational simulation of response pathways to analyze the decision mechanism that determines cell fate (survival or apoptosis) provides a valuable analysis tool, although there have been few such studies to date.
This is the second discussion of a several part series leading from the genome, to protein synthesis (1), posttranslational modification of proteins (2), examples of protein effects on metabolism and signaling pathways (3), and leading to disruption of signaling pathways in disease (4), and effects leading to mutagenesis.
Posttranslational modification is a step in protein biosynthesis. Proteins are created by ribosomes translating mRNA into polypeptide chains. These polypeptide chains undergo
PTM before becoming the mature protein product.
Protein phosphorylation is one type of post-translational modification. Wikipedia
Acetylation occurs as a co-translational and post-translational modification of proteins, for example, histones, p53, and tubulins.
Post-Translational Modifications
As noted above, the large number of different PTMs precludes a thorough review of all possible protein modifications. Therefore, this overview only touches on a small number of the most common types of PTMs studied in protein research today. Furthermore, greater focus is placed on phosphorylation, glycosylation and ubiquitination, and therefore these PTMs are described in greater detail on pages dedicated to the respective PTM.
PhosphorylationReversible protein phosphorylation, principally on serine, threonine or tyrosine residues, is one of the most important and well-studied post-translational modifications. Phosphorylation plays critical roles in the regulation of many cellular processes including cell cycle, growth, apoptosis and signal transduction pathways.
GlycosylationProtein glycosylation is acknowledged as one of the major post-translational modifications, with significant effects on protein folding, conformation, distribution, stability and activity. Glycosylation encompasses a diverse selection of sugar-moiety additions to proteins that ranges from simple monosaccharide modifications of nuclear transcription factors to highly complex branched polysaccharide changes of cell surface receptors. Carbohydrates in the form of aspargine-linked (N-linked) or serine/threonine-linked (O-linked) oligosaccharides are major structural components of many cell surface and secreted proteins.
UbiquitinationUbiquitin is an 8-kDa polypeptide consisting of 76 amino acids that is appended to lysine in target proteins via the C-terminal glycine of ubiquitin. A ubiquitin polymer is formed after initial monoubiquitination. Polyubiquitinated proteins are degraded recycling the ubiquitin.
S-NitrosylationNitric oxide (NO) is produced by three isoforms of nitric oxide synthase (NOS) and is a chemical messenger that reacts with free cysteine residues to form S-nitrothiols (SNOs). S-nitrosylation is a critical PTM used by cells to stabilize proteins, regulate gene expression and provide NO donors, and the generation, localization, activation and catabolism of SNOs are tightly regulated.S-nitrosylation is a reversible reaction, and SNOs have a short half life in the cytoplasm because of the host of reducing enzymes, including glutathione (GSH) and thioredoxin, that denitrosylate proteins. Therefore, SNOs are often stored in membranes, vesicles, the interstitial space and lipophilic protein folds to protect them from denitrosylation (5). For example, caspases, which mediate apoptosis, are stored in the mitochondrial intermembrane space as SNOs. In response to extra- or intracellular cues, the caspases are released into the cytoplasm, and the highly reducing environment rapidly denitrosylates the proteins, resulting in caspase activation and the induction of apoptosis.Only specific cysteine residues are S-nitrosylated. Proteins may contain multiple cysteines and due to the labile nature of SNOs, S-nitrosylated cysteines can be difficult to detect and distinguish from non-S-nitrosylated amino acids. The biotin switch assay, developed by Jaffrey et al., is a common method of detecting SNOs, and the steps of the assay are listed below (6):
All free cysteines are blocked.
All remaining cysteines (presumably only those that are denitrosylated) are denitrosylated.
The now-free thiol groups are then biotinylated.
Biotinylated proteins are detected by SDS-PAGE and Western blot analysis or mass spectrometry (7).
MethylationThe transfer of one-carbon methyl groups to nitrogen or oxygen (N- and O-methylation, respectively) to amino acid side chains increases the hydrophobicity of the protein and can neutralize a negative amino acid charge when bound to carboxylic acids. Methylation is mediated by methyltransferases, and S-adenosyl methionine (SAM) is the primary methyl group donor.Methylation occurs so often that SAM has been suggested to be the most-used substrate in enzymatic reactions after ATP (4). Additionally, while N-methylation is irreversible, O-methylation is potentially reversible. Methylation is a well-known mechanism of epigenetic regulation, as histone methylation and demethylation influences the availability of DNA for transcription.
N-AcetylationN-acetylation, or the transfer of an acetyl group to nitrogen, occurs in almost all eukaryotic proteins through both irreversible and reversible mechanisms. N-terminal acetylation requires the cleavage of the N-terminal methionine by methionine aminopeptidase (MAP) before replacing the amino acid with an acetyl group from acetyl-CoA by N-acetyltransferase (NAT) enzymes. This type of acetylation is co-translational, in that N-terminus is acetylated on growing polypeptide chains that are still attached to the ribosome.Acetylation at the ε-NH2 of lysine (termed lysine acetylation) on histone N-termini is a common method of regulating gene transcription. Histone acetylation is a reversible event that reduces chromosomal condensation to promote transcription, and the acetylation of these lysine residues is regulated by transcription factors that contain histone acetyletransferase (HAT) activity. While transcription factors with HAT activity act as transcription co-activators, histone deacetylase (HDAC) enzymes are co-repressors that reverse the effects of acetylation by reducing the level of lysine acetylation and increasing chromosomal condensation.Sirtuins (silent information regulator) are a group of NAD-dependent deacetylases that target histones. As their name implies, they maintain gene silencing by hypoacetylating histones and have been reported to aid in maintaining genomic stability (8).Cytoplasmic proteins may also be acetylated, and therefore acetylation seems to play a greater role in cell biology than simply transcriptional regulation (9). Furthermore, crosstalk between acetylation and other post-translational modifications, including phosphorylation, ubiquitination and methylation, can modify the biological function of the acetylated protein (10).
LipidationLipidation is a method to target proteins to membranes in organelles (endoplasmic reticulum [ER], Golgi apparatus, mitochondria), vesicles (endosomes, lysosomes) and the plasma membrane. The four types of lipidation are:
Each type of modification gives proteins distinct membrane affinities, although all types of lipidation increase the hydrophobicity of a protein and thus its affinity for membranes. The different types of lipidation are not mutually exclusive, in that two or more lipids can be attached to a given protein.
GPI anchors tether cell surface proteins to the plasma membrane. These hydrophobic moieties are prepared in the ER, where they are then added to the nascent protein en bloc. GPI-anchored proteins are often localized to cholesterol- and sphingolipid-rich lipid rafts, which act as signaling platforms on the plasma membrane.
N-myristoylation is a method to give proteins a hydrophobic handle for membrane localization. The myristoyl group is a 14-carbon saturated fatty acid (C14), which gives the protein sufficient hydrophobicity and affinity for membranes, but not enough to permanently anchor the protein in the membrane. N-myristoylation can therefore act as a conformational localization switch, in which protein conformational changes influence the availability of the handle for membrane attachment.
N-myristoylation, facilitated specifically by N-myristoyltransferase (NMT), uses myristoyl-CoA to attach the myristoyl group to the N-terminal glycine. This PTM requires methionine cleavage prior to addition of the myristoyl group because methionine is the N-terminal amino acid of all eukaryotic proteins.
S-palmitoylation adds a C16 palmitoyl group from palmitoyl-CoA to the thiolate side chain of cysteine residues via palmitoyl acyl transferases (PATs). Because of the longer hydrophobic group, this anchor can permanently anchor the protein to the membrane. S-palmitoylation is used as an on/off switch to regulate membrane localization.
S-prenylation covalently adds a farnesyl (C15) or geranylgeranyl (C20) group to specific cysteine residues within 5 amino acids from the C-terminus via farnesyl transferase (FT) or geranylgeranyl transferases (GGT I and II). All members of the Ras superfamily are prenylated. These proteins have specific 4-amino acid motifs at the C-terminus that determine the type of prenylation at single or dual cysteines. Prenylation occurs in the ER and is often part of a stepwise process of PTMs that is followed by proteolytic cleavage by Rce1 and methylation by isoprenyl cysteine methyltransferase (ICMT).
ProteolysisPeptide bonds are indefinitely stable under physiological conditions, and therefore cells require some mechanism to break these bonds. Proteases comprise a family of enzymes that cleave the peptide bonds of proteins and are critical in antigen processing, apoptosis, surface protein shedding and cell signaling.Degradative proteolysis is critical to remove unassembled protein subunits and misfolded proteins and to maintain protein concentrations at homeostatic concentrations.Proteolysis is a thermodynamically favorable and irreversible reaction. Therefore, protease activity is tightly regulated to avoid uncontrolled proteolysis through temporal and/or spatial control mechanisms including regulation by cleavage in cis or trans and compartmentalization (e.g., proteasomes, lysosomes).
The diverse family of proteases can be classified by the site of action, such as aminopeptidases and carboxypeptidase, which cleave at the amino or carboxy terminus of a protein, respectively. Another type of classification is based on the active site groups of a given protease that are involved in proteolysis. Based on this classification strategy, greater than 90% of known proteases fall into one of four categories as follows:
Serine proteases
Cysteine proteases
Aspartic acid proteases
Zinc metalloproteases
References
International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature. 431, 931-45.
Jensen O. N. (2004) Modification-specific proteomics: Characterization of post-translational modifications by mass spectrometry. Curr Opin Chem Biol. 8, 33-41.
Ayoubi T. A. and Van De Ven W. J. (1996) Regulation of gene expression by alternative promoters. FASEB J. 10, 453-60.
Walsh C. (2006) Posttranslational modification of proteins : Expanding nature’s inventory. Englewood, Colo.: Roberts and Co. Publishers. xxi, 490 p. p.
Gaston B. M. et al. (2003) S-nitrosylation signaling in cell biology. Mol Interv. 3, 253-63.
Jaffrey S. R. and Snyder S. H. (2001) The biotin switch method for the detection of S-nitrosylated proteins. Sci STKE. 2001, pl1.
Han P. and Chen C. (2008) Detergent-free biotin switch combined with liquid chromatography/tandem mass spectrometry in the analysis of S-nitrosylated proteins. Rapid Commun Mass Spectrom. 22, 1137-45.
Imai S. et al. (2000) Transcriptional silencing and longevity protein SIR2 is an NAD-dependent histone deacetylase. Nature. 403, 795-800.
Glozak M. A. et al. (2005) Acetylation and deacetylation of non-histone proteins. Gene. 363, 15-23.
Yang X. J. and Seto E. (2008) Lysine acetylation: Codified crosstalk with other posttranslational modifications. Mol Cell. 31, 449-61
Protein phosphorylation
From Wikipedia, the free encyclopedia
Protein phosphorylation is a post-translational modification of proteins in which a serine, a threonine or a tyrosine residue is phosphorylated by a protein kinase by the addition of a covalently bound phosphate group. Regulation of proteins by phosphorylation is one of the most common modes of regulation of protein function, and is often termed “phosphoregulation”. In almost all cases of phosphoregulation, the protein switches between a phosphorylated and an unphosphorylated form, and one of these two is an active form, while the other one is an inactive form.
In some reactions, the purpose of phosphorylation is to “activate” or “volatize” a molecule, increasing its energy so it is able to participate in a subsequent reaction with a negativefree-energy change. All kinases require a divalent metalion such as Mg2+ or Mn2+ to be present, which stabilizes the high-energy bonds of the donor molecule (usually ATP or ATP derivative) and allows phosphorylation to occur.
In other reactions, phosphorylation of a protein substrate can inhibit its activity (as when AKT phosphorylates the enzyme GSK-3). One common mechanism for phosphorylation-mediated enzyme inhibition was demonstrated in the tyrosine kinase called “src” (pronounced “sarc”, see: Src (gene)). When src is phosphorylated on a particular tyrosine, it folds on itself, and thus masks its own kinase domain, and is thus turned “off”.
In still other reactions, phosphorylation of a protein causes it to be bound to other proteins which have “recognition domains” for a phosphorylated tyrosine, serine, or threoninemotif. As a result of binding a particular protein, a distinct signaling system may be activated or inhibited.
In the late 1990s it was recognized that phosphorylation of some proteins causes them to be degraded by the ATP-dependent ubiquitin/proteasome pathway. These target proteins become substrates for particular E3 ubiquitin ligases only when they are phosphorylated.
Oxidative phosphorylation
From Wikipedia, the free encyclopedia
Oxidative phosphorylation (or OXPHOS in short) is the metabolic pathway in which the mitochondria in cellsuse their structure, enzymes, and energy released by the oxidation of nutrients to reform ATP. Although the many forms of life on earth use a range of different nutrients, ATP is the molecule that supplies energy tometabolism. Almost all aerobic organisms carry out oxidative phosphorylation. This pathway is probably so pervasive because it is a highly efficient way of releasing energy, compared to alternative fermentationprocesses such as anaerobic glycolysis.
During oxidative phosphorylation, electrons are transferred from electron donors to electron acceptors such as oxygen, in redox reactions. These redox reactions release energy, which is used to form ATP. In eukaryotes, these redox reactions are carried out by a series of protein complexes within the cell’s intermembrane wall mitochondria, whereas, in prokaryotes, these proteins are located in the cells’ intermembrane space.
These linked sets of proteins are called electron transport chains. In eukaryotes, five main protein complexes are involved, whereas in prokaryotes many different enzymes are present, using a variety of electron donors and acceptors.
The energy released by electrons flowing through this electron transport chain is used to transport protons across the inner mitochondrial membrane, in a process called electron transport. This generates potential energy in the form of a pH gradient and an electrical potential across this membrane. This store of energy is tapped by allowing protons to flow back across the membrane and down this gradient, through a large enzymecalled ATP synthase; this process is known as chemiosmosis. This enzyme uses this energy to generate ATP from adenosine diphosphate (ADP), in a phosphorylation reaction. This reaction is driven by the proton flow, which forces the rotation of a part of the enzyme; the ATP synthase is a rotary mechanical motor.
Oxidative phosphorylation works by using energy-releasing chemical reactions to drive energy-requiring reactions: The two sets of reactions are said to be coupled. This means one cannot occur without the other. The flow of electrons through the electron transport chain, from electron donors such as NADH to electron acceptors such as oxygen, is anexergonic process – it releases energy, whereas the synthesis of ATP is an endergonic process, which requires an input of energy. Both the electron transport chain and the ATP synthase are embedded in a membrane, and energy is transferred from electron transport chain to the ATP synthase by movements of protons across this membrane, in a process called chemiosmosis.[1] In practice, this is like a simple electric circuit, with a current of protons being driven from the negative N-side of the membrane to the positive P-side by the proton-pumping enzymes of the electron transport chain. These enzymes are like a battery, as they perform work to drive current through the circuit. The movement of protons creates an electrochemical gradient across the membrane, which is often called the proton-motive force. It has two components: a difference in proton concentration (a H+gradient, ΔpH) and a difference in electric potential, with the N-side having a negative charge.[2]
ATP synthase releases this stored energy by completing the circuit and allowing protons to flow down the electrochemical gradient, back to the N-side of the membrane.[3] This kinetic energy drives the rotation of part of the enzymes structure and couples this motion to the synthesis of ATP.
The two components of the proton-motive force are thermodynamically equivalent: In mitochondria, the largest part of energy is provided by the potential; in alkaliphile bacteria the electrical energy even has to compensate for a counteracting inverse pH difference. Inversely, chloroplasts operate mainly on ΔpH. However, they also require a small membrane potential for the kinetics of ATP synthesis. At least in the case of the fusobacteriumP. modestum it drives the counter-rotation of subunits a and c of the FO motor of ATP synthase.[2]
The amount of energy released by oxidative phosphorylation is high, compared with the amount produced by anaerobic fermentation. Glycolysis produces only 2 ATP molecules, but somewhere between 30 and 36 ATPs are produced by the oxidative phosphorylation of the 10 NADH and 2 succinate molecules made by converting one molecule of glucoseto carbon dioxide and water,[4] while each cycle of beta oxidation of a fatty acid yields about 14 ATPs. These ATP yields are theoretical maximum values; in practice, some protons leak across the membrane, lowering the yield of ATP.[5]
The electron transport chain carries both protons and electrons, passing electrons from donors to acceptors, and transporting protons across a membrane. These processes use both soluble and protein-bound transfer molecules. In mitochondria, electrons are transferred within the intermembrane space by the water-soluble electron transfer protein cytochrome c.[6] This carries only electrons, and these are transferred by the reduction and oxidation of an iron atom that the protein holds within a heme group in its structure. Cytochrome c is also found in some bacteria, where it is located within the periplasmic space.[7]
Reduction of coenzyme Q from itsubiquinone form (Q) to the reduced ubiquinol form (QH2).
Within the inner mitochondrial membrane, the lipid-soluble electron carrier coenzyme Q10 (Q) carries both electrons and protons by a redox cycle.[8] This small benzoquinone molecule is very hydrophobic, so it diffuses freely within the membrane. When Q accepts two electrons and two protons, it becomes reduced to the ubiquinol form (QH2); when QH2 releases two electrons and two protons, it becomes oxidized back to the ubiquinone (Q) form. As a result, if two enzymes are arranged so that Q is reduced on one side of the membrane and QH2 oxidized on the other, ubiquinone will couple these reactions and shuttle protons across the membrane.[9] Some bacterial electron transport chains use different quinones, such as menaquinone, in addition to ubiquinone.[10]
Within proteins, electrons are transferred between flavin cofactors,[3][11] iron–sulfur clusters, and cytochromes. There are several types of iron–sulfur cluster. The simplest kind found in the electron transfer chain consists of two iron atoms joined by two atoms of inorganic sulfur; these are called [2Fe–2S] clusters. The second kind, called [4Fe–4S], contains a cube of four iron atoms and four sulfur atoms. Each iron atom in these clusters is coordinated by an additional amino acid, usually by the sulfur atom of cysteine. Metal ion cofactors undergo redox reactions without binding or releasing protons, so in the electron transport chain they serve solely to transport electrons through proteins. Electrons move quite long distances through proteins by hopping along chains of these cofactors.[12] This occurs by quantum tunnelling, which is rapid over distances of less than 1.4×10−9 m.[13]
Many catabolic biochemical processes, such as glycolysis, the citric acid cycle, and beta oxidation, produce the reduced coenzymeNADH. This coenzyme contains electrons that have a high transfer potential; in other words, they will release a large amount of energy upon oxidation. However, the cell does not release this energy all at once, as this would be an uncontrollable reaction. Instead, the electrons are removed from NADH and passed to oxygen through a series of enzymes that each release a small amount of the energy. This set of enzymes, consisting of complexes I through IV, is called the electron transport chain and is found in the inner membrane of the mitochondrion. Succinate is also oxidized by the electron transport chain, but feeds into the pathway at a different point.
In eukaryotes, the enzymes in this electron transport system use the energy released from the oxidation of NADH to pump protons across the inner membrane of the mitochondrion. This causes protons to build up in the intermembrane space, and generates an electrochemical gradient across the membrane. The energy stored in this potential is then used by ATP synthase to produce ATP. Oxidative phosphorylation in the eukaryotic mitochondrion is the best-understood example of this process. The mitochondrion is present in almost all eukaryotes, with the exception of anaerobic protozoa such as Trichomonas vaginalis that instead reduce protons to hydrogen in a remnant mitochondrion called a hydrogenosome.[14]
Typical respiratory enzymes and substrates in eukaryotes.
NADH-coenzyme Q oxidoreductase, also known as NADH dehydrogenase or complex I, is the first protein in the electron transport chain.[16] Complex I is a giant enzyme with the mammalian complex I having 46 subunits and a molecular mass of about 1,000 kilodaltons (kDa).[17] The structure is known in detail only from a bacterium;[18][19]in most organisms the complex resembles a boot with a large “ball” poking out from the membrane into the mitochondrion.[20][21]
Complex I or NADH-Q oxidoreductase. The abbreviations are discussed in the text. In all diagrams of respiratory complexes in this article, the matrix is at the bottom, with the intermembrane space above.
The genes that encode the individual proteins are contained in both the cell nucleus and themitochondrial genome, as is the case for many enzymes present in the mitochondrion.
The reaction that is catalyzed by this enzyme is the two electron oxidation of NADH by coenzyme Q10 or ubiquinone(represented as Q in the equation below), a lipid-soluble quinone that is found in the mitochondrion membrane:
The start of the reaction, and indeed of the entire electron chain, is the binding of a NADH molecule to complex I and the donation of two electrons. The electrons enter complex I via a prosthetic group attached to the complex, flavin mononucleotide (FMN). The addition of electrons to FMN converts it to its reduced form, FMNH2. The electrons are then transferred through a series of iron–sulfur clusters: the second kind of prosthetic group present in the complex.[18] There are both [2Fe–2S] and [4Fe–4S] iron–sulfur clusters in complex I.
As the electrons pass through this complex, four protons are pumped from the matrix into the intermembrane space. Exactly how this occurs is unclear, but it seems to involve conformational changes in complex I that cause the protein to bind protons on the N-side of the membrane and release them on the P-side of the membrane.[22] Finally, the electrons are transferred from the chain of iron–sulfur clusters to a ubiquinone molecule in the membrane.[16] Reduction of ubiquinone also contributes to the generation of a proton gradient, as two protons are taken up from the matrix as it is reduced to ubiquinol (QH2).
Succinate-Q oxidoreductase, also known as complex II or succinate dehydrogenase, is a second entry point to the electron transport chain.[23] It is unusual because it is the only enzyme that is part of both the citric acid cycle and the electron transport chain. Complex II consists of four protein subunits and contains a bound flavin adenine dinucleotide (FAD) cofactor, iron–sulfur clusters, and a hemegroup that does not participate in electron transfer to coenzyme Q, but is believed to be important in decreasing production of reactive oxygen species.[24][25]
It oxidizes succinate to fumarate and reduces ubiquinone.As this reaction releases less energy than the oxidation of NADH, complex II does not transport protons across the membrane and does not contribute to the proton gradient.
In some eukaryotes, such as the parasitic wormAscaris suum, an enzyme similar to complex II, fumarate reductase (menaquinol:fumarate oxidoreductase, or QFR), operates in reverse to oxidize ubiquinol and reduce fumarate. This allows the worm to survive in the anaerobic environment of the large intestine, carrying out anaerobic oxidative phosphorylation with fumarate as the electron acceptor.[26] Another unconventional function of complex II is seen in the malaria parasite Plasmodium falciparum. Here, the reversed action of complex II as an oxidase is important in regenerating ubiquinol, which the parasite uses in an unusual form ofpyrimidine biosynthesis.[27]
Electron transfer flavoprotein-Q oxidoreductase[edit]
Electron transfer flavoprotein-ubiquinone oxidoreductase (ETF-Q oxidoreductase), also known as electron transferring-flavoprotein dehydrogenase, is a third entry point to the electron transport chain. It is an enzyme that accepts electrons from electron-transferring flavoprotein in the mitochondrial matrix, and uses these electrons to reduce ubiquinone.[28] This enzyme contains a flavin and a [4Fe–4S] cluster, but, unlike the other respiratory complexes, it attaches to the surface of the membrane and does not cross the lipid bilayer.[29]
In mammals, this metabolic pathway is important in beta oxidation of fatty acids and catabolism of amino acids and choline, as it accepts electrons from multiple acetyl-CoAdehydrogenases.[30][31] In plants, ETF-Q oxidoreductase is also important in the metabolic responses that allow survival in extended periods of darkness.[32]
Q-cytochrome c oxidoreductase is also known as cytochrome c reductase, cytochrome bc1 complex, or simply complex III.[33][34] In mammals, this enzyme is a dimer, with each subunit complex containing 11 protein subunits, an [2Fe-2S] iron–sulfur cluster and three cytochromes: one cytochrome c1 and two bcytochromes.[35] A cytochrome is a kind of electron-transferring protein that contains at least one hemegroup. The iron atoms inside complex III’s heme groups alternate between a reduced ferrous (+2) and oxidized ferric (+3) state as the electrons are transferred through the protein.
The two electron transfer steps in complex III: Q-cytochrome c oxidoreductase. After each step, Q (in the upper part of the figure) leaves the enzyme.
The reaction catalyzed by complex III is the oxidation of one molecule of ubiquinol and the reduction of two molecules of cytochrome c, a heme protein loosely associated with the mitochondrion. Unlike coenzyme Q, which carries two electrons, cytochrome c carries only one electron.
As only one of the electrons can be transferred from the QH2 donor to a cytochrome c acceptor at a time, the reaction mechanism of complex III is more elaborate than those of the other respiratory complexes, and occurs in two steps called the Q cycle.[36] In the first step, the enzyme binds three substrates, first, QH2, which is then oxidized, with one electron being passed to the second substrate, cytochrome c. The two protons released from QH2 pass into the intermembrane space. The third substrate is Q, which accepts the second electron from the QH2 and is reduced to Q.-, which is the ubisemiquinonefree radical. The first two substrates are released, but this ubisemiquinone intermediate remains bound. In the second step, a second molecule of QH2 is bound and again passes its first electron to a cytochrome c acceptor. The second electron is passed to the bound ubisemiquinone, reducing it to QH2 as it gains two protons from the mitochondrial matrix. This QH2 is then released from the enzyme.[37]
As coenzyme Q is reduced to ubiquinol on the inner side of the membrane and oxidized to ubiquinone on the other, a net transfer of protons across the membrane occurs, adding to the proton gradient.[3] The rather complex two-step mechanism by which this occurs is important, as it increases the efficiency of proton transfer. If, instead of the Q cycle, one molecule of QH2 were used to directly reduce two molecules of cytochrome c, the efficiency would be halved, with only one proton transferred per cytochrome c reduced.[3]
Cytochrome c oxidase, also known as complex IV, is the final protein complex in the electron transport chain.[38] The mammalian enzyme has an extremely complicated structure and contains 13 subunits, two heme groups, as well as multiple metal ion cofactors – in all, three atoms of copper, one of magnesium and one of zinc.[39]
This enzyme mediates the final reaction in the electron transport chain and transfers electrons to oxygen, while pumping protons across the membrane.[40] The final electron acceptor oxygen, which is also called the terminal electron acceptor, is reduced to water in this step. Both the direct pumping of protons and the consumption of matrix protons in the reduction of oxygen contribute to the proton gradient. The reaction catalyzed is the oxidation of cytochrome c and the reduction of oxygen:
The original model for how the respiratory chain complexes are organized was that they diffuse freely and independently in the mitochondrial membrane.[17] However, recent data suggest that the complexes might form higher-order structures called supercomplexes or “respirasomes.”[49] In this model, the various complexes exist as organized sets of interacting enzymes.[50] These associations might allow channeling of substrates between the various enzyme complexes, increasing the rate and efficiency of electron transfer.[51] Within such mammalian supercomplexes, some components would be present in higher amounts than others, with some data suggesting a ratio between complexes I/II/III/IV and the ATP synthase of approximately 1:1:3:7:4.[52] However, the debate over this supercomplex hypothesis is not completely resolved, as some data do not appear to fit with this model.[17][53]
Reversible protein phosphorylation, principally on serine, threonine or tyrosine residues, is one of the most important and well-studied post-translational modifications. Phosphorylation plays critical roles in the regulation of many cellular processes including cell cycle, growth, apoptosis and signal transduction pathways.
Phosphorylation is the most common mechanism of regulating protein function and transmitting signals throughout the cell. While phosphorylation has been observed in bacterial proteins, it is considerably more pervasive in eukaryotic cells. It is estimated that one-third of the proteins in the human proteome are substrates for phosphorylation at some point (1). Indeed, phosphoproteomics has been established as a branch of proteomics that focuses solely on the identification and characterization of phosphorylated proteins.
Mechanism of Phosphorylation
While phosphorylation is a prevalent post-translational modification (PTM) for regulating protein function, it only occurs at the side chains of three amino acids, serine, threonine and tyrosine, in eukaryotic cells. These amino acids have a nucleophilic (–OH) group that attacks the terminal phosphate group (γ-PO32-) on the universal phosphoryl donor adenosine triphosphate (ATP), resulting in the transfer of the phosphate group to the amino acid side chain. This transfer is facilitated by magnesium (Mg2+), which chelates the γ- and β-phosphate groups to lower the threshold for phosphoryl transfer to the nucleophilic (–OH) group. This reaction is unidirectional because of the large amount of free energy that is released when the phosphate-phosphate bond in ATP is broken to form adenosine diphosphate (ADP).
Diagram of serine phosphorylation. Enzyme-catalyzed proton transfer from the (–OH) group on serine stimulates the nucleophilic attack of the γ-phosphate group on ATP, resulting in transfer of the phosphate group to serine to form phosphoserine and ADP. (—B:) indicates the enzyme base that initiates proton transfer.
For a large subset of proteins, phosphorylation is tightly associated with protein activity and is a key point of protein function regulation. Phosphorylation regulates protein function and cell signaling by causing conformational changes in the phosphorylated protein. These changes can affect the protein in two ways. First, conformational changes regulate the catalytic activity of the protein. Thus, a protein can be either activated or inactivated by phosphorylation. Second, phosphorylated proteins recruit neighboring proteins that have structurally conserved domains that recognize and bind to phosphomotifs. These domains show specificity for distinct amino acids. For example, Src homology 2 (SH2) and phosphotyrosine binding (PTB) domains show specificity for phosphotyrosine (pY), although distinctions in these two structures give each domain specificity for distinct phosphotyrosine motifs (2). Phosphoserine (pS) recognition domains include MH2 and the WW domain, while phosphothreonine (pT) is recognized by forkhead-associated (FHA) domains. The ability of phosphoproteins to recruit other proteins is critical for signal transduction, in which downstream effector proteins are recruited to phosphorylated signaling proteins.
Protein phosphorylation is a reversible PTM that is mediated by kinases and phosphatases, which phosphorylate and dephosphorylate substrates, respectively. These two families of enzymes facilitate the dynamic nature of phosphorylated proteins in a cell. Indeed, the size of the phosphoproteome in a given cell is dependent upon the temporal and spatial balance of kinase and phosphatase concentrations in the cell and the catalytic efficiency of a particular phosphorylation site.
Phosphorylation is a reversible PTM that regulates protein function. Left panel: Protein kinases mediate phosphorylation at serine, threonine and tyrosine side chains, and phosphatases reverse protein phosphorylation by hydrolyzing the phosphate group. Right panel: Phosphorylation causes conformational changes in proteins that either activate (top) or inactivate (bottom) protein function.
Protein Kinases
Kinases are enzymes that facilitate phosphate group transfer to substrates. Greater than 500 kinases have been predicted in the human proteome; this subset of proteins comprises the human kinome (3). Substrates for kinase activity are diverse and include lipids, carbohydrates, nucleotides and proteins.ATP is the cosubstrate for almost all protein kinases, although guanosine triphosphate is used by a small number of kinases. ATP is the ideal structure for the transfer of α-, β- or γ-phosphate groups for nucleotidyl-, pyrophosphoryl- or phosphoryltransfer, respectively (4). While the substrate specificity of kinases varies, the ATP-binding site is generally conserved (5).Protein kinases are categorized into subfamilies that show specificity for distinct catalytic domains and include tyrosine kinases or serine/threonine kinases. Approximately 80% of the mammalian kinome comprises serine/threonine kinases, and >90% of the phosphoproteome consists of pS and pT. Indeed, studies have shown that the relative abundance ratio of pS:pT:pY in a cell is 1800:200:1 (6). Although pY is not as prevalent as pS and pT, global tyrosine phosphorylation is at the forefront of biomedical research because of its relation to human disease via the dysregulation of receptor tyrosine kinases (RTKs).Protein kinase substrate specificity is based not only on the target amino acid but also on consensus sequences that flank it (7). These consensus sequences allow some kinases to phosphorylate single proteins and others to phosphorylate multiple substrates (>300) (5). Additionally, kinases can phosphorylate single or multiple amino acids on an individual protein if the kinase-specific consensus sequences are available.
Kinases have regulatory subunits that function as activating or autoinhibitory domains and have various regulatory substrates. Phosphorylation of these subunits is a common approach to regulating kinase activity (8). Most protein kinases are dephosphorylated and inactive in the basal state and are activated by phosphorylation. A small number of kinases are constitutively active and are made intrinsically inefficient, or inactive, when phosphorylated. Some kinases, such as Src, require a combination of phosphorylation and dephosphorylation to become active, indicating the high regulation of this proto-oncogene. Scaffolding and adaptor proteins can also influence kinase activity by regulating the spatial relationship between kinases and upstream regulators and downstream substrates.
Signal Transduction Cascades
The reversibility of protein phosphorylation makes this type of PTM ideal for signal transduction, which allows cells to rapidly respond to intracellular or extracellular stimuli. Signal transduction cascades are characterized by one or more proteins physically sensing cues, either through ligand binding, cleavage or some other response, that then relay the signal to second messengers and signaling enzymes. In the case of phosphorylation, these receptors activate downstream kinases, which then phosphorylate and activate their cognate downstream substrates, including additional kinases, until the specific response is achieved. Signal transduction cascades can be linear, in which kinase A activates kinase B, which activates kinase C and so forth. Signaling pathways have also been discovered that amplify the initial signal; kinase A activates multiple kinases, which in turn activate additional kinases. With this type of signaling, a single molecule, such as a growth factor, can activate global cellular programs such as proliferation (9).
Signal transduction cascades amplify the signal output. External and internal stimuli induce a wide range of cellular responses through a series of second messengers and enzymes. Linear signal transduction pathways yield the sequential activation of a discrete number of downstream effectors, while other stimuli elicit signal cascades that amplify the initial stimulus for large-scale or global cellular responses.
Protein Phosphatases
The intensity and duration of phosphorylation-dependent signaling is regulated by three mechanisms (5):
Removal of the activating ligand
Kinase or substrate proteolysis
Phosphatase-dependent dephosphorylation
The human proteome is estimated to contain approximately 150 protein phosphatases, which show specificity for pS/pT and pY residues (10,11). While dephosphorylation is the end goal of these two groups of phosphatases, they do it through separate mechanisms. Serine/threonine phosphatases mediate the direct hydrolysis of the phosphorus atom of the phosphate group using a bimetallic (Fe/Zn) center, while tyrosine phosphatases form a covalent thiophosphoryl intermediate that facilitates removal of the tyrosine residue.
Phosphorylation and Ubiquitylation
Almost all aspects of biology are regulated by reversible protein phosphorylation and ubiquitylation. Abnormalities in these pathways cause numerous diseases including cancer, neurodegeneration and inflammation – all conditions under intense scrutiny in our Unit. Deciphering how disruptions in phosphorylation and ubiquitin networks lead to disease will reveal novel drug targets and improved strategies to treat these maladies in the future.
Protein ubiquitylation is analogous to protein phosphorylation except that ubiquitin molecules are attached covalently to Lys residues, as opposed to phosphate groups becoming covalently attached to one or more Ser, Thr or Tyr residues. Like phosphorylation, ubiquitylation can alter protein properties and functions in every conceivable way. Ubiquitylation is likely to be a more versatile control mechanism than phosphorylation, as ubiquitin molecules can not only be linked to one or more amino acid residues on the same protein, but can also form ubiquitin chains.
Moreover, there are also several ubiquitin-like modifiers (ULMs), such as Nedd8, SUMO1, SUMO2, SUMO3, FAT10 and ISG15, which can become attached to proteins in reactions termed Neddylation, SUMOylation, Tenylation and ISGylation, while poly-SUMO chains (involving SUMO2 and SUMO3) are also formed in cells. Recent research has highlighted an exquisite interplay between phosphorylation and ubiquitin pathways that regulate many physiological systems.
Protein ubiquitylation is an even more versatile control mechanism than protein phosphorylation
This includes pathways of relevance to understanding innate immunity, Parkinson’s disease and cancer, emphasising the importance of integrating phosphorylation and ubiquitylation research, and not considering these separate areas to be studied in isolation.
Phosphorylation
Ubiquitylation
Discovered 1955
Discovered 1978
>500 protein kinases
~10 E1s, ~40 E2s
>600 E3 ligases
140 protein phosphatases
~100 deubiquitylases
Nobel Prize 1992
Nobel Prize 2004
First drug approval
2001 (Gleevec)
First drug approval
2003 (Bortezomib)
16 drugs approved,
>150 in clinical trials
15 drugs in Phase I/II
Current sales of
USS$15 billion p.a.
Current sales of
USS$1.5 billion p.a.
30% of Pharma R&D
<<1% of Pharma R&D
History of the development of protein phoshorylation and ubiquitylation
The MRC-PPU research focuses on unravelling the roles of protein phosphorylation and ubiquitylation pathways that have strong links to understanding human disease. This is where we can make the best use of our expertise, grasp opportunities emerging from the golden era of genetic analysis of human disease, and make a significant contribution to medical research.
Our Principal Investigators (PIs) deploy a blend of creativity, curiosity, expertise and state-of-the-art technology to tackle their selected projects. Their aim is to uncover fundamentally new knowledge on how biological systems are controlled, hopefully shedding novel insights into the understanding and treatment of disease. Effective translation of our research will also be impossible without robust interactions with drug discovery units such as the MRC Technology Centre for Therapeutics Discovery, the University of Dundee’s Drug Discovery Unit and close collaboration with pharmaceutical companies.
The latter will be greatly enhanced by major collaborations with the six pharmaceutical companies that support the Division of Signal Transduction Therapy. Access to the exceptional support services available within the MRC-PPU and DSTT also helps to maximise the competitiveness of our research groups and reinforce collaborations with our external partners.
Central questions being addressed by our PIs include understanding how ubiquitin and phosphorylation pathways are organised, characterising the interplay between these pathways, determining how they recognise and respond to signals, and uncovering how disruption of these networks causes disease. The expectation is that the data, reagents and expertise emerging from our research and working effectively with clinicians and pharmaceutical industry will enable us to devise new
MIT Scientists on Proteomics: All the Proteins in the Mitochondrial Matrix identified