Feeds:
Posts

Schizophrenia genomics

Schizophrenia genomics

Larry H. Bernstein, MD, FCAP, Curator

LPBI

Histone Methylation at H3K9; Evidence for a Restrictive Epigenome in Schizophrenia

Epigenetic changes are stable and long-lasting chromatin modifications that regulate genomewide and local gene activity. The addition of two methyl groups to the 9th lysine of histone 3 (H3K9me2) by histone methyltransferases (HMT) leads to a restrictive chromatin state, and thus reduced levels of gene transcription. Given the numerous reports of transcriptional down-regulation of candidate genes in schizophrenia, we tested the hypothesis that this illness can be characterized by a restrictive epigenome.

METHODS   We obtained parietal cortical samples from the Stanley Foundation Neuropathology Consortium and lymphocyte samples from the University of Illinois at Chicago (UIC). In both tissues we measured mRNA expression of HMTs GLP, SETDB1 and G9a via real-time RT-PCR and H3K9me2 levels via western blot. Clinical rating scales were obtained from the UIC cohort.

RESULTS   A diagnosis of schizophrenia is a significant predictor for increased GLP, SETDB1 mRNA expression and H3K9me2 levels in both postmortem brain and lymphocyte samples. G9a mRNA is significantly increased in the UIC lymphocyte samples as well. Increased HMT mRNA expression is associated with worsening of specific symptoms, longer durations of illness and a family history of schizophrenia.

CONCLUSIONS   These data support the hypothesis of a restrictive epigenome in schizophrenia, and may associate with symptoms that are notoriously treatment resistant. The histone methyltransferases measured here are potential future therapeutic targets for small molecule pharmacology, and better patient prognosis.

Schizophrenia is conceptualized as a disorder of gene transcription and regulation. Consequently, chromatin is the ideal scaffold to examine this manifested pathophysiology of schizophrenia, as it constitutes the interface between the underlying genetic code and its surrounding biochemical environment. Through post-transcriptional modifications of histone proteins, gene expression can be either transcriptionally active in a ‘euchromatic’ environment, temporarily quieted in ‘facultative heterochromatin,’ or completely silenced in ‘constitutive heterochromatin’ (Zhang and Reinberg 2001). Post-translational modifications to lysine 9 of the H3 protein (H3K9) are uniquely able to reflect these three levels of transcriptional regulation. H3K9 modifications located in the promoter regions of actively transcribed genes are often acetylated (H3K9acetyl). Conversely, quieted transcription in gene-rich areas of the genome are often associated with mono- or dimethyl H3K9 (H3K9me2), while completely silenced areas of the genome are associated with trimethylated H3K9 (H3K9me3). In particular, the formation of H3K9me2 is catalyzed by histone methyltransferases (HMTs), including Eu-HMTase2 (G9a), Eu-HMTase1 (GLP), and SETDB1 (Krishnan et al. 2011) The different degrees of lysine methylation are possible due to the cooperation of these HMTs, which are able to form large heteromeric complexes (Fritsch et al. 2010).

H3K9 methylation has not been extensively studied in the brain, and until recently the regulation and role of the enzymes responsible for its formation were not known. Postnatal, neuronal-specific GLP/G9a knockdown produces a significant decrease in global H3K9me2 levels and inappropriate gene expression, leading to deficits in learning, reduction in exploratory behaviors and motivation in mice (Schaefer et al. 2009; Shinkai and Tachibana 2011;Tachibana et al. 2005;Tzeng et al. 2007). In humans, deletions or loss-of-function mutations of G9a results in Kleefstra Syndrome, characterized by a severe learning disability and developmental delay (Nillesen et al. 2011; Kleefstra et al. 2005). In humans, increased SETDB1 mRNA expression and resultant elevated H3K9me3 levels have been documented in Huntington Disease (HD) (Ryu et al. 2006; Fox et al. 2004).

A hallmark of schizophrenia is aberrant gene regulation, with the vast majority of studies reporting a down-regulation of gene transcription, suggesting that the epigenome of patients with schizophrenia is restrictive (Akbarian et al. 1995;Guidotti et al. 2000;Fatemi et al. 2005; Impagnatiello et al. 1998; Jindal et al. 2010). Postmortem brain studies indicate a reduction of an open histone modification, H3K4me3, and elevated expression of the histone deacetylase HDAC1 mRNA expression (Cheung et al. 2010; Sharma et al. 2008). The use of peripheral blood mononuclear cells as a reflection of overall chromatin state or at particular gene promoters has been successfully implemented in clinical studies of subjects afflicted depression, alcoholism, and schizophrenia. Peripheral blood cell studies have indicated that schizophrenia is associated with an abnormally condensed chromatin structure; (Issidorides et al. 1975; Kosower et al. 1995) specifically increased restrictive H3K9me2 and reduced H3K9 acetylation (Gavin et al. 2009b). Additionally, H3K9 acetylation in schizophrenia patients is less responsive to in vivo treatment with HDAC inhibitors when compared to both patients with bipolar disorder and nonpsychiatric controls (Sharma et al. 2006;Gavin et al. 2008). Finally, a correlation exists between age of onset of psychiatric symptoms of schizophrenia and baseline levels of H3K9me2 (Gavin et al. 2009b). It is the hypothesis of this paper that schizophrenia can be characterized by a restrictive epigenome, which is observable in both brain and peripheral blood, and has specific and observable effects on psychopathology. We have focused on levels of H3K9me2, indicative of facultative heterochromatin, and the enzymes that catalyze this modification, in patients with schizophrenia to examine their role in this illness.

3.1. mRNA Levels of HMT Gene Expression

We performed a multiple linear regression with each HMT gene of interest as the dependent variable. For postmortem brain tissue we examined sex, age, pH, RIN and diagnosis, whereas for lymphocytes we examined sex, age, and diagnosis as explanatory variables. In these two cohorts, we found that a diagnosis of schizophrenia is a significant predictor for GLP mRNA expression in both postmortem brain samples (β=0.44, F(1,24)=5.80, p<0.05), and in lymphocytes (β=−0.41, F(1,40)=7.91, p<0.01), indicating that patients with schizophrenia demonstrated increased levels compared to nonpsychiatric controls (Fig. 1a). Similarly, a diagnosis of schizophrenia is also a significant predictor for increased SETDB1 mRNA levels in both postmortem brain samples (β=0.39, F(1, 24)=4.33,p<0.05), and in lymphocytes (β=0.37, F(1,40)=6.19, p<0.05; Fig. 1b). A diagnosis of schizophrenia is not a significant predictor for elevated G9a mRNA levels in postmortem brain samples (β=0.22, F(1,24)=1.22, p=ns), but is for lymphocytes (β=−0.317, F(1,40)=4.46, p<0.05; Fig. 1c).

Interestingly, in both postmortem tissue (r=0.79, p<0.001) and lymphocytes (r=0.54, p<0.001), GLP and SETDB1 mRNA expression are positively correlated (data not shown).

mRNA expression in both postmortem parietal cortical samples from the Stanley Foundation Neuropathology Consortium (on the left) and lymphocyte samples from University of Illinois at Chicago (on the right) and a. GLP mRNA levels, b. G9a mRNA levels and

To establish whether there exist differences in HMT mRNA among schizophrenic patients taking psychotropic medication, and those who were not, we performed a second multiple linear regression analysis on each individual cohort. The overall or type-specific use of antipsychotic, antidepressant or mood stabilizing medication are not significant predictors of HMT mRNA levels in either the postmortem or the lymphocyte cohorts.

3.2. H3K9me2 levels in the Postmortem Brain

In a previously published study we documented elevated global H3K9me2 levels in lymphocytes obtained from schizophrenia patients compared to nonpsychiatric controls (Gavin et al. 2009b). In the current study we attempted to discern whether this abnormality in a restrictive histone modification is present in brain tissue from the SFNC cohort as well. We performed a multiple linear regression with H3K9me2 levels as the dependent variable, with sex, age, and diagnosis as explanatory variables. We found that diagnosis of schizophrenia is a significant predictor of H3K9me2 levels extracted from postmortem brain tissue (β=0.40, F(1,24)=4.58, p<0.05; Fig. 2). GLP (r=0.65, p<0.001) and SETDB1 (r=0.44,p<0.05) are positively correlated with H3K9me2 levels, as discovered through a Pearson Correlation (data not shown).

H3K9me2 levels are significantly increased parietal cortical samples from patients with schizophrenia when compared to nonpsychiatric controls. Below graph, a representative western blot image is shown. All data is shown as a ratio of optical density …
3.3. Clinical Correlates with Lymphocyte HMT mRNA Levels

Lymphocyte levels of G9a mRNA demonstrated a positive correlation with the PANSS negative subscale total (r=0.61, p<0.05; Fig. 3a), GLP mRNA is positively correlated with the PANSS general subscale total, (r=0.64, p<0.01; Fig. 3b), and SETDB1 mRNA is more highly expressed in patients with longer durations of illness compared to both normal controls and patients in the ‘first episode psychosis’ group (ANOVA, F(2,30)=3.66, p<0.01; Fig. 3c). Patients with a family history of schizophrenia also had significantly increased levels of lymphocyte SETDB1 mRNA (t18=2.52, p<0.05; Fig. 3d).

Clinical Correlates with Lymphocyte HMT mRNA Levels a. A rise in G9a mRNA is significantly correlated with increasing PANSS negative subscale totals; p<0.05. b. GLP mRNA is significantly increased upon worsening of PANSS general subscale scores;
4. Discussion

The current paper demonstrates an increase in GLP and SETDB1 mRNA in both postmortem parietal cortex and lymphocyte samples from patients with schizophrenia, as well as an increase in G9a mRNA in lymphocytes. G9a and GLP are responsible for the bulk of H3K9me2 modifications across the genome (Shinkai and Tachibana 2011; Tachibana et al. 2005), and SETDB1 is the only euchromatic HMT to specifically di- and tri-methylate H3K9 (Zee et al. 2010;Wang et al. 2003), but all three of these HMTs are able to form large heteromeric complexes, thus allowing for the sequential degrees of lysine methylation (Fritsch et al. 2010). Further, we demonstrate that the ultimate outcome of their catalytic activity, H3K9me2, is significantly increased in patients with schizophrenia as compared to nonpsychiatric controls. Moreover, GLP and SETDB1 mRNA are positively correlated with H3K9me2 levels. These findings add gravity to our previous demonstration of increased H3K9me2 levels in lymphocytes from schizophrenic patients (Gavin et al. 2009b).

Our investigations into the role of H3K9me2 in schizophrenia pathophysiology, as opposed to other H3K9 modifications, were motivated by the hypothesis that initial inactivation of gene promoter activity at various schizophrenia candidate genes can result in gradual entrenchment of the heterochromatin state as a result of disease chronicity and disuse (Sharma et al. 2012). Areas of H3K9me2 can then act as a platform for additional restrictive adaptors, thus resulting in the spreading of heterochromatin across previously unmodified gene rich areas. As such, the gene altering effects of medications are unable to overcome this restrictive burden, leading to repeated medication failures (Sharma et al. 2012). Support for this hypothesis has been previously demonstrated, (Sharma et al. 2008; Benes et al. 2007) including the finding that schizophrenia patients clinically treated for four weeks with the HDAC inhibitor, valproic acid, displayed no increase in peripheral blood cell acetylated histones 3 or 4 as compared to bipolar patients (Sharma et al. 2006). Here, we find an increase in both H3K9me2 levels and the enzymes which catalyze this modification, providing additional evidence supporting an increased heterochromatin state in schizophrenia.

The major role of the parietal cortex is to integrate and evaluate sensory information (Andersen & Buneo, 2003; Cohen & Andersen, 2002). It is one of the last areas of the human brain to fully mature, (Geschwind, 1965) thus early life environmental insults could have a profound effect. Disordered thought, a common symptom in schizophrenia, is most likely explainable through disruption of this system (Torrey, 2007). Patients with schizophrenia report either acute (McGhie & Chapman, 1961) or blunted (Freedman, 1974) sensitivity to sensory stimuli, and demonstrate overall impairment of sensory integration (Manschreck & Ames, 1984; Torrey, 1980). Similar patterns of transcriptional regulation are observed across the cortex, consequently, results from the parietal cortex likely reflect patterns of gene transcription in other brain regions (Hawrylycz et al., 2012).

Due to its heterogeneity, examining schizophrenia as a binary measurement of illness when examining biological relevancy can be limiting (Arango et al. 2000;Buchanan and Carpenter 1994). Through utilizing the PANSS, biological underpinnings that do not demarcate cleanly with diagnostic categories, can be correlated directly with specific symptomatology. Correlations between methyltransferase enzymes and clinical symptomatology indicate that these restrictive enzymes could contribute to specific facets of the illness, particularly negative and general symptoms, which are particularly resistant to improvement. Increased severity of negative symptoms are correlated with poorer disease prognosis, (Wieselgren et al. 1996) and are not alleviated through our current regimen of psychotropics.

Additionally, SETDB1 mRNA levels are also correlated with other markers of a worse disease prognosis, including a more chronic form of the illness, and a history of schizophrenia in the family. Pharmacological targeting of increased levels of SETDB1improves motor performance and extends survival in HD mice, indicating the promise of treatments that modulate gene silencing mechanisms in neuropsychiatric disorders (Ryu et al. 2006).

The main weakness of this current study was that clinical symptoms were correlated with mRNA extracted from peripheral tissue. Enzymes relating specifically to synaptic function were not examined, but rather overall mechanisms of epigenetic regulation that are not tissue specific. While postmortem investigations are able to serve as a useful snapshot at the time of death, the ability to measure and monitor histone marks over time as marker of disease progression, improvement, or as a predictor of pharmacological response are only possible using peripheral blood cells. A strong rationale for the use of blood chromatin ‘levels’ as a type of biosensor that registers the epigenetic milieu has been proposed elsewhere (Sharma 2012). Furthermore, previous studies have indicated the mRNA patterns of expression patterns in lymphocytes are capable of distinguishing between psychiatric diagnostic groups (Middleton et al. 2005).

The present study hypothesized that schizophrenia may be due to abnormal regulation of fundamental epigenetic mechanisms, thus, we chose to measure overall levels of H3K9me2 opposed to specific gene promoters, based on the assumption that while the individual genes silenced in the brain and blood may not be the same, similar global pathogenic processes may be occurring in both tissues.

The results of this paper indicate that chromatin is more restrictive in patients with schizophrenia, and may be significantly contributing to disease pathology. If, through pharmacological interventions, a reduction in this histone hyper-restrictive insult in schizophrenia can be relaxed, inducing a type of “genome softening,” then neuronal gene expression can be enhanced, thus allowing for increased plasticity and improved therapeutic response (Sharma 2005).

• Akbarian S, Kim JJ, Potkin SG, Hagman JO, Tafazzoli A, Bunney WE, Jones EG., Jr Gene expression for glutamic acid decarboxylase is reduced without loss of neurons in prefrontal cortex of schizophrenics. Arch. Gen. Psychiatry.1995;52(4):258–266. [PubMed]
• Andersen RA, Buneo CA. Sensorimotor integration in posterior parietal cortex. Advances in Neurology. 2003;93:159–177. [PubMed]
• Arango C, Kirkpatrick B, Buchanan RW. Neurological signs and the heterogeneity of schizophrenia. Am. J. Psychiatry. 2000;157(4):560–565.[PubMed]
• Benes FM, Lim B, Matzilevich D, Walsh JP, Subburaju S, Minns M. Regulation of the GABA cell phenotype in hippocampus of schizophrenics and bipolars. Proc. Natl. Acad. Sci. U.S.A. 2007;104(24):10164–10169. [PubMed]
• Buchanan RW, Carpenter WT. Domains of psychopathology: an approach to the reduction of heterogeneity in schizophrenia. J. Nerv. Ment. Dis.1994;182(4):193–204. [PubMed]
• Chase KA, Sharma RP. Nicotine induces chromatin remodelling through decreases in the methyltransferases GLP, G9a, Setdb1 and levels of H3K9me2. Int. J. Neuropsychopharmacol. 2012:1–10. [PubMed]
• Cheung I, Shulha HP, Jiang Y, Matevossian A, Wang J, Weng Z, Akbarian S. Developmental regulation and individual differences of neuronal H3K4me3 epigenomes in the prefrontal cortex. Proc. Natl. Acad. Sci. U.S.A.2010;107(19):8824–8829. [PubMed]
• Cohen YE, Andersen RA. A common reference frame for movement plans in the posterior parietal cortex. Nature Reviews. Neuroscience. 2002;3(7):553–562. [PubMed]
• Fatemi SH, Stary JM, Earle JA, Araghi-Niknam M, Eagan E. GABAergic dysfunction in schizophrenia and mood disorders as reflected by decreased levels of glutamic acid decarboxylase 65 and 67 kDa and Reelin proteins in cerebellum. Schizophr. Res. 2005;72(2–3):109–122. [PubMed]
• Fox JH, Barber DS, Singh B, Zucker B, Swindell MK, Norflus F, Buzescu R, Chopra R, Ferrante RJ, Kazantsev A, Hersch SM. Cystamine increases L-cysteine levels in Huntington’s disease transgenic mouse brain and in a PC12 model of polyglutamine aggregation. J. Neurochem. 2004;91(2):413–422.[PubMed]

Balancing Histone Methylation Activities in Psychiatric Disorders

Alterations in histone lysine methylation and other epigenetic regulators of gene expression contribute to changes in brain transcriptomes in mood and psychosis spectrum disorders, including depression and schizophrenia. Genetic association studies and animal models implicate multiple lysine methyltransferases (KMTs) and demethylases (KDMs) in the neurobiology of emotion and cognition. Here, we review the role of histone lysine methylation and transcriptional regulation in normal and diseased neurodevelopment and discuss various KMTs and KDMs as potential therapeutic targets in the treatment of neuropsychiatric disease.

Schizophrenia and depression are major psychiatric disorders that lack consensus neuropathology and, in a large majority of cases, a straightforward genetic risk architecture. Furthermore, many patients on the mood and psychosis spectrum show an incomplete response to conventional pharmacological treatments which are mainly aimed at monoamine signaling pathways in the brain (Box 1).

Box 1  Schizophrenia and Depression

Schizophrenia affects 1% of the general population and typically begins during young-adult years, although cognitive disturbances could be evident much earlier. The disease is, in terms of genetics and etiology, highly heterogeneous, and increasingly defined as different and partially independent symptom complexes: (i) psychosis with delusions, hallucinations and disorganized thought; (ii) cognitive dysfunction including deficits in attention, memory and executive function; and (iii) depressed mood and negative symptoms including inability to experience pleasure (anhedonia), social withdrawal and poor thought and speech output [42]. Currently prescribed antipsychotics, which are mainly aimed at dopaminergic and/or serotonergic receptor systems, exert therapeutic effects on psychosis in approximately 75% of patients. However, it is the cognitive impairment which is often the more disabling and persistent feature of schizophrenia [42]. Currently there are no established pharmacological treatments for this symptom complex. However, given that cognitive dysfunction is an important predictor for long-term outcome, this area is considered a high priority in schizophrenia research, as reflected by initiatives combining efforts from government agencies, academia and industry, including MATRICS (the Measurement and Treatment Research to Improve Cognition in Schizophrenia) [42].

Affective disorders as a group show, in terms of genetic risk architecture, some overlap with schizophrenia. For example, rare structural variants, including the balanced translocation at the Disrupted-in-Schizophrenia 1 (DISC-1) locus (1q42) or the 22q11 deletion are, in different individuals, associated with either mood disorder or schizophrenia [81, 82].

Depression, including its more severe manifestation, major depressive disorder which has a lifetime risk of 10–15% for the U.S. general population, is associated with excessive sadness, anhedonia, negative thoughts, and neurovegetative symptoms including changes in sleep pattern and appetite [1]. The disorder, which in more severe cases is accompanied by delusions, hallucinations and other symptoms of psychosis, often takes a chronic and recurrent course. Conventional antidepressant therapies primarily target monoamine metabolism and reuptake mechanisms at the terminals of serotonergic, noradrenergic and dopaminergic neurons. Unfortunately, up to 40% of cases show an insufficient response to these pharmacological treatments [1]. In addition, many antipsychotic and antidepressant drugs have significant side effect burden, including weight gain, diabetes and metabolic defects, extrapyramidal symptoms and sexual dysfunction [83, 84].

However, there is evidence that dysregulated gene transcription, indicative of compromised neural circuitry, contributes to disordered brain function in psychosis and mood spectrum disorder [1, 2]. While no single gene transcript is consistently affected, alterations in RNA levels contribute to defects in GABAergic inhibitory neurotransmission and more generally, synapse organization and function, metabolism and mitochondrial functions, and oligodendrocyte pathology [35]. While a number of transcriptional and post-transcriptional mechanisms may contribute to these changes, chromatin-associated proteins and epigenetic regulators invoked in sustained alterations of gene expression and function (Box 2) could play a critical role in the pathophysiology, or treatment of mental illness [6,7]. Indeed, there is evidence that changes in acetylation of histone lysine residues, which are broadly associated with active gene expression [8] and considered a potential therapeutic target for cancer and other medical conditions [9], also impact gene expression patterns in the brain and thereby influence emotional and cognitive functions. For example, mice or rats exposed to systemic treatment, or localized intracranial injections of class I/II histone deacetylase inhibitors (HDACi) exhibit behavioral changes reminiscent of those elicited by conventional antidepressant drugs [1013]. The short chain fatty acid derivative valproic acid, widely prescribed for its mood-stabilizing and anticonvulsant effects, induces brain histone hyperacetylation at a select set of gene promoters when administered to animals at comparatively high doses [14]. Conversely, overexpression of selected HDACs in neuronal structures implicated in the neurobiology of depression, including the hippocampus, elicit a pro-depressant behavioral phenotype [12]. Similarly, animals treated with class I/II HDACi often show improved performance in learning and memory paradigms and furthermore, drug-induced inhibition or activation of class III HDAC (also known as sirtuins) elicits changes in motivational and reward-related behaviors [15]. Therefore, the orderly balance between histone acetyl-transferase and deacetylase activities is critical for cognitive performance and synaptic and behavioral plasticity [16]. Likewise, However, HDACs interfere with acetylation of many non-histone proteins in the nucleus and cytoplasm [16], and moreover, some of these drugs carry a significant side effect burden [9]. Therefore, in light of the emerging role of epigenetic mechanisms in the neurobiology of these and other psychiatric conditions [6], the therapeutic potential of chromatin modifying drugs, other than the HDACi, warrants further investigations. This review will focus on histone lysine methylation, one of the most highly regulated chromatin markings in brain and other tissues. Multiple methyltransferases (KMTs) and demethylases (KDMs) were recently implicated in emotional and cognitive disorders (Fig. 1), and these types of chromatin modifying enzymes could emerge as novel targets in the treatment of mood and psychosis spectrum disorders.

Box 2  Epigenetic regulators and chromatin structure and function

Epigenetics, in the broader sense, applies both to dividing and postmitotic cells, and refers to a type of cellular memory that involves sustained changes in chromatin structure and function, including gene expression, in the absence of DNA sequence alterations (For in depth discussion, see [85]). Chromatin is essentially a repeating chain of nucleosomes comprised of genomic DNA wrapped around an octamer of core histones H2A/H2B/H3/H4. The histone proteins are intensely decorated with epigenetic information, with more than 70 (amino acid) residue-specific sites subject to various types of post-translational modifications (PTM). These include lysine (K) acetylation, methylation and poly ADP-ribosylation, arginine (R) methylation, and serine (S), threonine (T), tyrosine (Y) and histidine (H) phosphorylation [86]. In addition, a subset of the histone H2A, H2B and H4 lysines are covalently linked to the small protein modifiers ubiquitin and SUMO [87, 88]. Finally, epigenetic markings in genomic DNA include 5-methyl-cytosine and the related form, 5-hydroxy-methyl-cytosine [85]. These DNA and nucleosomal histone markings define the functional architecture of chromatin (see main text).

Proteins associated with methylation and other histone PTM are typically defined either as ‘writers’, ‘erasers’ or ‘readers’, essentially differentiating between the process of establishing or removing a mark as opposed to providing a docking site for chromatin remodeling complexes that regulate transcription, or induce and maintain chromatin condensation [18, 86, 89]. As it pertains to the brain, especially in the context of neuropsychiatric disease, a substantial body of knowledge has been generated for a select set of site-specific (K) methyltransferases and demethylases (Fig. 1A). In contrast, many PTMs are recognized by large numbers of reader proteins [90], but to date only very few of these readers have been explored in the brain. To mention just two examples, there are approximately 75 reader proteins specifically associated with histone H3-trimethyl-lysine 4 (H3K4me3), including several components of the SAGA complex ascribed with a key role for transcriptional initiation at RNA polymerase II target genes [90]. In contrast, H3K9me3, generally considered a repressive mark, provides a central hub for heterochromatin (associated) proteins including several members of the HP1 family and zinc finger domain containing molecules [90]. There is additional complexity because pluripotent stem cells and additional cell types decorate many of their promoters with ‘bivalent domains’ which include both open chromatin-associated (methylated H3K4 and H3/H4 acetylation) and repressive (methylated H3K27) marks [91, 92].

Regulation of histone (K) methylation. (A) Listings of residue-specific KMTs and KDMs for H3K4/9/27/36/79 and H4K20. The majority of KMT and KDM are highly specific for a single histone residue, while a few enzymes target multiple residues, as indicated. Red marked KMT/KDM are implicated in neurodevelopment or psychiatric disease as discussed in main text. The non-catalytic JARID2 regulates activity and function of related KMTs. (B) Simplified scheme for selected mono- and trimethylated histone lysine markings implicated in transcriptional regulation, silencing and enhancer function.

The methylation of lysine and arginine residues, like other histone PTM, define chromatin states and function [8, 17]. To date, more than 20 methyl-marks on K and R residues have been described [18, 19]. As it pertains to the lysines, the majority of studies focused on the regulation and methylation-related functions of six specific sites: H3K4, H3K9, H3K27, H3K36, H3K79 and H4K20 [18]. For H3K4 and H3K9/K27, there is additional complexity because specific information is also conveyed (i) for H3K4, the unmethylated lysine effectively serving as a DNA methylation signal [20, 21], and (ii) for H3K9/K27 acetylation as an alternative PTM [22, 23] (Fig. 1A). For the aforementioned H3/H4 residues, specific biological functions and their interrelations with functional chromatin states, including transcriptional initiation and elongation, heterochromatic silencing and other mechanisms, have been described for the trimethyl-, and for some of the mono- and dimethyl-modifications (Representative examples are provided in Fig. 1B. See ref [17] for a detailed description of the histone methylation code and its relation to other types of histone PTM).

The following examples further illustrate the complex regulation of histone lysine methylation. Monomethylation of histone H3-lysine 4 (H3K4me1) plays an important role in neuronal activity-induced transcription at enhancer sequences [24], but the related forms, H3K4me3/2 are primarily found at the 5′ end of genes, with H3K4me3 mostly arranged as distinct and sharp peaks within 1–2Kb of transcription start sites. The H3K4me3 mark provides a docking site at the 5′ end of genes for chromatin remodeling complexes that either facilitate or repress transcription [25]. Furthermore, mono-methyl-H4-K20 shows strong positive correlation with gene expression at promoters enriched with CpGs, which contrasts to the trimethylated form of the same residue which generally is associated with repressed chromatin [23]. Taken together, these examples illustrate that even closely related histone lysine methylation markings are potentially associated with very different chromatin states.

To date, H3K4, H4K9, H3K27 and H4K20 methylation signals were measured at specific loci and genome-wide in human brain, essentially confirming that each of these epigenetic markings defines the same type of chromatin as in the peripheral tissues or animal brain [2630]. Interestingly, a subset of psychotherapeutic drugs including the mood-stabilizer valproate, the atypical antipsychotic clozapine and some monoamine oxidase inhibitors and stimulant drugs interfere with brain histone methylation (Table 1).

Molecular mechanisms of histone (lysine) methylation

A complex system of site-specific methyltransferases, which transfer the methyl-group of S-Adenosyl-Methionine (SAM) to lysine residues, has evolved in the vertebrate cell. There are an estimated 70 human genes harboring the Su(var)3–9,Enhancer of Zeste,Trithorax (SET) domain, which spans approximately 130 amino acids essential for KMT enzymatic activity [31]. The only known exception is the H3K79-specific methyltransferase, KMT4/DOT1L [31, 32], which lacks a SET. Each of the histone K residues discussed above is the preferential target of a distinct set of methyltransferase proteins (Fig. 1A)[19]. Of note, these histone-modifying enzymes are thought not to access histone substrates directly unless recruited by DNA-bound activators and repressors, a mechanism which could target each methyltransferase to a highly specific set of genomic loci [19].

An equally complex system exists for the site-specific lysine demethylases (Fig. 1A). There are at least two different mechanisms for active histone demethylation. The first enzyme type, represented by lysine-specific demethylase 1 (LSD1/KDM1A), contains an amine oxidase domain and requires flavin adenine dinucleotide (FAD) as a cofactor to demethylate di- and mono-methylated lysines. LSD1 and its homologue, LSD2, are primarily H3K4 demethylases, albeit depending on species and context, and activity against H3K9 also has been described [18]. Interestingly, monoamine oxidase inhibitors (MAOi) such as tranylcypromine or phenelzine — powerful antidepressants that exert their therapeutic effects mainly by elevating brain monoamine levels through inhibition of MAO-A/B — also block LSD1 type histone demethylases [18]. While LSD1 is thought to regulate histone methylation at promoters, LSD2 is bound to transcriptional elongation complexes and removes H3K4 methyl markings in gene bodies, thereby facilitating gene expression by reducing spurious transcriptional initiation outside of promoters [33]. The second type of demethylase, which in contrast to LSD1/LSD2 is capable of demethylating trimethyl markings, involves Fe2+-dependent dioxygenation by Jumonji-C (JmJC) domain-mediated catalysis [18]. Given that each of the KMTs and KDMs described has a different combinatorial set of functional domains and (protein) binding partners [18, 34], it is likely that the various site-specific methyltransferases and demethylases are largely non-redundant in function.

KMTs and KDMs with a role in cognition and neuropsychiatric disease

An increasing number of KMTs and KDMs are implicated in neurodevelopment and major psychiatric diseases (marked in red in Fig. 1A).

H3K4

The first histone lysine methyltransferase explored in the nervous system was KMT2A/MLL1, a member of the mixed-lineage leukemia (MLL) family of molecules. Mice heterozygous for an insertional (lacZ) loss-of-function Mll1mutation show distinct abnormalities in hippocampal plasticity and signaling [35], in conjunction with defects of learning and memory [36]. Of note, the hippocampus, and other portions of the forebrain including prefrontal cortex and ventral striatum, are frequently implicated in the neural circuitry of mood and psychosis spectrum disorders [1]. Furthermore, conditional deletion of Mll1resulted in defective neurogenesis during the early postnatal period [37]. While the full spectrum of MLL1 target genes in neurons and glia awaits further investigation, dysregulated expression of certain transcription factors such as DLX2, a key regulator for the differentiation of forebrain GABAergic neurons (which are essential for inhibitory neurotransmission and orderly synchronization of neural networks) [38], may contribute to the cognitive phenotype of the Mll1mutant mice. These observations may be relevant for the pathophysiology of schizophrenia, because some patients show in the prefrontal cortex a deficit in H3K4-trimethylation and gene expression at a subset of GABAergic promoters, including GAD1 encoding a GABA synthesis enzyme [28]. While the timing and age-of-onset for this ‘molecular lesion’ remains unknown, it is of interest that in the normal PFC, H3K4 methylation at the site of GABAergic genes progressively increases during the transition from fetal period to childhood to adulthood [28]. The epigenetic vulnerability of the Gad1 promoter during such prolonged developmental periods is further emphasized by recent animal studies demonstrating that Gad1-DNA methylation and histone acetylation are heavily influenced by the level of maternal care in the neonatal period/pre-weanling period [39].

There is additional evidence that epigenetic fine-tuning of the brain’s H3K4 methyl-markings is critical for orderly neurodevelopment. Of note, loss-of-function mutations in KDM5C/JARID1C/SMCX, an X-linked gene encoding a H3K4 demethylase, have been linked to mental retardation [40] and autism spectrum disorders [41]. The KDM5C gene product operates in a chromatin remodeling complex together with HDAC1/2 histone deacetylases and the transcriptional repressor REST, thereby poising neuron-restrictive silencer elements for H3K4 demethylation and decreased expression of target genes including synaptic proteins and sodium channels [40]. However, because this study was conducted with the HeLa cell line, it remains to be determined whether similar mechanisms operate in the nervous system.

In addition to its role in neurodevelopment, MLL-mediated H3K4 methylation could play a potential role for the treatment of psychosis. The atypical antipsychotic clozapine, which has a somewhat higher therapeutic efficacy when compared to conventional antipsychotics that function primarily as dopamine D2 receptor antagonists [42], upregulates H3K4 tri-methylation at the Gad1/GAD1GABA synthesis enzyme gene promoter. These effects were not mimicked in dopamine receptors D2/D3 (Drd2/3) compound null mutant mice, suggesting that blockade of dopamine D2-like receptors is not sufficient for clozapine-induced H3K4 methylation [28]. In the human PFC, GAD1-associated H3K4 methylation was increased in subjects exposed to clozapine, as compared to subjects treated with conventional antipsychotics. Conversely, mice heterozygous for the H3K4-specific KMT, mixed-lineage leukemia 1 (MLL1), exhibited decreased H3K4 methylation at brain Gad1 [28]. Therefore, it is possible that MLL1, which is highly expressed in GABAergic and other neurons of the adult cerebral cortex [28], will in the future emerge as a novel target for the treatment of psychosis. Questions that remain to be resolved include (i) the molecular pathways linking clozapine — a drug that impacts dopaminergic, serotonergic, muscarinic and other signaling pathways — to MLL1-mediated histone methylation, and (ii) whether or not the clozapine-induced changes in H3K4 methylation are restricted to GABAergic gene promoters or, alternatively, the reflection of more widespread epigenetic changes throughout the genome. Of note, clozapine’s effects on H3K4 methylation require intact brain circuitry and cannot be mimicked in cultured neurons differentiated from forebrain progenitor cells [43]. This finding is in good agreement with the recent observation that some of clozapine’s therapeutic effects require an intact serotonergic system, particularly its presynaptic components [44].

H3K9

The 9q34 subtelomeric deletion syndrome, which includes mental retardation and other developmental defects, is caused by deleterious mutations and haploinsufficiency of euchromatin histone methyltransferase 1 (EHMT1, also known as GLP and KMT1D) [45]. This gene encodes a H3K9-specific methyltransferase that operates in a multimeric complex that includes its closest homologue, G9a/KMT1C, and additional H3K9-specific HMTs [46]. Studies in mutant mice suggest that the GLP/G9a complex is important for suppression of non-neuronal and progenitor genes in mature neurons, and loss of this complex has deleterious effects on cognition and other higher brain functions [47]. Furthermore, G9a-mediated H3K9 methylation events within the reward circuitry, including the ventral striatum, are critical intermediates for the long-term effects of cocaine on reward behavior and neuronal morphology [48]. This would suggest that GLP/G9a, and proper regulation of H3K9 levels, is important for orderly brain function both in developing and mature brain.

Furthermore, changes in motivational and affective behaviors could be elicited by overexpression of the H3K9-HMT, SET domain bifurcated 1 (KMT1C/SETDB1/ESET), in adult forebrain neurons [49]. Interestingly, SETDB1 occupancy in neuronal chromatin is highly restricted, and may be confined to less than 0.75% of annotated genes [49]. However, among these are several NMDA and other ionotropic glutamate receptor subunit genes, including Grin2a/b (Nr2a/b)[49]. Mild to moderate inhibition of NMDA receptor-mediated (including Grin2b) neurotransmission elicits a robust improvement of depressive symptoms in some mood disorder patients [50], and, indeed, SETDB1-mediated H3K9 methylation and repressive chromatin remodeling at the Grin2b locus was associated with antidepressant-like behavioral phenotypes in the Setdb1 transgenic mice [49]. Of note, NMDA receptor antagonists, including GRIN2B-specific drugs, elicit significant therapeutic benefits even in subjects who failed multiple trials of selective serotonin reuptake inhibitors (SSRI) and other conventional antidepressants [50]. However, drugs directly acting at the NMDA receptor site have an unfavorable side effect profile, and therapeutic strategies aimed at SETDB1 expression and activity may therefore provide an alternative strategy.

Interestingly, mice with a genetic ablation of Kap1, encoding the SETDB1 binding partner KRAB-associated protein 1, also known as TRIM28/TIF1b/KRIP1)[51], show increased anxiety and deficits in cognition and memory [52], which are phenotypes that are broadly opposite from those observed in mice with increasedSetdb1 expression in brain [49]. These findings further speak to the therapeutic potential of the Kap1-Setdb1 repressor complex in the context of neuropsychiatric disease.

Finally, the H3K9-specific demethylase, KDM3A/Jmjd1A, showed increased H3K9-methylation at its own promoter in the ventral striatum of mice exposed to social defeat (a type of stressor associated with a depression-like syndrome in these animals), while mice that were treated with a conventional antidepressant or that were resilient to this type of stress did not show changes in KDM3A promoter methylation [53]. While it is unclear whether KDM3A or some other demethylase acitivity is altered in the depressed animals, the same study [53] reported widespread repressive histone methylation changes, including increased dimethyl-H3K9 and methylated H3K27 at hundreds of gene promoters in stress susceptible animals, which further emphasizes the importance of these PTMs for the epigenetics of mood disorder.

H3K27

The H3K27-selective methyltransferase, KMT6A, also known as Enhancer of zeste homolog2 (EZH2), is associated with the polycomb repressive chromatin remodeling complex 2 (PRC2) [54], and essential for cortical progenitor cell and neuron production. Consequently, loss of EZH2 function is associated with severe thinning of the cerebral cortex and a disproportionate loss of neurons residing in upper cortical layers I–IV [55]. Likewise, the H3K27-specific demethylase, JMJD3, is important for neurogenesis and neuronal lineage commitment [56]. Furthermore, H3K27 methylation is dynamically regulated in mature brain and involved in the neurobiology of major psychiatric disease. For example, changes in expression of brain-derived neurotrophic factor (Bdnf) in hippocampus of mice exposed to environmental enrichment or chronic stress are associated with opposite changes in the H3K27me3 mark at a subset of Bdnf gene promoters [12,57]. In addition, acute stress leads to an overall decrease in hippocampal H3K27me3 and H3K9me3 [58]. Furthermore, in the orbitofrontal cortex of suicide completers, alterations in H3K27 methylation were described at the TRKB gene, encoding the high affinity receptor for the nerve growth factor molecule, BDNF [27]. Changes in the balance between histone H3K4 and H3K27 methylation, or DNA cytosine and H3K27 methylation may also contribute to GABAergic gene expression deficits in schizophrenia [28, 43]. To date it remains unclear which of the various H3K27-specific KMTs and KDMs (Fig. 1) are involved in these disease-related alterations in postmortem brain tissue. Of note, the Jumonji and Arid containing protein 2 (JARID2), which by itself lacks catalytic activity but is crucial for subsequent H3K27 or H3K9 methylation by recruiting the polycomb PRC2 complex to its target genes [59, 60], is located within the schizophrenia susceptibility locus on chromosome 6p22 and confers genetic risk in multiple populations of different ethnic origin [61, 62]. While the biological functions of JARID2 have been studied primarily in the context of transcriptional regulation in stem cells [63, 64], this gene shows widespread expression in the mature nervous system [65], implying JARID2-mediated control over polycomb repressive chromatin remodeling in the adult brain.

H3K36 and H4K20

Epigenetic dysregulation of nuclear receptor-binding SET domain containing protein 1/KMT3B could play a role in some neuro- and glioblastomas [66], but like for other H3K36 and H4K20 regulating enzymes (Fig. 1), to date little is known about their role in neurodevelopment, cognition and psychiatric disease. Strikingly, however, KMT3A/HYPB/SETD2, a member of the SET2 family of KMTs mediating H3K36 methylation [67], is also known as huntingtin-interacting protein 1 (HIP-1) or huntingtin(yeast)-interacting protein B (HYPB) [68]. Huntington’s is a triplet repeat disorder and chronic neurodegenerative condition with motor symptoms and cognitive defects, and significant changes in mood and affect [69]. Whether or not there is altered H3K36 methylation in the neuronal populations that are at risk for degeneration is unclear. Furthermore, the huntingtin/KMT3A interaction has been documented for yeast [68] but not brain. Of note, wildtype huntingtin is a facilitator of polycomb complex PCR2-mediated H3K27 methylation [70], and furthermore, H3K4 and H3K9 methylation changes have been reported in preclinical model systems and postmortem brains with Huntington’s disease [71, 72]. Therefore, it is possible that transcriptional dysregulation in this condition is associated with aberrant methylation patterns of multiple lysine residues.

KMTs and KDMs as Novel Drug Targets

Given the emerging role of histone methylation in the neurobiology of psychiatric disease, the next obvious question is whether this type of PTM could provide a target for a new generation of psychotropic therapeutics. In principle, KMTs and KDMs should provide fertile ground for the development of novel drugs, because these enzymes are considered more specific than, for example, HDACs, because each HDAC enzyme is likely to affect a much larger number of histone residues as compared to KMTs/KDMs [73]. However, like for other histone modifying enzymes, the specificity of KMTs and KDMs is not limited to histones but includes the (de)methylation of lysines of non-histone proteins, including the p53 tumor suppressor protein and the VEGF growth factor [74]. Druggable domains within the KMTs and KDMs could involve not only their catalytic sites, such as the SET domain for the KMTs or the amino oxidase and JmjC domains for the LSD1 and JMJD subtypes of KDMs, respectively, but also some of the many other functional domains that are specific to subsets of these proteins [75]. One potential candidate would be the bromodomain of the MLLs and other H3K4-specific methyltransferases [75]. Bromodomains, which are present in many different types of nuclear proteins, bind to acetylated histones and small molecules interfering with some of these interactions recently emerged as powerful modulators of systemic inflammation [76].

The catalytic activity of the SET domain containing KMTs requires the universal methyl donor, S-adenosyl-methionine (also known as AdoMET). Crystallographic and functional studies revealed that the SAM binding pocket of KMTs is different from the SAM pockets of other proteins, which may increase the chance to develop compounds which specifically target histone methyltransferases but not other enzymes and proteins [31]. Currently, however, no KMT or KDM related drug is in clinical trials. However, several of these compounds show therapeutic promise in preclinical studies. For example, the S-adenosylhomocysteine hydrolase inhibitor, 3-deazaneplanocin A (DZNep) induces apoptosis in breast cancer cells [77]. This drug alters H3K27 and H4K20 trimethylation via interference with polycomb PRC2 repressive chromatin remodeling [73]. Antioncogenic effects were also observed with BIX-01294, a drug that downregulates H3K9 methylation levels by binding to the SET domain of the G9a/GLP(EHMT1) methyltransferases [73]. The same drug was shown to alter addictive behaviors and H3K9 methylation when infused locally into the brain of cocaine-exposed mice [48]. As discussed above, while tranylcypromine and other monoamine oxidase inhibitors used for the treatment of depression are weak inhibitors of the LSD1 type of KDM, recently several compounds emerged with much stronger activity against LSD1/LSD2 [18]. It will be extremely interesting to explore these drugs in preclinical models for mood and psychosis spectrum disorders. Finally, microRNA-based therapeutic strategies, aimed at decreasing levels and expression of chromatin remodeling complexes, including some of the histone modifying enzymes discussed here, are gaining increasing prominence in the field of cancer therapy [73] and may in the future emerge as a novel therapeutic option in the context of neuropsychiatric disease.

Emerging Concept in DNA Methylation: Role of Transcription Factors in Shaping DNA Methylation Patterns
CLAIRE MARCHAL AND BENOIT MIOTTO*   Journal of Cellular Physiology Volume 230, Issue 4

DNA methylation in mammals is a key epigenetic modification essential to normal genome regulation and development. DNA methylation patterns are established during early embryonic development, and subsequently maintained during cell divisions. Yet, discrete site-specific de novo DNA methylation or DNA demethylation events play a fundamental role in a number of physiological and pathological contexts, leading to critical changes in the transcriptional status of genes such as differentiation, tumor suppressor or imprinted genes. How the DNA methylation machinery targets specific regions of the genome during early embryogenesis and in adult tissues remains poorly understood. Here, we report advances being made in the field with a particular emphasis on the implication of transcription factors in establishing and in editing DNA methylation profiles. J. Cell. Physiol. 230: 743–751, 2015.

DNA methylation is a well-studied epigenetic modification in mammalian genomes, discovered in 1948. It is involved in a number of essential cellular processes such as transcription regulation, cellular differentiation, cellular identity maintenance, X inactivation, gene imprinting, and the cellular response to environmental changes (Klose and Bird, 2006; Guibert and Weber, 2013; Smith and Meissner, 2013; Subramaniam et al., 2014). DNA methylation has proved to be a dynamic process, requiring continuous regulation and potentially having an important regulatory role for tissuespecific differentiation or cellular signaling. Indeed, the analysis of the distribution of DNA methylation at the genome scale, and nowadays at the single-base resolution, in different physiological and pathological states, unraveled that local changes in DNA methylation contribute to cell-type specific variation in gene expression. Furthermore, aberrant DNA methylation patterns are documented in a number of human diseases from Immunodeficiency, Centromere instability, and Facial anomalies (ICF) syndrome to cancer, and contribute to the onset or development of these diseases (Smith and Meissner, 2013; Weng et al., 2013; Subramaniam et al., 2014). Needless to say, these discoveries also fuel the promising idea that therapeutic strategies targeting DNA methylation can be used in the prevention and the treatment of cancer and other human diseases, including neuro-developmental disorders (Weng et al., 2013; Subramaniam et al., 2014). As an example, antipsychotic drugs clozapine and sulpiride, combined with histone deacetylase inhibitor valproate, have a beneficial action in schizophrenia and bipolar patients, maybe because they revert the aberrant DNA methylation status at GABAergic gene promoters (Dong et al., 2008). In 2004, 5-azacytidine (VidazaTM, Celgene Corporation, Summit, NJ). A drug blocking DNA methylation, received approval by the Food and Drug Administration for the treatment of myelodysplastic syndromes (Kaminskas et al., 2005).

Figure 1. Overview of the DNA methylation and demethylation pathway. (A) DNMT1 is responsible for the maintenance of DNA methylation during DNA replication. It recognizes hemi-methylated CpG, thanks to its interaction with co-factor UHRF1, and it adds methylation on the un-methylated strand. Black bubbles: methylated CpG. Empty bubbles: un-methylated CpG. (B) DNMT3A/B are responsible for de novo DNA methylation. They establish new patterns of methylation directly from unmethylated CpG-containing sequences. In the embryo, their activity is modulated by a catalytically inactive family member DNMT3L. (C) Passive demethylation occurs through loss of DNMT1/3 activity in actively dividing cells. Loss can be attributed to post-translational modifications, gene mutations, gene silencing or any other mechanism that will eventually lead to DNMT activity inhibition. (D) Active DNA demethylation is catalyzed by the TET family of enzymes. TET1, 2 and 3 can oxydate 5mC into 5hmC (represented in grey bubbles), and eventually oxidate 5hmC into 5-formylcytosine and 5-carboxy-cytosine. None of these bases is recognized by DNMTs causing loss of DNA methylation during DNA replication. In addition, these oxidated bases are recognized by the base-excision repair (BER) pathway and catalytically removed.

Figure 2. Summary of the nuclear factors and epigenetic marks involved in the maintenance of DNA methylation status in different regions of the genome. The table recapitulates our current knowledge on transcription factors, chromatin remodellers and histone marks contributing to the establishment of DNA methylation and its erasure. The information is presented according to genomic features, sharing common regulators, such as promoters/enhancers, tumor suppressor genes, germline gene promoters, imprinted regions, DNA repeats, and retroviral elements and peri-centromeric regions.

…….

t KAP1/DNMTs control the maintenance of DNA methylation, independently of DNA replication, on a number of genomic targets. Yet, DNMTs have been shown to be recruited onto the chromatin by other chromatin remodelers, such as SETDB1 or G9a, or secondary to gene silencing (Gibbons et al., 2000; Dennis et al., 2001; Guibert and Weber, 2013; Pacaud et al., 2014). Thus, only the identification of the full-spectrum of transcription factors involved in the regulation of DNA methylation will tell whether this function is predominantly confer to KRAB-ZNF factors. This systematic analysis might help understand why only a limited number of factors per family are involved in the shaping of DNA methylation. In the case of ZNF factors several explanations have been postulated. The resolution of the structure of the ZNF fingers of Zfp57 bound onto methylated DNA indicated that a specific amino-acid sequence in the DNA binding ZNF fingers might be required for the recognition and binding of methylated CpG sequences (Liu et al., 2012; BuckKoehntop and Defossez, 2013). Using this knowledge, researchers have postulated that ZNF factors containing this motif might likely contribute to shape DNA methylation profile (Liu et al., 2013). An alternative hypothesis rely on the observation that KRAB-ZNF factors are present uniquely in vertebrate genomes and have expanded quite dramatically in mammalian genomes. As DNA repeats sequences also quickly evolved in mammalian genomes, it is suggested that humanspecific KRAB-ZNF factors might primarily contribute in DNA repeats silencing (Lukic et al., 2014).

……

DNA methylation plays an important role in the control of gene expression and cell fate in mammals. Its regulation and function has been upon intense scrutiny since its discovery in mid-1900s. Yet, how DNA methylation patterns are established during embryogenesis, and edited in adult tissue, remains a matter of intense debate. Profiling of DNA methylation in many cell type, species and environmental set up indicates that the DNA methylation profile is thighly correlated with the cell type and its environment. As a consequence, de novo methylation and DNA demethylation events are not randomly distributed but are actually targeted to particular regulatory DNA elements in the genome, including promoters, enhancers or repeated DNAs. For this latter reason researchers have focused on the role of transcription factor in these DNA methylation events. Yet, it is also recognized that non-coding RNAs, short and long, contribute to the establishment and editing of DNA methylation profiles in mammals. Non-coding RNAs may directly interact and control methylation and demethylation activities and, as a consequence, the pattern of DNA methylation in the genome (Di Ruscio et al., 2013; Arab et al., 2014; Castro-Diaz et al., 2014; Molaro et al., 2014; Turelli et al., 2014). For instance, antisense long non-coding RNA TARID (TCF21 antisense RNA inducing demethylation), activates TCF21 expression by inducing promoter demethylation. TARID sequence is complementary to the sequence of the TCF21 promoter. Its transcription causes the anchoring of GADD45A (growth arrest and DNA-damageinducible, alpha), a regulator of DNA demethylation, at the TCF21 promoter and its subsequent chromatin remodelling (Arab et al., 2014). Understanding the interplay between noncoding RNAs and transcription factors in the establishment and the maintenance of DNA methylation is therefore an important challenge for the future.

The Expanding Role of MBD Genes in Autism: Identification of a MECP2 Duplication and Novel Alterations in MBD5, MBD6, and SETDB1

The methyl-CpG-binding domain (MBD) gene family was first linked to autism over a decade ago when Rett syndrome, which falls under the umbrella of autism spectrum disorders (ASDs), was revealed to be predominantly caused by MECP2mutations. Since that time, MECP2 alterations have been recognized in idiopathic ASD patients by us and others. Individuals with deletions across the MBD5 gene also present with ASDs, impaired speech, intellectual difficulties, repetitive behaviors, and epilepsy. These findings suggest that further investigations of the MBD gene family may reveal additional associations related to autism. We now describe the first study evaluating individuals with ASD for rare variants in four autosomal MBD family members, MBD5, MBD6, SETDB1, and SETDB2, and expand our initial screening in the MECP2 gene. Each gene was sequenced over all coding exons and evaluated for copy number variations in 287 patients with ASD and an equal number of ethnically matched control individuals. We identified 186 alterations through sequencing, approximately half of which were novel (96 variants, 51.6%). We identified seventeen ASD specific, nonsynonymous variants, four of which were concordant in multiplex families: MBD5 Tyr1269Cys, MBD6 Arg883Trp, MECP2 Thr240Ser, and SETDB1 Pro1067del. Furthermore, a complex duplication spanning the MECP2 gene was identified in two brothers who presented with developmental delay and intellectual disability. From our studies, we provide the first examples of autistic patients carrying potentially detrimental alterations in MBD6 and SETDB1, thereby demonstrating that the MBD gene family potentially plays a significant role in rare and private genetic causes of autism.

There is growing evidence of the involvement of the methyl-CpG-binding domain (MBD) genes in neurological disorders. To date, pathogenic mutations have been found in patients with clinical features along the autism continuum for two genes in this family, methyl-CpG-binding domain protein 5 (MBD5) and methyl-CpG-binding protein 2 (MECP2). Both genes carry an MBD domain, the unifying feature for the family that includes nine additional genes; BAZ1A, BAZ1B, MBD1,MBD2, MBD3, MBD4, MBD6, SETDB1 and SETDB2 (Roloff et al., 2003). The MBD genes are involved in a variety of functions, including chromatin remodeling (BAZ1A, BAZ1B, MBD1, MBD2, MBD3, and MECP2), DNA damage repair (BAZ1A and MBD4), histone methylation (SETBD1 and SETDB2), and X chromosome inactivation (MBD2, Roloff et al., 2003, Bogdanovic & Veenstra, 2009). There is also functional interplay among members of this family as they have been found to bind at the same promoter regions (MBD1, MBD2, MBD3, andMECP2), partner with each other in complexes (MBD1 and SETBD1), or act in the same complexes in a mutually exclusive manner (MBD2 and MBD3, Sarraf & I. Stancheva 2004; Ballestar et al., 2005; Le Guezennec et al., 2006; Matarazzo et al., 2007). Little is known thus far about the functions of MBD5 and MBD6; they each encode proteins that localize to chromatin but fail to bind methylated DNA (Laget et al., 2010).

One specific disorder in the autism spectrum, Rett syndrome, is caused almost exclusively by alterations in MECP2 (Amir et al., 1999). Due to the location ofMECP2 on the X chromosome, mutations in females can lead to Rett syndrome while males with the same genetic changes typically present with neonatal encephalopathy (Moretti & Zoghbi 2006). Further investigations have demonstrated that MECP2 misregulation can lead to a wide range of clinical features including autism, Angelman-like symptoms, mental retardation with or without infantile seizures, mild learning disabilities, and schizophrenia (Watson et al., 2001; Klauck et al., 2002; Carney et al., 2003; Shibayama et al., 2004;Coutinho et al., 2007; Harvey et al., 2007; Lugtenberg et al., 2009). Our group previously evaluated the MECP2 gene in a dataset of female ASD patients and identified two mutations reported in classic Rett syndrome patients; an Arg294X mutation and a 41 base pair deletion (Leu386fs) predicted to generate a truncated protein (Carney et al., 2003). Furthermore, while point mutations in MECP2 were first recognized to result in abnormal clinical phenotypes, increased expression of the wild type protein due to gene duplication also results in neurodevelopmental disorders (Meins et al., 2005; Van Esch et al., 2005; del Gaudio et al., 2006;Ramocki et al., 2009).

A second gene in the MBD family, MBD5, was tied to neurodevelopmental disorders following the identification of microdeletions on chromosome 2q22–2q23 (Vissers et al., 2003; Koolen et al., 2004; de Vries et al., 2005; Wagenstaller et al., 2007; Jaillard et al., 2008; van Bon et al., 2009; Williams et al., 2009; Chung et al., 2011; Talkowski et al., 2011; Noh & J. M. Graham Jr 2012). The minimal region for these nonrecurrent deletions covers only a single gene, MBD5 (van Bon et al., 2009; Williams et al., 2009; Talkowski et al., 2011). This suggests that the common features of ASDs, delayed or impaired speech, intellectual disability, epilepsy, and stereotypic hand movements found across microdeletion patients manifest due to a decreased expression of this critical gene (van Bon et al., 2009;Williams et al., 2009; Talkowski et al., 2011). Notably, two cases of individuals with duplications across the critical MBD5 region also present with autistic features and developmental delay (Chung et al., 2012). This demonstrates that precise regulation of both MBD5 and MECP2 must be maintained as either increased or decreased expression of each gene can result in a range of neurodevelopmental disorders.

Supplementing clinical evidence, mouse models have reiterated the potential significance of the MBD family in autism etiology. Mbd1 and Mecp2 null models have abnormal neurobehavioral phenotypes including increased anxiety, and impaired social interactions and synaptic plasticity (Guy et al., 2001; Shahbazian et al., 2002b; Zhao et al., 2003; Allan et al., 2008). Furthermore, a transgenicSetdb1 model established a link between this gene and behavior (Jiang et al., 2010a). Additionally, Setdb1 plays a role in the repression of Grin2b, a gene linked to autism, bipolar disorder, intellectual disability, and schizophrenia (Avramopoulos et al., 2007; Allen et al., 2008; Endele et al., 2010; Jiang et al., 2010a; Myers et al., 2011; O’Roak et al., 2011).

Studies have demonstrated that each of the MBD genes are expressed in the brain, while their specific functions having only been determined for a subset of genes (Shahbazian et al., 2002a, Bogdanovic & Veenstra, 2009, Jiang et al., 2010b, Laget et al., 2010, Safran et al., 2010). MeCP2 is a transcriptional regulator believed to act in neuronal maturation as levels increase over time (Shahbazian et al., 2002a,Chahrour et al., 2008). Stable levels of MeCP2 are required through adulthood, as elimination of this protein in adult mice mimics features seen in knockout Mecp2mice (McGraw et al., 2011). The H3K9 methyltransferase SETDB1 acts both in early development as well as later stages of life (Jiang et al., 2010a, Cho et al., 2012). Removal of Setdb1 in mice results in peri-implantation lethality (Dodge et al., 2004). Studies in the forebrain of transgenic Setdb1 mice demonstrate that it targets the NMDA receptors Grin2a and Grin2b as well as the glutamate receptorGrid2 (Yang et al., 2002, Jiang et al., 2010a).

While there is clinical evidence of MECP2 and MBD5 playing a role in autism, only two studies to date have evaluated patients with ASD for mutations in additional MBD family members (Li et al., 2005; Cukier et al., 2010). Previous work in our laboratory analyzed the coding regions of MBD1, MBD2, MBD3, andMBD4 in over 200 individuals with ASD of African and European ancestry and identified multiple variants that altered the amino acid sequence, were unique to patients with autism, and concordant with disease in multiplex families (Cukier et al., 2010). In contrast, a study by Li and colleagues was restricted to a dataset of 65 Japanese autistic patients and reported only a single variation that might be related to autism (Li et al., 2005). We now expand our initial study of MECP2 to a larger dataset that includes male patients and perform the first study evaluating patients with ASD for alterations in four additional MBD family members: MBD5MBD6, SETDB1 and SETDB2.

Sequencing across the five MBD genes in 287 patients with ASD and 288 ethnically matched control individuals identified a total of 186 unique variations (Table 1, Supplemental Tables 37). These variants included 177 single nucleotide polymorphisms (SNPs), five deletions and four insertions. Ninety (48.4%) of the variations have been previously reported in either the dbSNP 134 database (http://www.ncbi.nlm.nih.gov/projects/SNP/) or RettBASE (http://mecp2.chw.edu.au/), while the remaining 96 variants (51.6%) are novel. Fifty-six variations are predicted to alter the amino acid sequence. Fifty-three of the changes were found solely in patients with ASD and absent from controls. To determine variants most likely to contribute to ASD susceptibility, we prioritized changes that were either unique to affected individuals or that had an increased frequency in cases when compared to controls. The 17 most interesting variants were nonsynonymous and unique to our ASD population (Table 1). We utilized four distinct programs to characterize the variants; GERP (Cooper et al., 2005) and PhastCons (Siepel et al., 2005) to measure the level of amino acid conservation across species and PolyPhen (Adzhubei et al., 2010) and SIFT (Kumar et al., 2009) to predict which alterations might have the damaging consequences to protein function.

ASD Unique, Nonsynonymous Variations

The mutational burden between cases and controls of African or European ancestry for each gene was not statistically significant by the chi-squared test (Supplemental Table 8). This was determined for the overall load of all variants as well as nonsynonymous alterations (Supplemental Table 9).

MBD5

Thirty-two changes were identified in MBD5, 18 of which have been previously reported (Supplemental Table 3). A distinct set of 11 alterations were nonsynonymous, four of which were only identified in patients with ASD (Val443Met, Ile1247Thr, Tyr1269Cys, and Arg1299Gln, Figure 1A–D, Table 1). Three of these four alterations (75%) are predicted to be damaging by SIFT, as compared to only two of seven nonsynonymous variants (28.6%) identified solely in control individuals (Supplemental Table 3). One alteration of high interest, MBD5 Tyr1269Cys, was inherited paternally in all three ASD children in multiplex family 7763 (Figure 1C). Two of the affected individuals (0001 and 0100) were intellectually impaired with measured IQ in the moderate to severe range (Full Scale IQ: 40 and 50, respectively), while the remaining brother with autism (0101) had borderline intellectual functioning (Full Scale IQ=78). Furthermore, all three siblings had a delay in language and displayed self-injurious behaviors. Two individuals presented with macrocephaly (0100 and 0101), and individual 0100 has a history of epilepsy (recurrent non-febrile seizures).

Pedigrees of ASD families carrying alterations in MBD5 and MBD6

MBD6

A total of 44 alterations were detected in MBD6, two being single base pair insertions and the remainder of which were SNPs (Supplemental Table 4). Sixteen of the single nucleotide changes have been previously reported and 28 are novel. A subset of 17 alterations was identified only in individuals with ASD, seven of which are predicted to cause missense changes (Table 1, Figure 1E–K). While each of these changes was only identified in a single proband, three of the alterations have high PolyPhen and SIFT scores and are novel (Arg883Trp, Pro943Arg and Arg967Cys), suggesting a strong functional consequence. Furthermore, one of these alterations, Arg883Trp, was identified in multiplex family 7979 and passed maternally to both affected children (Figure 1I). Individual 0001 has a diagnosis of autism and is nonverbal with moderate intellectual disability. His sister (0100) has a diagnosis of Pervasive Developmental Disorder-Not Otherwise Specified and mild intellectual disability, displaying some phrase speech. Both siblings have a history self-injurious behavior. Their mother (1001), who also carries the alteration, was diagnosed with anxiety/panic disorders, depression, obsessive compulsive disorder, and has a history of epilepsy (adolescent onset seizures).

Along with novel variations of interest in MBD6, we found that two known SNPs occur at a higher frequency within our affected population compared to our control population. The first variation, rs61741508 (c.-2C>A), was recognized in sixteen patients with ASD and five controls and is located just upstream of the ATG start site in the Kozak consensus sequence. This variation also has high conservation scores (Supplemental Table 4). The second SNP, rs117084250 (c.2407-64C>T), falls within intron nine and was found in twelve individuals with ASD but only four controls. However, the conservation scores were relatively low, thereby making this a variant of lesser interest (Supplemental Table 4).

MECP2

Twenty-eight alterations were identified in MECP2 (Supplemental Table 5). Sixteen of these are currently in the dbSNP database and another one has been previously reported in RettBASE, leaving 11 novel variations. While none of the frequently recurring, classic Rett syndrome variations were identified in this study, there are two previously reported MeCP2 alterations of undetermined pathogenicity (Thr240Ser and Ala370Thr) that may cause clinical phenotypes. This first variation, MeCP2 Thr240Ser (rs61749738), was identified in two families of African ancestry (1072 and 17130) and absent in control individuals (Figure 2A,B). Further investigation into additional family members showed that the variation was inherited maternally in both cases and concordant with disease in multiplex family 1072. The second alteration, Ala370Thr (rs147017239), was also inherited maternally in a single proband of African ancestry (family18024, Figure 2C).

Pedigrees of ASD families carrying alterations in MeCP2

SETDB1

A total of 44 changes were found in SETDB1, comprised of 19 known and 25 novel alterations (Supplemental Table 6). Eight changes are predicted to be nonsynonymous, but only one of these, Pro1067del, was found solely in patients with ASD. This change is also the only ASD specific, nonsynonymous deletion identified in the entire study. The variant removes three nucleotides and predicts an in-frame deletion of a single amino acid. This deletion falls within the SET domain of the protein and was inherited maternally in both affected sons in family 17187 (Figure 3A).

Pedigrees of ASD families carrying alterations in SETDB1 and SETDB2

Another novel variation of interest in SETDB1 that we identified in a high proportion of cases versus controls, Pro529Leu, was identified in five ASD families of European ancestry and only a single control (Figure 3B–F). This variant was inherited paternally in one family and maternally in the remaining four families. In family 37265, the variation was passed from the father, who has dyslexia, to both the female proband with autism (0001) who was diagnosed with developmental and language delays as well as her brother (0100) who presented with ADHD, anxiety/panic disorder, language delay and macrocephaly (Figure 3E). In two of the families with maternal inheritance (17663 and 37673), the mothers presented with anxiety/panic disorder. In family 17663, the mother also presented with a history of seizures, sleep disorder and self-reported depression, while the mother in family 37673 reported history of adolescent onset Anorexia Nervosa. The increased incidence of this alteration in cases versus controls, along with neuropsychiatric and neurodevelopmental disorders in parents carrying the alteration, suggests that this variation may confer a variety of clinical consequences.

SETDB2

Thirty-eight single base pair alterations were identified in the SETDB2gene, 21 of which have been previously reported and the remaining 17 are novel (Supplemental Table 7). Eight SNPs are predicted to alter amino acids and three of these were unique to affected individuals: Ile425Thr, Thr475Met and Pro536Arg (Table 1, Figure 3C,G,H). However, these alterations are not predicted to have a highly detrimental effect on the protein and occur within singleton families, making it difficult to determine whether they may play a pathogenic role in ASD.

Along with isolating additional variations in MBD5 and MECP2 that may contribute to neuropsychiatric disease, this study is the first to report prospective pathogenic variations in MBD6 and SETDB1. These include two novel, nonsynonymous alterations in MBD6 (Arg883Trp and Pro943Arg) and one more in SETDB1 (Pro1067del). Furthermore, the MBD6 Arg883Trp and SetDB1 Pro1067del variations each segregated with ASD in the multiplex families. Potential for SETDB1 to play a role in neurobehavioral phenotypes is supported by results from transgenic Setdb1 mice demonstrating a role in mood behaviors (Jiang et al., 2010a).

To date, MBD5 mutations have been identified in individuals presenting a range of clinical phenotypes including ASD, developmental delay, intellectual disability, epilepsy, repetitive movements, and language impairments (Vissers et al., 2003;Koolen et al., 2004; de Vries et al., 2005; Wagenstaller et al., 2007; Jaillard et al., 2008; van Bon et al., 2009; Williams et al., 2009; Chung et al., 2011; Talkowski et al., 2011; Noh & Graham Jr 2012). These results suggest a significant role for theMBD5 isoform 1, which presents with increased expression in the brain (Laget et al., 2010). It has been estimated that between microdeletions and point mutations of MBD5, this gene may play a contributing genetic role in up to 1% of individuals with ASD (Talkowski et al., 2011). Of the nonsynonymous alterations identified in this study, ASD specific changes were more likely to be predicted to be damaging as compared to those variations found in control individuals (Supplemental Table 3). MBD5 Tyr1269Cys is a strong potentially pathogenic change due to its co-segregation with ASD in a multiplex family of three affected children, high conservation of this amino acid across species and altered function in the luciferase transcriptional activation assay. While this alteration does not fall in a known protein domain, it is specific to isoform 1, the isoform predominately expressed in brain (Laget et al., 2010). It seems likely that most alterations inMBD5 related to disease will be rare and unique, as the one alteration previously reported to have an increased frequency in patients with ASD, Gly79Glu, was only identified in a single control in the current study (Talkowski et al., 2011).

The role of MECP2 in developmental disorders is undisputed (Samaco & Neul 2011). Our study supports the possible pathogenicity of two specific MeCP2 alterations: Thr240Ser and Ala370Thr. The first variant, Thr240Ser was identified in two male probands from families of African ancestry, including the multiplex family 1072 where the variant segregated with ASD (Figure 2A,B). The maternal inheritance in family 17130 and presence of an unaffected carrier sister suggests that the variation may only present with a clinical phenotype in a hemizygous state. This variant falls within the transcriptional repression domain and has been previously reported in four studies; three cases of males with intellectual disability and one female with Rett syndrome (Yntema et al., 2002; Bourdon et al., 2003;Bienvenu & J. Chelly 2006; Campos et al., 2007; Bunyan & D. O. Robinson 2008). The second alteration, Ala370Thr, was identified in a singleton family of African ancestry and previously reported in three Chinese individuals: one female with Rett syndrome, her unaffected mother and a male presenting with epileptic encephalopathy (Figure 2C, Li et al., 2007; Wong & Li 2007). Both of these alterations must be further evaluated to isolate their potential functional consequences.

Finally, while we did identify variants of interest in four of the genes studied,SETDB2 alterations did not appear to be related to the occurrence of ASDs.

This is the first study to evaluate the coding regions of MBD5, MBD6, SETDB1, and SETDB2 for rare alterations in individuals with ASD. We identified novel point mutations predicted to be damaging and concordant with disease in multiplex families, as well as a complex duplication encompassing MECP2. Additional studies, ideally both in patients and animal models, are required to determine the precise consequences of these alterations. The results described here compound the evidence of MECP2 and MBD5’s involvement in ASDs and neurodevelopmental disorders and provide the first examples of autistic patients carrying potentially detrimental alterations in MBD6 and SETDB1. This study demonstrates the expanding role MBD genes play in autism etiology.

PI3K/Akt: getting it right matters

T F Franke1          Oncogene (2008) 27, 6473–6488;      http://dx.doi.org:/10.1038/onc.2008.313

The Akt serine/threonine kinase (also called protein kinase B) has emerged as a critical signaling molecule within eukaryotic cells. Significant progress has been made in clarifying its regulation by upstream kinases and identifying downstream mechanisms that mediate its effects in cells and contribute to signaling specificity. Here, we provide an overview of present advances in the field regarding the function of Akt in physiological and pathological cell function within a more generalized framework of Akt signal transduction. An emphasis is placed on the involvement of Akt in human diseases ranging from cancer to metabolic dysfunction and mental disease.

The molecular mechanisms of Akt regulation are summarized in Figure 1.

Figure 1.

Canonical schematic depicting the present state of our understanding of Akt activation and regulation of downstream biological responses. Autophosphorylation of RTKs induces the recruitment of p85 regulatory subunits leading to PI3K activation. Once activated, p110 catalytic subunits phosphorylate plasma membrane-bound phosphoinositides (PI-4-P and PI-4,5-P2) on the D3-position of their inositol rings. The second messengers resulting from this PI3K-dependent reaction are PI-3,4-P2and PI-3,4,5-P3 (also called PIP3). PIP3, in turn, is the substrate for the phosphoinositide 3-phosphatase PTEN, an endogenous inhibitor of PI3K signaling in cells. The phosphoinositide products of PI3K form high-affinity binding sites for the PH domains of intracellular molecules. PDK1 and Akt are two of the many targets of PI3K products in cells. Following binding of the Akt PH domain to PI3K products, Akt is phosphorylated by PDK1 on a critical threonine residue in its kinase domain. mTORC2 is the main kinase activity that through phosphorylation of a C-terminal HM serine residue locks Akt enzyme into an active conformation. Other kinases such as DNA-PK and ILK1 are also capable of phosphorylating Akt at the HM site but may do so in a cell- or context-dependent manner. Akt activation is blunted by phosphatases including PP2A and PHLPP that inhibit Akt activity by dephosphorylation. Studies examining Akt-interacting proteins such as CTMP or second messengers such as Ins(1,3,4,5)P4 suggest that this common pathway of Akt regulation may be further specified within certain functional contexts or during development. Once activated, Akt activation is channeled into a plethora of downstream biological responses reaching from angiogenesis, cell survival, proliferation, translation to metabolism.

Full figure and legend (367K)

…….

Consequences of Akt activation include diverse biological responses, ranging from primarily metabolic functions such as glucose transport, glycolysis, glycogen synthesis and the suppression of gluconeogenesis to protein synthesis, increased cell size, cell-cycle progression and apoptosis suppression.

Insights into the molecular consequences of increased Akt activation were derived from seminal studies that ultimately identified the ‘orphan’ proto-oncogene as an obligate intermediate downstream of PI3K in the insulin-dependent metabolic control of glycogen synthesis. When searching for kinases that could regulate GSK3, the groups of Brian Hemmings and Phil Cohen realized that Akt inhibited GSK3 activity in an insulin-stimulated and PI3K-dependent manner by direct phosphorylation of an N-terminal regulatory serine residue (Cross et al., 1995). By systematically permutating the amino-acid sequence surrounding the Akt phosphorylation site in GSK3, Alessi et al. (1996b) derived an optimal peptide sequence for Akt phosphorylation (R-X-R-X-X-S/T; where R is an arginine residue, S is serine, T is threonine and X is any amino acid). This Akt consensus motif is a common feature of known substrates of Akt, and its presence predicts reasonably well whether a given protein may be phosphorylated by Akt enzyme in vitro (for review, see Manning and Cantley, 2007). Experiments using randomized permutations on the basis of the motif to optimize substrate peptides have defined the requirement for optimal phosphorylation by Akt further (Obata et al., 2000). The preferred phosphoacceptor for Akt-dependent phosphorylation is a serine residue, but a synthetic substrate peptide with a threonine residue as the phosphoacceptor instead (R-P-R-A-A-T-F; P=proline, A=alanine, F=phenylalanine) is also easily phosphorylated. For achieving optimal phosphorylation efficiency, the phosphoacceptor is best followed by a hydrophobic residue with a large side-chain in the p+1 position, and preceded by a serine or threonine at the p−2 position.

One of the first targets of Akt to be identified that has direct implications for regulating cell survival is the pro-apoptotic BCL2-antagonist of death (BAD) protein. BAD regulation by Akt has exemplified the molecular pathways linking survival factor signaling to apoptosis suppression (for review, see Franke and Cantley, 1997). When BAD is not phosphorylated, it will inhibit Bcl-xL and other anti-apoptotic Bcl-2 family members by direct binding of its Bcl-2 homology domain to their hydrophobic grooves (Gajewski and Thompson, 1996). Once phosphorylated, these phospho-serine residues of BAD form high-affinity binding sites for cytoplasmic 14-3-3 molecules. As a result, phosphorylated BAD is retained in the cytosol where its pro-apoptotic activity is effectively neutralized (Zha et al., 1996). The importance of BAD as an integration point of survival signaling is underscored by the fact that it is a substrate for multiple independent kinase pathways in cells, not all of which phosphorylate BAD at the same site(s) as Akt (Datta et al., 2000). The mechanisms of 14-3-3-dependent regulation of BAD function hereby resemble the Akt-dependent inhibition of FoxO transcription factors that regulate the transcription of pro-apoptotic genes (Brunet et al., 1999).

The function of Akt extends beyond maintaining mitochondrial integrity to keep cytochrome c and other apoptogenic factors in the mitochondria (Kennedy et al., 1999). Akt activity also mitigates the response of cells to the release of cytochrome c into the cytoplasm. Although caspase-9 is an Akt substrate in human cells, where it may explain cytochrome c resistance (Cardone et al., 1998), it may not be the only, or even the most important, target because Akt-dependent cytochrome c resistance can be observed in animal species where caspase-9 lacks a potential Akt phosphorylation site (Fujita et al., 1999; Zhou et al., 2000). Not surprisingly, other components of the post-mitochondrial machinery such as the X-linked inhibitor of apoptotic proteins (XIAP) have been suggested as potential Akt substrates (Dan et al., 2004).

Another important class of Akt targets are proteins involved in the stress-activated/mitogen activated protein kinase (SAPK/MAPK) cascades. Growing experimental evidence points to a close functional relationship between the Akt survival pathway and SAPK/MAPK cascades that are activated by various cellular stresses and are linked to apoptosis. Increased Akt activity has been shown to suppress the JNK and p38 pathways (Berra et al., 1998; Cerezoet al., 1998; Okubo et al., 1998). It has been shown that apoptosis signal-regulating kinase 1 (ASK1) is regulated by Akt and contains an Akt-specific phosphorylation site (Kim et al., 2001). These findings have been confirmed independently by other groups (Yuan et al., 2003; Mabuchi et al., 2004). Thus, ASK1 is likely to be one of the points of convergence between PI3K/Akt signaling and stress-activated kinase cascades, although probably not the only one. Akt also phosphorylates the small G protein Rac1 (Kwon et al., 2000), the MAP2K stress-activated protein kinase kinase-1 (SEK1; also known as JNKK1 or MKK4) (Park et al., 2002) and the MAP3K mixed lineage kinase 3 (MLK3) (Barthwal et al., 2003; Figueroa et al., 2003). Using yeast-2-hybrid screens to identify interacting partners for Akt kinases, Figueroa et al. (2003) found binding of Akt2 to the JNK adaptor POSH. These authors showed that the binding of Akt2 to POSH results in an inhibition of JNK activity, and that this inhibition is mediated by phosphorylation of the upstream kinase MLK3 and leads to the disassembly of the JNK signaling complex. In turn, POSH is also an Akt substrate (Lyons et al., 2007). Taken together, these findings point to an intriguing model for the regulation of the JNK pathway by Akt, in which the Akt-dependent phosphorylation of specific components can block signal transduction through the stress-regulated kinase cascade. In spite of this, it has been reported that Akt also blocks the pro-apoptotic activity of other MAP3Ks such as MLK1 and MLK2 that act in parallel to MLK3 but do not contain a typical Akt consensus phosphorylation motif (Xu et al., 2001). Thus, phosphorylation-based mechanisms may be limited in explaining the role of Akt in blocking JNK signaling.

Although many of its substrates are involved in clearly defined biological functions within a circumscribed context such as cell proliferation, a more thorough analysis of Akt signaling has suggested that the boundaries between metabolic processes and apoptosis suppression may be artificial. For example, the Akt target GSK3 has been implicated both in the regulation of glucose metabolism and cell survival (Pap and Cooper, 1998). These findings suggest that the distinctions between cell growth, survival, metabolism and apoptosis regulation do not properly reflect functional interactions between concurrent biological processes in cells. This shift in perception has been fueled by studies from the Korsmeyer laboratory that have demonstrated a canonical function for the pro-apoptotic Bcl-2 family member BAD in the regulation of glucokinase activity (Danial et al., 2003). It is conceivable that findings of PKA-dependent regulation of BAD in glucose metabolism can be extrapolated to BAD inhibition by Akt. Still, a formal confirmation for a role of Akt in this process has yet to be presented (for review, see Downward, 2003).

The critical importance of Akt signaling for neuronal function is implied from several lines of in vitro evidence using neuronal cell lines and dispersed primary neuronal cultures that have demonstrated a requirement for Akt in the protection against trophic factor deprivation, oxidative stress and ischemic injury (Dudek et al., 1997; Salinas et al., 2001; Noshita et al., 2002). Dysregulation of Akt activity is observed in neurodegenerative diseases including Alzheimer’s disease (Rickle et al., 2004; Ryderet al., 2004), Parkinson’s disease (Hashimoto et al., 2004) and Huntington’s disease (Humbert et al., 2002), and it is also associated with the pathobiological mechanisms underlying spinocerebellar ataxia (Chen et al., 2003). A mechanistic involvement of impaired Akt signaling in neurodegeneration is further supported by the Akt-dependent phosphorylation of the disease-related proteins huntingtin (Humbert et al., 2002) and ataxin (Chen et al., 2003).

Other studies suggest that the involvement of Akt in brain function extends beyond the protection of neuronal cells against apoptotic insults. Indeed, pathological changes in Akt signal transduction have been described that are associated with mental diseases. Significantly decreased Akt1 expression has been reported in patients suffering from familial schizophrenia (Emamian et al., 2004). Decreased Akt1 levels are correlated with increased GSK3 activity, presumably because of the lack of the Akt-dependent inhibitory input on GSK3. In support of AKT1 being a susceptibility gene for schizophrenia, Akt1(−/−) mice exhibit increased sensitivity to the sensorimotor disruptive effect of amphetamine, which is partly reversed by the treatment of mutant mice with the antipsychotic drug haloperidol (Emamian et al., 2004). Additional support for a contribution of impaired Akt signaling in the pathogenesis of schizophrenia derives from the finding of mutant PI3K signaling in schizophrenia (for review, see Arnold et al., 2005). A direct involvement of Akt in dopaminergic action is indicated by the observation that Akt1(−/−) mutant mice exhibit a behavioral phenotype resembling enhanced dopaminergic transmitter function (Emamian et al., 2004). By interacting with the GSK3 pathway, Akt modulates the suppression of dopamine (DA)-associated behaviors after treatment with the mood stabilizer lithium (Beaulieu et al., 2004). Furthermore, a β-arrestin 2-mediated kinase/phosphatase scaffold of Akt and protein phosphatase A (PP2A) is required for the regulation of Akt downstream of DA receptors (Beaulieu et al., 2005). Still, the role of Akt in dopaminergic responses by far exceeds actions downstream of DA receptors: the insulin-dependent regulation of DA transporter also depends on Akt activity (Garcia et al., 2005).

Since the field of Akt signaling in psychiatric disorders is still emerging, it may be too early to speculate about the molecular involvement of Akt in regulating higher brain function. Possible functional outlets of Akt include some of the substrates mentioned above, including mTORC1 and GSK3. In addition, substrates of Akt related to synaptic plasticity and transmission have been described. One such novel substrate of Akt related to neuronal excitability is the β2-subunit of the type A γ-aminobutyric acid receptor (GABAA-R) (Lin et al., 2001). In support of a direct involvement of Akt in synaptic function, studies directed at working memory performance performed in Akt1(−/−) mice (Lai et al., 2006) and in healthy individuals carrying the AKT1 coding variation observed in familial schizophrenia (Tan et al., 2008) find a strong correlation with cognitive performance. Additional roles for Akt in higher brain function are suggested by studies from the Nestler laboratory that have explored the IRS2-Akt pathway during the development of tolerance to opiate reward (Russo et al., 2007). By using viral-mediated gene transfer to express mutant Akt in midbrain neurons, these authors demonstrate that downregulation of the IRS2-Akt pathway mediates morphine-induced decreases in cell size of DA neurons in brain regions that are critically involved in the reward circuitry and affected in individuals addicted to drugs of abuse.

Finally, TSC patients show an increased incidence of autism spectrum disorders (ASD) ranging from 25 to 50% (for review, seeWiznitzer, 2004). Individuals with macrocephaly due to Lhermitte–Duclos disease are prone to ASD and show a pronounced incidence of mutations in the PTEN tumor suppressor gene (Butler et al., 2005). Additional experimental support for a possible involvement of PTEN/Akt in ASD is provided by data from the Parada laboratory examining the morphology and behavior of mutant mice with neuron-specific knockout of PTEN (Kwon et al., 2006). Future studies will be required to clarify the function of Akt in cognition and characterize the underlying molecular mechanisms. In spite of this, these initial studies suggest a complex function of Akt in conditions affecting brain function and mental health.

The emerging involvement of Akt in higher brain function is summarized in Figure 2.

Figure 2.

Akt kinase regulates diverse aspects of neuronal cell function. Akt activation in neuronal cells follows similar mechanisms to those outlined in Figure 1, including activation of PI3K by RTKs including IGF-1/insulin and nerve growth factor (BDNF/NT-3) receptors. Other mechanisms governing Akt activity in neuronal cells include G-coupled receptors for the monoamine neurotransmitters serotonin (5-HT) (for review, see Raymond et al., 2001) and dopamine (DA) (for review, see Beaulieu et al., 2007). Depending on receptor type (D2-DA and 5-HT1A receptors vs D1-DA receptors), binding of 5-HT or DA decreases or increases the activity of adenylyl cyclase (AC), respectively. Changes in cAMP second messenger levels, in turn, alter PKA and PP2A activity. PP2A is inhibited by increased PKA activity, thus maintaining Akt in an activated state after 5-HT1Areceptor simulation (Hsiung et al., 2008). After binding of DA to D2-DA receptor, following initial inhibition of AC, a secondary internalization complex is formed between β-arrestin 2, PP2A and Akt leading to the inhibition of Akt. In neuronal cells, activated Akt regulates diverse targets that have been implicated in the regulation of protein translation and cell size (mTORC1), axonal outgrowth (GSK3), apoptosis suppression (BAD) and synaptic plasticity (GABAA-R). Details regarding functional consequences of Akt regulation for higher brain function are discussed in the text.

Full figure and legend (306K)

When considering the present understanding of all the signals leading to and from Akt, we face a growing complexity that is in part compounded by the intersection of multiple signaling cascades. Many substrates of Akt are shared with other kinases that have similar specificities. Moreover, signals originating from activated Akt do not simply lead to changes in the biological activity of specific downstream substrates, but affect entire signaling networks. In spite of this, there is hope that there is order to the far-reaching physiological involvement of Akt. One possibility is that differential regulation of the binding partners of Akt may determine cell- and context-specific signaling by Akt. Studies are now needed to elucidate the physiological functions of the binding partners of Akt in mammalian physiology.

A second challenge that the field is facing arises from the involvement of Akt in multiple areas of physiology. These now exceed cancer and diabetes and, as briefly outlined above, include higher brain functions related to cognition.

SETDB1 in Early Embryos and Embryonic Stem Cells

Yong-Kook Kang

The histone methyltransferase SETDB1 contributes to the silencing of local chromatin and the target specificity appears to be determined through various proteins that SETDB1 interacts with. This fundamental function endows SETDB1 with specialized roles in embryonic cells. Keeping the genomic and transcriptomic integrity via proviral silencing and maintaining the pluripotency by repressing the differentiation-associated genes have been demonstrated as the roles of SETDB1 in embryonic stem cells. In early developing embryos, SETDB1 exhibits characteristic nuclear mobilizations that might account for its pleiotropic roles in these rapidly changing cells as well. Early lethality of SETDB1-null embryos, along with other immunolocalization findings, suggests that SETDB1 is necessary for reprogramming and preparing the genomes of zygotes and pluripotent cells for the post-implantation developmental program.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3706629/

Exome sequencing of probands with autism have revealed broadly similar results:de novo mutations in a large set of genes occur in a significant fraction of patients, with relatively high OR’s for damaging mutations in genes expressed in the brain9,1921. Most interestingly, CHD8, which like CHD7 reads H3K4me marks, is frequently mutated in autism22, raising the question of whether the H3K4me pathway may play a role in many congenital diseases. Among 249 protein-alteringde novo mutations in CHD (Supplementary Table 4) and 570 such mutations in autism9,19,20,23, there were two genes, CUL3 and NCKAP1, with damaging mutations in both CHD and autism and none in controls (P = 0.001, Monte Carlo simulation), and several others with mutations in both (e.g., SUV40H1 and CHD7). Similarly, rare copy number variants at 22q11.2, 1q21, and 16p11 are found in patients with autism, CHD or both diseases2426. These observations suggest variable expressivity of mutations in key developmental genes. Identification of the complete set of these developmental genes and the full spectrum of the resulting phenotypes will likely be important for patient care and genetic counseling.

Context-specific microRNA function in developmental complexity

Adam P. Carroll1,2Paul A. Tooney1,2 and Murray J. Cairns1,2,*      http://jmcb.oxfordjournals.org/content/early/2013/03/01/jmcb.mjt004.full     J Mol Cell Biol (2013)    doi: 10.1093/jmcb/mjt004

Since their discovery, microRNAs (miRNA) have been implicated in a vast array of biological processes in animals, from fundamental developmental functions including cellular proliferation and differentiation, to more complex and specialized roles such as longterm potentiation and synapse-specific modifications in neurons. This review recounts the history behind this paradigm shift, which has seen small non-coding RNA molecules coming to the forefront of molecular biology, and introduces their role in establishing developmental complexity in animals. The fundamental mechanisms of miRNA biogenesis and function are then considered, leading into a discussion of recent discoveries transforming our understanding of how these molecules regulate gene network behaviour throughout developmental and pathophysiological processes. The emerging complexity of this mechanism is also examined with respect to the influence of cellular context on miRNA function. This discussion highlights the absolute imperative for experimental designs to appreciate the significance of context-specific factors when determining what genes are regulated by a particular miRNA. Moreover, by establishing the timing, location, and mechanism of these regulatory events, we may ultimately understand the true biological function of a specific miRNA in a given cellular environment.

It was once considered the central dogma of molecular biology that gene expression was regulated in a unidirectional manner whereby cellular instructions were encoded in DNA to be transcribed to produce RNA, which simply acted as a messenger molecule to produce the protein end-products that executed these cellular instructions. In fact, signs of a biological phenomenon whereby non-protein-coding RNA molecules could interfere with this very process were not even realized until the 1970s and early 1980s, when exogenous oligonucleotides complementary to ribosomal RNA were found to interfere with ribosome function (Taniguchi and Weissmann, 1978; Eckhardt and Luhrmann, 1979;Jayaraman et al., 1981). A number of experiments in both prokaryotes and eukaryotes further supported the notion of antisense RNA as an antagonist to RNA function (Chang and Stoltzfus, 1985; Ellison et al., 1985; Harland and Weintraub, 1985; Izant and Weintraub, 1985; Melton, 1985), and one such experiment elegantly demonstrated that the introduction of synthetic oligonucleotides complementary to 3′- and 5′-terminal repeats of Rous sarcoma virus 35S RNA not only attenuated viral replication and cell transformation, but also inhibited viral RNA translation in vitro (Stephenson and Zamecnik, 1978; Zamecnik and Stephenson, 1978).

In addition to this, the successful inhibition of thymidine kinase gene expression by antisense RNA in eukaryotic cells precipitated the concept of antisense RNA not only as an experimental tool, but also as a therapeutic design (Izant and Weintraub, 1984). Determining the functionality of a previously identified gene sequence without identifying, isolating, or characterizing the protein product; interfering with RNAs that are never translated; and silencing the expression of disease-associated transcripts in a sequence-specific manner: these were some very appealing prospects. By the late 1980s and early 1990s, a variety of techniques had evolved in the field of molecular and applied genetics whereby various antisense DNA and RNA construct designs were employed to efficiently downregulate target gene expression (Fire et al., 1991).

Meanwhile, the scientific community was also beginning to appreciate a role for endogenous antisense RNA. Short antisense transcripts were found to form an RNA–RNA duplex with the 5′ end of the replication primer of the ColE1 plasmid (Tomizawa et al., 1981; Tomizawa and Itoh, 1982). Endogenous antisense RNA control in prokaryotes was also linked with various biological processes such as plasmid replication, transposition, temporal bacteriophage development, and catabolite repression in bacteria (Light and Molin, 1983; Simons and Kleckner, 1983; Kumar and Novick, 1985). Evidence was also beginning to mount to implicate antisense control mechanisms in eukaryotic organisms (Adeniyi-Jones and Zasloff, 1985; Farnham et al., 1985; Heywood, 1986; Spencer et al., 1986; Williams and Fried, 1986; Stevens et al., 1987), including the demonstration that antisense transcripts in the bovine papillomavirus type 1 (BPV-1) genome prevented episomal replication (Bergman et al., 1986).

It was only a matter of time before phenomena of gene silencing began to unfold in animals. Previous work in the 1980s with Caenorhabditis elegans had established that mutations in the genes for lin-4, lin-14, lin-28, lin-29, and lin-41 altered the heterochronic lineage of developing larvae, resulting in a failure to control temporal aspects of post-embryonic development (Chalfie et al., 1981; Ambros and Horvitz, 1984; 1987; Ambros, 1989); thus, these genes were referred to as being ‘heterochronic’. However, in 1993 it was discovered that lin-4 was located within an intron and was thus unlikely to encode a protein. More significantly, two lin-4 transcripts ∼22 and 61 nucleotides in length were identified that exhibited complementarity to a repeat sequence element in the 3′ untranslated region (UTR) of lin-14 mRNA (Lee et al., 1993). With another report soon replicating this finding in C. elegans andCaenorhabditis briggsae (Wightman et al., 1993), the notion was set forth that the 22-nucloetide lin-4 transcript represented an active mature form of the 61-nucelotide transcript and functioned to control worm larval development by binding to the 3′-UTR of lin-14, thereby negatively regulating its function via an antisense RNA–RNA interaction. Furthermore, lin-4 exhibited complementarity to seven regions within the 3′-UTR of lin-14, demonstrating that gene expression was more potently inhibited as more of these non-coding transcripts bound to the mRNA (He and Hannon, 2004). Retrospectively, we can identify the lin-4 gene in C. elegans as the pioneer of a new class of small, non-coding RNAs called microRNA (miRNA) (Lee et al., 1993), which utilize the RNA interference (RNAi) pathway to regulate the expression of protein-encoding genes at post-transcriptional level (He and Hannon, 2004).

The following few years were somewhat quiet at the forefront of miRNA research, with lin-4 mechanism assumed to be a unique event. Meanwhile, RNAi was coming to prominence in 1998 with Fire and Mello (along with their colleagues) reporting double-stranded RNA (dsRNA) to be far more potent at mediating gene suppression in C. elegans than single-stranded antisense RNA (Fire et al., 1998). Interestingly, only small quantities of dsRNA were required to induce post-transcriptional gene silencing (PTGS), and it was hypothesized that an endogenous catalytic or amplification component was mediating mRNA degradation prior to translation (Montgomery et al., 1998). RNAi was soon thereafter reported as an ATP-dependent process in an in vitroDrosophila embryo lysate system where dsRNA was processed into 21–23-nucleotide species that appeared to guide sequence-specific mRNA cleavage (Zamore et al., 2000). When dsRNA was shown by the Tuschl laboratory to be processed into 21–22-nucleotide short interfering RNA (siRNA) by a ribonuclease III enzyme to mediate sequence-specific RNAi in human embryonic kidney HEK-293 cells, the prospect was set forth for exogenous 21–22-nucleotide siRNA to be developed as gene-specific therapeutic molecules (Elbashir et al., 2001a).

With incredible excitement surrounding the implications of RNAi, Ruvkun and colleagues discovered a second miRNA inC. elegans in 2000. Like lin-4, the newly discovered let-7 exhibited complementarity to the 3′-UTR of heterochronic genes, in this case lin-14, lin-28, lin-41, lin-42, and daf-12 (Reinhart et al., 2000). Moreover, they discovered that let-7 was highly conserved in its temporal regulation across phylogeny (Pasquinelli et al., 2000), refuting the widely believed concept that lin-4 and let-7 were a worm-specific oddity and propelling miRNA to significance as native endogenous clients of the RNAi machinery. This catalysed intense genome-wide searches for the discovery of more endogenous small regulatory RNAs in numerous species, to the point that miRBase Release 19 currently contains sequence data for 25141 mature miRNA products in 193 organism species (Kozomara and Griffiths-Jones, 2011).

The significance of non-coding RNA was further illuminated in 2001 when the completion of the human genome project revealed that <2% of the human genome encoded proteins (Lander et al., 2001). It has been realized that the ratio of non-coding to protein-coding DNA in the genome correlates with developmental complexity (Mattick, 2004), and a recent publication has reported on the exponential correlation of miRNA gene number and 3′-UTR length—but not 5′-UTR or coding sequence length—with morphological complexity in animals (Chen et al., 2012). This was measured according to the number of cell types within each organism, and also confirmed earlier observations that 3′-UTR length in housekeeping genes has remained short across organisms, thereby minimizing miRNA-binding site potential and reducing the complexity with which these constitutively expressed genes are regulated (Stark et al., 2005). Today we certainly have a stronger appreciation for RNA molecules to function not only as messengers of protein production, but also as complex regulatory molecules facilitating the intricate control of gene expression required for developmental complexity (Kosik, 2009).

Mechanisms of miRNA function

When considering non-coding RNA function, miRNAs constitute one of the largest classes of endogenous, non-coding regulatory RNA molecules in animals. In their mature form they are ∼19–22 nucleotides in length, and they interact via Watson–Crick binding with regions of complementarity primarily within the 3′-UTR of mRNA transcripts. In doing so, miRNAs act as sequence-specificity guides for the RNAi machinery to mediate repression of target gene expression at post-transcriptional level by negatively regulating mRNA stability and/or protein translation.

miRNA biogenesis

miRNAs are typically transcribed by RNA polymerase II (pol II) as long primary miRNA (pri-miRNA) transcripts, which undergo sequential cleavage into a precursor miRNA (pre-miRNA) transcript before being cleaved again into the mature miRNA duplex (Figure 1). These pri-miRNA transcripts range in length from several hundred nucleotides to several kilobases, can contain either a single miRNA or clusters of several miRNAs, and originate from intronic regions of protein-coding and non-coding genes, as well as from intergenic and exonic regions (Rodriguez et al., 2004; Saini et al., 2007). The microprocessor complex is responsible for mediating pri-miRNA cleavage, with the dsRNA-binding protein DGCR8 (DiGeorge syndrome critical region gene 8) binding the pri-miRNA and positioning the catalytic site of Drosha—a ribonuclease III (RNase III) dsRNA-specific endonuclease—11 nucleotides from the base of the duplex stem to mediate nuclear processing to the pre-miRNA transcript (Denli et al., 2004; Han et al., 2006). This produces a pre-miRNA hairpin typically 55–70 nucleotides in length with a two-nucleotide 3′ overhang, characteristic of RNase III-mediated cleavage (Lee et al., 2003). This two-nucleotide overhang facilitates the subsequent exportation of the pre-miRNA from the nucleus to the cytoplasm by a RanGTP/Exportin5-dependent mechanism and is suspected to also facilitate subsequent cleavage by the RNase III endonuclease Dicer (Yi et al., 2003; Bohnsack et al., 2004; Lund et al., 2004). This cleavage requires the interaction of Dicer with the dsRNA-binding protein TRBP [HIV-1 transactivating response (TAR) RNA-binding protein] (Forstemann et al., 2005), and as a result of Dicer processing the terminal base pairs and the loop of the pre-miRNA are excised. This produces a 19–22-nucleotide mature miRNA duplex, which possess two-nucleotide overhangs at each 3′ end (Lee et al., 2002).

http://jmcb.oxfordjournals.org/content/early/2013/03/01/jmcb.mjt004/F1.medium.gif

Figure 1

A model for canonical miRNA biogenesis and function in animals. After their transcription by RNA polymerase II, pri-miRNAs are cleaved in the nucleus by Drosha, which forms a microprocessor complex with DGCR8. This generates the pre-miRNA, which is actively exported into the cytoplasm via a RanGTP/Exportin 5-dependent mechanism. In the cytoplasm, Dicer binds the base of the pre-miRNA stem defined in the nucleus by Drosha. Dicer cleavage liberates a mature miRNA duplex that exhibits imperfect complementarity. This miRNA duplex is assembled into the miRISC loading complex, in which the passenger strand is discarded. The miRNA guides the mature miRISC to regions of complementarity within mRNA transcripts, thereby mediating post-transcriptional gene silencing through translational repression and/or mRNA degradation.

After their maturation into small RNA duplexes, miRNAs are loaded into ribonucleoprotein (RNP) complexes, often referred to as miRNA-induced silencing complexes (miRISCs), RISCs, or miRNPs. The signature components of each miRISC are the miRNA and an Argonaute (AGO) protein. In humans, there are four AGO proteins (AGO1-4), each consisting of the highly conserved P-element-induced wimpy testes (PIWI), middle (MID), and PIWI-AGO-Zwille (PAZ) domains, along with a less-conserved terminal domain. The loading of the miRNA into this protein complex has been proposed to occur in tandem with Dicer-mediated miRNA maturation (Gregory et al., 2005;Maniataki and Mourelatos, 2005) and requires ATP hydrolysis with additional chaperone proteins to create an open conformation to facilitate loading of the miRNA duplex (Liu et al., 2004; Yoda et al., 2010).

A key feature of miRNA is that while both strands of a small RNA duplex are capable of activating the miRISC, typically only one strand will induce silencing (Khvorova et al., 2003). This asymmetry is primarily governed by the relative thermodynamic properties of the RNA duplex, such that the miRISC-associated helicase preferentially unwinds the miRNA duplex from the end with least resistance in terms of inter-strand hydrogen bonding. The strand with its 5′ end at this less thermodynamically stable end is selected as the guide strand, and proteins such as TRBP or protein kinase, interferon-inducible dsRNA-dependent activator (PACT) are proposed to interact with Dicer to sense this thermodynamic asymmetry (Schwarz et al., 2003; Noland et al., 2011). In doing so, the guide strand is retained in the miRISC, while the other strand (the passenger, or the miRNA* strand) is discarded (Hutvagner, 2005; Matranga et al., 2005). miRNA strand selection also appears to be independent of Dicer processing polarity (Preall et al., 2006), where both ends of a duplex have similar thermodynamic properties, both the miRNA and miRNA* act as the guide strand with similar frequencies (Schwarz et al., 2003). However, strand selection does not always occur according to the axiom of thermodynamic strand asymmetry, with tissue-specific factors appearing to play a role in enabling both the miRNA and miRNA* strands to co-accumulate and function as the guide strand (Ro et al., 2007). For this reason, miRNA nomenclature has advanced beyond the miRNA* system, with the adoption of miRNA-5p and -3p names to indicate whether the mature miRNA sequence is derived from the 5′ or 3′ end of the pre-miRNA transcript.

Once the mature miRNA strand has been isolated in the mature miRISC, the AGO protein functions as an interface for the miRNA to interact with its mRNA targets. Recent characterization of human AGO2 has revealed that the 3′ hydroxyl of the miRNA inserts into a hydrophobic pocket of AGO such that the terminal nucleotide stacks against the aromatic ring of a conserved phenylalanine residue in the AGO PAZ domain (Jinek and Doudna, 2009). Meanwhile, the MID domain forms a binding pocket that anchors the miRNA 5′ phosphate such that this terminal nucleotide is distorted and does not interact with the target mRNA (Ma et al., 2005; Parker et al., 2005).

…….

Since being discovered as regulators of developmental timing in C. elegans, it has become widely established that miRNA-mediated regulation of gene expression is a fundamental biological phenomenon required to facilitate key developmental processes such as cellular proliferation, programmed cell death, and cell lineage determination and differentiation (Bartel, 2009; Ambros, 2011). Their significance is such that 60% of the human genome is predicted to be regulated by miRNA function (Friedman et al., 2009), each miRNA estimated to regulate around 200 target genes (Krek et al., 2005).

http://jmcb.oxfordjournals.org/content/early/2013/03/01/jmcb.mjt004/F2.medium.gif

Figure 2

Characteristic miRNA associated with the proliferation and differentiation of specialized cell types. A number of distinct miRNAs are expressed at specific stages through development to play a vital role in mediating cell proliferation, specification, and differentiation. A number of miRNAs involved in the establishment of specialized cell types are illustrated for neurogenesis (Smirnova et al., 2005;Makeyev et al., 2007; Shen and Temple, 2009; Shi et al., 2010; Zhao et al., 2010), myogenesis (Chen et al., 2006; Kim et al., 2006), haematopoiesis (Chen et al., 2004; Georgantas et al., 2007;Vasilatou et al., 2010), oligodendrocyte differentiation (Lau et al., 2008; Dugas et al., 2010), as well as induced pluripotent stem (iPS) cell reprogramming (Miyoshi et al., 2011).

miRNAs play a central role in establishing the spatiotemporal gene expression patterns required to establish specialized cell types and promote developmental complexity. The inherent complexity of miRNA function, however, requires a scientific approach in which context-specific miRNA function must be acknowledged if advancements are to be made in understanding how these small regulatory RNA molecules function in various developmental and pathophysiological processes. While this requires an appreciation for mechanistic aspects such as non-redundant miRISC function and the dynamic regulatory outcomes this facilitates, arguably the greatest challenge facing miRNA biology is the identification of the many genes that each miRNA targets and an understanding of the context-specific factors that determine when and how these genes are regulated.

Variability of Gene Expression and Drug Resistance

Larry H. Bernstein, MD, FCAP, Curator

LPBI

New Data Suggest Extreme Genetic Diversity of Tumors May Impart Drug Resistance

NEW YORK (GenomeWeb) – Researchers from the University of Chicago and the Beijing Institute of Genomics have undertaken one of the most extensive analyses of the genome of a single tumor and found far greater genetic diversity than anticipated. Such variation, they said, may enable even small tumors to resist treatment.

“With 100 million mutations, each capable of altering a protein in some way, there is a high probability that a significant minority of tumor cells will survive, even after aggressive treatment,” Chung-I Wu, a University of Chicago researcher and senior author of the study, said in a statement. “In a setting with so much diversity, those cells could multiply to form new tumors, which would be resistant to standard treatments.”

Extremely high genetic diversity in a single tumor points to prevalence of non-Darwinian cell evolution

Shaoping Ling,  PNAS   http://dx.doi.org:/10.1073/pnas.1519556112      http://www.pnas.org/content/early/2015/11/11/1519556112

A tumor comprising many cells can be compared to a natural population with many individuals. The amount of genetic diversity reflects how it has evolved and can influence its future evolution. We evaluated a single tumor by sequencing or genotyping nearly 300 regions from the tumor. When the data were analyzed by modern population genetic theory, we estimated more than 100 million coding region mutations in this unexceptional tumor. The extreme genetic diversity implies evolution under the non-Darwinian mode. In contrast, under the prevailing view of Darwinian selection, the genetic diversity would be orders of magnitude lower. Because genetic diversity accrues rapidly, a high probability of drug resistance should be heeded, even in the treatment of microscopic tumors.

The prevailing view that the evolution of cells in a tumor is driven by Darwinian selection has never been rigorously tested. Because selection greatly affects the level of intratumor genetic diversity, it is important to assess whether intratumor evolution follows the Darwinian or the non-Darwinian mode of evolution. To provide the statistical power, many regions in a single tumor need to be sampled and analyzed much more extensively than has been attempted in previous intratumor studies. Here, from a hepatocellular carcinoma (HCC) tumor, we evaluated multiregional samples from the tumor, using either whole-exome sequencing (WES) (n = 23 samples) or genotyping (n = 286) under both the infinite-site and infinite-allele models of population genetics. In addition to the many single-nucleotide variations (SNVs) present in all samples, there were 35 “polymorphic” SNVs among samples. High genetic diversity was evident as the 23 WES samples defined 20 unique cell clones. With all 286 samples genotyped, clonal diversity agreed well with the non-Darwinian model with no evidence of positive Darwinian selection. Under the non-Darwinian model,MALL (the number of coding region mutations in the entire tumor) was estimated to be greater than 100 million in this tumor. DNA sequences reveal local diversities in small patches of cells and validate the estimation. In contrast, the genetic diversity under a Darwinian model would generally be orders of magnitude smaller. Because the level of genetic diversity will have implications on therapeutic resistance, non-Darwinian evolution should be heeded in cancer treatments even for microscopic tumors.

Semantically Related Articles

The findings, which appeared in the Proceedings of the National Academy of Sciences this week, also call into question the widely held view that evolution at the cellular level is driven by Darwinian selection, revealing a level of rapid and extensive genetic diversity beyond what would be expected under this model.

In the study, the researchers focused on a single hepatocellular carcinoma tumor, roughly the size of a ping pong ball. They sampled 286 regions from a single slice of the tumor, studying each one with either whole-exome sequencing or genotyping under both the infinite-site and infinite-allele models of population genetics.

Based on their analyses, the team estimated more than 100 million coding region mutations in what they called an “unexceptional” tumor — more mutations than would ordinarily be expected by orders of magnitude, according to Wu.

This extreme genetic diversity, the study’s authors wrote, implies evolution under the non-Darwinian mode, which is driven by random mutations largely unaffected by natural selection. It also raises the question of why there is so little apparent Darwinian selection in the tumor.

The scientists speculated that in solid tumors, cells remain together and do not migrate, “so that when an advantageous mutation indeed emerges, cells carrying it are competing mostly with themselves. These mutations may confer advantages in fighting for space or extracting nutrients, but they are stifled by their own advantages,” they wrote.

Beneficial mutations may emerge on occasion, but in solid tumors the cell populations are “so structured that selection may often be blunted,” they stated. “The physiological effect has to be very strong to overcome those constraints.” Cancer drugs could remove those constraints, loosening up a cell population and allowing competition to occur, the investigators added.

Wu and his colleagues see the presence of so many mutations in a tumor as creating problems when it comes to treatment. “It almost guarantees that some cells will be resistant,” study co-author and University of Chicago oncologist Daniel Catenacci said in the statement. “But it also suggests that aggressive treatment could push tumor cells into a more Darwinian mode.”

Overall, the findings highlight the need to consider non-Darwinian evolution and the vast genetic diversity it can confer as factors when developing treatment strategies, even for small tumors, the researchers concluded.

Size Matters

Size Matters

Larry H. Bernstein, MD, FCAP, Curator

LPBI

MinION Sequencing Untangles RNA Transcripts in a Difficult Gene

By Aaron Krol

http://www.bio-itworld.com/2015/11/3/minion-sequencing-untangles-rna-transcripts-difficult-gene.html

RNA isoforms are distinct versions of the same gene. Through a process called alternative splicing, the different subunits, or “exons,” that make up a gene can be reshuffled in new combinations. Many genes have two or more mutually exclusive exons, and which ones are actually expressed as RNA and protein can have big effects on cellular behavior ― in effect, expanding the protein arsenal of the genome.

November 3, 2015 | Brenton Graveley received his first MinION shipment in April 2014, at his lab at the University of Connecticut’s Institute of Systems Genomics. His lab was among the first to unwrap one of the candy bar-sized DNA sequencers made by Oxford Nanopore Technologies, and although its accuracy was shaky and its throughput low, right away Graveley and his colleagues could see it was producing real DNA data.

“I’m still amazed to this day that it works at all,” Graveley says. “It’s like Star Trek.”

A lot of buzz around the MinION has focused on its tiny size: early adopters have plotted to take MinIONs into outbreak zones and species-hunting tromps through the rainforest, working with bare-bones labs and laptop computers. But for Graveley, the size of the DNA strands the MinION reads is just as exciting as the size of the sequencer itself. That’s because most other sequencers rely on picking up chemical reactions that become more error-prone over time, meaning DNA can only be read in short fragments. The MinION, which reads genetic material by observing single molecules of DNA as they pass through extremely narrow “nanopores,” keeps producing data for as long as DNA is moving through the pore.

“You get the read length of whatever fragment you put into the MinION,” he says. “We’ve gotten reads that are over 100 kilobases,” hundreds or even thousands of times longer than researchers can expect with most other technologies.

Now, in a paper published in Genome Biology, Graveley and two of his lab members, post-doc Mohan Bolisetty and PhD student Gopinath Rajadinakaran, have shown how these read lengths can help explain the cellular behavior of Dscam1, one of the most difficult-to-study genes known to science. Related to a gene in humans that has been linked to Down syndrome ― the name stands for “Down Syndrome Cell Adhesion Molecule” ―Dscam1 plays a fundamental role in forming the architecture of insect brains. This single gene can produce thousands of subtly different proteins, an ability that makes it both a fascinating subject of research, and almost impossible to understand using standard sequencing technology.

Determining exon connectivity in complex mRNAs by nanopore sequencing

Mohan T. Bolisetty12, Gopinath Rajadinakaran1 and Brenton R. Graveley1*
Genome Biology 2015, 16:204       http://dx.doi.org:/10.1186/s13059-015-0777-z                    http://genomebiology.com/2015/16/1/204

Short-read high-throughput RNA sequencing, though powerful, is limited in its ability to directly measure exon connectivity in mRNAs that contain multiple alternative exons located farther apart than the maximum read length. Here, we use the Oxford Nanopore MinION sequencer to identify 7,899 ‘full-length’ isoforms expressed from four Drosophila genes, Dscam1, MRP, Mhc, and Rdl. These results demonstrate that nanopore sequencing can be used to deconvolute individual isoforms and that it has the potential to be a powerful method for comprehensive transcriptome characterization.

High throughput RNA sequencing has revolutionized genomics and our understanding of the transcriptomes of many organisms. Most eukaryotic genes encode pre-mRNAs that are alternatively spliced [1]. In many genes, alternative splicing occurs at multiple places in the transcribed pre-mRNAs that are often located farther apart than the read lengths of most current high throughput sequencing platforms. As a result, several transcript assembly and quantitation software tools have been developed to address this [2], [3]. While these computational approaches do well with many transcripts, they generally have difficulty assembling transcripts of genes that express many isoforms. In fact, we have been unable to successfully assemble transcripts of complex alternatively spliced genes such as Dscam1 or Mhc using any transcript assembly software (data not shown). These software tools also have difficulty quantitating transcripts that have many isoforms, and for genes with distantly located alternatively spliced regions, they can only infer, and not directly measure, which isoforms may have been present in the original RNA sample [4]. For example, consider a gene containing two alternatively spliced exons located 2 kbp away from one another in the mRNA. If each exon is observed to be included at a frequency of 50 % from short read sequence data, it is impossible to determine whether there are two equally abundant isoforms that each contain or lack both exons, or four equally abundant isoforms that contain both, neither, or only one or the other exon.

Pacific Bioscience sequencing can generate read lengths sufficient to sequence full length cDNA isoforms and several groups have recently reported the use of this approach to characterize the transcriptome [5]. However, the large capital expense of this platform can be a prohibitive barrier for some users. Thus, it remains difficult to accurately and directly determine the connectivity of exons within the same transcript. The MinION nanopore sequencer from Oxford Nanopore requires a small initial financial investment, can generate extremely long reads, and has the potential to revolutionize transcriptome characterization, as well as other areas of genomics.

Several eukaryotic genes can encode hundreds to thousands of isoforms. For example, inDrosophila, 47 genes encode over 1,000 isoforms each [6]. Of these, Dscam1 is the most extensively alternatively spliced gene known and contains 115 exons, 95 of which are alternatively spliced and organized into four clusters [7]. The exon 4, 6, 9, and 17 clusters contain 12, 48, 33, and 2 exons, respectively. The exons within each cluster are spliced in a mutually exclusive manner and Dscam1 therefore has the potential to generate 38,016 different mRNA and protein isoforms. The variable exon clusters are also located far from one another in the mRNA and the exons within each cluster are up to 80 % identical to one another at the nucleotide level. Together, these characteristics present numerous challenges to characterize exon connectivity within full-length Dscam1 transcripts for any sequencing platform. Furthermore, though no other gene is as complex as Dscam1, many other genes have similar issues that confound the determination of exon connectivity.

We are interested in developing methods to perform simple and robust long-read sequencing of individual isoforms of Dscam1 and other complex alternatively spliced genes. Here, we use the Oxford Nanopore MinION to sequence ‘full-length’ cDNAs from four Drosophila genes – Rdl, MRP,Mhc, and Dscam1 – and identify a total of 7,899 distinct isoforms expressed by these four genes.

Similarity between alternative exons

We were interested in determining the feasibility of using the MinION nanopore sequencer to characterize the connectivity of distantly located exons in the mRNAs expressed from genes with complex splicing patterns. For the purposes of these experiments, we have focused on fourDrosophila genes with increasingly complex patterns of alternative splicing (Fig. 1). Resistant to dieldrin (Rdl) contains two clusters, each containing two mutually exclusive exons and therefore has the potential to generate four different isoforms (Fig. 1a). Multidrug-Resistance like Protein 1(MRP) contains two mutually exclusive exons in cluster 1 and eight mutually exclusive exons in cluster 2, and can generate 16 possible isoforms (Fig. 1b). Myosin heavy chain (Mhc) can potentially generate 180 isoforms due to five clusters of mutually exclusive exons – clusters 1 and 5 contain two exons, clusters 2 and 3 each contain three exons, and cluster 4 contains five exons. Finally, Dscam1 contains 12 exon 4 variants, 48 exon 6 variants, 33 exon 9 variants (Fig. 1d), and two exon 17 variants (not shown) and can potentially express 38,016 isoforms. For this study, however, we have focused only on the exon 3 through exon 10 region of Dscam1, which encompasses the 93 exon 4, 6, and 9 variants, and 19,008 potential isoforms (Fig. 1d).

Fig. 1. Schematic of the exon-intron structures of the genes examined in this study. a The Rdl gene contains two clusters (cluster one and two) which each contain two mutually exclusive exons. b The MRP gene contains contains two and eight mutually exclusive exons in clusters 1 and 2, respectively. Mhc contains two mutually exclusive exons in clusters 1 and 5, three mutually exclusive exons in clusters 2 and 3, and five mutually exclusive exons in cluster 4. The Dscam1 gene contains 12, 48, and 33 mutually exclusive exons in the exon 4, 6, and 9 clusters, respectively. For each gene, the constitutive exons are colored blue, while the variable exons are colored yellow, red, orange, green, or light blue

Because our nanopore sequence analysis pipeline uses LAST to perform alignments [8], we aligned all of the Rdl, MRP, Mhc, and Dscam1 exons within each cluster to one another using LAST to determine the extent of discrimination needed to accurately assign nanopore reads to a specific exon variant. For Rdl, each variable exon was only aligned to itself, and not to the other exon in the same cluster (data not shown). For MRP, the two exons within cluster 1 only align to themselves, and though the eight variable exons in cluster 2 do align to other exons, there is sufficient specificity to accurately assign nanopore reads to individual exons (Fig. 2a). For Mhc, the variable exons in cluster 1 and cluster 5 do not align to other exons, and the variable exons in cluster 2, cluster 3, and cluster 4 again align with sufficient discrimination to identify the precise exon present in the nanopore reads (Fig. 2b). Finally, for Dscam1, the difference in the LAST alignment scores between the best alignment (each exon to itself) and the second, third, and fourth best alignments are sufficient to identify the Dscam1 exon variant (Fig. 2c). This analysis indicates that for each gene in this study, LAST alignment scores are sufficiently distinct to identify the variable exons present in each nanopore read.

Fig. 2. Similarity distance between the variable alternative exons of MRP,Mhc, and Dscam1. a Violin plots of the LAST alignment scores of each variable exon within MRP cluster 1 and MRP cluster 2 to themselves and the second (2nd) best alignments. b Violin plots of the LAST alignment scores of each variable exon within each Mhc cluster to themselves and the second (2nd) best alignments. c Violin plots of the LAST alignment scores of each variable exon within each Dscam1 cluster to themselves (1st), and to the exons with the second (2nd), third (3rd) and fourth (4th) best alignments

Optimizing template switching in Dscam1 cDNA libraries

Template switching can occur frequently when libraries are prepared by PCR and can confound the interpretation of results [9], [10]. For example, CAM-Seq [11] and a similar method we independently developed called Triple-Read sequencing [12] to characterize Dscam1 isoforms, were found to have excessive template switching due to amplification during the library prep protocols. To assess template switching in our current study, we generated a spike-in mixture of in vitro transcribed RNAs representing six unique Dscam1 isoforms – Dscam1 4.2,6.32,9.31 , Dscam14.1,6.46,9.30 , Dscam1 4.3,6.33,9.9 , Dscam1 4.12,6.44,9.32 , Dscam1 4.7,6.8,9.15 , and Dscam1 4.5,6.4,9.4. We used 10 pg of this control spike-in mixture and prepared libraries for MinION sequencing by amplifying the exon 3 through exon 10 region for 20, 25, or 30 cycles of RT-PCR. We then end-repaired and dA-tailed the fragments, ligated adapters, and sequenced the samples on a MinION (7.3) for 12 h each. We obtained 33,736, 8,961, and 7,511 base-called reads from the 20, 25, and 30 cycle libraries, respectively. Consistent with the size of the exon 3 to 10 cDNA fragment being 1,806–1,860 bp in length, depending on the precise combination of exons it contains, most reads we observed were in this size range (Fig. 3a). We used Poretools [13] to convert the raw output files into fasta format and then used LAST to align the reads to a LAST database containing each variable exon. From these alignments, we identified reads that mapped to all three exon clusters, as well as the exon with the best alignment score within each cluster. When examining the alignments to each cluster independently, we found that for these spike-in libraries, all reads mapped uniquely to the exons present in the input isoforms. Therefore, any observed isoforms that were not present in the input pool were a result of template switching during the RT-PCR and library prep protocol and not due to false alignments or sequencing errors.

Fig. 3. Optimized RT-PCR minimizes template-switching for MinION sequencing. a Histogram of read lengths from MinION sequencing ofDscam1 spike-ins from the library generated using 25 cycles of PCR. bBar plot indicating the extent of template switching in Dscam1 spike-ins at different PCR cycles (left). The blue portions indicate the fraction of reads corresponding to input isoforms while the red portions correspond to the fraction of reads corresponding to template-switched isoforms. On the right, plots of the rank order versus number of reads (log10) for the 20, 25, and 30 cycle libraries. The blue dots indicate input isoforms while the red portions correspond to template-switched isoforms

When comparing the combinations of exons within each read to the input isoforms, we observed that 32 % of the reads from the 30 cycle library corresponded to isoforms generated by template switching (Fig. 3b). The template-switched isoforms observed by the greatest number of reads in the 30 cycle library were due to template switching between the two most frequently sequenced input isoforms. In most cases, template switching occurred somewhere within exon 7 or 8 and resulted in a change in exon 9. However, the extent of template switching was reduced to only 1 % in the libraries prepared using 25 cycles, and to 0.2 % in the libraries prepared using 20 cycles of PCR (Fig. 3b). Again, for these two libraries the most frequently sequenced template-switched isoforms involved the input isoforms that were also the most frequently sequenced. These experiments demonstrate that the MinION nanopore sequencer can be used to sequence ‘full length’ Dscam1 cDNAs with sufficient accuracy to identify isoforms and that the cDNA libraries can be prepared in a manner that results in a very small amount of template switching.

Fig. 4. MinION sequencing of Dscam1 identified 7,874 isoforms. aHistogram of read length distribution for Drosophila head samples. b The total number of Dscam1 isoforms identified from MinION sequencing. cCumulative distribution of Dscam1 isoforms with respect to expression. dViolin plot of the number of isoforms identified using 100 random pools of the indicated number of reads. e Plot of the estimated number of total isoforms present in the library using the capture-recapture method with two random pools of the indicated number of reads. The shaded blue area indicates the 95 % confidence interval. f Deconvoluted expression of Dscam1 exon cluster variants (top) and the isoform connectivity of two highly expressed Dscam1 isoforms (bottom)

Fig. 5. Accuracy of Dscam1 sequencing results. a Comparison of the frequency of variable exon inclusion for the Dscam1 exon 4 (yellow), 6 (red), and 9 (orange) clusters as determined by nanopore sequencing or by amplicon sequencing using an Illumina MiSeq. b Percent identities (left) or LAST alignment scores (right) of full-length template, complement, and two directions (sequencing both template and complements) nanopore read alignments

Over their entire lengths, the 2D reads that map specifically to one exon 4, 6, and 9 variants map with an average 90.37 % identity and an average LAST score of approximately 1,200 (Fig. 5b). The 16,450 full length reads correspond to 7,874 unique isoforms, or 42 % of the 18,612 possible isoforms given the exon 4, 6, and 9 variants observed. We note, however, that while 4,385 isoforms were represented by more than one read, 3,516 of isoforms were represented by only one read indicating that the depth of sequencing has not reached saturation (Fig. 4b and c). This was further confirmed by performing a bootstrapped subsampling analysis (Fig. 4d) and by using the capture-recapture method to attempt to assess the complexity of isoforms present in the library (Fig. 4e), which suggests that over 11,000 isoforms are likely to be present, though even this analysis has not yet reached saturation. The most frequently observed isoforms were Dscam14.1,6.12,9.30 and Dscam1 4.1,6.1,9.30 which were observed with 30 and 25 reads, respectively (Fig. 4e). In conclusion, these results demonstrate the practical application of using the MinION nanopore sequencer to identify thousands of distinct Dscam1 isoforms in a single biological sample.

Nanopore sequencing of ‘full-length’ Rdl, MRP, and Mhc isoforms

To extend this approach to other genes with complex splicing patterns, we focused on Rdl, MRP, and Mhc which have the potential to generate four, 16, and 180 isoforms, respectively. We prepared libraries for each of these genes by RT-PCR using primers in the constitutive exons flanking the most distal alternative exons using 25 cycles of PCR, pooled the three libraries and sequenced them together on the MinION nanopore sequencer for 12 h obtaining a total of 22,962 reads. The input libraries for Rdl, MRP, and Mhc were 567 bp, 1,769-1,772 bp, and 3,824 bp, respectively. The raw reads were aligned independently to LAST indexes of each cluster of variable exons. The alignment results were then used to assign reads to their respective libraries, identify reads that mapped to all variable exon clusters for each gene, and the exon with the best alignment score within each cluster. In total, we obtained 301, 337, and 112 full length reads forRdl (Fig. 6), MRP (Fig. 7), and Mhc (Fig. 8), respectively. For Rdl, both variable exons in each cluster was observed, and accordingly all four possible isoforms were observed, though in each case the first exon was observed at a much higher frequency than the second exon (Fig. 6d). Interestingly, the ratio of isoforms containing the first versus second exon in the second cluster is similar for isoforms containing either the first exon or the second exon in the first cluster indicating that the splicing of these two clusters may be independent. For MRP, both exons in the first cluster were observed and all but one of the exons in the second cluster (exon B) were observed, though the frequency at which the exons in both clusters were used varied dramatically (Fig. 7d). For example, within the first cluster, exon B was observed 333 times while exon A was observed only four times. Similarly, in the second cluster, exon A was observed 157 times whereas exons B, E, F, and G were observed 0 times, thrice, once, and twice, respectively, and exons D, E, and H were observed between 40 and 76 times. As a result, we observed only nine MRP isoforms. For Mhc, we again observed strong biases in the exons observed in each of the five clusters (Fig. 8d). In the first cluster, exon B was observed more frequently than exon A. In the second cluster, 109 of the reads corresponded to exon A, while exons B and C were observed by only two and one read, respectively. In the third cluster, exon A was not observed at all while exons B and C were observed in roughly 80 % and 20 % of reads, respectively. In the fourth cluster, exon A was observed only once, exons B and C were not observed at all, exon E was observed 13 times while exon D was present in all of the remaining reads. Finally, in the fifth cluster, only exon B was observed. As with MRP, these strong biases and near or complete absences of exons in some of the clusters severely reduces the number of possible isoforms that can be observed. In fact, of the 180 potential isoforms encoded by Mhc, we observed only 12 isoforms. Various Mhc isoforms are known to be expressed in striking spatial and temporally restricted patterns [14] and thus it is likely that other Mhc isoforms that we did not observe, could be observed by sequencing other tissue samples.

Fig. 6. MinION sequencing of Rdl identified four isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)

Fig. 7. MinION sequencing of MRP identified nine isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)

Fig. 8. MinION sequencing of Mhc identified 12 isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)

Conclusions

Here we have demonstrated that nanopore sequencing with the Oxford Nanopore MinION can be used to easily determine the connectivity of exons in a single transcript, including Dscam1, the most complicated alternatively spliced gene known in nature. This is an important advance for several reasons. First, because short-read sequence data cannot be used to conclusively determine which exons are present in the same RNA molecule, especially for complex alternatively spliced genes, long-read sequence data are necessary to fully characterize the transcript structure and exon connectivity of eukaryotic transcriptomes. Second, although the Pacific Bioscience platform can perform long-read sequencing, there are several differences between it and the Oxford Nanopore MinION that could cause users to choose one platform over the other. In general, the quality of the sequence generated by the Pacific Bioscience is higher than that currently generated by the Oxford Nanopore MinION. This is largely due to the fact that each molecule is sequenced multiple times on the Pacific Bioscience platform yielding a high quality consensus sequence whereas on the Oxford Nanopore MinION, each molecule is sequenced at most twice (in the template and complement). We have previously used the Pacific Bioscience platform to characterize Dscam1 isoforms and found that it works well, though due to the large amount of cDNA needed to generate the libraries, many cycles of PCR are necessary and we observed an extensive amount of template switching, making it impractical to use for these experiments (BRG, unpublished data). However, over the past year that we have been involved in the MAP, the quality of sequence has steadily increased. As this trend is likely to continue, the difference in sequence quality between these two platforms is almost certain to shrink. Nonetheless, as we demonstrate, the current quality of the data is more than sufficient to allow us to accurately distinguish between highly similar alternatively spliced isoforms of the most complex gene in nature. Third, the ability to accurately characterize alternatively spliced transcripts with the Oxford Nanopore MinION makes this technology accessible to a much broader range of researchers than was previously possible. This is in part due to the fact that, in contrast to all other sequencing platforms, very little capital expense is needed to acquire the sequencer. Moreover, the MinION is truly a portable sequencer that could literally be used in the field (provided one has access to an Internet connection), and due to its size, almost no laboratory space is required for its use.

Although nanopore sequencing has many exciting and potentially disruptive advantages, there are several areas in which improvement is needed. First, although we were able to accurately identify over 7,000 Dscam1 isoforms with an average identity of full-length alignments >90 %, there are several situations in which this level of accuracy will be insufficient to determine transcript structure. For instance, there are many micro-exons in the human genome [15], and these exons would be difficult to identify if they overlapped a portion of a read that contained errors. Additionally, small unannotated exons could be difficult to identify for similar reasons. Second, the current number of usable reads is lower than that which will be required to perform whole transcriptome analysis. One issue that plagues transcriptome studies is that the majority of the sequence generated comes from the most abundant transcripts. Thus, with the current throughput, numerous runs would be needed to generate a sufficient number of reads necessary to sample transcripts expressed at a low level. In fact, this is one reason that we chose in this study, to begin by targeting specific genes rather than attempting to sequence the entire transcriptome. We do note, however, that over the past year of our participation in the MAP, the throughput of the Oxford Nanopore MinION has increased, and it is reasonable to expect additional improvements in throughput that should make it possible to generate a sufficient number of long reads to deeply interrogate even the most complex transcriptome.

In conclusion, we anticipate that nanopore sequencing of whole transcriptomes, rather than targeted genes as we have performed here, will be a rapid and powerful approach for characterizing isoforms, especially with improvements in the throughput and accuracy of the technology, and the simplification and/or elimination of the time-consuming library preparations.

The Tangled Transcriptome

Graveley’s lab studies the transcriptome, the mass of RNA molecules in living cells whose job is to translate DNA into proteins. The transcriptome is a sort of snapshot of which parts of the genome are active at a given time and place. Which genes are transcribed into RNA, and in what quantities, changes from organ to organ and even cell to cell, and can vary over an organism’s lifetime or in response to environmental changes.

Of particular interest to Graveley are those RNA molecules than can take different shapes, or “isoforms,” depending on random chance or what the cell needs at a particular time. RNA isoforms are distinct versions of the same gene. Through a process called alternative splicing, the different subunits, or “exons,” that make up a gene can be reshuffled in new combinations. Many genes have two or more mutually exclusive exons, and which ones are actually expressed as RNA and protein can have big effects on cellular behavior ― in effect, expanding the protein arsenal of the genome.

“For the entire field of transcriptomics and gene function, knowing what isoforms are expressed is critical,” says Graveley. “Most genes are complicated, especially in humans, and have alternative splicing that occurs at multiple places.”

That brings us to the challenge of Dscam1, the world record holder for alternative splicing. In fruit flies, a particularly well-studied model organism, Dscam1 is made up of 115 exons, only 20 of which are always transcribed into RNA. The other 95 exist in four “clusters” of mutually exclusive exons, and as a result, over 38,000 possible isoforms of Dscam1 have been predicted.

“This is by far, an order of magnitude, more than any other gene,” Graveley explains. This flexibility makes sense in light of Dscam1’s function. The protein it makes helps to “identify” single neurons in the insect brain, making them distinct enough from their neighbors for these cells to assemble a neural circuit on principles of like avoiding like. In experiments where Dscam1 has been altered to make fewer RNA isoforms, the neural wiring breaks down during development, sometimes severely enough to kill the flies.

Dscam1 also plays a role in the insect immune system, another reason for it to produce a huge variety of isoforms. Each of these molecules might be more or less effective at fighting certain pathogens.

It’s frustratingly hard, however, to figure out exactly which isoforms are in a specific sample. Graveley has been working on Dscam1 in fruit flies for more than a decade, but very basic questions remain unanswered: are some isoforms more common, or more important, than others? Are all the theoretical isoforms expressed? Do the isoforms have different behaviors, or are they just arbitrary ways of tagging neurons?

Size Matters

The trouble is the current state of the art in sequencing technology, which reads just a couple of hundred DNA bases at a time. That works great for identifying which exons are present in the transcriptome, but it’s no good for saying which mix of exons any specific strand of RNA is carrying. Different exons can lie thousands of bases apart on the RNA molecule, and there’s no way to bridge the gap between reads.

Graveley has tried a lot of solutions. He’s used the outdated Sanger sequencing method, which is much slower and more labor-intensive than modern sequencers, but does span longer reads. His lab also worked out a roundabout way of reconstructing RNA transcripts with contemporary Illumina sequencers, through a combination of chemistry and computational approaches.

“It worked,” he says, “but it was complicated by a lot of library preparation artifacts, and you basically had to jury-rig a genome analyzer to do something it was not supposed to do.”

Graveley’s preferred method is to use a sequencer produced by Pacific Biosciences, which, like the MinION, is built on long-read, single-molecule technology. PacBio sequencing is much better established than nanopores, and its results are known to be reliable; it also has the high throughput typical of modern instruments. For researchers working on alternative splicing, it’s clearly the technology to beat.

Unfortunately, it’s also very expensive. So Graveley’s team set out to learn whether the MinION, a low-throughput but extremely cheap alternative, could be an adequate substitute.

For the Genome Biology paper, the team focused on a 1.8-kilobase region of Dscam1 RNA that covers 93 of the gene’s 95 alternatively spliced exons. To get their samples, they crushed fruit fly heads, isolated Dscam1 RNA from the sample using a polymerase, and reverse-transcribed it into cDNA for sequencing. They also sequenced transcripts of three other alternatively spliced genes, Rdl, MRP, and Mhc.

The biggest concern for new applications of the MinION is its shaky accuracy. While most sequencers can achieve comfortably over 99% consensus with reference sequences, Graveley’s group has seen only about 90% identity with the MinION. That’s actually a little better than most MinION users have managed, although the device’s accuracy has been steadily improving. Users have had to pick their projects carefully to account for this: the device is pretty reliable in resequencing studies that map DNA reads to known references, but it’s still a dubious choice for sequencing unknown genetic material from scratch (although it’s been tried).

To accurately pin down the exact isoforms in the transcriptome, the MinION didn’t have to read every RNA molecule perfectly, but it did have to come close enough to decisively tell one exon from another ― and inDscam1, those exons could be as much as 80% identical.

In fact, Graveley and his co-authors found that the MinION was very capable of this. Out of around 33,000 high-quality Dscam1 reads pulled off the sequencer, almost 29,000 were a strong match for one and only one combination of exons. To further check their accuracy, the team also sequenced the same sample on Illumina technology. While the Illumina sequencer could not give whole isoforms, it did show the same proportions of different exons, suggesting that the MinION gave a complete and unbiased picture of the sample.

“Alternative splicing, it turns out, is probably one of the ideal applications for this platform,” Graveley says. “Even with a gene as complicated as this one, we’re able to accurately distinguish the isoforms from one another. Unless you have very, very small exons, or two exons that are almost identical to each other, the accuracy is good enough.”

Make Way for PromethION

The results are good news for researchers studying the transcriptome, but the MinION probably won’t push out other methods for dealing with alternative splicing just yet. Its low throughput means that at best it can cover a very small portion of the transcriptome with each run ― and that means isolating targeted RNA transcripts, a process that can introduce new biases into the data.

“You need a lot of reads to get the whole transcriptome, and what happens is you end up sequencing boring genes like actin and tubulin, the really abundantly expressed things,” Graveley explains. Still, his data from this experiment was good enough to replicate a few earlier findings: for instance, that Dscam1 does appear to make every predicted isoform. In this experiment, his lab observed almost half the possible isoforms, containing 92 of 93 possible exons.

Meanwhile, Oxford Nanopore Technologies is working on a new instrument, the PromethION, which will contain 48 MinION-style flow cells in a battery. Graveley has already signed on to be one of the first recipients, in an access program that is likely to start in the winter.

Judging by studies like this one, the PromethION stands a good chance of becoming the instrument of choice for large-scale RNA sequencing. With Dscam1, Graveley hopes to reach high enough throughput to do functional studies, seeking to learn whether different combinations of isoforms give rise to physical or behavioral differences. He also wants to look at human genes with high levels of alternative splicing, and to test whether the MinION can accurately count total numbers of RNA isoforms.

“The fact that you can use this technology to characterize whole isoforms is very exciting,” Graveley says. “It’s going to help us start characterizing the transcriptome in ways that have been very difficult.”

Human Genetics and Childhood Diseases

Curator: Larry H. Bernstein, MD, FCAP

Publication Roundup: HGMD

HGMD®, the Human Gene Mutation Database is used by scientists around the world to find information on reported genetic mutations. The papers below use the database to advance our understanding of disease, DNA dynamics, and more.

https://www.qiagenbioinformatics.com/blog/translational/publication-roundup-hgmd

Local DNA dynamics shape mutational patterns of mononucleotide repeats in human genomes
First author: Albino Bacolla

Scientists in the US and UK published results in Nucleic Acids Research of a detailed analysis of single-base substitutions and indels in the human genome. Their findings show that certain base positions are more susceptible to mutagenesis than others. They used HGMD Professional to find mutations in specific genomic regions for analysis; the paper includes charts showing mutation patterns, germline SNPs, and more from HGMD data.

High prevalence of CDH23 mutations in patients with congenital high-frequency sporadic or recessively inherited hearing loss
First author: Kunio Mizutari

This Orphanet Journal of Rare Diseases paper from scientists in Japan sequenced 72 patients with unexplained hearing loss, finding several CDH23 mutations, some of which were novel. Mutations in the gene have been linked to Usher syndrome and other forms of hereditary hearing loss. The scientists used HGMD to find all known CDH23 mutations within nearly 70 coding regions.

Mutation analyses and prenatal diagnosis in families of X-linked severe combined immunodeficiency caused by IL2Rγ gene novel mutation
First author: Q.L. Bai

In Genetics and Molecular Research, scientists report the utility of mutation analysis of the interleukin-2 receptor gamma gene to assess carrier status and perform prenatal diagnosis for X-linked severe combined immunodeficiency. They studied two high-risk families, along with 100 controls, to evaluate the approach. Sequence variation was determined using HGMD Professional and an X-SCID database, and a new mutation was discovered in the project.

Impact of glucocerebrosidase mutations on motor and nonmotor complications in Parkinson’s disease
First author: Tomoko Oeda

Researchers from three hospitals in Japan published this Neurobiology of Aging report that may help stratify Parkinson’s disease patients by prognosis. They sequenced mutations in the GBA gene in 215 patients, finding that those who had mutations associated with Gaucher disease suffered dementia and psychosis much earlier than those who didn’t. The team found previously reported GBA mutations using HGMD Professional.

Comprehensive Genetic Characterization of a Spanish Brugada Syndrome Cohort
First author: Elisabet Selga

In this PLoS One publication, scientists from a number of institutions in Spain examined genetic variation among patients with Brugada syndrome, a rare genetic cardiac arrhythmia. They sequenced 14 genes in 55 patients, identifying 61 variants and finding the subset that appear pathogenic. Variants were filtered against a number of databases, including HGMD.

Local DNA dynamics shape mutational patterns of mononucleotide repeats in human genomes

Nucl. Acids Res. (26 May 2015) 43(10): 5065-5080.   http://dx.doi.org:/10.1093/nar/gkv364

Single base substitutions (SBSs) and insertions/deletions are critical for generating population diversity and can lead both to inherited disease and cancer. Whereas on a genome-wide scale SBSs are influenced by cellular factors, on a fine scale SBSs are influenced by the local DNA sequence-context, although the role of flanking sequence is often unclear. Herein, we used bioinformatics, molecular dynamics and hybrid quantum mechanics/molecular mechanics to analyze sequence context-dependent mutagenesis at mononucleotide repeats (A-tracts and G-tracts) in human population variation and in cancer genomes. SBSs and insertions/deletions occur predominantly at the first and last base-pairs of A-tracts, whereas they are concentrated at the second and third base-pairs in G-tracts. These positions correspond to the most flexible sites along A-tracts, and to sites where a ‘hole’, generated by the loss of an electron through oxidation, is most likely to be localized in G-tracts. For A-tracts, most SBSs occur in the direction of the base-pair flanking the tracts. We conclude that intrinsic features of local DNA structure, i.e. base-pair flexibility and charge transfer, render specific nucleotides along mononucleotide runs susceptible to base modification, which then yields mutations. Thus, local DNA dynamics contributes to phenotypic variation and disease in the human population.

INTRODUCTION

Changes in human genomic DNA in the form of base substitutions and insertions/deletions (indels) are essential to ensure population diversity, adaptation to the environment, defense from pathogens and self-recognition; they are also a critical source of human inherited disease and cancer. On a genome-wide scale, base substitutions result from the combined action of several factors, including replication fidelity, lagging versus leading strand DNA synthesis, repair, recombination, replication timing, transcription, nucleosome occupancy, etc., both in the germline and in cancer (14). On a much finer scale [(over a few base pairs (bp)], rates of base substitutions may be strongly influenced by interrelationships between base–protein and base–base interactions. For example, the mutator role of activation-induced deaminase (AID) in B-cells during class-switch recombination and somatic hypermutation (5) targets preferentially cytosines within WRC (W: A|T; R: A|G) sequences (6), whereas apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like (APOBEC) overexpression displays a preference for base substitutions at cytosines in TCW contexts (7). Other examples, such as the induction of C→T transitions at CG:CG dinucleotides by cytosine-5-methylation and the role of UV light in promoting base substitutions at pyrimidine dimers have been well documented (reviewed in (4,8)). More recently, complex patterns of base substitution at guanosines in cancer genomes have been found to correlate with changes in guanosine ionization potentials as a result of electronic interactions with flanking bases (9), suggesting a role for electron transfer and oxidation reactions in sequence-dependent mutagenesis. However, despite these advances, the increasing number of sequence-dependent patterns of mutation noted in genome-wide sequencing studies has met with a lack of understanding of most of the underlying mechanisms (10). Thus, a picture is emerging in which mutations are often heavily dependent on sequence-context, but for which our comprehension is limited.

Mononucleotide repeats comprise blocks of identical base pairs (A|T or C|G; hereafter referred to as A-tracts and G-tracts) and display distinct features: they are abundant in vertebrate genomes; mutations within the tracts occur more frequently than the genome-wide average; mutations generally increase with increasing tract length; length instability is a hallmark of mismatch repair-deficiency in cancers; and sequence polymorphism within the general population has been linked to phenotypic diversity (1115). Thus, mononucleotide repeats appear ideal for addressing the question of sequence-dependent mutagenesis since base pairs within the tracts are flanked by identical neighbors. Both historic and recent investigations concur with the conclusion that a major source of mononucleotide repeat polymorphism is the occurrence of slippage (i.e. repeat misalignment) during semiconservative DNA replication, which gives rise to the addition or deletion of repeat units (11,12). An additional and equally important source of mutation has recently been suggested to arise from errors in DNA replication by translesion synthesis DNA polymerases, such as pol η and pol κ (13), also on slipped intermediates, leading to single base substitutions.

A key question that remains unanswered in these studies and which is relevant to the issue of sequence context-dependent mutagenesis is whether all base pairs within mononucleotide repeats display identical susceptibility to single base changes and whether indels (which are consequent to DNA breakage) occur randomly within the tracts.

Herein, we combine bioinformatics analyses on mononucleotide repeat variants from the 1000 Genomes Project and cancer genomes with molecular dynamics simulations and hybrid quantum mechanics/molecular mechanics calculations to address the question of sequence-dependent mutagenesis within these tracts. We show that mutations along both A-tracts and G-tracts are highly non-uniform. Specifically, both base substitutions and indels occur preferentially at the first and last bp of A-tracts, whereas they are concentrated between the second and third G:C base pairs in G-tracts. These positions coincide with the most flexible base pairs for A-tracts and with the preferential localization of a ‘hole’ that results when one electron is lost due to an oxidation reaction anywhere along G-tracts. Thus, despite the uniformity of sequence composition, mutations occur in a sequence-dependent context at homopolymeric runs according to a hierarchy that is imposed by both local DNA structural features and long-range base–base interactions. We also show that the repair processes leading to base substitution must differ between A- and G-tracts, since in the former, but not in the latter, base substitutions occur predominantly in the direction of the base immediately flanking the tracts. Additional sequence-dependent patterns of mutation are likely to arise from studies of more heterogeneous sequence combinations, possibly involving other aspects intrinsic to the structure of DNA.

RESULTS

Mononucleotide repeat variation is defined by tract length and flanking base composition

We define mononucleotide repeats in the GRCh37/hg19 (hg19) human genome assembly as uninterrupted runs of A:T and G:C base pairs (hereafter referred to as A-tracts and G-tracts, respectively) from 4 to 13 base pairs in length (Figure 1A). We retrieved a total of 48,767,945 A-tracts and 13,633,781 G-tracts, both of which displayed a biphasic distribution with an inflection point between tract lengths of 8 and 9 (bp) and with the number of runs declining with length more dramatically for G-tracts than for A-tracts (Figure 1B), as noted previously (29). Both the number of short tracts and the extent of decline varied with flanking base composition, TA[n]T runs being two- to three-fold more abundant than CA[n]Cs (Supplementary Figure S1A) and AG[n]As declining the most rapidly (Supplementary Figure S1B). Thus, mononucleotide runs exist as a collection of separate pools of sequences in extant human genomes, each maintained at distinctive rates of sequence stability, as determined by factors such as bp composition (A:T versus G:C), tract length and flanking sequence composition.

View larger version:

Figure 1.

Mononucleotide repeat variation, evolutionary conservation and association with transcription. (A) The search algorithm was designed to retrieve runs of As or Ts (A-tracts) and Gs or Cs (G-tracts) length n (n = 4 to 13), along with their 5′ (n = 0) and 3′ (n = n + 1) nearest neighbors from hg19. Tract bases were numbered 5′ to 3′ with respect to the purine-rich sequence. The panel exemplifies the nomenclature for A- and G-tracts of length 4. (B) Logarithmic plot of the number of A-tracts (closed circles) and G-tracts (open circles) in hg19 as a function of length. (C) Normalized fractions of polymorphic tracts (F SNV) (number of SNVs divided by both hg19 number of tracts and n) from the 1KGP for A-tracts (closed circles) and G-tracts (open circles). (D) Radial plot of SNVs in the 1KGP at the 5′ and 3′ nearest neighbors of A-tracts. Periphery, tract length; horizontal axis, scale for the fraction of SNVs (F SNV). (E) Radial plot of SNVs in the 1KGP at the 5′ and 3′ nearest neighbors of G-tracts. (F) Percent difference in the numbers of A-tracts (closed circles) and G-tracts (open circles) between syntenic regions of hg19 and HN genomes. (G) The exponents of Benjamini-corrected P-values for A-tract-containing genes enriched in transcription-factor binding sites plotted as a function of A-tract length (triangles); each value represents the median of the top 11 USCS_TFBS terms. The percent A-tracts (closed circles) and G-tracts (open circles) intersecting genomic regions pulled-down by chromatin immunoprecipitation using antibodies against transcription factors are plotted as a function of tract length. (H) List of gene enrichment terms with a Benjamini-corrected P-value of <0.05 in common between genes containing A- and G-tracts of lengths 4–13, excluding the UCSC_TFBS terms.

We examined the extent of sequence variation in the human population by mapping 38,878,546 single nucleotide variants (SNVs) from 1092 haplotype-resolved genomes (the 1000 Genomes Project, 1KGP) (30) to the hg19 A- and G-tracts. The normalized fractions of polymorphic tracts (F SNV) were greater for G-tracts than A-tracts and both displayed Gaussian-type distributions, with maxima of 0.067 for G-tracts of length 8 and 0.017 for A-tracts of length 9 (Figure 1C). CA[n]C and AG[n]A runs displayed the highest F SNV values for A- and G-tracts, respectively (Supplementary Figure S1C and D), with F SNV values for AG[n]As attaining ∼0.10 at length 8. We conclude that flanking base composition influences the rates of SNV within mononucleotide runs and, as a consequence, their representation in the reference human genome.

F SNV values at the flanking 5′ and 3′ bp were similar between A- and G-tracts, except for minor differences for the least represented (i.e. longest) tracts and did not exceed 0.02 (Supplementary Figure S1E). These fractions are expected to be greater than at more distant positions from the tracts, based on previous data (29). SNVs at G-tracts, but not at A-tracts, were more frequent than at flanking base pairs. F SNVs for base pairs flanking short (≤8 bp) tracts were at least twice as high as those flanking long tracts; F SNVs also displayed distinct sequence preference with most (∼0.1) variants occurring at Ts 3′ of G-tracts (Figure 1D and E). In summary, SNVs at mononucleotide runs do not increase monotonically with length but peak at 8–9 bp. This behavior mirrors the genomic distributions, both with respect to the total number of tracts (Figure 1B) and the subsets flanked by specific-sequence combinations (Supplementary Figure S1A–D). Variation at flanking base pairs also displayed a biphasic pattern centered at a length of 8–9 bp, with a greater chance of variation adjacent to G- than A-tracts and with characteristic sequence preferences.

Long tracts are evolutionarily conserved and associated with high transcription

To assess whether more variable monosatellite runs (Figure 1C) might have undergone a greater reduction in number in extant humans relative to extinct hominids, we compared the number of A- and G-tracts between syntenic regions of five individuals comprising hg19 and three Neanderthal (HN) specimens (31). The difference between hg19 and HN was very small (<±2%) for the short tracts, but it displayed more negative values in hg19 with increasing tract length, which reached a maximum of −11.8 and −32.7% for A- and G-tracts, respectively, of length 9. Beyond this threshold, the numbers of tracts converged for A-tracts, whereas they were more abundant in hg19 for G-tracts >11 bp (Figure 1F). In summary, the largest difference in the number of mononucleotide runs between hg19 and HN sequences was centered at 9 bp for both A- and G-tracts, suggesting that the length distributions (Figure 1A and Supplementary Figure S1A and B) reflect distinct rates of evolutionary gains and losses due to differential sequence mutability (Figure 1C) as a function of length and flanking sequence composition (12).

The fact that long (>9 bp) mononucleotide runs display low variability in the human population (Figure 1C) and sequence conservation during evolutionary divergence (Figure 1F) raises the possibility that they might serve functional roles. Through gene enrichment analyses, we found that genes containing A- and G-tracts were enriched for genes associated with the term ‘UCSC_TFBS’, which pertains to transcripts harboring frequent transcription factor binding sites (32,33). For A-tract-containing genes, the median P-values for the top 11 UCSC_TFBS terms decreased from 2.95E-26 for tracts of length 4 to 5.22E-241 for tracts of length 13 (Figure 1G). The percent of A-tracts intersecting genomic fragments amplified from chromatin immunoprecipitation using transcription-factor binding antibodies (32,33) also increased from 8.7 to 9.9 from length 6 to 13, whereas it was constant (mean ± SD, 22.4 ± 1.1) for G-tracts (Figure1G). For gene classes excluding ‘UCSC_TFBS’, a search for categories enriched at P < 0.05 and common to all A- and G-tract-containing genes returned a set of 25 terms, 22 of which were associated with high levels of tissue-specific gene expression (Figure 1H). In summary, these analyses extend prior work (14) supporting a role for mononucleotide tracts in enhancing gene expression, a function that for A-tracts appears to increase with increasing tract length.

Repeat variability is highly skewed

Next we addressed whether bp along A- and G-tracts display equal probability and type of variation. In the 1KGP dataset, the number of SNVs at each position along both A- and G-tracts of length 4 was within a two-fold difference (144,000–240,000); for both types of sequence, transitions (i.e. A→G and G→A) were the predominant (51–78%) type of base substitution (Supplementary Figure S2A and B). However, with increasing length, the number of SNVs decreased up to 30-fold more drastically for G-tracts than for A-tracts, with increasing numbers of transversions (A→T and G→C|T) being predominant. Normalizing the data for the number of tracts genome-wide revealed that the extent of SNV varied by up to 10-fold, depending upon tract length and bp position. Specifically, the highest degree of variation was observed at the first and last A within the A-tracts (i.e. A1 and An), which underwent up to 61% A→T and 43% A→C transversions, respectively, at length 9 (Figure 2A). Likewise, for G-tracts, the most polymorphic sites were G3, followed by G2, for mid-size tracts of 8–10 bp, with 44% G→C transversions at G3 for tracts of length 8 (Figure2B). Thus, the extent of SNV at mononucleotide runs is grossly skewed in human genomes, both along the sequence itself and across tract length, which must account for the bell-shape behavior in F SNV for the tracts as a whole (Figure 1C).

View larger version:

Figure 2.

Population variation spectra. (A) Variation spectra of A-tracts. Percent (number of SNVs at each position divided by the number of tracts in hg19 × 100) of A→T (black), A→C (red) and A→G (green) SNVs in the 1KGP dataset (left). Percent SNVs at A1 as a function of tract length (right). (B) Variation spectra of G-tracts. As in panel A with G→T (black), G→C (red) and G→A (cyan) (left). Percent SNVs at G3 as a function of tract length (right). (C) Percent A→T, A→C and A→G transitions at each position along A-tracts (stars) preceded and followed by a T (TA[n]T, left), C (CA[n]C), center) and G (GA[n]G, right) as a function of tract length. (D) Percent G→T, G→C and G→A transitions at each position along G-tracts (stars) preceded and followed by a T (TG[n]T, left), C (CG[n]C), center) and A (AG[n]A, right) as a function of tract length. (E) Percent transitions at base pairs (stars) preceding or following A-tracts (left) and G-tracts (right) as a function of tract length (n). *, mutated position.

We assessed whether SNV hypervariability was associated with specific combinations of nearest neighbors. For A-tracts flanked 5′ by a T, C or G, the highest percentage of SNVs was observed at A1 when preceded by a T, which reached 7.9% for TA[n] tracts of length 9 (Supplementary Figure S2C). By contrast, for 3′ T, C or G, the greatest effect was elicited by a C, with the highest percentage (7.1%) of SNVs at An for A[n]C tracts of length 9 (Supplementary Figure S2D). Therefore, flanking base pairs play a critical role both in the spectra and frequencies of SNVs at A-tracts. More detailed plots along A-tracts either preceded (Supplementary Figure S2E), followed (Supplementary Figure S2F) or preceded and followed (Figure 2C) by a T, C or G revealed the dramatic and long-range (up to 9–10 bp for the longest tracts, higher than the value of 4 bp predicted by mathematical models of slippage (11)) influence of flanking base pairs on variation spectra, in which up to 95% of the changes were in the direction of the base flanking the tract. Because the number of A-tracts preceded or followed by a specific base varies by up to three-fold (Supplementary Figure S2G), we conclude that for A-tracts, the overall mutation fractions and spectra are the result of at least three variables; length, position along the tract, and base composition of the 5′ and 3′ nearest-neighbors.

For G-tracts flanked 5′ by a T, C or A, high percentages (10–12%) of SNVs were observed at G1 for tracts preceded by a C, an effect that decreased with increasing tract length (Supplementary Figure S3A). This result, together with an exceedingly low number of G→A transitions at G1 for tracts not preceded by a C (Supplementary Figure S3C) relative to all tracts (Supplementary Figure S2B), is consistent with the known high mutability of CG:CG dinucleotides as a result of cytosine-5 methylation (9). The hypermutability at G2 was observed preferentially for tracts preceded by an A, and to a lesser extent T, whereas that at G3 was insensitive to flanking sequence composition. Likewise, G-tracts flanked 3′ by a T, C or A did not display marked sequence-dependent effects (Supplementary Figure S3B). Detailed plots of the SNV spectra along G-tracts either preceded (Supplementary Figure S3D), followed (Supplementary Figure S3E), or preceded and followed (Figure 2D) by a T, C or A revealed a noticeable effect only for 5′ T in association with G→T substitutions at G1for tracts of length ≥8. Thus, despite a consistent over-representation of G-tracts flanked 5′ by a T (Supplementary Figures S3F and S1B), which must account for the high absolute number of SNVs at G1 for TG[n] relative to AG[n] and CG[n] (Supplementary Figure S3G), nearest-neighbor base composition seems to play a lesser role in SNV spectra at G-tracts than at A-tracts.

With respect to SNVs at the flanking 5′ and 3′ nearest positions, no B→A or H→G substitutions (Figure 1A) were found above a length threshold of 9 for A-tracts and 8 for G-tracts (Figure 2E, gray shading) out of 5969 SNVs, implying that tract expansion by recruiting flanking base pairs is disfavored at these lengths. In summary, base substitution along mononucleotide repeats is strongly skewed towards the edges of A-tracts and within the 5′ half of G-tracts, with frequencies that peak at midsize lengths (8–9 bp). For A-tracts ≥7 bp, base substitution occurred almost exclusively in the direction of the flanking nearest-neighbors. Finally, base substitution at flanking bases did not contribute to tract expansion for mononucleotide runs longer than 8–9 bp.

Insertions and deletions display length and positional preference

In addition to SNVs, mononucleotide runs are polymorphic in length as a result of indels. Herein, we consider separately two types of indels: one in which tract length changes by ±1 and flanking bp composition is not altered (slippage); the other comprising all other cases involving the addition or removal of 1–200 bp (indels). Slippage is a widely accepted mutational mechanism (1112,34), whereby DNA replication errors at reiterated DNA motifs cause changes in the number of motifs (most often +/−1). The normalized fractions of slippage in the 1KGP dataset peaked at lengths of 8 bp for A-tracts and 9 bp for G-tracts (Figure 3A), generating bell-shaped curves similar to those observed for SNVs (Figure1C) and with no differences in the highest fraction of ‘slipped’ tracts, which peaked at ∼0.02. By contrast, +1 slippage occurred more frequently than −1 slippage at A-tracts (Figure 3B). These results support recent studies on microsatellite repeats (12) and contrast with previous conclusions that slippage increases monotonically with tract length, and that the extent of slippage differs between A- and G-tracts (35,36).

View larger version:

Figure 3.

Population insertions and deletions. (A) Normalized fractions of A-tracts (closed circles) and G-tracts (open circles) displaying +/−1 bp slippage in the 1KGP dataset as a function of tract length. Data were obtained by dividing the number of events by both the number of hg19 tracts and tract length (n). (B) Ratio of the number of +1 to −1 slippage for A-tracts (closed circles) and G-tracts (open circles). (C) Indels at A-tracts. For positions along the tracts (‘Tract’), ‘F Indel’ is the ratio between the number of indels and the number of tracts in hg19 multiplied by tract length. For the positions immediately flanking the tracts genomic coordinates (‘Before tract’ and ‘After tract’), ‘F Indel’ is the ratio between the number of indels and the number of tracts in hg19. (D) Indels at G-tracts, calculated as described in panel C. (E) Heatmap representation of insertions along A-tracts. The percent insertions (i.e. the number of insertions at each position divided by the number of tracts in hg19) (y-axis) plotted as a function of location (x-axis) from position 0 (insertion between the bp 5′ to the tract and the first bp of the tract) to position n + 1 (insertion between the bp 3′ to the last bp of the tract and the following bp) (see Figure 1A) and as a function of tract length (z-axis). (F) Heatmap representation of insertions along G-tracts.

With respect to indels, the normalized fractions were low (<1 × 10−3) along short (4–6 bp) A- and G-tracts, but rose to a plateau for longer tracts as reported earlier (11); this plateau was 10-fold higher for G-tracts (∼0.03) than for A-tracts (∼0.003) (Figure 3C and D). Indels also occurred more frequently (up to six-fold for A-tracts of length 11) at nearest-neighboring base pairs (‘Before tract’ and ‘After tract’ in Figure 3C and D) than along the tracts. Thus, contrary to SNVs and slippage, indels increased to a plateau with mononucleotide tract length.

We analyzed in detail the locations of insertions along the tracts and the flanking positions with respect to the 5′ to 3′ orientation of the tracts (Figure 1A). The normalized fractions demonstrated that insertions peaked at the 3′, and to a lesser extent 5′, ends of the longest A-tracts (Figure 3E), but remained low. For G-tracts, insertions occurred most efficiently at two locations (G2–3 and G5) (Figure 3F), they increased with tract length (up to ∼0.04), and attained ∼10-fold higher values than for A-tracts. In conclusion, insertion sites at A- and G-tracts followed the patterns observed for SNVs (Figure 2A and B), suggesting that factors associated with local DNA dynamics sensitize specific bases along the tracts to genetic alteration, inducing both SBS and indels.

Base pair flexibility and charge localization map to sites of sequence changes

To elucidate elements of intrinsic DNA dynamics that may be responsible for the biases in SNV and insertion sites, we performed molecular dynamics (MD) and hybrid quantum mechanics/molecular mechanics (QM/MM) simulations on model A[6], A[9], G[6] and G[9] duplex DNA fragments. We focused on water bridge coordination (Figure 4A), bp step flexibility, and for the G[6] and G[9], charge localization, as these properties are known to impact the susceptibility of DNA to base damage, repair and mutation. The fractions of one water coordination increased along the A[9] and A[6] structures in a 5′ to 3′ direction, irrespective of flanking sequence composition, in concert with a decrease in minor groove width (Figure 4B and Supplementary Figure S4A) as predicted (37). Vstep, a measure of bp structural fluctuation, displayed a prominent peak of ∼40 Å3deg3 at the 5′-TA-3′ step for both structures (Figure 4C and Supplementary Figure S4B), which together with low water occupancy points to 5′-TA-3′ being a preferred location for base modification and mutation. In the G[9] and G[6] structures water coordination involved mostly two-water bridges due to wide (∼14 Å) minor grooves (Figure 4Dand Supplementary Figure S4C), whereas flexibility was modest (∼20–22 Å3deg3, Figure 4E and Supplementary Figure S4D). Thus, bp dynamics are likely to impact mutations at A-tracts to a greater extent than at G-tracts. Guanine has the lowest ionization potential (IP) of all four bases and IP further decreases at guanine runs, rendering them targets for electron loss, charge localization, oxidation and eventually mutation (4,38). Because after electron loss the ensuing charge (hole) can migrate along the DNA double-helix and relocalize at specific guanines, we addressed whether the preferred sites of mutation along G-tracts, i.e. G2–3 and G5, would also be preferred sites for charge localization. The QM/MM determinations indicated that whereas for the short G[6] fragment the difference in the density-derived atomic partial charges (DDAPC) (i.e. the hole) localized most often (∼50%) to the first position (Figure 4F), for the long G[9] fragment charge localization shifted downstream (mostly to the second, but also to positions 6–7, Figure 4G). Importantly, the charge was found exclusively around the guanine rings (Figure 4H). Thus, the two main sites of sequence change along G-tracts, i.e. G2–3 and G5, coincide with positions where charge localization and hence one-electron oxidation reactions is predicted to occur most frequently. In summary, bp flexibility at A-tracts and charge transfer at G-tracts likely represent intrinsic DNA features underlying the bias in SNV and insertions at mononucleotide runs in human genomes.

View larger version:

Figure 4.

MD and QM/MM simulations. (A) Molecular modeling of one (left) and two (right) minor groove water bridge coordination. (B) Fraction of one-water bridge occupancy (left axis) at A[9] DNA sequences flanked 5′ and 3′ by a T (black circles), C (red circles) or G (green circles). Minor groove widths (right axis), as determined from intrastrand phosphate-to-phosphate distances. (C) Vstep for A[9] DNA sequences, determined as the product of the square root of the eigenvalues (λi) described by the six bp step parameters shift, slide, rise, tilt, roll and twist; i.e. Vstep=6i=1λi−−√. (D) Fraction of one- (black circles) and two-water (red circles) bridge occupancy (left axis) at G[9] DNA sequences. Minor groove widths (right axis), as assessed from intrastrand phosphate-to-phosphate distances. (E) Vstep for G9 DNA sequences. (F) Average charge redistribution (open circles and right axis) for G[6] DNA structures upon vertical ionization, examined by calculating the difference on the density-derived atomic partial charges (DDAPC) for the neutral and negatively charged states. Histogram of the number of instances (left axis) in which the largest charge redistribution occurred at a specific position along the G[6] structures. (G) DDAPC for G[9] DNA structures (open circles and right axis) and histogram of the number of instances (left axis) in which the largest charge redistribution occurred at a specific position. (H) VMD rendering of a G[9] DNA structure displaying hole localization at G2. Capped base pairs were removed for clarity.

Position and orientation along nucleosome core particles modulate sequence variation

DNA wrapped around histones in nucleosomes is subject to local deformation (39), which may impact mutation. Thus, we analyzed the 1KGP SNVs at A- and G-tracts predicted to overlap with well-positioned nucleosome core particles (NCPs) (16). In hg19, the percentage of tracts that overlap with NCPs decreased moderately from ∼90% at length of 4 to 81% and 71% for A- and G-tracts of length 13, respectively (Figure 5A), suggesting that mononucleotide runs are not depleted in NCPs in human genomes as previously proposed (40). A-tracts of lengths 4–8 base pairs displayed distinctive peaks along the NCP surface in phase with the helical repeat of DNA (10.5 bp) and with minor grooves facing toward the inner protein core (lengths 4–5) (16) (Figure 5B and Supplementary Figure S5A). A-tracts of length of 9–13 bp exhibited only half (six) the peaks evident for the shorter tracts. For the G-tracts, only small peaks with no clear minor groove-inward-facing regions were detected (Supplementary Figure S5B).

View larger version:

Figure 5.

Positioning along nucleosome core particles. (A) Percent of A-tract (open circles) and G-tract (closed circles) base pairs in hg19 overlapping with well-positioned NCP genomic coordinates as a function of tract length. (B) Counts of base pairs in hg19 A-tracts of length 5 overlapping with NCPs genomic regions as a function of distance from the histone octamer dyad axis. Minor groove-inward-facing regions (gray) were derived from the X-ray crystal structure of NCP147 (41). (C) Percent SNVs in the 1KGP dataset (left axis) at every bp along A-tracts of length 5 for tracts centered at maxima (black) and minima (gray) along NCPs (Figure 5B). Percent increase (right axis) of SNVs at minima relative to maxima (green). P-values for paired t-tests: 0.013 (*), 0.002 (**) and 4.7 × 10−6 (***). (D) Whisker plots of%SNVs (left axis) at A1 for A-tracts of length 5 centered at maxima and minima (black) along NCPs (Figure 5B). Percent difference (right axis) in the number of A-tracts of length 5 in hg19 preceded by C, T or G (red) between those centered at minima and those centered at maxima (Figure5B). (E) C-containing/G-containing ratios (see text) for G-tracts of length 5 in hg19 as a function of distance from the NCP dyad axis (black) and location of core histones (maroon and green). Peaks correspond to negative iSAT (i.e. tilt parameters multiplied by the corresponding sin θ) values (gray) (39). Ratios of%SNV at G1 (upshifted by 0.5 for clarity) between C-containing (5′-CCCCCG-3′ sequences on the hg19 forward strand) and G-containing (5′-CGGGGG-3′ sequences on the hg19 forward strand) (Figure 1A) CG[5] tracts mapping NCP Chip-seq genomic intervals (red) fitted by a non-parametric local regression (loess; sampling proportion, 0.100; polynomial degree, 3). (F) VMD rendering (top) of TATTT residues 34–38 (yellow) and the complementary AAATA residues 672–753 (pink) from the 1EQZ pdb nucleosomal crystal structure, corresponding to peak area from −40 to −36 in Figure 5E. The switch in G-tract (lengths of 5 and 7) orientation along NCPs (bottom) serves to position the C-containing strand on the outside (yellow) and, correspondingly, the G-containing strand on the inside (pink).

To assess if tract-positioning along NCPs influences SNVs, we selected A-tracts of lengths 5, 7 and 9 bp and G-tracts of lengths 5 and 7 bp whose central positions coincided with either the maxima or minima (41) (Figure 5B and Supplementary Figure S5A and B) and conducted pair-wiset-tests (330 total) between permutations of ‘categories’, including ‘tracts centered at maxima versus minima’, ‘position along the tracts’, ‘flanking sequence composition’, ‘specific NCP locations’ and ‘tract orientation’. For A-tracts, 79/207 (38%) significant pairs were found, 68 (86%) of which were related to differences between tracts centered at maxima versus minima, with a preponderance (63%) of tests displaying increased %SNVs at minima (Supplementary Figure S5C and E). For example, %SNVs at length 5 bp were greater at minima than at maxima at each position along the A-tracts (Figure 5C). A→C substitutions at A1 were more abundant at maxima than at minima (mean ± SD, 18.7 ± 0.7% at max and 17.6 ± 0.8% at min; P-value 0.001), whereas A→T substitutions at the same position displayed the opposite trend (mean ± SD, 18.4 ± 0.5% at max and 19.8 ± 1.1% at min; P-value 0.0005) (Figure 5D). A-tracts of length 7 also exhibited a similar pattern at A7 (Supplementary Figure S5H). The percentages of CA[5] and A[7]C tracts in hg19 centered at maxima were greater than at minima and the reverse was observed for the TA[5] and A[7T] tracts (Figure 5D and Supplementary Figure S5H). Thus, we conclude that positioning along the NCP surface of both the double-helical grooves and junctions with flanking base pairs influence SNVs along A-tracts. However, this influence is complex and for the most part, difficult to predict.

For G-tracts, most pairwise comparisons (18/34, 53%) indicated SNV variation according to sequence orientation (Supplementary Figure S5F and G). In hg19, the ratio of the numbers of G-tracts of lengths 5 and 7 for which the C-containing strand coincided with the forward sequence (downstream example sequence in Figure 1A) to the numbers of G-tracts for which the G-containing strand coincided with the forward sequence (upstream example sequence in Figure 1A) (C-containing/G-containing ratios) displayed a prominent 10.5-bp oscillation in phase with iSAT (Figure 5E), a measure of ‘inside’ and ‘outside’ bases, according to the bp step tilt parameter (39). Analysis of the helical path of a 146-bp DNA fragment wrapped around histones showed that the oscillation in the C-containing/G-containing ratios corresponds to a preference for guanine bases to face the protein core (Figure 5F). We analyzed the subset of G-tracts preceded by a 5′ C (i.e. CG[5]) to assess whether SNVs at G1, the position known to be mutable due to CpG methylation also oscillated with the C-containing/G-containing ratios. Oscillation in SNV-C-containing/SNV-G-containing values was evident, with peaks aligning to the hg19 troughs (Figure 5E) implying that the cytosines facing the protein surface harbor more variants than those facing away. We conclude that A- and G-tracts display preferential positioning (the former) and orientation (the latter) along NCPs, which in turn modulate the rate of sequence variation.

Mutations associated with human disease

Knowing that the first and last As of long A-tracts and G2–3 in G-tracts are the major sites of SNV in the human population, we addressed whether these features are also discernible in mutated mononucleotide tracts associated with human genetic disease. We collected 9,450,456 unique SBSs (both SBSs and SNVs refer to single base changes) from sequenced cancer genomes and normalized the percent mutations along A- and G-tracts to enable a direct comparison with the 1KGP dataset. For A-tracts (Figure 6A and Supplementary Figure S6A), SBSs displayed the same trend as the 1KGP data (Figure 2A) with respect to the bell-shape increase in mutations at A1 and An and the mutation spectra, although the susceptibility to mutation as a function of tract length attained greater values (6.36% for length 11 in cancer versus 4.15% for length 9 in the 1KGP datasets at A1). The first and last 3 bp also harbored more SBSs than in the 1KGP dataset for tracts >7 bp, a feature that we found to be due exclusively to a large cancer dataset (42) containing high-level microsatellite instability (MSI) samples (Supplementary Figure S6B and C), which are known to result from mismatch-repair deficiency (15). Thus, A-tracts display similar patterns of base substitution between the germline and somatic cancer tissues. For G-tracts, mutation spectra were characterized by G→T transversions at tract lengths >7, particularly at G1, the most frequently mutated position for tracts lengths up to 11 bp (Figure 6B and Supplementary Figure S6D). This trend persisted even when the high rates of methylation-mediated deamination mutations at the CG dinucleotide were removed (Supplementary Figure S6E). Thus, mutation patterns in cancer genomes contrast with those observed in the germline, both with respect to the most mutable position (G1 versus G2–3) and the types of base substitution (G→T in cancer genomes versus G→T and G→C in the germline).

View larger version:

Figure 6.

Mutation patterns in cancer genomes. (A) Mutation spectra for SBSs at A-tracts. Percent values were obtained by dividing the total number of SBSs at each position by the number of tracts in hg19 and then multiplying by 3.2516 to equalize the percentage of A-tracts of length 4 between the cancer genomes and the 1KGP datasets. (B) Mutation spectra for SBSs at G-tracts in cancer genomes. Percent values were obtained as in (A) using a multiplication factor of 3.7419. (C) Normalized fractions of A-tracts (closed circles) and G-tracts (open circles) displaying +/−1 bp slippage, obtained by dividing the number of events by both the number of tracts in hg19 and tract length. (D) Indels at A-tracts, calculated as described in Figure 3C. (E) Indels at G-tracts, calculated as described in Figure3C. (F) Heatmap representation of insertions along G-tracts, as described in Figure 3E.

With respect to slippage, the fractions for A-tracts elicited an excess at lengths 9 and 10 bp relative to the 1KGP dataset, which was also due to the MSI-containing dataset. For G-tracts, the fractions peaked at length 8, as for the 1KGP dataset (Figures 3A and 6C), implying that the propensity to undergo slippage is indistinguishable between the germline and soma. Indels were also more abundant at flanking base pairs than along the tracts (Figure 6D and E), particularly for G-tracts of length >7, similar to the 1KGP dataset (Figure 3C and D). Detailed analyses of insertions revealed that both G1 and the preceding position were the most significant sites of mutation (F-values up to 0.08 at G1 for tracts of length 8) (Figure 6F). Thus, the 5′ end of long G-tracts is the most susceptible site for both SBSs and insertions in cancer genomes, in contrast to the germline where these occur within the runs, typically at G2–3.

We also extracted the mutated A- and G-tracts from the Human Gene Mutation Database (HGMD), a collection of >150,000 germline gene mutations associated with human inherited disease. A total of 1519 genes were mutated at A- or G-tracts out of a total of 3972 (38%); 3480 SBSs and 2866 slippage events were noted within these tracts, 85 and 46% of which were predicted to be disease-causing, respectively (Figure 7A and Supplementary Table S1). Ranking genes by the number of literature reports indicated that among the top 10 entries three were associated with cancer (BRCA1, BRCA2 and APC), two with hemophilia (F8 and F9), four with debilitating lesions of the skin (COL71A), muscle (DMD), lung (CFTR) and kidney (PKD1), with one causing hypercholesterolemia (LDLR) (Figure 7B). Thus, mutations within A- and G-tracts carry a high social burden by contributing to some of the most common human pathological conditions.

View larger version:

Figure 7.

Mutation patterns in HGMD and model for sequence context-dependent changes. (A) Number of germline SBSs and slippage events (Slip.) at A- and G-tracts in HGMD. Gene alterations were classified as disease-causing mutation (DM), likely disease-causing mutation (DM?), disease-associated and putatively functional polymorphism (DFP), disease-associated polymorphism with additional supporting functional evidence (DP) and invitro/laboratory orinvivo functional polymorphism (FP). Codon changes (SIFT predictor) were classified as damaging (d), null (n), tolerated (t) and low-confidence prediction (l). (B) The 10 most commonly reported genes in HGMD with mutations at A- and G-tracts. Various mutated tracts were generally reported for the same gene in different reports. (C) Mutation spectra for SBSs at A- (left) and G-tracts (right) in HGMD. Percent values were obtained by dividing the total number of SBSs at each position by the number of tracts in hg19 exons. A|G→T (black), A|G→C (red), A→G (green), G→A (cyan). (D) Normalized fractions of A-tracts (closed circles) and G-tracts (open circles) displaying +/−1 bp slippage, obtained by dividing the total number of events by the number of tracts in hg19 exons and by tract length. (E) Model for sequence context-dependent changes at A-tracts (left) and G-tracts (right). *, site of base modification.

For both A- and G-tracts, SBSs occurred mostly at tract lengths of 4–7, with patterns more similar to those in the 1KGP than in the cancer datasets, both with respect to the location of the most mutable positions (first and last As and first/second Gs) and the types of base substitution (A→T and G→H) (Figure 7C and Supplementary Figure S6F). Likewise, slippage events peaked at tract lengths of 7–9 as observed in the 1KGP dataset (Figure 7D). In summary, the patterns of both SBSs and slippage in the HGMD dataset followed the trend observed in the 1KGP dataset, suggesting that germline variants at mononucleotide repeats leading to either population variation or human inherited disease may have arisen through similar mechanisms.
DISCUSSION

Why are specific A:T and G:C base pairs within A- and G-tracts more susceptible to sequence changes than their identical neighbors? For A-tracts, bp flexibility may play a role. Chemical damage to DNA, such as by hydroxyl radicals has been shown to be proportional to the geometrical solvent-accessible surface of the atomic groups, which increases with DNA flexibility (43). Along A-tracts flexibility is restricted, but it is high at both the 5′ and 3′ junctions. Thus, the fact that the highest rates of mutation coincide with the highest degree of flexibility at the 5′-TA-3′ bp step is consistent with the view that this position may be susceptible to DNA damage as a result of flexibility. Other sources of DNA dynamics are also likely to be relevant, such as sugar flexibility at the junctions, which increases with tract length (44). Chemical modification at these junctions may then lead to base substitution and indels, the latter as a result of strand breaks.

With respect to SNV mutation spectra, these were found mostly in the direction of flanking base composition above a length of 7–8 bp. We interpret this behavior in terms of DNA slippage along A-tracts when attempts are made during translesion synthesis (TLS) to bypass a damaged site (Figure 7Ei). Two scenarios may be considered to account for A→T transitions at A1. In the first, the last tract-template base would loop out into the polymerase active site permitting base-pairing and strand elongation (Figure 7Eii) using the tract-flanking base as a template (34,4546). In the second (Figure 7Eiii), slippage would occur behind the polymerase, prompting extension past the newly created A*:T mispair generated by primer/template misalignment. Either pathway would yield a common intermediate (Figure 7Eiv) that contains the base complementary to the junction across from the damaged site upon slippage resolution (34). Following DNA synthesis (S) and/or repair (R) (Figure 7Ev and vi), this mispair will generate a base change that is always identical to the tract-flanking base.

For G-tracts, the high rates of G→T transversions at G1 in cancer genomes are also consistent with preferred chemical attack at this site due to high flexibility (Figure 7F top). Direct chemical attack at a guanine is known to result in stable products, such as 8-oxo-G and Fapy-G, both of which are known to yield G→T transversions (4750). Thus, G1 may be the most susceptible site for such reactions for G-tracts of lengths ≥7 (Figure 7Fright), which in cancer genomes would become a mutation hotspot. In the germline, SNVs peaked inside G-tract base pairs, while mutational spectra were insensitive to flanking base composition; these events are inconsistent with a role for template misalignment and slippage as noted for A-tracts. Rather, the correspondence between hotspot mutations at G2–3 and G5 and the QM/MM simulations suggest a role for charge transfer. A large body of work during the past 20 years using computational, theoretical chemistry and biophysical techniques on short oligonucleotides, has shown that guanine is the most easily oxidizable base in DNA and that indeed a guanine radical cation can be generated through long-range hole transfer from an oxidant via one-electron oxidation mechanisms (5155). GGG triplets were found to act as the most effective traps in hole transfer by both experimental and theoretical work (5659), demonstrating that the resulting guanine radical cation (or its neutral deprotonated form) became rather delocalized, but it preferentially centered at the first and second G. These well-established patterns of chemical reactivity are consistent with our experimental observation of high mutation frequencies at G1 for short G-tracts and the results from QM/MM simulations on G6. For longer tracts, the downstream shift in mutation hotspots, i.e., G2–3 and G5, also correlate well with the charge localization predicted from QM/MM simulations, which explicitly included solvent effects and structural fluctuations. Thus, in conjunction with the constrained density functional theory (60), both the neutral and oxidized forms of a guanine nucleobase can be reliably constructed to infer the accurate determination of mutational patterns of mononucleotide repeats in human genomic DNA.

The compact organization of the sperm genome (61), and presumably low levels of oxidative stress in the germline, may enable guanine oxidization through one-electron oxidation reactions rather than by direct chemical attack, thereby favoring the formation of radical cations. A charge injected at G1 by electron loss would then migrate to neighboring guanines and localize at sites of low IP, such as G2 (Figure 7F left). Guanine radical cations are known to readily undergo further chemical modification leading to products such as 8-oxo-G, oxazolone, imidazolone, guanidinohydantoin, and spiroiminodyhydantoin (62) (M in Figure 7F), to yield G→T, G→C and G→A substitutions (4,63). Our model is in line with recent observations in which mutations at guanines within short G-runs (1–4 bp) correlate with sequence-dependent IPs at the target guanine in cancer genomes (9). Interestingly, these correlations were not observed in the germline (9). We interpret these composite observations as follows. The IP values for G-runs have been shown to decrease asymptotically with tract length, although the absolute values vary according to the methods and assumptions used (we obtained a value of 5.43 eV for both G[6] and G[9]) (64,65). We suggest that short G-runs with high IPs undergo one-electron oxidation reactions in the oxidative environment of cancer cells but would be refractory to such a mechanism in the germline (Figure 7Fright yellow and left white sectors). As length increases and IP values fall, G-runs would be attacked directly by oxidants abundant in tumor cells (Figure 7F orange sector), whereas oxidation will be limited to electron loss in the germline environment (Figure 7F left yellow sector).

These models (template misalignment for A-tracts and charge transfer for G-tracts) suggest a more complex scenario for mechanisms underlying mononucleotide repeat polymorphism in the human population than recently proposed (13), in which nucleotide misincorporation by error-prone polymerases is proposed as a primary source of mutations at both A- and G-tracts. As already stated, the directionality of SNVs toward tract-flanking bases in A-tracts and the hotspot mutations at G2–3, supports multiple and distinct mechanisms of base substitution at mononucleotide repeats.

Our analyses highlight additional information, including the lack of mutations in the direction of tract-base composition for base pairs flanking long tracts, the association with gene expression and the preference of guanines for the inner NCP surface, and extend prior observations (12) such as the bell-shape character of base substitution and slippage, whose mechanisms remain to be fully clarified. Finally, we document the contribution of mononucleotide mutagenesis to key aspects of human pathology beyond the well-established MSI instability in cancer (15), including hemophilia and tissue degeneration. Our collective work supports the conclusion that as the human genome undergoes evolutionary diversification and along the way suffers disease-associated mutations, oxidation reactions including charge transfer may play a prominent role.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

Mutation analyses and prenatal diagnosis in families of X-linked severe combined immunodeficiency caused by IL2Rγ gene novel mutation

Genet. Mol. Res. 14 (2): 6164 – 6172   DOI: 10.4238/2015.June.9.2
Severe combined immunodeficiency diseases (SCIDs) are a group of primary immunodeficiency diseases characterized by a severe lack of T cells (or T cell dysfunction) caused by various gene abnormalities and accompanied by B cell dysfunction (WHO, 1992; Buckley et al., 1997). The incidence rates in infants were 1/75,000-1/10,0000 (WHO, 1992), but no morbidity statistics are available in China. The 2 genetic modes of SCID include X-linked recessive and autosomal recessive genetic inheritance. X-linked severe combined immunodeficiency (X-SCID) is the most common form, accounting for 50-60% of SCID cases (Noguchi et al., 1993). Immune system abnormalities in patients with X-SCID include T-B+NK-, in which T cells (CD3+) and natural killer (NK) cells (CD16+/CD56+) are absent or significantly reduced, and the number of B cells (CD19+) is normal or increased, causing reduced immunoglobulin production and class switching disorder (Buckley, 2004; Fischer et al., 2005). The IL- 2Rg gene mutation has been confirmed to be a major cause of X-SCID (Noguchi et al., 1993). In recent years, great progress has been made in understanding the pathogenesis of primary immunodeficiency disease and its application in clinical treatment, particularly regarding the development of critical care medicine and immune reconstruction technology. With timely control of infection and early bone marrow or stem cell transplantation, X-SCID patients can be treated, prolonging survival time. Therefore, early diagnosis of X-SCID is very important for patient treatment. Gene diagnosis has become a better early diagnosis or differential diagnosis method. In addition, familial X-SCID brings a great psychological burden to the relatives of patients. Ordinary chromosome analysis and immunological evaluation cannot be used for female carrier identification and fetal diagnosis, and gene diagnosis is the most effective method of carrier detection and prenatal diagnosis. In this study, we detected mutations in 2 families with X-SCID and identified 2 novel mutations, confirming the X-SCID pedigrees. Prenatal diagnosis was performed for the pregnant fetus in the mother of one of the probands based on gene diagnosis. Female individuals in this family were subjected to carrier detection.
IL2Rg gene mutation test Direct sequencing of 1-8 exons and the flanking region of the IL2Rg gene by PCR in family 1 showed that the 3rd exon of the proband contained the c.361-363delGAG heterozygous deletion mutation, which led to deletion of the 121st amino acid glutamate (p.E121del) in its coding product. There were no sequence variations in other coding regions or in the shear zone. The proband’s mother carried the same heterozygous mutation, while his father did not carry the mutation site (Figure 2a, b, c). This mutation was not observed in any cases of the control group, and this family was identified as an X-SCID family. The c.510-511insGAACT insertion heterozygous mutation was present in the 4th exon of the proband’s mother in family 2. This mutation was a 5-base repeat of GAACT, resulting in a change in amino acid 173 from tryptophan into a stop codon (p.W173X). While there were no sequence variations in other coding regions or in the shear zone, the patient’s father did not carry the mutation (see Figure 2d, e). We did not find this mutation in the healthy control group. We presumed that the 4th exon of the deceased child in family 2 contained the c.510-511insGAACT insertion mutation, leading to X-SCID symptoms, and thus we speculated that this family was an X-SCID pedigree. Prenatal diagnosis We verified the chorionic villus status of the fetus in family 1 using the PowerPlex 16 HS System kit. The results of prenatal diagnosis showed that the fetal tissue contained no maternal contamination and that this fetus was female. The results of prenatal diagnosis showed that there was no c.361-363delGAG (p.E121del) heterozygous mutation in the female fetus of family 1.
Figure 2. Sequencing graph of IL2Rg gene in 2 pedigrees with X-chain severe combined immunodeficiency. a.-c. Family 1. a. Normal control (rectangle indicates 3 edentulous bases of this patient). b. Proband carrying the c.361- 363delGAG (p.E121del) mutation (arrow indicates deletion of fragment connection sites). c. The proband’s mother contained a c.361-363delGAG (p.E121del) heterozygous mutation (arrow). d.-e. Family 2. d. The proband’s mother carried the c.510-511insGAACT (p.W173X) heterozygous mutation (arrow indicates that the reverse sequencing graph was positive). e. Normal control (rectangular box indicates 2 normal copies of GAACT (the mutation fragment was 3 copies). Carrier detection results For the c.361-363delGAG (p.E121del) site, the gene analysis results of the female individual in family 1 showed that I2 (proband’s grandmother) was a heterozygous carrier and that II3 (proband’s aunt) was a non-carrier and had no mutations.
IL-2 can combine with the IL-2 receptor (IL-2R) of the immune cell membrane. IL-2R is composed of 3 subunits, including the IL-2Ra chain (CD25), IL-2Rb chain (CD122), and IL- 2Rg chain (CD132). IL-2Rg functional units in common with IL-4, IL-7, IL-9, IL-15, IL-21, and other cytokine receptors, and these regions are referred to as the total chain (Li et al., 2000). The IL-2Rg chain can maintain the integrity of the IL-2R complex and is required for the internalization of the IL-2/IL-2R complex; it is also the link that contacts the cell membrane surface factor region and downstream cell signal transduction molecules. Therefore, the integrity of the IL-2Rg chain is vital for the immune function of an organism (Malka et al., 2008; Shi et al., 2009).
Mutations in the IL2Rg gene, which encodes IL-2Rg, were identified to be a major cause of X-SCID in 1993 (Noguchi et al., 1993). The IL2Rg gene is located on chromosome X q21.3-22, is 37.5 kb length, and contains 8 exons, which encode 369 IL-2Rg amino acids. The IL2Rg chain exhibits varying structural regions, such as the signal peptide [amino acids (AA) 1-22], extracellular domain (AA 23-262), transmembrane region (AA 263-283), and intracellular region (AA 284-369). The WSXWS motif is located in the extracellular region (AA 237-241), while Box 1 is located in the intracellular region (AA 286-294).
By the end of 2013, the Human Gene Mutation Database contained a total of 200 mutations in the IL2Rg gene (HGMD Professional 2013.4). The most common mutation types in the IL2Rg gene were the missense or nonsense mutations, which result from single base changes. A total of 100 missense or nonsense mutations have been identified, followed by insertion or deletion mutations in a total of 50 species. The 3rd most common type of mutations includes shear mutations in approximately 30 species. Eight exons contained mutations, and mutations in 3rd or 4th exons were the highest, accounting for a total mutation rate of 43% (86/200). According to the X-SCID gene database (IL2RGbase) (http://research.nhgri. nih.gov/scid/), the gene mutations in IL2Rg mainly occurred in the extracellular region of the IL2Rg chain (Fugmann et al., 1998). Zhang et al. (2013) reported that the IL2Rg gene mutations in 10 patients with X-SCID in China were located in the extracellular region. Two mutations reported in our study were also located in the extracellular region. The mutation of IL2Rg gene in family 1 was a codon mutation in the 3rd exon, resulting in a 3-base deletion. The c.361-363delGAG (p.E121del) mutation was located in the extracellular area of the IL- 2Rg subunit, and we inferred that the 121 glutamate deletion caused by the mutation would lead to changes in the structure of the peptide chain, affecting signal transmission and resulting in serious symptoms. The mutation of family 2 was a GAACT repeat of ILR2g gene; this repeat of 5 bases resulted in 173 codon changes from tryptophan into a stop codon. Generation of the peptide chain with the mutation lacked 196 amino acids compared to the normal chain, including the intracellular, transmembrane, and some extracellular regions, directly affecting the structure and function of receptors and causing disease. No studies have been reported regarding these 2 mutations. We combined with the mutation characteristics and clinical manifestations and diagnosed family 1 as X-SCID pedigrees. Although the patient in family 2 was deceased, it can be speculated that the 2 deceased patients in family 2 were X-SCID pedigrees caused by c.510-511insGAACT (W173X).
Prenatal diagnosis can accurately identify fetal situations and be used to avoid birth defects, which can also ease the anxiety of the pregnant mother. Gene diagnosis for pedigrees of patients based on DNA samples has advanced recently, particularly with the application of high-throughput sequencing technology (Alsina et al., 2013). We can now perform gene analysis for varied clinical infectious diseases for differential diagnosis. However, the effectiveness of prenatal diagnosis for pedigrees in which the proband is dead remains unclear. Because the gene mutations in the proband is unknown in these cases, the patient’s situation was only inferred by his mother’s genotypes. However, we considered that for the deceased, if we can define the mother was a pathogenic gene carrier, even if the proband is not X-SCID, the woman also has a risk of having X-SCID children and this pedigree may be X-linked recessive inheritance. Prenatal diagnosis may provide a choice for preventing the birth of patients in these families in the premise of informed consent.
Gene diagnosis of IL2Rg can also be used for carrier detection of suspected females in the family.
In the present study, we performed carrier detection of the patient’s grandmother and aunt in family 1 and determined that the patient’s pathogenic mutations were from his grandmother. His aunt did not inherit the pathogenic gene, and thus she was a non-carrier and her fertility will not be affected. In this study, we used direct sequencing of PCR products and identified IL2Rg gene mutations in 2 pedigrees with X-SCID. We found 2 unreported mutations in the IL2Rg gene, and prenatal diagnosis and carrier detection were conducted in 1 X-SCID family. Because the incidence rate of X-SCID is extremely low, it is difficult to promote the widespread use and application of genetic diagnosis. However, this study may provide some implications for the diagnosis of infants with immunodeficiency, and gene diagnosis techniques such as conventional or high-throughput sequencing should be used as soon as possible during pregnancy, which can be used to guide treatment. This method can also provide reliable prenatal diagnosis and carrier detection service for these families.
MEF2A gene mutations and susceptibility to coronary artery disease in the Chinese population
J. Li1 , H.-X. Chen2 , J.-G. Yang3 , W. Li3 , R. Du3 and L. Tian3       DOI http://dx.doi.org/10.4238/2014.October.20.15
Coronary artery disease (CAD) has high morbidity and mortality rates worldwide. Thus, the pathogenesis of CAD has long been the focus of medical studies. Myocyte enhancer factor 2A (MEF2A) was first discovered as a CAD-related gene by Wang (2005) and Wang et al. (2003, 2005). Three mutation points in exon 7 of MEF2A were subsequently identified by Bhagavatula et al. (2004); however, Altshuler and Hirschhorn (2005) and Weng et al. (2005) predicted that the MEF2A gene lacked mutations. Zhou et al. (2006a,b) analyzed the mutations and polymorphisms in exons 7 and 11 of the MEF2A gene in the Han population in Beijing, and various rare mutations were found in exon 11 rather than in exon 7. The clinical significance of specific 21-bp deletions in MEF2A was also explored, and previous studies have shown mixed results. In this study, polymerase chain reaction-singlestrand conformation polymorphism (PCR-SSCP) and DNA sequencing were used to detect exon 11 of the MEF2A gene in samples collected from 210 CAD patients and 190 healthy controls and to investigate the function of the MEF2A gene in CAD pathogenesis and their correlation.
CAD, a common disease in China, is induced by multiple factors, such as genetics, the environment, and lifestyle. Thus, a multi-faceted approach is necessary in the study of CAD pathogenesis, particularly in molecular biology research, which is important for developing comprehensive treatment of CAD based on gene therapy. The MEF2A gene was first identified as a CAD-related gene through linkage analysis of a large family with CAD (9 of 13 patients developed MI) in 2003.
In this study, we found the following mutations: 1) codon 451G/T (147191) heterozygous or homozygous mutation; 2) loss of 1 (Q), 2 (QQ), 3 (QQP), 6 (425QQQQQQ430), and 7 (424QQQQQQQ430) amino acids (147108-147131); and 3) codon 435G/A (147143) heterozygous mutation. Among these mutations, the synonymous mutation at locus 147191 was confirmed by reference to the National Center for Biotechnology Information (NCBI) database to be a single nucleotide polymorphism, which was also demonstrated in our study by the extensive presence of this polymorphism in healthy controls. However, the heterozygous mutation at locus 147143 was only found in the genomes of CAD patients, and was therefore identified as a mutation.
Impact of glucocerebrosidase mutations on motor and nonmotor complications in Parkinson’s disease

Homozygous and compound heterozygous mutations in GBA encoding glucocerebrosidase lead to Gaucher disease (GD). A link between heterozygous GBAmutations and Parkinson’s disease (PD) has been suggested ( Bembi et al., 2003,Goker-Alpan et al., 2004, Halperin et al., 2006, Machaczka et al., 1999, Neudorfer et al., 1996, Tayebi et al., 2001 and Tayebi et al., 2003). In 2009, a 16-center worldwide analysis of GBA revealed that heterozygous GBA mutation carriers have a strong risk of PD ( Sidransky et al., 2009).

In addition, heterozygote GBA mutations not only carry a risk for PD development but also the possibility of some risk burden on the progression of PD clinical course. In cross-sectional analyses of GBA mutations in PD patients, earlier disease onset, increased cognitive impairment, a greater family history of PD, and more frequent pain were reported in patients with mutations, compared with no mutations ( Chahine et al., 2013,Clark et al., 2007, Gan-Or et al., 2008, Kresojevic et al., 2015, Lwin et al., 2004, Malec-Litwinowicz et al., 2014, Mitsui et al., 2009, Neumann et al., 2009, Nichols et al., 2009,Seto-Salvia et al., 2012, Sidransky et al., 2009, Swan and Saunders-Pullman, 2013 and Wang et al., 2012). Recently, a few prospective studies have investigated clinical features of PD with GBA and showed a more rapid progression of motor impairment and cognitive decline in GBA mutation cases than in PD controls ( Beavan et al., 2015, Brockmann et al., 2015 and Winder-Rhodes et al., 2013). However, in terms of motor complications such as wearing-off and dyskinesia, no studies exist in the longitudinal course of PD with GBA mutations.

Here, we conducted a multicenter retrospective cohort analysis, and the data were investigated by survival time analysis to show the impact of GBA mutations on PD clinical course. We also investigated regional cerebral blood flow (rCBF) and cardiac sympathetic nerve degeneration of subjects with GBA mutations, compared with matched PD controls.

3.1. Subjects

Among the 224 eligible PD patients (the subjects were not related to each other), 9 subjects were excluded from the analysis (4 due to multiple system atrophy findings on subsequent brain MRI and 5 because of insufficient clinical information). Therefore, 215 PD patients [female, 52.1%; age, 66.7 ± 10.8 (mean ± standard deviation)] were analyzed. For non-PD healthy controls, 126 patients’ spouses (female, 58.7%; age, 67.3 ± 10.3) without a family history of PD or GD were enrolled.

3.2. GBA mutations and risk ratios for PD

In the PD subjects, we identified 10 nonsynonymous and 2 synonymous GBA variants. Within the nonsynonymous variants, 7 mutations were previously reported in GD [R120W, L444P-A456P-V460 (RecNciI), L444P, D409H, A384D, D380N, and444L(1447-1466 del 20, insTG)] as GD-associated mutations. Three nonsynonymous mutations have never been reported in GD patients [I(-20)V, I489V, and there was one novel mutation (Y11H)].

GD-associated GBA mutations were found in 19 of the 215 (8.8%) PD patients but none in the healthy controls. The risk of PD development relative to these GD-associated mutations was estimated as an OR of 25.1 [95% confidence interval (CI), 1.50–420,p = 0.0001] with 0-cell correction. The nonsynonymous mutations that were not reported in GD patients had no association with PD development (p = 0.506; OR, 1.3; 95% CI, 0.7–2.6) ( Table 1). Four subjects had double mutations. For subsequent analyses, 2 subjects with double mutations of I (-20)V and K466K were adopted to the group of mutations unreported in GD, and 2 subjects with double mutations of R120W and I(-20)V, and of R120W and L336L were adopted to the group of GD-associated mutations.

Table 1.Frequency of glucocerebrosidase gene allele in Parkinson’s disease patients and controls

Allele name PD (n = 215) Controls (n = 126) p Odds ratio (95% CI)
GD-associated mutations
R120W 7a 0 0.050 9.1 (0.5–160.8)
RecNciI (L444P-A456P-V460) 4 0
L444P 4 0
D409H 1 0
A384D 1 0
D380N 1 0
444L(1447-1466 del 20, insTG) 1 0
Subtotal, n (%) 19 (8.8%) 0 (0%) <0.001 25.1 (1.5–419.8)b
Nonsynonymous mutations not reported in GD
I(-20)V 27a 13 0.603 1.3 (0.6–2.5)
I489V 3 0
Y11Hc 0 1
Subtotal, n (%) 30 (14.0%) 14 (11.1%) 0.506 1.3 (0.7–2.6)
Synonymous, n
K466K 2a 1
L336L 1a 0
Allele names refer to the processed protein (excluding the 39-residue signal peptide).

Key: CI, confidence interval; GD, Gaucher disease; PD, Parkinson’s disease.

a Four subjects had double mutations; 2 of I(-20)V and K466K, 1 of I(-20)V and R120W, and 1 of R120W and L336L.
b Odds ratio was calculated by adding 0.5 to each value.
c Novel mutation.
3.3. Clinical features of PD patients by GBA mutation groups

The clinical features of PD patients with GD-associated mutations, those with mutations unreported in GD, and those without mutations are shown in Table 2. In the GD-associated mutation group, females, those with a family history and those with dementia (DSM IV) were significantly more frequent than those in the no-mutation group (p = 0.047, 0.012, and 0.020, respectively). The age of PD onset was lower in patients with GD-associated mutations (55.2 ± 9.9 years ± standard deviation), compared with those without mutations (59.3 ± 11.5), although the statistical difference was not significant. There were no differences in clinical manifestations between subjects with mutations unreported in GD and those without mutations, except for dopamine agonist dosage (p = 0.026) ( Table 2).

Table 2.Epidemiological and clinical features of PD patients with Gaucher disease–associated GBA mutations, those with mutations previously unreported in GD and those without mutations

Variables Total n = 215 Mutation (-) GD-associated mutations

Mutations unreported in GD

167 19a pb 29c pd
Sex Female, n (%) 83 (49.7) 14 (73.7) 0.047 15 (51.7) ns
Age Mean (SD) 67.0 (10.8) 62.2 (10.7) 0.063e 67.5 (11.2) nsf
Disease duration (y) Mean (SD) 7.7 (5.5) 6.9 (4.6) nsf 7.2 (4.9) nsf
Onset age Mean (SD) 59.3 (11.5) 55.2 (9.9) ns 60.3 (11.8) ns
Family history Yes, n (%) 17 (11.0)g 6 (31.6) 0.012 0 (0.0) ns
Dementia (DSM-IV) Yes, n (%) 29 (17.4) 9 (47.4) 0.020 5 (17.2) ns
MMSE Mean (SD) 25.8 (5.4)h 23.3 (7.7) nsf 27.0 (3.4)i nsf
Onset symptom (tremor vs. others) Tremor, n (%) 78 (46.8) 9 (47.4) ns 15 (51.7) ns
Modified H-Y on (<3 vs. ≥3) ≥3, n (%) 82 (49.1) 14 (73.7) 0.042 16 (55.2) ns
UPDRS part 3 Mean (SD) 23.6 (12.2)j 28.5 (13.8) nsf 21.9 (8.7) nsf
Wearing off Yes, n (%) 70 (41.9) 9 (47.4) ns 13 (44.8) ns
Dyskinesia Yes, n (%) 49 (29.3) 8 (42.1) ns 8 (27.6) ns
Mood disorder Yes, n (%) 43 (25.7) 8 (42.1) ns 7 (24.1) ns
Orthostatic hypotension symptom Yes, n (%) 21 (12.6) 5 (26.3) ns 7 (24.1) ns
Psychosis history Yes, n (%) 59 (35.3) 10 (52.6) ns 7 (24.1) ns
ICD history Yes, n (%) 8 (4.8) 1 (5.3) ns 1 (3.4) ns
Stereotactic brain surgery for PD Yes, n (%) 4 (2.4) 0 (0.0) ns 0 (0.0) ns
Agonist LED mg/d Mean (SD) 92.8 (114.2) 72.1 (137.7) nse 163.7 (155.6) 0.026e
Levodopa LED mg/d Mean (SD) 400.7 (184.2) 456.7 (206.9) nsf 369.2 (230.3) nse
Total LED mg/d Mean (SD) 496.4 (233.7) 537.9 (258.9) nsf 525.7 (287.4) nsf
Categorical data were examined by Fisher’s exact test.

Key: DSM-IV, Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition; GBA, glucocerebrosidase gene; GD, Gaucher disease; H-Y, Hoehn and Yahr; ICD, impulse control disorder; LED, levodopa equivalent dose; ns, not significant; MMSE, Mini-Mental State Examination; PD, Parkinson’s disease; SD, standard deviation; UPDRS, Unified Parkinson’s Disease Rating Scale.

a Including a double-mutation subject (with a mutation unreported in GD).
b GD-associated mutations versus mutation (-).
c Two subjects with double mutation, including GD-associated mutations, were assigned to GD-associated mutation group.
d Other mutations versus mutation (-).
e Examined by Student t test after Levene’s test for equality of variances.
f Examined by Mann-Whitney U-test because of non-Gaussian distribution.
g    n = 155 due to 10 missing data.
h    n = 164 due to 3 missing data.
i     n = 28 due to 1 missing datum.
j     n = 165 due to 2 missing data.

3.4. Survival time analyses to develop dementia, psychosis, dyskinesia, and wearing-off

Time to develop clinical outcomes (dementia, psychosis, dyskinesia, and wearing-off) was compared in 19 subjects with GD-associated mutations, 29 with mutations unreported in GD, and 167 without mutation. The median observation time was 6.0 years. The subjects with GD-associated mutations showed a significantly earlier development of dementia and psychosis, compared with subjects without mutation (p < 0.001 and p = 0.017) ( Supplementary Table e-1, Fig. 1A and B). We rereviewed the clinical record of the subject who showed early dementia (defined by DSM IV) ( Fig. 1A) and made sure it did not satisfy the criteria of DLB ( McKeith et al., 2005).

The associations of GBA mutations and these symptoms were estimated as HRs, adjusting for sex and age at PD onset. HRs were 8.3 for dementia (95% CI, 3.3–20.9; p < 0.001) and 3.1 for psychosis (95% CI, 1.5–6.4; p = 0.002). The time until development of wearing-off and dyskinesia complications was not statistically significant, with HRs of 1.5 (95% CI, 0.8–3.1; p = 0.219) and 1.9 (95% CI, 0.9–4.1; p = 0.086) ( Table 3).

Table 3.Hazard ratios of GBA pathogenic mutations for clinical symptoms

Model Clinical feature Hazard ratio 95% CI p
1 Dementia (DSM-IV) 8.3 3.3–20.9 <0.001
2 Psychosis 3.1 1.5–6.4 0.002
3 Wearing-off 1.5 0.8–3.1 0.219
4 Dyskinesia 1.9 0.9–4.1 0.086
Each model was adjusted for sex and age at onset.

Key: CI, confidence interval; DSM-IV; The Diagnostic and Statistical Manual of Mental Disorders part 1IV; GBA, glucocerebrosidase.

Subjects with mutations unreported in GD did not show significant differences in time to develop all 4 outcomes, compared with no mutation subjects. Therefore, subjects with GD-unreported mutations were regarded as subjects without GBA mutations in further analyses.

3.5. rCBF on SPECT in patients with GD-associated GBA mutations

We conducted pixel-by-pixel comparisons of rCBF on SPECT between PD subjects with mutations (cases) and sex-, age-, and disease duration-matched PD subjects without any mutations in GBA (controls). Four controls were adopted for each case (except for a 34-year-old female case who was matched to a control), and in total 12 cases (female 50%, age at SPECT mean ± standard error (SE); 58.9 ± 3.3 years, disease duration at SPECT 7.3 ± 1.5 years) and 45 controls (female 64.4%, age at SPECT mean ± SE; 61.0 ± 1.3 years, disease duration at SPECT 7.1 ± 0.7 years) were analyzed. As a result, a significantly lower rCBF was seen in the cases compared to the controls in the bilateral parietal cortex, including the precuneus ( Fig. 2).

3.6. H/M ratios on MIBG scintigraphy in patients with GD-associated GBA mutations

Cardiac MIBG scintigraphy visualizes catecholaminergic terminals in vivo that are reduced as well as brain dopaminergic neurons in PD patients. We also investigated MIBG scintigraphy between 16 cases (female 68.8%, age at examination mean ± SE; 60.2 ± 2.6 years, disease duration at examination 6.2 ± 1.2 years) and sex-, age- and disease duration-matched 61 controls [(63.8 %, age 62.0 ± 1.1 years, disease duration 5.5 ± 0.6 years) (1:4 except for 1 young 34-year-old female case who was matched to a control)]. In the results, both early and late H/M ratios declined in both groups and did not show any significant differences (p = 0.309 and 0.244) ( Supplementary Table e-2).

4. Discussion

4.1. Contributions of GD-associated GBA mutations to the development of PD

In the analysis of 215 PD patients and 126 non-PD controls, we identified 10 nonsynonymous heterozygous GBA mutations, including 1 novel mutation. Among these mutations, 7 were GD-associated, and the patients carrying these mutations represented 8.8% of the PD cohort. No significant association was found between the GD-unreported mutations and PD development, which suggests that only the GD-associated mutations are a genetic risk for PD. According to a worldwide multicenter analysis of 1883 fully sequenced PD patients, 7% of the GD-associated mutations are found in non-Ashkenazi Jewish PD patients ( Sidransky et al., 2009). Although the mutation frequency in the present study was similar to previous results, the OR of GD-associated heterozygous mutations (25.1) was significantly greater than the OR (5.43) of other ethnic cohorts (Sidransky et al., 2009) and was consistent with an OR of 28.0 from a previous Japanese report ( Mitsui et al., 2009). These results, taken together, suggest the possibility thatGBA mutations are at a distinct risk for PD in the Japanese population. However, a larger Japanese cohort study is required to confirm this.

4.2. Cross-sectional clinical figures of PD with GBA mutations

Before the survival time analyses, we investigated clinical features at enrollment between mutation groups. The lower onset age, more frequent family history and dementia, and worse disease severity of PD in patients with GBA mutations, compared with those without mutations, were consistent with previous cross-sectional case-control reports ( Anheim et al., 2012, Brockmann et al., 2011, Chahine et al., 2013, Lesage et al., 2011, Li et al., 2013, Mitsui et al., 2009, Neumann et al., 2009, Seto-Salvia et al., 2012 and Sidransky et al., 2009). In contrast, female-predominance (73.7%, p = 0.047) in patients with mutations observed in the present study is inconsistent ( Neumann et al., 2009 and Seto-Salvia et al., 2012).

4.3. Impact of GBA mutations on the clinical course of PD

To investigate the impact of GBA mutations on the clinical course of PD, a prospective-designed study over a long period is preferred. Although there has been a few longitudinally designed study to date, follow-up clinical data for a median of 6 years of 121 PD cases from a community-based incident cohort was recently reanalyzed; results demonstrate that progression to dementia defined by DSM IV (HR 5.7) and Hoehn and Yahr stage 3 (HR 3.2) are significantly earlier in 4 GBA mutation-carrier patients compared with 117 patients with wild-type GBA ( Winder-Rhodes et al., 2013). A 2-year follow-up clinical report of 28 heterozygous GBA carriers who were recruited from relatives of GD-patients shows slight but significant deterioration of cognition and smelling, compared to healthy controls ( Beavan et al., 2015). Brockmann et al. (2015)assessed motor and nonmotor symptoms including cognitive and mood disturbances for 3 years in 20 PD patients with GBA mutations and showed a more rapid disease progression of motor impairment and cognitive decline in GBA mutation cases comparing to sporadic PD controls. The current long-term retrospective cohort study up to 12 years reinforced these results. It revealed that dementia and psychosis developed significantly earlier in subjects with GD-associated mutations compared with those without mutation, and the HRs of GBA mutations were estimated at 8.3 for dementia and 23.1 for psychosis, with adjustments for sex and PD onset age. In contrast, the results showed no significant difference in developing wearing-off and dyskinesia.

In this study, we also investigated whether GD-unreported mutations affected the clinical course of PD. In both cross-sectional and survival time analyses, the mutations unreported in GD carried no increased burden on clinical symptoms such as dementia, psychosis, wearing-off, and dyskinesia.

4.4. Reduced rCBF in PD with GBA mutations compared with matched PD controls

We found a significantly decreased rCBF, reflecting decreased synaptic activity, in the bilateral parietal cortex including the precuneus, in subjects with GD-associated mutations compared with matched subjects without mutations. The pattern of reduced rCBF was very similar to the pattern of H215O positron-emission tomography that Goker-Alpan et.al. (2012) reported, showing decreased resting rCBF in the lateral parietal association cortex and the precuneus bilaterally in GD subjects with parkinsonism (7 subjects with homozygous or compound heterozygous GBA mutations), compared with 11 PD without GBA mutations. Results suggest that PD with heterozygous GBAmutations and GD patients presenting parkinsonism had a common reduced pattern of rCBF. Interestingly, in their study, rCBF in the precuneus—but not in the lateral parietal cortex—correlated with IQ, suggesting that the involvement of the precuneus is critical for defining GBA-associated patterns.

4.5. Reduced cardiac MIBG H/M ratios as well as matched PD controls

We also showed that cardiac MIBG H/M ratios in subjects with GD-associated mutations were lower than the cutoff point for PD discrimination (Sawada et al., 2009), suggesting that postganglionic sympathetic nerve terminals to the epicardium were denervated, as well as in PD without mutations.

4.6. Mechanisms of impact on PD clinical course by GD-associated GBA mutations

Experimental studies suggesting a bidirectional pathogenic loop between α-synuclein and glucocerebrosidase have been accumulated (Fishbein et al., 2014, Gegg et al., 2012, Mazzulli et al., 2011, Noelker et al., 2015, Schondorf et al., 2014 and Uemura et al., 2015). Loss of glucocerebrosidase function compromises α-synuclein degradation in lysosome, whereas aggregated α-synuclein inhibits normal lysosomal function of glucocerebrosidase. The pathogenic loop may facilitate neurodegeneration in GD-associated PD brain, resulting in early development of dementia or psychosis as shown in the present study. Several recent researches propose the possibility that the similar mechanism as in PD with GBA mutations exists even in idiopathic PD brain ( Alcalay et al., 2015, Chiasserini et al., 2015, Gegg et al., 2012 and Murphy et al., 2014). On the other hand, the impacts of GD-associated GBA mutations for the development of motor complications such as wearing-off and dyskinesia were not statistically significant, suggesting other pathophysiological mechanisms in the striatal circuit brought out after long-term therapy especially by l-dopa.

4.7. Limitations

Our study has several limitations. In the design of the study, we assumed that the sample size was 215 (PD patients) for survival time analyses and investigated 224 PD patients. We assumed that the mutation prevalence would be 9.4%, and in fact, we found 19 patients with mutations (8.5%) of the 224 patients. Based on these figures, we estimated the risk ratios of heterozygous GBA mutations for the risk of PD development and PD clinical symptoms as ORs in the cross-sectional multivariate analyses, although the 95% CIs were broad. More of subject numbers will be needed to determine robust risk ratios.

Comprehensive Genetic Characterization of a Spanish Brugada Syndrome Cohort

Published: July 14, 2015   DOI: http://dx.doi.org:/10.1371/journal.pone.0132888

Brugada syndrome (BrS) was identified as a new clinical entity in 1992 [1]. Six years later, the first genetic basis for the disease was identified, with the discovery of genetic variations inSCN5A [2]. Nowadays, more than 300 pathogenic variations in this first gene are known to be associated with BrS [3]. SCN5A encodes for the α subunit of the cardiac voltage-dependent sodium channel (Nav1.5), which is responsible for inward sodium current (INa), and thus plays an essential role in phase 0 of the cardiac action potential (AP). Genetic variations in this gene can explain around 20–25% of BrS cases [3].

Since BrS was classified as a genetic disease, several other genes have been described to confer BrS-susceptibility [47]. Pathogenic variations have been mainly described in: 1) genes encoding proteins that modulate Nav1.5 function, and 2) other calcium and potassium channels and their regulatory subunits. All these proteins participate, either directly or indirectly, in the development of the cardiac AP. Although the incidence of pathogenic variations in these BrS-associated genes is low [6], it is considered that, among all of them, they could provide a genetic diagnosis for up to an extra 5–10% of BrS cases. Hence, altogether, a genetic diagnosis can be achieved approximately in 35% of clinically diagnosed BrS patients.
Other types of genetic abnormalities have been suggested to explain the remaining percentage of undiagnosed patients. Indeed, multiplex ligation-dependent probe amplification (MLPA) has allowed the detection of large-scale gene rearrangements involving one or several exons ofSCN5A in BrS cases. However, the low proportion of BrS patients carrying large genetic imbalances identified to date suggests that this type of rearrangements will provide a genetic diagnosis for a modest percentage of BrS cases [810].
BrS has been associated with an increased risk of sudden cardiac death (SCD), despite the reported variability in disease penetrance and expressivity [11]. The prevalence of BrS is estimated at about 1.34 cases per 100 000 individuals per year, with a higher incidence in Asia than in the United States and Europe [12]. However, the dynamic nature of the typical electrocardiogram (ECG) and the fact that it is often concealed, hinder the diagnosis of BrS. Therefore, an exhaustive genetic testing and subsequent family screening may prove to be crucial in identifying silent carriers. A large percentage of these pathogenic variation carriers are clinically asymptomatic, and may be at risk of SCD, which is, sometimes, the first manifestation of the disease [13].
In the present work, we aimed to determine the spectrum and prevalence of genetic variations in BrS-susceptibility genes in a Spanish cohort diagnosed with BrS, and to identify variation carriers among relatives, which would enable the adoption of preventive measures to avoid SCD in their families.

Results
Study population

Sequencing of genes associated with BrS

We performed a genetic screening of 14 genes (SCN5A, CACNA1C, CACNB2, GPD1L,SCN1B, SCN2B, SCN3B, SCN4B, KCNE3, RANGRF, HCN4, KCNJ8, KCND3, and KCNE1L), which allowed the identification of 61 genetic variations in our cohort. Of these, 20 were classified as potentially pathogenic variations (PPVs), one variation of unknown significance, and 40 common or synonymous variants considered benign.

The 20 PPVs were found in 18 of the 55 patients (32.7% of the patients, 83.3% males; Table 2). Sixteen patients (88.9%) carried one PPV, and two patients (11.1%) carried two different PPVs each. Nineteen out of the 20 PPVs identified were localized in SCN5A and one in SCN2B.

The vast majority of the PPVs identified were missense (70%). We also detected 2 nonsense variations (10%), 3 insertions or deletions causing frameshifts (15%), and one splicing variation (5%). The three frameshifts (p.R569Pfs*151, p.E625Rfs*95 and p.R1623Efs*7) were identified in SCN5A. These were not found in any of the databases consulted (see Methods), and were thus considered potentially pathogenic (see below). The other 16 rare variations identified inSCN5A had been previously described, and hence were also considered potentially pathogenic. Fourteen of them had been identified in BrS patients. Of these, 6 had also been identified in individuals diagnosed with other cardiac electric diseases (i.e. Sick Sinus Syndrome, Long QT Syndrome, Sudden Unexplained Nocturnal Death Syndrome or Idiopathic Ventricular Fibrillation [2,15,16,20,21,25]). The other 2, p.P1725L and p.R1898C, had only been associated with Long QT Syndrome or found in Exome Variant Server with a MAF of 0.0079%, respectively. Furthermore, we identified a variation in SCN2B (c.632A>G in exon 4 of the gene, resulting in p.D211G) which was considered pathogenic. This patient was included within our cohort, but the functional characterization of channels expressing SCN2B p.D211G was object of a previous study from our group [7]. We also identified a nonsense variation in RANGRFwhich has been formerly reported as rare genetic variation of unknown significance [29].

Additionally, we screened the relatives of those probands carrying a PPV. We analysed a total of 129 relatives, 69 of which (53.5%) were variation carriers. Genotype-phenotype correlations evidenced that 8 of the families displayed complete penetrance (S3 Table). Additionally, no relatives were available for one of the probands carrying a PPV, thus hampering genotype-phenotype correlation assessment. The other 12 families showed incomplete penetrance.

MLPA analysis

The 37 patients with negative results after the genetic screening of the 14 BrS-associated genes underwent MLPA analyses of SCN5A. This technique did not reveal any large exon deletion or duplication in this gene for any of the patients.

SCN5A p.R569Pfs*151 (c.1705dupC), a novel PPV

A 41-year-old asymptomatic male presented a type 3 BrS ECG which was suggestive of BrS. Flecainide challenge unmasked a type 1 BrS ECG (Fig 1A, left), which was also spontaneously observed sometimes during medical follow up. Sequencing of SCN5A revealed a duplication of a cytosine at position 1705 (c.1705dupC; Fig 1A, right), which originated a frameshift that lead to a truncated Nav1.5 channel (p.R569Pfs*151). The proband’s sister also carried this duplication, but had never presented signs of arrhythmogenesis. The proband’s twin daughters were also variation carriers, displayed normal ECGs and, to date, are asymptomatic (Fig 1A, middle). Thus, p.R569Pfs*151 represents a novel genetic alteration in the Nav1.5 channel that could potentially lead to BrS, but with incomplete penetrance.

SCN5A p.E625Rfs*95 (c.1872dupA), a novel PPV

A 51-year-old asymptomatic male was diagnosed with BrS since he presented a spontaneous ST segment elevation in leads V1 and V2 characteristic of type 1 BrS ECG (Fig 1B, left). The sequencing of SCN5A evidenced an adenine duplication at position 1872 (c.1872dupA, Fig 1B, right). This genetic variation results in a truncated Nav1.5 channel (p.E625Rfs*95). The genetic analysis of the proband’s relatives proved that only her mother carried the variation (Fig 1B, middle). She was asymptomatic, but a BrS ECG was unmasked upon ajmaline challenge. The proband’s sister was found dead in her crib at 6 months of age, which suggests that her death might be compatible with BrS. Therefore, the p.E625Rfs*95 variation in the Nav1.5 channel represents a novel genetic alteration potentially causing BrS.

SCN5A p.R1623Efs*7 (c.4867delC), a novel PPV

The proband, a 31-year-old male, was admitted to hospital after suffering a syncope. His baseline 12-lead ECG showed a ST segment elevation in leads V1 and V2 that strongly suggested BrS type 1 (Fig 1C, left). A deletion of the cytosine at position 4867 (c.4867delC) was observed upon SCN5A sequencing (Fig 1C, right). This base deletion leads to a frameshift that originates a truncated Nav1.5 channel (p.R1623Efs*7). Genetic screening of his parents and sisters evidenced that none of them carried this novel variation (Fig 1C, middle). None of them had presented any signs of arrhythmogenicity, nor had a BrS ECG. Nevertheless, in uterogenetic analysis of one of his daughters proved that she had inherited the variation. She died when she was 1 year of age of non-arrhythmogenic causes. Hence, the p.R1623Efs*7 variation in the Nav1.5 channel is a novel genetic alteration originated de novo in the proband that could potentially lead to BrS.

Synonymous and common genetic variations portrayal

In our cohort, we identified 40 single nucleotide variations which were common genetic variants and/or synonymous variants (S2 Table). Twenty-nine had a minor allele frequency (MAF) over 1%, and were thus considered common genetic variants.

We also identified 11 variants with MAF less than 1%. Of them, 9 were synonymous variants, what made us assume that they were not disease-causing. Four of these synonymous variants were not found in any of the databases consulted, and thus their MAF was considered to be less than 1%. Each of these synonymous variations was identified in 1 patient of the cohort. A similar proportion of individuals carrying these novel variations was detected upon sequencing of 300 healthy Spanish individuals (600 alleles). The remaining 2 variants were missense, and although they had either a MAF of less than 1% or an unknown MAF according to the Exome Variant Server and dbSNP websites, they were common in our cohort (29.2 and 50%, respectively; S2 Table), and a similar MAF was detected in a Spanish cohort of healthy individuals (26.7% and 48.8%, respectively).

Influence of phenotype and age on PPV discovery

To assess if a connection existed between the probands’ phenotype and the PPV detection yield, we classified the patients in our cohort according to their ECG (spontaneous or induced type 1), the presence of BrS cases within their families, and the presence/absence of symptoms. Even though the overall PPV detection yield was 32.7%, it was even higher for symptomatic patients (Fig 2). Indeed, in this group of patients, having a family history of BrS was identified as a factor for increased PPV discovery yield. In the case of absence of BrS in the family, the variation discovery yield was almost double for those patients having a spontaneous type 1 BrS ECG than for patients with drug-induced type 1 ECG (45.5% vs 25%, respectively). In addition, we identified a PPV in 44.4% of the asymptomatic patients who presented family history of BrS and a spontaneous type 1 BrS ECG. When the patient presented drug-induced type 1 ECG or in the absence of family history of BrS, the PPV discovery yield was of around 15%.

We also investigated the role of age on the PPV occurrence. No significant age differences were observed between variation carriers and non-carriers (38.6±10.3 and 43.5±14.4, respectively, p = 0.16). However, the PPV discovery yield was higher for patients with ages between 30 and 50 years: out of the total of patients carrying a PPV, 83.3% of the patients were in this age range, while 11.1% were younger and 5.6% were older patients (Fig 3A, upper panel). The PPV discovery yield was significantly higher for symptomatic than for asymptomatic patients (42.3% vs 24.1%, respectively; Fig 3A, lower panels).

Noteworthy, in the 30–50 age range, 52.9% (9/17) of the symptomatic patients and 35.3% (6/17) of asymptomatic patients carried one PPV (Fig 3B, middle). Additionally, 40% (2/5) of the symptomatic young patients (< 30 years) were variation carriers, while no PPVs were identified in asymptomatic patients within this age range.

Overall, 55 unrelated Spanish patients clinically diagnosed with BrS were included in our study.Table 1 shows the demographics of this cohort, and Table 2 and S1 Table show the clinical and genetic characteristics of all the patients included in the study. The mean age at clinical diagnosis was of 41.9±13.3 years. Although the majority of patients were males (74.5%), their age at diagnosis was not different than that of females (41.8±12.1 years and 42.3±16.3 years, respectively; p = 0.92). A type 1 BrS ECG was present spontaneously in 37 patients (67.3%), and drug challenge revealed a type 1 BrS ECG for the remaining 18 patients (32.7%). Almost half of the patients had experienced symptoms, including 2 SCD and 4 aborted SCD. Patients who had not previously experienced any signs of arrhythmogenicity despite having a BrS ECG were considered asymptomatic. Comparison of symptomatic vs asymptomatic patients evidenced a similar percentage of males (73.1% and 75.9%, respectively). However, the mean age at diagnosis was different between the two groups of patients (37.7±14.3 and 45.7±11.4, respectively; p<0.05).

Discussion

To the best of our knowledge, this is the first comprehensive genetic evaluation of 14 BrS-susceptibility genes and MLPA of SCN5A in a Spanish cohort. Well delimited BrS cohorts from Japan, China, Greece and even Spain have been genetically studied [24,3032]. Additionally, an international compendium of BrS genetic variations identified in more than 2100 unrelated patients from different countries was published in 2010 [3]. However, all these studies screenedSCN5A exclusively. In 2012, Crotti et al. reported the spectrum and prevalence of genetic variations in 12 BrS-susceptibility genes in a BrS cohort [5]. However, this study included patients of different ethnicity. Here, we report the analysis of 14 genes which has been conducted on a well-defined BrS cohort of the same ethnicity.

Our results confirm that SCN5A is still the most prevalent gene associated with BrS. Indeed,SCN5A-mediated BrS in our cohort (30.9%) is higher than the proportion described in other European reports [3,23], where a potentially causative variation is identified in only 20–25% of BrS patients. The reason for this discrepancy is unclear but could point towards a higher prevalence of SCN5A PPVs in the Spanish population or to selection bias. Additionally, we identified a genetic variation in SCN2B (c.632A>G, which results in p.D211G). We have formerly published the comprehensive electrophysiological characterization of this variation, and showed that indeed this variation could be responsible of the phenotype of the patient, thus linking SCN2B with BrS for the first time [7]. Also, we identified a variation in RANGRF. This variation (c.181G>T leading to p.E61X) had been previously reported in a Danish atrial fibrillation cohort [33]. Surprisingly, the authors reported an incidence of 0.4% for this variation in the healthy Danish population, which brought into question its pathogenicity. Our finding of this variation in an asymptomatic patient displaying a type 2 BrS ECG also points toward considering it as a rare genetic variation with a potential modifier effect on the phenotype but not clearly responsible for the disease [29].

No PPVs were identified in the other genes tested. Certainly, it is well accepted that the contribution of these genes to the disease is minor, and thus should only be considered under special circumstances [13,34]. In addition, recent studies have questioned the causality of variations identified in some of these minority genes [35].

We also used the MLPA technique for the detection of large exon duplications and/or deletions in SCN5A in patients without PPVs, and no large rearrangements were identified. This is in accordance with previous reports, which revealed that such imbalances are uncommon [810].

Kapplinger et al. [3] reported a predominance of PPVs in transmembrane regions of Nav1.5. Indeed, it has been proposed that most rare genetic variations in interdomain linkers may be considered as non-pathogenic [36]. In contrast, PPVs identified in this study are mainly located in extracellular loops and cytosolic linker regions of Nav1.5 (Fig 4). Additionally, 2 of our non-previously reported frameshifts are located in the DI-DII linker. These 2 genetic variations lead to truncated proteins, which would lack around 75% of the protein sequence, and thus are presupposed to be pathogenic.

In our cohort, we have identified 40 synonymous or common genetic variations, 4 of which have not been previously reported. These variations are gradually becoming more and more important in the explanation of certain phenotypes of genetic diseases. Only a few common variations identified here are already published as phenotypic modifiers [37,38]. The effect of these and other common variants identified in our cohort on BrS phenotype should be further studied.

Unexpectedly, almost 40% (7/18) of the PPV carriers did not present signs of arrhythmogenicity. We also performed genotype-phenotype correlations of the PPVs identified in the families (S3 Table). These studies uncovered relatives, most of whom were young individuals, who carried a familial variation but had never exhibited any clinical manifestations of the disease. This is in agreement with Crotti et al. and Priori et al. [5,23], who postulated that a positive genetic testing result is not always associated with the presence of symptoms. Indeed, the existence of asymptomatic patients carrying genetic variations described to cause a severe Nav1.5 channel dysfunction has been reported [39]. The identification of silent carriers is of paramount importance since it allows the adoption of preventive measures before any lethal episode takes place. Unknown environmental factors, medication and modifier genes have been suggested to influence and/or predispose to arrhythmogenesis [11]. Hence, this group of patients has to be cautiously followed in order to avoid fatal events.

Our studies on the connection between patients’ phenotype and the PPV detection yield highlighted the presence of symptoms as a factor for an increased variation discovery yield. Within the group of symptomatic individuals, a PPV was identified in a higher proportion of patients displaying a spontaneous type 1 BrS ECG than for patients showing a drug-induced ECG. Likewise, within the asymptomatic patients with family history of BrS, those who presented spontaneous type 1 BrS ECG carried a PPV more often than those with a drug-induced ECG (Fig 2). Referring to age, the vast majority (17/20, 85%) of the PPVs were identified in patients around their fourth decade of age (30–50 years). This is in accordance with the accepted mean age of disease manifestation. Moreover, in this age range, more than 50% of the patients who presented symptoms carried a variation that could be pathogenic (Fig 3). Importantly, 35.3% of asymptomatic patients of around 40 years of age also carried one of such variations. These data highlight the importance of performing a genetic test even in the absence of clinical manifestations of the disease, and particularly when in the 30–50 years range, which is in accordance with consensus recommendations [13,34].

In conclusion, we have analysed for the first time 14 BrS-susceptibility genes and performed MLPA of SCN5A in a Spanish BrS cohort. Our cohort showed male prevalence with a mean age of disease manifestation around 40 years. BrS in this cohort was almost exclusivelySCN5A-mediated. The mean PPV discovery yield in our Spanish BrS patients is higher than that described for other BrS cohorts (32.7% vs 20–25%, respectively), and is even higher for patients in the 30–50 years age range (up to 53% for symptomatic patients). All these evidences support the genetic testing, at least of SCN5A, in all clinically well diagnosed BrS patients.

Study Limitations

First of all, drug challenge tests were not performed for all the relatives who were asymptomatic variation carriers. This fact hampered their clinical diagnosis and represents an impediment to definitely assess the link between PPVs and BrS. These patients are nowadays under follow-up.

New PPVs have been identified in our cohort. The clinical information available for the families suggests that these new variations could be pathogenic. Still, in vitro studies of these variations are required in order to evaluate their functional effects and verify their pathogenic role. Additionally, genotyping in an independent cohort would help reduce the likelihood of type I (false positive) error in genetic variant discovery.

We have to acknowledge that the study set is relatively small. Consequently, the classification of patients according to the different clinical categories rendered rather small sub-groups, which may lead to over-interpretation of the results. Future studies will be directed to the genetic screening of additional Spanish BrS patients, which will probably reinforce the significance of the tendencies observed here.

Imaging Schizophrenia Brain

Schizophrenia Brain

Larry H. Bernstein, MD, FCAP, Curator

LPBI

http://health-innovations.org/2015/10/27/neuroimaging-matches-specific-schizophrenia-behaviour-to-the-brains-anatomy/

Neuroimaging studies using fMRI and PET to examine functional differences in brain activity in patients with schizophrenia have shown that differences seem to most commonly occur in the frontal lobes, hippocampus, and temporal lobes. These differences are heavily linked to the neurocognitive deficits which often occur with schizophrenia, particularly in areas of memory, attention, problem solving, executive function and social cognition.

Earlier studies from the researchers reported evidence suggesting that schizophrenia is not a single disease but a group of eight genetically distinct disorders, each with its own set of symptoms. Results found that distinct sets of genes were strongly associated with particular clinical symptoms.

The current study investigates the brain’s anatomy and shows that there are distinct subgroups of patients with a schizophrenia diagnosis that correlates with symptoms.  This also explains the difficulty in past studies to identify a single set of biomarkers for a single type of schizophrenia.

The current study evaluated scans taken with magnetic resonance imaging (MRI) and a technique called diffusion tensor imaging in 36 healthy volunteers and 47 people with schizophrenia. Results show that the scans of patients with schizophrenia had various abnormalities in portions of the corpus callosum, a bundle of fibers that connects the left and right hemispheres of the brain and is considered critical to neural communication. Characteristics across the corpus callosum revealed in the brain scans matched specific symptoms of schizophrenia. Patients with specific features in one part of the corpus callosum typically displayed bizarre and disorganized behaviour. In other patients, irregularities in a different part of that structure were associated with disorganized thinking and speech and symptoms such as a lack of emotion; other brain abnormalities in the corpus callosum were associated with delusions or hallucinations.  The lab conclude that their findings provide further evidence that schizophrenia is a heterogeneous group of disorders rather than a single disorder.

The team surmise that they didn’t start with people who had certain symptoms and then look to see whether they had corresponding abnormalities in the brain. They note that they just looked at the data, and the patterns began to emerge. They go ony to add that this kind of granular information, combined with data about the genetics of schizophrenia, one day will help physicians treat the disorder in a more precise way.

Many genes responsible for the creation of synaptic proteins have previously shown to be strongly linked to schizophrenia and other brain disorders, however, until now the reasons have not been understood.  Now, researchers from Cardiff University have identified a critical function of what they believe to be schizophrenia’s ‘Rosetta Stone’ gene that could hold the key to decoding the function of all genes involved in the disease.  The team state that the breakthrough has revealed a vulnerable period in the early stages of the brain’s development that they hope can be targeted for future efforts in reversing schizophrenia.  The study is published in the journal Science.

The gene identified in the current study is known as ‘disrupted in schizophrenia-1’ (DISC-1). Earlier studies have shown that when mutated, the gene is a high risk factor for mental illness including schizophrenia, major clinical depression and bipolar disorder.  The aim of the current study was to determine whether DISC-1’s interactions with other proteins early on in the brain’s development had a bearing on the brain’s ability to adapt its structure and function, also known as ‘plasticity’, later on in adulthood.

In order for healthy development of the brain’s synapses to take place, the DISC-1 gene first needs to bind with two other molecules known as ‘Lis’ and ‘Nudel’.  The experiments in mice revealed that by preventing DISC-1 from binding with these molecules prevents cortical neurons in the brain’s largest region from being able to form synapses.  The ability to form coherent thoughts and to properly perceive the world is damaged as a consequence of this.

Preventing DISC-1 from binding with ‘Lis’ and ‘Nudel’ molecules when the brain was fully formed had no effect on its plasticity. However, the researchers were able to pinpoint a seven-day window early on in the brain’s development, one week after birth, where failure to bind had an irreversible effect on the brain’s plasticity later on in life.

The researchers hypothesize that DISC-1 is schizophrenia’s Rosetta Stone gene and could hold the master key to help unlock the understanding of the role played by all risk genes involved in the disease.  They go on to add that they have identified a critical period during brain development that will assist in testing whether other schizophrenia risk genes affecting different regions of the brain create their malfunction during their own critical period.

McEwen Award for Innovation: Irving Weissman, M.D., Stanford School of Medicine, and Hans Clevers, M.D., Ph.D., Hubrecht Institute

McEwen Award for Innovation: Irving Weissman, M.D., Stanford School of Medicine, and Hans Clevers, M.D., Ph.D., Hubrecht Institute

Larry H. Bernstein, MD, FCAP, Curator

Series E. 2; 7.3

Past winners include Azim Surani, James Thomson, Rudolf Jaenisch and Kazutoshi Takahashi with Shinya Yamanaka

The International Society for Stem Cell Research (ISSCR) has presented EuroStemCell partner Hans Clevers with the McEwen Award for Innovation at the opening of its annual meeting, today (24 June) in Stockholm, Sweden.

The prizes awarded by ISSCR in 2015 are:

McEwen Award for Innovation: Irving Weissman, M.D., Stanford School of Medicine, and Hans Clevers, M.D., Ph.D., Hubrecht Institute

ISSCR-BD Biosciences Outstanding Young Investigator Award: Paul Tesar, Ph.D., Case Western Reserve University School of Medicine

ISSCR Public Service Award: Alan Trounson, Ph.D., MIMR-PHI Institute of Medical Research

In 2015, the ISSCR recognizes long-standing contributors to the field, Weissman and Clevers, for the identification, prospective purification and characterization of somatic (adult) tissue-associated stem cells and advancement of their research findings toward clinical applications.

Award recipient Weissman’s many discoveries have helped map the direction of the stem cell field and have served as the basis for important research and work by scientists all over the world.  He was the first to isolate and characterize hematopoietic (blood) stem cells from mice and humans. He developed the approaches and technologies, now widely used within the field, for isolating blood stem and progenitor cells and defining their properties. Weissman pioneered the extension of his approaches to isolation of other stem cell types, including human nervous system cells and skeletal muscle myogenic stem/progenitor cells. Further, he discovered several independent leukemia stem cells and, more recently, bladder cancer stem cells, head and neck cancer stem cells and malignant melanoma stem cells. Weissman has pursued these discoveries to develop several promising means of cancer therapy.

Award recipient Clevers has been a leader in biomedical sciences and the area of Wnt signaling in colon cancer for more than three decades. He and his lab developed tools to identify and track an adult stem cell population able to give rise to the entire lining of the gut and later to demonstrate that these cells can be isolated and grown in culture as “miniguts,” recapitulating the normal structure and function of the gut. These discoveries are a move toward promising therapies for colon conditions, like ulcers, in which the lining of the intestine has been destroyed in patches, and provide a powerful resource for modeling disease pathology and for drug screening.

“Irv Weissman and Hans Clevers have made enormous contributions to stem cell science. Working in the blood and gut systems, respectively, and extending their findings in different tissues, they have defined the concepts and technologies that underpin many avenues of research,” Hans Schöler, chair of the ISSCR’s McEwen Awards selection committee, said. “Each has made pioneering conceptual advances in disease modeling and regenerative medicine.”

The ISSCR-BD Biosciences Outstanding Young Investigator Award recognizes exceptional achievements by an ISSCR member and investigator in the early part of their independent career in stem cell research.  The winner receives a $7,500 USD personal award and is invited to present at the ISSCR’s annual meeting. Past winners include Valentina Greco, Marius Wernig, Cédric Blanpain, Robert Blelloch, Joanna Wysocka and Konrad Hochedlinger. Award recipient Tesar established his independent laboratory five years ago and has rapidly risen to his current position as the Dr. Donald and Ruth Weber Goodman Professor of Innovative Therapeutics and tenured Associate Professor in the Department of Genetics and Genome Sciences at Case Western Reserve University School of Medicine. Tesar’s studies have shaped the global understanding of both pluripotent stem cell and oligodendrocyte biology. His seminal and highly cited report on epiblast stem cells, published in Nature in 2007, along with similar findings by Pedersen, Vallier and colleagues, led to a complete shift in the understanding of how pluripotency is regulated in the mammalian embryo. He has continued to provide high impact contributions to the field, pioneering new methods to generate and mature oligodendrocyte progenitor cells, and to use these to enhance repair in animal models of multiple sclerosis. Stanford stem cell pioneer Irving Weissman wins international honors by Krista Conger on Feb 10, 2015 http://news.stanford.edu/thedish/2015/02/10/stanford-stem-cell-pioneer-irving-weissman-wins-international-honors/ IRVING WEISSMAN, a professor of pathology and of developmental biology at Stanford Medical School, was recently awarded the Charles Rodolphe Brupbacher Prize for Cancer Research in Zurich. Weissman, who directs the Stanford Institute for Stem Cell Biology and Regenerative Medicine, was honored for his role in identifying and isolating the first hematopoetic, or blood-forming, stem cell in mice in 1988, and then in humans in 1992. In 2000, he also isolated leukemia cancer stem cells from humans. Recently, he and his colleagues have devoted themselves to understanding how cancer cells escape destruction by the immune system by expressing a “don’t eat me” signal on their cell membranes. “His discoveries on aging processes in stem-cell systems and ultimately his contribution toward understanding cancer stem cells and the way in which the immune system can control these cells are pioneering achievements with far-reaching clinical implications,” Markus Manz, director of the Department of Hematology at the University Hospital Zurich, said of Weissman at a symposium titled “Breakthroughs in Cancer Research and Therapy” where the prize was announced. Weissman also is the director of Stanford’s Ludwig Center for Cancer Stem Cell Research and Medicine and holds the Virginia and Daniel K. Ludwig Professorship in Clinical Investigation in Cancer Research. The prize, presented by the Charles Rodolphe Brupbacher Foundation, included 100,000 Swiss francs, or about$108,000.

The Charles Rodolphe Brupbacher Foundation was founded in 1991 by Brupbacher’s wife, Frederique, in honor of her late husband. This is the 12th time the prize, which is meant to recognize internationally acknowledged achievements in fundamental cancer research, has been awarded. Brupbacher was a Swiss banker, economist and international currency expert.

In addition to the Brupbacher Prize, it was recently announced that Weissman will receive theMcEwen Award for Innovation, supported by the McEwen Centre for Regenerative Medicine in Toronto. The award will be presented in June at the annual meeting of the International Society for Stem Cell Research in Stockholm. It recognizes the work of Weissman and Hans Clevers, of the Hubrecht Institute in the Netherlands, in the identification, purification and characterization of adult stem cells from a variety of human tissues and cancers. Weissman and Clevers will share a 100,000 award. Anti-CD47 antibody may offer new route to successful cancer vaccination Scientists at the School of Medicine have shown that their previously identified therapeutic approach to fight cancer via immune cells called macrophages also prompts the disease-fighting killer T cells to attack the cancer. The research, published online May 20 in the Proceedings of the National Academy of Sciences, demonstrates that the approach may be a promising strategy for creating custom cancer vaccines. Various researchers have been working over the years to create vaccines against cancer, but the resulting vaccines have not been highly effective. Current approaches to developing the vaccines rely on using immune cells called dendritic cells to introduce cancer protein fragments to T cells — a process known as antigen presentation. The hope has been that the process would stimulate the body’s T cells to identify cancer cells as diseased or damaged and target them for elimination. However, this process often only modestly activates the most potent cancer-fighting kind of T cell, called killer T cells or CD8+ T cells. The Stanford team discovered that there was another viable vaccine approach, using the macrophage pathway to program killer T cells against cancer. Irving Weissman, MD, professor of pathology and of developmental biology, and his team previously showed that nearly all cancers use the molecule CD47 as a “don’t-eat-me” signal to escape from being eaten and eliminated by macrophages. The researchers found that anti-CD47 antibodies, which can block the “don’t-eat-me” signal and enable macrophages to engulf cancer cells, eliminated or inhibited the growth of various blood cancers and solid tumors. In the new study, the Stanford team showed that after engulfing the cancer cells, the macrophages presented pieces of the cancer to CD8+ T cells, which, in addition to attacking cancer, are also potent attackers of virally infected or damaged cells. As a result, the CD8+ T cells were activated to attack the cancer cells on their own. “It was completely unexpected that CD8+ T cells would be mobilized when macrophages engulfed the cancer cells in the presence of CD47-blocking antibodies,” said MD/PhD student Diane Tseng, the lead author of the study. Following engulfment of cancer cells, macrophages activate T cells to mobilize their own immune attack against cancer, she said. The Stanford group plans to start human clinical trials of the anti-CD47 cancer therapy in 2014. The new research provides hope that the therapy will cause the immune system to wage a two-pronged attack on cancer — through both macrophages and T cells. The approach may also give physicians early indicators of how the treatment is working in patients. “Monitoring T-cell parameters in patients receiving anti-CD47 antibody may help us identify the immunological signatures that tell us whether patients are responding to therapy,” said co-author Jens Volkmer, MD, an instructor at the Stanford Institute for Stem Cell Biology and Regenerative Medicine. The research revives interest in an aspect of macrophages that has been neglected for decades: their role in presenting antigens to T cells. For many years, researchers have focused on the dendritic cell as the main antigen-presenting cell, and have generally believed that macrophages specialize in degrading antigens rather presenting them. This research shows that macrophages can be effective at antigen presentation and are powerful initiators of the CD8+T cell response. The fact that T cells become involved in fighting cancer as a result of CD47-blocking antibody therapy could have important clinical implications. The antibody might be used as a personalized cancer vaccine allowing T cells to recognize the unique molecular markers on an individual patient’s cancer. “Because T cells are sensitized to attack a patient’s particular cancer, the administration of CD47-blocking antibodies in a sense could act as a personalized vaccination against that cancer,” Tseng added. Weissman, who is senior author of the new study, is the director of the Stanford Institute for Stem Cell Biology and Regenerative Medicine and the director of the Stanford Ludwig Center for Cancer Stem Cell Research and Medicine. Other Stanford investigators involved in the research were senior scientist Stephen Willingham, PhD; postdoctoral scholars John Fathman, PhD, Nathaniel Fernhoff, PhD, Matthew Inlay, PhD, and Masanori Miyanishi, MD, PhD; instructor Jun Seita, MD, PhD; graduate student Kipp Weisskopf, MPhil; and life sciences research associate Humberto Contreras-Trujillo. The research was supported by the Virginia and D.K. Ludwig Fund for Cancer Research, the Joseph and Laurie Lacob Gynecologic/Ovarian Cancer Fund, the National Institutes of Health (grants R01CA86017, P01CA139490, P30CA124435 and F30CA168059), and the Student Training and Research in Tumor Immunology Program of the Cancer Research Institute. Christopher Vaughan is communications manager at the Stanford Institute for Stem Cell Biology and Regenerative Medicine. Clinical Investigation of a Humanized Anti-CD47 Antibody in Targeting Cancer Stem Cells in Hematologic Malignancies and Solid Tumors Funding Type: Disease Team Therapy Development III Grant Number: DR3-06965 Investigator(s): Irving Weissman – PI Institution: Stanford University Disease Focus: Cancer Solid Tumor Blood Cancer Most normal tissues are maintained by a small number of stem cells that can both self-renew to maintain stem cell numbers, and also give rise to progenitors that make mature cells. We have shown that normal stem cells can accumulate mutations that cause progenitors to self-renew out of control, forming cancer stem cells (CSC). CSC make tumors composed of cancer cells, which are more sensitive to cancer drugs and radiation than the CSC. As a result, some CSC survive therapy, and grow and spread. We sought to find therapies that include all CSC as targets. We found that all cancers and their CSC protect themselves by expressing a ‘don’t eat me’ signal, called CD47, that prevents the innate immune system macrophages from eating and killing them. We have developed a novel therapy (anti-CD47 blocking antibody) that enables macrophages to eliminate both the CSC and the tumors they produce. This anti-CD47 antibody eliminates human cancer stem cells when patient cancers are grown in mice. At the time of funding of this proposal, we will have fulfilled FDA requirements to take this antibody into clinical trials, showing in animal models that the antibody is safe and well-tolerated, and that we can manufacture it to FDA specifications for administration to humans. Here, we propose the initial clinical investigation of the anti-CD47 antibody with parallel first-in-human Phase 1 clinical trials in patients with either Acute Myelogenous Leukemia (AML) or separately a diversity of solid tumors, who are no longer candidates for conventional therapies or for whom there are no further standard therapies. The primary objectives of our Phase I clinical trials are to assess the safety and tolerability of anti-CD47 antibody. The trials are designed to determine the maximum tolerated dose and optimal dosing regimen of anti-CD47 antibody given to up to 42 patients with AML and up to 70 patients with solid tumors. While patients will be clinically evaluated for halting of disease progression, such clinical responses are rare in Phase I trials due to the advanced illness and small numbers of patients, and because it is not known how to optimally administer the antibody. Subsequent progression to Phase II clinical trials will involve administration of an optimal dosing regimen to larger numbers of patients. These Phase II trials will be critical for evaluating the ability of anti-CD47 antibody to either delay disease progression or cause clinical responses, including complete remission. In addition to its use as a stand-alone therapy, anti-CD47 antibody has shown promise in preclinical cancer models in combination with approved anti-cancer therapeutics to dramatically eradicate disease. Thus, our future clinical plans include testing anti-CD47 antibody in Phase IB studies with currently approved cancer therapeutics that produce partial responses. Ultimately, we hope anti-CD47 antibody therapy will provide durable clinical responses in the absence of significant toxicity. New insights into the biology of cancer have provided a potential explanation for the challenge of treating cancer. An increasing number of scientific studies suggest that cancer is initiated and maintained by a small number of cancer stem cells that are relatively resistant to current treatment approaches. Cancer stem cells have the unique properties of continuous propagation, and the ability to give rise to all cell types found in that particular cancer. Such cells are proposed to persist in tumors as a distinct population, and because of their increased ability to survive existing anti-cancer therapies, they regenerate the tumor and cause relapse and metastasis. Cancer stem cells and their progeny produce a cell surface ‘invisibility cloak’ called CD47, a ‘don’t eat me signal’ for cells of the native immune system to counterbalance ‘eat me’ signals which appear during cancer development. Our anti-CD47 antibody counters the ‘cloak’, enabling the patient’s natural immune system to eliminate the cancer stem cells and cancer cells. Our preclinical data provide compelling support that anti-CD47 antibody might be a treatment strategy for many different cancer types, including breast, bladder, colon, ovarian, glioblastoma, leiomyosarcoma, squamous cell carcinoma, multiple myeloma, lymphoma, and acute myelogenous leukemia. Development of specific therapies that target all cancer stem cells is necessary to achieve improved outcomes, especially for sufferers of metastatic disease. We hope our clinical trials proposed in this grant will indicate that anti-CD47 antibody is a safe and highly effective anti-ancer therapy that offers patients in California and throughout the world the possibility of increased survival and even complete cure. We have previously developed a new therapeutic candidate, the anti-CD47 humanized antibody, Hu5F9-G4, which demonstrates potent anti-cancer activity in animal models of malignancy. The goal of CIRM DTIII Grant DR3-06965 is to conduct initial phase I clinical trials of this antibody in advanced cancer patients. We originally proposed to conduct two separate Phase I clinical trials: one in solid tumor patients with advanced malignancy (commenced in August 2014), the other in relapsed, refractory AML patients (anticipated to start in September 2015). The primary endpoints for these trials will be to assess safety and tolerability, and additional endpoints include obtaining information about the dosing regimen for subsequent clinical investigations, and initial efficacy assessments. CD47 is a dominant anti-phagocytosis signal that is expressed on all types of human cancers assessed thus far. It binds to SIRPα, an inhibitory receptor on macrophages, and in so doing, blocks the ability of macrophages to engulf and eliminate cancer cells. Hu5F9-G4 blocks binding of CD47 to SIRPα, and restores the ability of macrophages to engulf or phagocytose cancer cells. In pre-clinical cancer models, treatment with Hu5F9-G4 shrunk tumors, eliminated metastases, and in some cases resulted in long-term protection from cancer recurrence. These results suggest that Hu5F9-G4 leads to elimination of cancer stem cells in addition to differentiated cancer cells. We have developed Hu5F9-G4 for human clinical trials by demonstrating safety and tolerability in pre-clinical toxicology studies. These studies also indicated that we can achieve serum levels associated with potent efficacy in pre-clinical models. The regulatory agencies (FDA in the U.S., and MHRA in the U.K.) reviewed the large package of pre-clinical data describing Hu5F9-G4, and approved our requests to commence separate Phase I clinical trials in solid tumor and AML patients. The solid tumor trial commenced at Stanford in August 2014 and has been designed to assess patients in separate groups, or cohorts, treated with increasing doses of Hu5F9-G4. The trial is ongoing as primary endpoints have not been met. The acute myeloid leukemia trial has been given regulatory approval in the U.K., and will start enrolling patients in September 2015. In summary, during the last year, the Hu5F9-G4 clinical trials have made substantial progress and all milestones have been met. Stem Cell Research: Promise and Progress Hans Clevers: “Every day new research is showing us that many types of cancers are fed by tumour stem cells” http://www.irbbarcelona.org/en/news/hans-clevers-every-day-new-research-is-showing-us-that-many-types-of-cancers-are-fed-by-tumour The biggest challenge in designing new cancer therapies lies in successfully identifying and targeting tumour stem cells, which are responsible for the regrowth of the tumour. The Barcelona BioMed Conference on “Normal and Tumour Stem Cells”, aims to analyze the function of stem cells in cancer. The conference, which begins today and runs until November 14 at the Institut d’Estudis Catalans, is co-organized by colon cancer research experts Eduard Batlle (IRB Barcelona) andHans Clevers (Hubrecht Institute, the Netherlands), with the support of the BBVA Foundation. During the three-day event, 21 world experts in the field will meet with a further 130 participants to share their latest research findings on tumour stem cells. “In 2007 we held the first Barcelona BioMed Conference on this topic. At the time there was only very preliminary data on the relationship between stem cells and cancer. Five years on, many convincing data have emerged to indicate that the majority of tumours are indeed fed by tumour stem cells,” explains Hans Clevers, the scientist who first identified stem cells in the intestine and who today is one of the world leaders in research on normal stem cells and their potential for regenerative therapy. A number of important studies have demonstrated that at the heart of cancers of the breast, colon, skin, brain, lung and leukemias lie a small group of malignant cells that have retained the properties of the stem cell that gave rise to the cancers in the first place. It is these cells that allow the tumour to grow and can regenerate it. The efforts of many research groups worldwide now focusses on unraveling this process, identifying the specific genes that allow it to occur, and finding ways to detect and eliminate these malignant stem cells. Stem cells and the origin of tumours One of the principal characteristics of stem cells is that they are able to copy themselves indefinitely, giving rise to one stem cell and one specialized cell. This capacity for unlimited replication ensures the constant renewal of healthy tissues, which is fundamental for survival and is the basis of regenerative medicine. When the stem cells undergo cancerous mutations or when normal tumour cells acquire stem cell properties, however, this can lead to the formation of tumours. “This conference gives us a valuable opportunity to learn about the latest work on the two types of stem cells, normal and tumour, in different tissues. What we have been observing over recent years is that the tumour mimcs the hierarchies that exist in normal tissues. In order to understand the tumour, we need to understand the healthy tissue. Most of the scientists invited to the conference are working on both aspects,” explains Batlle. The list of speakers includes pioneers in the field, such as Irving L. Weissman, director of the Institute for Stem Cell Biology & Regenerative Medicine in Stanford, California. Weissman, known as the “father of haematopoiesis”, first identified stem cells in the blood and determined how they give rise to the different types of blood cells, making major contributions to our understanding of leukemias and other ‘liquid’ tumours. Stem cells and metastasis In addition to being at the root of the tumour and allowing it to grow, stem cells may also cause metastasis. In order for metastasis to occur, cells from the original tumour must escape into the blood stream and invade new organs to seed new tumours there. “Only cells with stem cell properties are able to make this happen, since they are the only type of cell that can generate all the cell types of the tumor,” explains Batlle. But in order to cause metastasis, these cells also need to be able to do other things. “We have discovered that in the case of colon cancer, stem cells must be able to trick the healthy tissue of the organ they have invaded into helping them survive in this hostile environment.” Batlle’s study, to be published tomorrow inCancer Cell, will be presented during the conference. This is the first piece of work to reveal a key role for the tumour microenvironment in fostering the process of metastasis, a discovery which will open doors to similar findings in other types of tumours. Normal stem cells vs. tumour stem cells One of the keys in the fight against cancer is the ability to identify tumour stem cells and differentiate them from healthy stem cells. The conference co-organizers maintain that “this is still a central question. We don’t yet know enough about normal stem cells, and technical issues make things difficult. We are making rapid progress, however, and in the next few years we expect to be able to make great strides both in figuring out the similarities and differences in the two types of cells, and in coming up with new strategies to fight the growth and spread of tumours.” PROFILES OF CONFERENCE CO-ORGANIZERS EDUARD BATLLE – Group Leader of the Colorectal Cancer Laboratory and Coordinator of the Oncology Programme at IRB Barcelona. ICREA Research Professor (Instituto Catalán para la Investigación y Estudios Avanzados). Dr. Batlle’s research over the past decade has focused on the characterization of the mechanisms that cause the initiation, progression and metastasis of colon cancer. He has published studies in several high-impact journals such as Cell, Nature, Nature Genetics and Cancer Cell. His achievements include the discovery of the transcription factor Snail in tumour cells and the elucidation of the function of EphB membrane receptors in colorrectal cancer. During the Barcelona BioMed Conference, Dr. Batlle will present the results of a study to be published in Cancer Cell on a process indispensable for colon cancer metastasis. Among his recognitions, Batlle has received the Banc Sabadell Prize for Biomedical Research (2010) and the “Debiopharm Life Sciences Award for Outstanding Research in Oncology” given by the Ecole Polytechnique Fédérale de Lausanne in Switzerland (2006). He is the recipient of an ERC Starting Grant awarded by the European Research Council in 2007. HANS CLEVERS – Group leader at the Hubrecht Institute (director 2002-2012 ) and President of the Royal Netherlands Academy of Arts and Sciences. Dr. Clevers was the first scientist to identify intestinal stem cells and remains one of the leading researchers in this field. His discoveries have had significant impact in cancer as well as in regenerative therapy with stem cells and in vitro organ culture. Clevers’ work in developmental biology and cancer led him to discover the beta-catenin/Tcf4 transcriptional complex, which causes the majority of colorrectal cancer. http://apoorvamandavilli.com/wp-content/uploads/2010/10/2010stem-cells-and-cancer.pdf In 1991 Clevers became a professor of immunology at the University Medical Center in Utrecht. Since 2002 he has been a professor of molecular genetics at UMC Utrecht. Also in 2002 he became director of the Hubrecht Institute for Developmental Biology and Stem-Cell Research at the Royal Dutch Academy of Sciences, where until May 2012 he led the WNT Signaling and Cancer research group and was project leader of the Netherlands Proteomics Centre and Cancer Genomics Centre. Clevers discovered similarities between the normal renewal of intestinal tissue and the onset of colon cancer. In 2007 he received a grant of two million euros from the KWF Cancer Society to study the function of stem cells in the normal intestines and in colon cancer, and in 2008 he received an ERC Advanced Investigator Grant. In March 2012, Clevers, who since 2000 had been a member of the Royal Netherlands Academy of Arts and Sciences, was elected its president, a position he assumed on June 1 of that year, succeeding Robbert Dijkgraaf. In connection with his election to this position, he resigned from the Hubrecht Institute and began to carry out research two days a week at the UMC-U.[4][5][6][7][9] Asked in a 2008 interview what had been the highlights of his research up to that point, Clevers said “there would probably be three. There was a first one, when I just started my lab, within the first few months we cloned the gene that they call TCF1, t-cell factor 1, I used to be a t-cell embryologist when we first started out. And that paper was published in EMBO in ’91, first author. So in that paper we described cloning of this vector, which at that time maybe on the world scale was not great but for my own lab to clone this gene was my first thing I ever did alone. This gene then in ’96 we found to be the crucial missing component of what’s called the Wnt signaling pathway, and this [was] generally seen as a major breakthrough we had. There were papers in ’96 and ’97 in Cell, and we had two papers in Science in the same two years.” Clevers and his team thus showed that “there is that this TCF transcription factor, there is a small family of them, they occur in every animal on the planet, they are the end point of the signal transcription cascade, and they control virtually every decision in a developing animal. When we realized this we started changing our model systems, we used to work on lymphocytes, and we changed it, first to frogs and flies, drosophila, where the Wnt pathway had been studied by many other people that way we could use assays of those people. We then realized that in mammals Wnt signaling…was not only important in embryos but also crucial in adults, which is novel. And we switched to the gut, we found that one of our knockouts, the TCF4 knockout, one of the four members of that family had no stem cells in the gut. And this is the first link in the literature, this was also a ’97 paper in Nature Genetics, between Wnt signaling and stem cells in adults. And in that same year we found that colon cancer comes about by the disregulation of TCF4, and those two phenomena are really linked. So stem cells need TCF4, cancers disregulate TCF4 by mutating a gene upstream in that pathway called APC.” After this Clevers’s team “continued to work on the intestine and on the physiology of the intestine, which was essentially an unstudied field, much to my surprise. May I emphasize, there are thousands of very competent embryologists, and they work on tiny details, and they fight over the smallest details, are extremely competent. In this intestinal field there are thousands of gastroentromologists that study cancer or colitis or Crohn’s Disease, but there are very few, if any, labs studying normal tissue, which is amazing because that is a tissue that we use every five days. It’s the most rapidly proliferating tissue in a normal body. So my lab actually build up a lot of mouse models and we learn a lot about how that’s being done, and then finally…last year we finally identified the stem cells in the gut. And we now can purify them in large numbers and study their characteristics.”[4] A recent posting at the website of the Royal Netherlands Academy of Arts and Sciences provides a capsule summary of Clevers’s research to date: “His research deals with the intestine, in both its healthy and diseased state. He has discovered that there are numerous similarities between the normal process whereby intestinal tissue is renewed and the development of intestinal cancer. Improved understanding of these processes is crucial to developing new ways of treating cancer. Hans Clevers has described the molecular signalling pathways that are disrupted by cancer and has identified a protein that is specific to stem cells in the intestine. He has then been able to grow ‘mini-intestines’ from individual stem cells. These are the first steps on the road to regenerative medicine, in this case the regeneration of intestinal tissue.”[7] Q&A: Hans Clevers Eric Bender Nature 521, S15 (14 May 2015) http://dx.doi.org://10.1038/521S15a n 2009, Hans Clevers and Toshiro Sato (then a postdoc in Clevers’ lab) demonstrated a powerful new model to study development and disease: a three-dimensional ‘organoid’ derived from adult stem cells that replicates the structure of cells lining the intestine. More than 100 labs worldwide are now working with different types of organoid to study cancer and other diseases. Clevers, at the Hubrecht Institute in Utrecht, the Netherlands, discusses the potential of this approach. Why might it be better to screen drugs in organoids rather than in cell lines? We don’t currently understand why certain tumours are sensitive or resistant to particular drugs. With targeted therapies, you can make a prediction, but for classical chemotherapy drugs, such as cisplatin or 5-fluorouracil, it is totally unpredictable which tumours will respond. Tumours can be sequenced in great detail, but drugs against them cannot be tested effectively other than in clinical trials. Organoids are a very good genetic representation of the tumour, so they let us bridge the gap between deep-sequencing efforts and patient outcomes. How do you see organoids contributing to the study of colorectal cancer? We are collaborating with groups at the Broad Institute in Cambridge, Massachusetts, and the Sanger Institute in Hinxton, UK, to build a biobank of organoids from 20 or so people with colon cancer. We have organoids of the cancer and of normal cells from individual patients, as well as sequences of their protein-coding genes. We have established the non-profit Hubrecht Organoid Technology (HUB) to expand our organoid biobanks. The HUB shares these biobanks with academic groups around the world, and now works with about 15 companies on drug-development programmes. We can culture tumours from almost every person with colon cancer, sequence them and test them against drugs. Additionally, we can use research techniques that have been developed for cell lines, such as genetic tools, fluorescence-activated cell sorting and microarrays. Is this research moving towards clinical trials? Yes, my group and the HUB are collaborating with Emile Voest at the Netherlands Cancer Institute in Amsterdam on an observational trial. We already have some organoid models from people with colon cancer who receive chemotherapy. The organoids are screened against a panel of common colon-cancer drugs. The patients will be treated the same way the oncologists would normally treat them, but we’ll see if we could have predicted the response from our organoids. We’re also starting another trial in which we will enrol advanced-colon-cancer patients, for whom there is no standard treatment. We will make organoids, test drug sensitivity and resistance, and then advise the oncologists as to what drug to use for that particular patient. We will be looking at multiple drugs, so we need large numbers of patients — that’s the only way we will be able to produce enough data to help us match drugs to tumour types. To benefit individual patients, won’t you need to test the drugs very quickly? Yes — and that’s really where we want to take this technology. When you have pneumonia, your bacterial cultures are tested and you get answers in three days. With this technology, we can tell the oncologist the best odds for a combination of therapeutics, maybe not in three days, but in several weeks. We have an organoid-based test in cystic fibrosis that gives us a result in about two weeks. How does the organoid approach differ from patient-derived xenografts, in which patients’ tumours are transplanted into immune-suppressed mice for testing drugs? It’s the same principle — you get a functional readout of the patient’s tumour. But organoids can be tested against an unlimited amount of compounds and combinations. Furthermore, in contrast to xenografts, organoids can be established from almost all patients. What are some of the next steps in your cancer research? Organoids model the key component of the tumour but they lack some important elements. We want to combine organoids with other elements to make more-complete tools. For instance, we would like to introduce the immune system so that we can study the effects of the fantastic new immunotherapy drugs. We think that we can build it up in a reductionist way — take lymphocytes isolated from a tumour, bring these together with cancer organoids derived from the same tumour and watch what happens. And maybe we can also put microorganisms in these organoids. For example, we could add Helicobacter, a major cause of stomach cancer, to stomach organoids. Can organoids also help to test drug combinations? Yes, tumours are genetically heterogeneous, and there can be vast differences in drug sensitivity between clones for the same tumour. We can possibly advance sequence-based therapy by testing millions of drug combinations in organoids. Single Lgr5 stem cells build crypt–villus structures in vitro without a mesenchymal niche Toshiro Sato1, Robert G. Vries1, Hugo J. Snippert1, Marc van de Wetering1, Nick Barker1, Daniel E. Stange1, Johan H. van Es1, Arie Abo2, Pekka Kujala3, Peter J. Peters3 & Hans Clevers1 Nature 459, 262-265 (14 May 2009) | http://dx.doi.org:/10.1038/nature07935 Received 16 July 2008; Accepted 24 February 2009 The intestinal epithelium is the most rapidly self-renewing tissue in adult mammals. We have recently demonstrated the presence of about six cycling Lgr5+ stem cells at the bottoms of small-intestinal crypts1. Here we describe the establishment of long-term culture conditions under which single crypts undergo multiple crypt fission events, while simultanously generating villus-like epithelial domains in which all differentiated cell types are present. Single sorted Lgr5+ stem cells can also initiate these crypt–villus organoids. Tracing experiments indicate that the Lgr5+ stem-cell hierarchy is maintained in organoids. We conclude that intestinal crypt–villus units are self-organizing structures, which can be built from a single stem cell in the absence of a non-epithelial cellular niche. • A Model for Life Dis. Model. Mech. September 2013, doi: 10.1242/dmm.013367 vol. 6 no. 5 1053-1056 A gutsy approach to stem cells and signalling: an interview with Hans Clevers , Professor of Molecular Genetics at Utrecht University, began his career in immunology and developmental biology, but a shift towards intestinal research in the late 1990s led to his group’s pioneering discovery that Lgr5 is a marker of tissue stem cells – a finding that paved the way for a cascade of key insights into the molecular signalling pathways that are dysregulated in cancer. Interviewed here by Ross Cagan, Editor-in-Chief of Disease Models & Mechanisms, Hans recalls the mentors and discoveries that motivated his transition from basic to applied science, discusses his style of lab management and mentorship, and highlights the potential of organoid-based therapy for personalised medicine. Johannes (Hans) Clevers was born in 1957 in Eindhoven, home to Philips Electronics, in the south of The Netherlands. From a young age he showed enthusiasm and a natural talent for science, and as an undergraduate became fascinated with molecular biology. He obtained his PhD in immunology from Utrecht University during the mid-1980s, and simultaneously studied medicine. Making the pivotal decision to move back into the lab after completing his clinical training, he undertook postdoctoral research in Cox Terhorst’s lab at the Dana-Farber Cancer Institute at Harvard University. He then returned to Utrecht to set up his own lab, and was a Professor of Immunology at the university between 1991 and 2002. From 2002 to 2012 he was Director of the nearby Hubrecht Institute for Stem Cell Research. During this time, Hans moved gradually into the gastroenterology field, and made groundbreaking discoveries regarding the role of Wnt signalling in stem cells and colon cancer. His unique contributions to cancer, stem cell research and regenerative medicine have been recognised in the form of numerous awards, and in 2013 he was one of the eleven winners of a3 million award from the Breakthrough Prize in Life Sciences Foundation. Currently, he is Professor of Molecular Genetics at Utrecht University, and is also President of the Royal Netherlands Academy of Arts and Sciences (KNAW). Hans has also been involved in setting up several biotechnology companies.

Before we get to your background, I want to congratulate you on being, unsurprisingly, one of the Breakthrough Prize award winners. You have a long list of prizes now – is it something you’ve gotten used to?

This last one was unusual for me – prior to the Breakthrough award I had only ever received one American prize and that was in gastroenterology. To be the only researcher in Europe awarded, and to see my name on the list together with people like Robert Weinberg and Bert Vogelstein, who were the big shots when I was a postdoc, was a truly great honour. I went to the ceremony for the physics prize in Geneva, and it was like being at the Oscars – very surreal, as a scientist.

The first thing I did when I found out about my award was to invite the current and previous members of my lab to a huge party in Amsterdam, which will take place in September [2013]. There will be around 100 attendees – most of which are still in science. There will be good food and drink, stand-up comedy, and a small symposium.

Taking a step back into your past, why did you choose a career in science and medicine?

My high school system was very geared towards languages. I started learning biology at university in 1975 at the age of 18, and I was disappointed. Molecular biology was being developed in England, Switzerland and the US, but in Dutch universities there was no legal framework to do this, and so the courses – where available – focused only on technical details. Biology in general lacked charisma. At the time, my friends and brothers were junior medics, and as I had an interest in medicine I decided to take it on in addition to biology. I ended up spending a year in Nairobi and half a year at NIH for my biology rotations, and essentially I never went to any lectures (although this is something I never tell my students!). Anyway, I really started getting sucked into the clinical training, and found that working in a clinical environment is much more sociable than being in a lab. You’re part of a big organisation and there are lots of people to talk to, whereas in the lab there are only a few people, and small issues – such as somebody not cleaning up – can really cause friction. After medical school, I was picked, mainly because of my research background, for a training position in paediatrics. They suggested that I should start work for a PhD, so I went back into the lab. That’s when I realised that, despite the social attractiveness of working in a hospital, I was much more of a scientist than a doctor. I got my PhD – together with four published papers – in just 1 year. However, it was during my first postdoc position in Boston that I think I was really exposed to science for the first time. It was tough, but I knew I’d made the right decision.

Are there particular mentors who influenced your decision to choose the lab over clinics, and shaped your career moves?

When I received the Heineken Prize from the Royal Netherlands Academy of Arts and Sciences in 2012, I had to think deeply about my mentors and realised that there were two that I had almost forgotten. The first was my high school chemistry teacher, who sold laboratory chemicals to students from his home, during the evenings (in a well-regulated way). I had built a small lab in the attic of my parents’ house and I really had fun mixing things together and doing all the experiments that are possible to do at home. Because of this chemistry teacher, I learned the joy of being in a lab.

The second crucial mentor was my thesis advisor, who didn’t supervise me very much but did give me key advice that has stayed with me until now. He taught me that it’s important to trust everybody you work with, at least until they show you that they can’t be trusted. I emphasize this in my own lab – I encourage my students and postdocs to be open and transparent and to discuss their work. Some scientists are intuitively secretive and paranoid – cultural differences perhaps play a part in this. In my view, only when someone damages your trust can you justify being paranoid, and until then it is important to share information.

“…it’s important to trust everybody you work with, at least until they show you that they can’t be trusted”

There are many ways to run a lab; for example, you can micro-manage it or you can focus on the big picture and step back from the day-to-day issues. What is your style of running a lab?

When I first became a PI, I really liked doing experimental work. Even after 5 years as a postdoc, I enjoyed doing minipreps! As a consequence, I really micro-managed the few lab members I had, and I’m sure they were ultimately happy to get away from me. But when the lab grew a little bigger and I became Head of Department, it took me away from the lab much of the time. Nowadays, I informally talk with my lab colleagues as much as I can, preferably at the bench. As we speak, I know that there is someone in my group who will find out the results of a 3-month effort, today. I always insist on looking at the raw data, never the digested, analysed data. It could be 5 minutes or 2 hours, but when I’m needed in the lab I will always try to make time for it and be part of the troubleshooting process. When you can no longer troubleshoot in your own lab, you’re lost.

Well clearly success builds on success – some impressive scientists have come out of your lab. Do you encourage all of your group members to pursue academic positions?

I’ve had many ‘super postdocs’ in my lab but some of these individuals would not be happy as PIs. It’s not about capability, but about wanting to deal with the paperwork, the responsibility and the decision-making that come with being a PI. Such individuals can make a valuable contribution to a lab, given their years of experience, as well as acting as great mentors and role models for the newer group members. When, having gained experience in the pharmaceutical industry, Nick Barker re-joined my group in 2006 as Senior Staff Scientist, we spent 6–7 years looking for stem cell markers, and then broke open the field by identifying Lgr5 as a marker of cancer stem cell populations. Nick has now set up his own group in Singapore, but I have had several other very talented experimentalists in my lab for many years. Overall, I think that intermediate positions are fantastic for successful postdocs who might end up unhappy as PIs.

How did you get involved with intestinal stem cell research? You didn’t start in this field but somehow ended up there.

As an undergraduate student, I did a brief rotation project on T cells. This led to a PhD and postdoc focused on T cells. I learned molecular biology, which inspired me to clone a T-lymphocyte transcription factor, TCF-1, when I subsequently set up my own lab in Holland. We (Marc van der Wetering and I) cloned TCF-1 within a few months and showed that it binds DNA; but, despite trying all kinds of functional assays, we couldn’t show that it regulates transcription. It took 6 or 7 years to figure out that β-catenin, a signal transducer in the Wnt signalling pathway, was needed. We heard that Walter Birchmeier had made a complementary discovery in Berlin, and our papers came out at the same time.

Around that time, I was Clinical Professor in Immunology at Utrecht, and I started studying TCFs in mice, frogs, flies and worms. We soon established that TCFs are always the endpoint of the Wnt pathway. In 1996–1997, we knocked out TCF-4 in mice and, remarkably, observed a gut phenotype – the mice had no crypts. Simultaneously, we realised that the pathway is overactivated in colon cancer. That’s when I decided to move into studying the gut. It wasn’t easy as an immunologist, but I gradually got to know the gastroenterology field. At the time, this field was dominated by clinical research, and in fact our work didn’t really become known to gastroenterologists until around 3–4 years ago. They were totally unaware that mice could give clues about human disease, which surprised me, as in haematology and immunology, there is a good balance between basic and clinical science. There are other clinically well-developed fields, such as prostate and lung cancer research, that could really benefit from a stronger basic approach.

A key discovery for you was that Lgr5 is a marker of stem cells. When did you realise the implications of this discovery?

There were two ‘eureka’ moments with the stem cell story. The dogma at the time was the ‘+4’ stem cell model, which was pioneered by Chris Potten, who recently passed away. I tried to provide experimental support for this model, together with Nick Barker, but it never really went anywhere. Having realised that β-catenin and TCFs controlled crypts in the gut and cancer, we set out to determine the genetic programme controlled by this pathway. At the time (1997), there was no technology to do this properly, but in 2000 we performed one of the first microarrays with Pat Brown. Our array looked at expression in a colon cancer cell line. The array contained only two samples – plus or minus the Wnt pathway – but it opened the field for us by providing a list of markers to investigate further. This was the first, key step. From the list of markers, we picked a few that we thought were marking +4 cells, but these led us nowhere. Eventually, based on its unique expression pattern, we came up with Lgr5. We made numerous mouse strains, including Lgr5-GFP tagged mice. The moment we saw tiny cells lighting up under the microscope, I started writing our next ten big papers in my head. It was a remarkable moment – the cells exist, and we could visualise them using these mice.

And why exactly is Lgr5 so important, both from a basic and an applied standpoint?

Lgr5 is an exquisite protein. We and several other labs have shown that it is a marker for stem cells in many tissues. Originally, we saw it only in spontaneously dividing tissues, but we’ve recently found that it also appears in organs that have undergone damage. Lgr5 is unique in that it – on its own – it specifically marks homogenous populations of stem cells but not their progenitors, unlike most other markers. We now know that this is because it is a cell surface receptor protein in the Wnt pathway, and only stem cells require Wnts. In the gut, the stem cells are particularly active – in mice, they divide every day for 2.5 years, so they go through a thousand cell divisions.

Discovering Lgr5 led to another eureka moment: the generation of long-term culture systems that maintain crypt physiology. A Japanese gastroenterologist who I invited to my lab, Toshiro Sato, was the first to set up the right culture conditions, and now multiple labs are creating these systems, which are called organoids or ‘mini-guts’. Once the system was up and running, Toshiro showed that Paneth cells provide the niche for stem cells at crypt bottoms, and that stem cells produce their own daughters which then produce growth factors. With his former Japanese lab, we showed that normal tissue can be generated from a single stem cell, and it can survive in a mouse for as long as you want. Based on this finding, our lab evolved and now we’re culturing prostate, liver, pancreas, kidney, lung and breast tissue, all for prolonged periods of time, all from humans. There are no changes in chromosomal structure in the cultured cells, and deep sequencing reveals very few mutations. The next step will be to take single cells, genetically modify them like we do with embryonic stem cells, pick a safe clone, expand it and use it for therapy, particularly transplantation.

Do you think we will be able to take organoid-based therapy to the personalised level? Colorectal cancer, for example, only has a 3% success rate in clinical trials. Are organoids going to provide the answer?

We’re finalising a pilot sequencing study now involving 20 patients with normal crypts and colon cancer. With the wild-type and colon cancer organoids, we can potentially predict patient outcome and response to drugs. In the future, we hope to rapidly build large, living biobanks for other cancers, too. In line with this, we’re building up a ‘Stand Up 2 Cancer’ dream team involving several American labs and the Sanger Institute, with the aim of taking the organoid approach to the next level in cancer therapy. Sanger has robotised screening set-ups that allow thousands of compounds to be screened across hundreds of cell lines. We can now do this with patient-derived organoids. From these tests we could establish new effective drug combinations, and we could link genetics to function to help design smarter trials. The great thing about organoids is that they contain only epithelium – there is no immune system, no blood system, only the diseased tissue, making it a very clean system.

We’ve also recently collaborated with clinicians on a cystic fibrosis project. We can predict using cystic fibrosis ‘mini-guts’ that certain drugs that are currently in trials will work for one patient and not for another, and that certain drug combinations work better than others. From biopsy to drug response, it takes only 10 days. Industry is now very interested in using this assay to pre-screen and design trials.

“The great thing about organoids is that they contain only epithelium – there is no immune system, no blood system, only the diseased tissue, making it a very clean system”

In the past, you’ve suggested that classic hypothesis-driven science isn’t the right way to do science. Could you say a little bit more about this?

Now that I’m a bit older I’m more interested in how the process of science works. I always ask my colleagues: how do you run the lab and how do you make discoveries? In my lab, I try to establish a reproducible, quantitative system, like GFP mice and arrays. Then, I throw something at the system and look, without formulating a hypothesis. This is difficult because our brains like to produce causal relationships, even though these are often wrong. I’m constantly telling my group members that they should keep their minds open and make observations without assuming that they know what’s going on. In molecular biology, we can go anywhere we want and there are billions of effects to discover. You cannot do this in a hypothesis-driven way because you’re essentially retracing evolution. There are many solutions to a particular problem but evolution picked one – it’s very arrogant to think we can reconstruct this in our minds.

Some of my most elegant hypotheses have fallen by the wayside. The importance of establishing formal rules for innovation is a discussion worth having in biology. I understand that you have embraced movies to explain scientific concepts. What’s the story behind this?

I was inspired by Leonard Zon – I came across one of his movies about 8 years ago. I realised it’s much easier to convey messages visually than in words so I started working with a small company in Holland to produce science movies. The lab provides the idea and the images, and the company writes the script. We end up going back and forth a few times to make the message as accurate as possible, and it really shows us as scientists how ambiguous language can be. Often, feedback from the company sends us back into the lab to find out something we hadn’t looked into, for example how fast do the cells move, how many cells are there? Gradually, the movie comes together. Nowadays, I typically use the movies in my talks to explain a problem, and I’ve found that it’s much more effective to show the movie before explaining the experiments. People understand the experiments much better that way, and listen effortlessly. Now, whenever we have a story to write up I try to turn it into a 30-second movie before putting pen to paper. This really forces us to think about the core of the paper.

“In molecular biology, we can go anywhere we want and there are billions of effects to discover…There are many solutions to a particular problem but evolution picked one – it’s very arrogant to think we can reconstruct this in our minds”

Science is frustrating because things don’t work 90% of the time: ideas are wrong, experiments fail. You have to have the personality that thrives by those few fantastic moments of success that you have once a year or even once a career. Moving from being a clinician to being a scientist was one of the hardest decisions I ever made. A clinician gets rewards multiple times a day, so if you’re a person who needs that kind of reward and social interaction, then you shouldn’t be a scientist. Luckily there are now many alternative careers, such as pharma, government and teaching, that didn’t exist when I was a young scientist. However, there needs to be a radical change in the way we view these alternative routes. Maybe in the US it’s different, but here, if you step out of the system you are treated like a failure. I tell young scientists that failure comes with ending up as a miserable PI, with no funding and no papers.

PhD students and junior postdocs have to be aware that the people they see at meetings who give the great talks are in the minority – as scientists we have to be ready to do something else at any point during our career. I think the whole system has to realise that every other job can be as interesting as a job in science. That’s not what we always convey to young people – we describe academia as where it’s happening and everything else as dull or uncreative.

If you hadn’t chosen science as a career, what would you have done instead?

I would probably be a novelist. It’s even more competitive than being a scientist, but it’s also creative, so the perfect blend for me.

Targeted gene modification

Targeted gene modification

Larry H Bernstein, MD, FCAP, Curator

Series E. 2: 7.8

Mario R. Capecchi won a 2007 Nobel Prize for his work on targeted gene modification.

Born in Italy in 1937, scientist Mario R. Capecchi emigrated to the United States after World War II and later became a geneticist and professor. His groundbreaking work on targeted gene modification won him a Nobel Prize in 2007.

The Making of a Scientist II

In 1996, as a Kyoto Prize laureate, I was asked to write an autobiographical sketch of my early upbringing. Through this exercise, shared by all of the laureates, the hope was to uncover potential influences or experiences that may have been key to fostering the creative spirit within us. In my own case, what I saw was that, despite the complete absence of an early nurturing environment, the intrinsic drive to make a difference in our world is not easily quenched and that given an opportunity, early handicaps can be overcome and dreams achieved. This was intended as a message of hope for those who have struggled early in their lives. As I have previously noted, our ability to identify the genetic and environmental factors that contribute to talents such as creativity are too complex for us to currently predict. In the absence of such wisdom our only recourse is to provide all children with the opportunities to pursue their passions and dreams. Our understanding of human development is too meager to allow us to predict the next Beethoven, Modigliani, or Martin Luther King.

The content of the autobiographical sketch was based on my own memories, on conversations with my aunt and uncle, who raised me once I arrived in the United States, and on conversations with my mother. Because of the added exposure resulting from the winning of the Nobel Prize, I have received letters from people who knew me in Italy during those formative early years. In addition members of the press have taken an interest in my story and have sought independent corroboration. An amazing and wonderful surprise is that they have discovered a half-sister of whom I was completely unaware. She is two years younger than I, and was given up for adoption before she was one year old. Most recently I had the opportunity to meet my half-sister. She was a very nice person, as a sister should be. I am grateful for all of these new sources of information and revelation. Where appropriate, I will weave the new information into this retelling of my story.

Autobiographical Sketch
I was born in Verona, Italy on October 6, 1937. Fascism, Nazism, and Communism were raging through the country. My mother, Lucy Ramberg, was a poet; my father, Luciano Capecchi, an officer in the Italian Air Force. This was a time of extremes, turmoil and juxtapositions of opposites. They had a passionate love affair, and my mother wisely chose not to marry him. This took a great deal of courage on her part. It embittered my father.

 Figure 1. A photograph of my mother, Lucy Ramberg, at age 19.

I have only a few pictures of my mother. She was a beautiful woman with a passion for languages and a flair for the dramatic (see Figure 1). This picture was taken when she was 19. She grew up, with her two brothers, in a villa in Florence, Italy. There were magnificent gardens, a nanny, gardeners, cooks, house cleaners, and private tutors for languages, literature, history, and the sciences. She was fluent in half a dozen languages. Her father, Walter Ramberg, was an archeologist specializing in Greek antiquities, born and trained in Germany. Her mother was a painter born and raised in Oregon, USA. In her late teens, my grandmother, Lucy Dodd, packed up her steamer trunks and sailed with her mother from Oregon to Florence, Italy, where they settled. My grandmother was determined to become a painter. This occurred near the end of the 19th century, a time when young women were not expected to set off on their own with strong ambitions of developing their own careers.

 Figure 2. A painting done by my grandmother, Lucy Dodd Ramberg, of her three children, left to right, Edward, Lucy, and Walter. It was painted at their villa in Florence, Italy in 1913.
 Figure 3. A painting by Lucy Dodd Ramberg of my mother, Lucy, and uncle Edward having tea at the villa in Florence, Italy (1913).

My grandmother became a very gifted painter. Let me share with you a couple of her paintings, which also illustrate the young lives of her children. These paintings are very large, approximately seven feet by five feet. The first painting (Figure 2) is the center panel of a triptych depicting my mother and her two brothers Walter and Edward (both of whom became physicists) surrounded by olive trees at the villa in Florence. The influence of the French impressionist painters is evident. The second painting (Figure 3) is of my mother, age 8, and her younger brother Edward, age 6, having a tea party, again at the villa in Florence. Their father, the German archeologist, was killed as a young man in World War I. My grandmother finished raising her three children on her own by painting, mostly portraits, and by converting the family villa into a finishing school for young women, primarily from the United States.

 Figure 4. A photograph of the chalet where my mother and I lived in Wolfgrübben just north of Bolzano, Italy. In the foreground is my mother, Lucy.

My mother’s love and passion was poetry. She published in German. She received her university training at the Sorbonne in Paris and was a lecturer at that university in literature and languages. At that time, she joined with a group of poets, known as the Bohemians, who were prominent for their open opposition to Fascism and Nazism. In 1937, my mother moved to the Tyrol, the Italian Alps. Figure 4 shows the chalet north of Bolzano, in Wolfgrübben, with my mother in the foreground. We lived in this chalet until I was 3½ years old. In the spring of 1941, German officers came to our chalet and arrested my mother. This is one of my earliest memories. My mother had taught me to speak both Italian and German, and I was quite aware of what was happening. I sensed that I would not see my mother again for many years, if ever. She was incarcerated as a political prisoner in Germany.

I have believed that her place of incarceration was Dachau. This was based on conversations with my uncle Edward, my mother’s younger brother. During World War II, my uncle lived in the United States. Throughout these war years, he made many attempts to locate where my mother was being held. The most reliable information indicated that the location was near Munich. Dachau is located near Munich and was built to hold political prisoners. My mother survived her captivity, but after the war, despite my prodding, she refused to talk about her war experiences.

Reporters from the Associated Press (AP) have found records that my mother was indeed a prisoner during the war in Germany. In fact, they have found records of German interest in my mother’s political activities preceding 1939. In that year, they had her arrested by the Italian authorities and jailed in Perugia and subsequently released. However, the AP reporters did not find records indicating that my mother was incarcerated in Dachau. Though Germans were noted for their meticulous record keeping, it would be difficult now to evaluate the accuracy of the existing war records, particularly for cases where data is missing. It is clear, however, that exactly where in Germany my mother was held has not yet been determined. Regardless of which prison camp was involved, her experiences were undoubtedly more horrific than mine. She had aged beyond recognition during those five years of internment. Following her release, though she lived until she was 82 years old, she never psychologically recovered from her wartime experiences.

My mother had anticipated her arrest by German authorities. Prior to their arrival, she had sold most of her possessions and gave the proceeds to an Italian peasant family in the Tyrol so that they could take care of me. I lived on their farm for one year. It was a very simple life. They grew their own wheat, harvested it, and took it to the miller to be ground. From the flour they made bread which they took to the baker to be baked. During this time, I spent most of my time with the women of the farm. In the late fall, the grapes were harvested by hand and put into enormous wooden vats. The children, including me, stripped, jumped into the vats and mashed the grapes with our feet. We became squealing masses of purple energy. I still remember the pungent odor and taste of the fresh grapes. Most recently, members of the Dolomiten Press have located this farm and I had the opportunity to visit it. It is still owned by the same family that occupied it when I was there. The old farm house has been taken down and a new one erected. However, the pictures of the old farm house, as well as the surrounding land are remarkably consistent with my memories.

World War II was now fully under way. The American and British forces had landed in Southern Italy and were proceeding northward. Bombings of northern Italian cities were a daily occurrence. As constant reminders of the war, curfews and blackouts were in effect every night; no lights were permitted. In the night we could hear the drone of presumed American and British reconnaissance planes which we nicknamed “Pepe.” One hot afternoon, American planes swooped down from the sky and began machine gunning the peasants in the fields. A senseless exercise. A bullet grazed my leg, fortunately not breaking any bones. I still have the scar, which, many years later my daughter proudly had me display to her third-grade class in Utah.

For reasons that have never been clear to me, my mother’s money ran out after one year and, at age 4½, I set off on my own. I headed south, sometimes living in the streets, sometimes joining gangs of other homeless children, sometimes living in orphanages, and most of the time being hungry. My recollections of those four years are vivid but not continuous, rather like a series of snapshots. Some of them are brutal beyond description, others more palatable.

There are records in the archives of Ritten, a region of the Southern Alps of Italy, that I left Bozen to go to Reggio Emilia on July 18, 1942. AP reporters exploring this history have suggested that my father came to the farm, picked me up, and that we went together to Reggio Emilia where he was living. I have no memory of his coming to the farm, nor of having travelled with him to Reggio Emilia. I have recently received a letter from a man who remembers me as the youngest member of his street gang operating in Bolzano, which is on the way to Reggio Emilia.

I did end up in Reggio Emilia, which is approximately 160 miles south of Bolzano. I knew that my father lived in Reggio Emilia and I have previously noted that I had lived with him a couple of times from 1942-1946, for a total period of approximately three weeks. The question has been raised why I didn’t live with him for a much longer period. The reason was that he was extremely abusive. Amidst all of the horrors of war, perhaps the most difficult for me to accept as a child was having a father who was brutal to me.

Recently, I have also received a very nice letter from the priest in Reggio Emilia who ran the orphanage in which I was eventually placed. I remember him because he was one of the very few men I encountered in Reggio Emilia who showed compassion for the children and took an interest in me. I am surprised, but pleased, that after all these years he still remembers me among the thousands of children he was responsible for over the years. Further, I believe I was at that orphanage for only several months, the first time in the fall of 1945, after which I ran away, followed by a second period, in the same orphanage, in the spring of 1946. But his memory is genuine, for he recounts incidents consistent with my memories that could only have been known through our common experience.

In the spring of 1945, Munich was liberated by the American troops. My mother had survived her captivity and set out to find me. In October 1946, she succeeded. As an example of her flair for the dramatic, she found me on my ninth birthday, and I am sure that this was by design. I did not recognize her. In five years she had aged a lifetime. I was in a hospital when she found me. All of the children in this hospital were there for the same reasons: malnutrition, typhoid, or both. The prospects for most of those children ever leaving that hospital were slim because they had no nourishing food. Our daily diet consisted of a bowl of chicory coffee and a small crust of old bread. I had been in that hospital in Reggio Emilia for what seemed like a year. Scores of beds lined the rooms and corridors of the hospital, one bed touching the next. There were no sheets or blankets. It was easier to clean without them. Our symptoms were monotonously the same. In the morning we awoke fairly lucid. The nurse, Sister Maria, would take our temperature. She promised me that if I could go through one day without a high fever, I could leave the hospital. She knew that without any clothes I was not likely to run away. By late morning, the high, burning fever would return and we would pass into oblivion. Consistent with the diagnosis of typhoid, many years later I received a typhoid/paratyphoid shot, went into shock, and passed out.

 Figure 5. A photograph of my uncle Edward Ramberg working in his laboratory at RCA Princeton, New Jersey.

The same day that my mother arrived at the hospital, she bought me a full set of new clothes, a Tyrolean outfit complete with a small cap with a feather in it. I still have the hat. We went to Rome to process papers, where I had my first bath in six years, and then on to Naples. My mother’s younger brother, Edward, had sent her money to buy two boat tickets to America. I was expecting to see roads paved with gold in America. As it turned out, I found much more: opportunities.

On arriving in America, my mother and I lived with my uncle and aunt, Edward and Sarah Ramberg. Edward, my mother’s younger brother was a brilliant physicist. He was a Ph.D. student in quantum mechanics with Arnold Sommerfeld and translated one of Sommerfeld’s major texts into English. Among Edward’s many contributions was his discovery of how to focus electrons, knowledge which he used in helping to build the first electron microscope at RCA. Edward’s books on electron optics have been published in many languages. During my visit to Japan to celebrate the Kyoto Prize, several Japanese physicists approached me to express how grateful they were for my uncle’s texts from which they learned electron optics. Another achievement, of which he was less proud was being a principal contributor to the development of both black and white and color television. While I grew up in his home, television was not allowed. Figure 5 shows a photograph of my uncle working in his laboratory.

My aunt and uncle were Quakers and they did not support violence as solutions to political problems anywhere in the world. During World War II, my uncle did alternative service rather than bear arms. He worked in a mental institution in New Hampshire, cleared swamps in the south, and was a guinea pig for the development of vaccines against tropical diseases. After the war he settled in a commune in Pennsylvania, called Bryn Gweled, which he helped found. People of all races and religious affiliations were welcomed in this community. It was a marvelous place for children: it contained thick woods for exploration and had communal activities of all kinds – painting, dance, theater, sports, electronics, and many sessions devoted to the discussion of the major religious philosophies of the world. Every week there were communal work parties, putting in roads, phone lines, and electrical lines, building a community center and so on.

The contrast between living primarily alone in the streets of Italy and living in an intensely cooperative and supportive community in Pennsylvania was enormous. Time was needed for healing and for erasing the images of war from my mind. I remember that for many years after coming to the United States I would go to sleep tossing and turning with such force that by morning the sheets were torn and the bed frame broken. This activity disturbed my aunt and uncle to the extent that Sarah would take me from one child psychologist or psychiatrist, to another. These professionals were not very helpful, but the support of the community was. The nightly activity eventually subsided. There may be lessons to be learned from such experiences for the treatment of the children from Darfur, the Congo, and now Kenya.

Sarah and Edward took on the challenge of converting me into a productive human being. This, I am sure, was a very formidable task. I had received little or no formal education or training for living in a social environment. Quakers do not believe in frills, but rather in a life of service. My aunt and uncle taught me by example. I was given few material goods, but every opportunity to develop my mind and soul. What I made of myself would be entirely up to me. The day after I arrived in America, I went to school. I started in the third grade in the Southampton public school system. Sarah also took on the task of teaching me to read, starting from the very beginning.

The first task was to learn English. I had a marvelous third grade teacher. She was patient and encouraging. The class was studying Holland, so I started participation in class functions by painting a huge mural on butcher block paper with tulips, windmills, children ice skating, children in Dutch costumes, and ships. It was a collage of activities and colors. This did not require verbal communication.

I was a good, but not serious, student in grade school and high school. Academics came easily to me. I attended an outstanding high school, George School, a Quaker school north of Philadelphia. The teachers were superb, challenging, enthusiastic, competent, and caring. They enjoyed teaching. The campus was also magnificent, particularly in the spring when the cherry and dogwood trees were bursting with blossoms. An emphasis on Quaker beliefs permeated all of the academic and sports programs. A favorite period for many, including me, was Quaker meeting, a time set aside for silent meditation, and taking stock of where we were going. My wife and I sent our daughter to George School for her own last two years in high school so that she might also benefit from the personal virtues it promotes, and we think she has.

Sports were very important to me at George School, and physical activity has remained an important activity for me to this day. I played varsity football, soccer, and baseball, and wrestled. I was particularly proficient at wrestling. I enjoyed the drama of a single opponent, as well as the physical and psychological challenges of the sport. After George School, I went to Antioch, a small liberal arts college in Ohio.

At Antioch College I became a serious student, converting to academics all of the energy I had previously devoted to sports. Coming from George School, I carried the charge of making this a better, more equitable world for all people. Most of the problems appeared to be political, so I started out at Antioch majoring in political science. However, I soon became disillusioned with political science since there appeared to be little science to this discipline, so I switched to the physical sciences – physics and chemistry. I found great pleasure in the simplicity and elegance of mathematics and classical physics. I took almost every mathematics, physics, and chemistry course offered at Antioch, including Boolean algebra and topology, electrodynamics, and physical chemistry.

Although I found physics and mathematics intellectually satisfying, it was becoming apparent that what I was learning came from the past. The newest physics that was taught at Antioch was quantum mechanics, a revolution that had occurred in the 1920’s and earlier. Also, many frontiers of experimental physics, particularly experimental particle physics, were requiring the use of larger and larger accelerators, which involved bigger and bigger teams of scientists and support groups to execute the experiments. I was looking for a science in which the individual investigator had a more intimate, hands-on involvement with the experiments. Fortunately, Antioch had an outstanding work-study program; one quarter we studied on campus, the next was spent working on jobs related to our fields of interest. The jobs, in my case laboratory jobs, were maintained all over the country, and every three months we packed up our bags and set off for a new city and a new work experience. So one quarter off I went to Boston and the Massachusetts Institute of Technology (MIT).

There I encountered molecular biology as the field was being born (late 1950’s). This was a new breed of science and scientist. Everything was new. There were no limitations. Enthusiasm permeated this field. Devotees from physics, chemistry, genetics, and biology joined its ranks. The common premises were that the most complex biological phenomena could, with persistence, be understood in molecular terms and that biological phenomena observed in simple organisms, such as viruses and bacteria, were mirrored in more complex ones. Implicit corollaries to this premise were that whatever was learned in one organism was likely to be directly relevant to others and that similar approaches could be used to study biological phenomena in many organisms. Genetics, along with molecular biology, became the principal means for dissecting complex biological phenomena into workable subunits. Soon all organisms came under the scrutiny of these approaches.

I became a product of the molecular biology revolution. The next generation. As an Antioch college undergraduate, I worked several quarters in Alex Rich’s laboratory at MIT. He was an x-ray crystallographer, with very broad interests in molecular biology. While at MIT I was also fortunate to be influenced bySalvador Luria, Cyrus Leventhal and Boris Magasanik, through courses, seminars, and personal discussions. At that time Sheldon Penman and Jim Darnell were also working in Alex Rich’s laboratory. When placed in the same room, these two were particularly boisterous, providing comic relief to the fast moving era.

After Antioch, I set off for what I perceived as the “Mecca” of molecular biology, Harvard University. I had interviewed with Professor James D. Watson, of “Watson and Crick” fame, and asked him where should I do my graduate studies. His reply was curt and to the point: “Here. You would be fucking crazy to go anywhere else.” The simplicity of the message was very persuasive.

 Figure 6. A photograph of James D. Watson.

James D. Watson had a profound influence on my career (see Figure 6). He was my mentor. He did not teach me how to do molecular biology; because of my Antioch job experiences, I had already become a proficient experimenter. Jim instead taught me the process of science – how to extract the questions in a field that are critical to it and at the same time approachable through current technology. As an individual, he personified molecular biology, and, as his students, we were its eager practitioners. His bravado encouraged self-confidence in those around him. His stark honesty made our quest for truth uncompromising. His sense of justice encouraged compassion. He taught us not to bother with small questions, for such pursuits were likely to produce small answers. At a critical time, when I was contemplating leaving Harvard as a faculty member and going to Utah, he, being familiar with my self-sufficiency, counseled me that I could do good science anywhere. The move turned out to be a good decision. In Utah I had the luxury to pursue long-term projects that were not readily possible at Harvard, which, in too many cases had become a bastion of short-term gratification.

The summer before I started graduate school, Marshall Nirenberg had announced that polyU directs the synthesis of polyphenylalanine in a cell free protein synthesizing extract. That paper was a bombshell! I decided I would generate a cell-free extract capable of synthesizing real, functional proteins. Jim’s laboratory had started working on the RNA bacteriophage, R17. Its genome also served as messenger RNA to direct the synthesis of its viral proteins. That would be my message. The cell-free protein synthesizing extract worked beautifully. Authentic viral coat protein and replicase were shown to be synthesized in the extract1. Further, the coat protein was functional, it bound to a specific sequence of the R17 genome, thereby modulating the synthesis of the replicase. To this day, the high affinity of the viral coat protein for this RNA sequence is exploited as a general reporter system to track RNA trafficking within living cells and neuronal axons. In collaboration with Gary Gussin, also a graduate student in Jim’s laboratory, this system was used to determine the molecular mechanism of genetic suppression of nonsense mutations2. In collaboration with Jerry Adams, another graduate student in Jim’s laboratory, the system was also used to determine that initiation of the synthesis of all proteins in bacteria proceeded through the use of formyl-methionine-tRNA3,4. A similar mechanism is involved in the initiation of protein synthesis in all eukaryotic organisms. Finally, I used the same in vitro system to show that termination of protein synthesis unexpectedly utilized protein factors, rather than tRNA, to accomplish this end5,6. Jim Watson would later offer the very complimentary comment “that Capecchi accomplished more as a graduate student than most scientists accomplish in a lifetime.” It was, indeed, a productive time, but it wasn’t work; it was sheer joy.

While a graduate student in Jim’s laboratory, I was invited to become a junior fellow of the Society of Fellows at Harvard. Being a junior fellow was very special. The society’s membership, junior and senior fellows, represented a broad spectrum of disciplines; all the members were talented, and most of them were much more verbal than I. Social discourse centered around meals, prepared by an exquisite French chef and ending with fine brandy and Cuban cigars. Frequent guests at these dinners were the likes of Leonard Bernstein. Surreal maybe, but also very special.

 Figure 7. A photograph of Karl G. Lark.

From Jim’s laboratory, I joined the faculty in the Department of Biochemistry at Harvard Medical School, across the river in Boston. During my four years at Harvard Medical School I quickly rose through the ranks, but then, I unexpectedly decided to go to Utah. I was looking for something different. There were excellent scientists in the department I was in at Harvard Medical School, but the department was not built with synergy in mind. Each research group was an island onto itself. At that time, they were also unwilling to hire additional young faculty and thereby provide the department with a more youthful, energetic character. At the University of Utah, I would be joining a newly formed department that was being assembled by a very talented scientist and administrator, Karl G. Lark (Figure 7). He had excellent taste in scientists and a vision of assembling a faculty that would enjoy working together and striving together for excellence. I could be a participant in the growth of that department and help shape its character. Furthermore, the University’s administration, led then by President David P. Gardner, was in synchrony with this vision and a strong supporter. Gordon had already attracted Baldomero (Toto) Olivera, Martin Rechsteiner, Sandy Parkinson, and Larry Okun to Utah. After I arrived at Utah, we were able to bring to Utah such outstanding scientists as Ray Gesteland, John Roth, and Mary Beckerle. Utah also provided wide open space, an entirely new canvas upon which to create a new career (see Figures 8). These are views from one of the homes in Utah which I have shared with my wife, Laurie Fraser, and daughter, Misha. The air is clean, and I can look for long distances. The elements of nature are all around us. What a place to begin a new life!

 Figure 8. Views from one of our homes in Utah and a photograph of my wife, Laurie Fraser, and daughter, Misha, just after she was born. Misha is now graduating from the University of California, Santa Cruz as an arts major.
 References 1. Capecchi, M. R. (1966). Cell-free protein synthesis programmed with R17 RNA: Identification of two phage proteins. J. Mol. Bol. 21:173–193. 2. Capecchi, M. R. and Gussin, G. N. (1965). Suppression in vitro: Identification of a serine-tRNA as a “Nonsense Suppressor.” Science 149:417–422. 3. Adams, J. M. and Capecchi, M. R. (1966). N-formylmethionine-tRNA as the initiator of protein syntheses. Proc. Natl. Acad. Sci. USA 55:147–155. 4. Capecchi, M. R. (1966). Initiation of E. coli proteins. Proc. Natl. Acad. Sci. USA 55:1517–1524. 5. Capecchi, M. R. (1967). Polypeptide chain termination in vitro: Isolation of a release factor. Proc. Natl. Acad. Sci. USA 58:1144–1151. 6. Capecchi, M. R. and Klein, H. A. (1970). Release factors mediating termination of complete proteins. Nature 26:1029–1033.

From Les Prix Nobel. The Nobel Prizes 2007, Editor Karl Grandin, [Nobel Foundation], Stockholm, 2008

This autobiography/biography was written at the time of the award and later published in the book series Les Prix Nobel/ Nobel Lectures/The Nobel Prizes. The information is sometimes updated with an addendum submitted by the Laureate.

Dr. Capecchi is a member of the National Academy of Sciences (1991) and the European Academy of Sciences (2002). He has won numerous awards, including the Bristol-Myers Squibb Award for Distinguished Achievement in Neuroscience Research (1992), the Gairdner Foundation International Award for Achievements in Medical Sciences (1993), the General Motors Corporation’s Alfred P. Sloan Jr. Prize for Outstanding Basic Science Contributions to Cancer Research (1994), the German Molecular Bioanalytics Prize, (1996), the Kyoto Prize in Basic Sciences (1996), the Franklin Medal for Advancing Our Knowledge of the Physical Sciences (1997), the Feodor Lynen Lectureship (1998), the Rosenblatt Prize for Excellence (1998), the Baxter Award for Distinguished Research in the Biomedical Sciences (1998), the Helen Lowe Bamberger Colby and John E. Bamberger Presidential Endowed Chair in the University of Utah Health Sciences Center (1999), lectureship in the Life Sciences for the Collège de France (2000), the Horace Mann Distinguished Alumni Award, Antioch College (2000), the Italian Premio Phoenix-Anni Verdi for Genetics Research Award (2000), the Spanish Jiménez-Diáz Prize (2001), the Pioneers of Progress Award (2001), the Albert Lasker Award for Basic Medical Research (2001), the National Medal of Science (2001), the John Scott Medal Award (2002), the Massry Prize (2002), the Pezcoller Foundation-AACR International Award for Cancer Research (2003), the Wolf Prize in Medicine (2002/03), the March of Dimes Prize in Developmental Biology (2005),and the Nobel Prize in Physiology and Medicine (2007) with Oliver Smithies and Martin Evans.

Research interests include: the molecular genetic analysis of early mouse development, neural development in mammals, production of murine models of human genetic diseases, gene therapy, homologous recombination and programmed genomic rearrangements in the mouse.

http://www.hhmi.org/news/making-scientist

Mario Capecchi received a Kyoto Prize from the Inamori Foundation in 1996. The lecture he delivered when he accepted the prize in Japan in November 1996 tells the story of his remarkable life. The text of the lecture has been edited for length.

Radoslav Bozov commented on Targeted gene modification

Targeted gene modification Larry H Bernstein, MD, FCAP, Curator Leaders in Pharmaceutical Intelligence Series E. 2: …

Larry, same thing, data redundancy of data mining issues, of what data is in reality of physics beyond nano space in time! Working on something that does not exits in space and time, but computable mass of ‘designed’ energy formulated systems: Hox gene does not exist: It is a piece of time we percive through some kind of imagination +1,

The data generated through m/z methods is space-time unaccurate! Guess what is double R doing here instead of double Y, wonder why miRNA are obejctive to polymer degradation process ??!! what are we really seeing is not what is really in there!

Score Expect Method Identities Positives Gaps
19.2 bits(38) 0.076 Composition-based stats. 8/17(47%) 10/17(58%) 0/17(0%)

Query 511 LTEDRRAFAARMAEIGE 527
LT DRR AR+ + E
Sbjct 38 LTRDRRYEVARLLNLTE 54

Breaking news about genomic engineering, T2DM and cancer treatments

Larry H. Bernstein, MD, FCAP, Curator