Healthcare analytics, AI solutions for biological big data, providing an AI platform for the biotech, life sciences, medical and pharmaceutical industries, as well as for related technological approaches, i.e., curation and text analysis with machine learning and other activities related to AI applications to these industries.
Nicole Kidman’s new role shines a light on genetics
Nicole Kidman has taken to the London stage to play Rosalind Franklin, one of the most important yet overshadowed scientists of the 20th century. The impact of her work is still revolutionizing genetics work in modern pathology.
Photograph 51 relates Franklin’s contribution to the discovery of the double helix structure of DNA in the 1950s. The play depicts the sometimes confrontational working relationship between the talented Franklin and her laboratory partner, Maurice Wilkins.
The play’s name comes from the X-ray image of DNA that Franklin created. It was this image that led scientists James Watson and Francis Crick to determine the chemical structure of DNA, ushering in the age of modern genetics.
In 1962, the Nobel Prize in Physiology or Medicine was awarded to Watson, Crick and Wilkins, with Franklin notably overlooked. In 1958, Franklin died of cancer, never having been recognised for her work. Photograph 51 attempts to bring Franklin’s role to light.
Dr Melody Caramins is a genetic pathologist working in Sydney. She says modern medicine would look very different without the discovery.
“Genetic testing is widely used, particularly for screening; for example prenatal testing for Down Syndrome and newborn bloodspot testing for life-threatening conditions like Cystic Fibrosis.
Genetic testing can also suggest if a particular cancer drug is likely to be effective for an individual patient. Testing can also indicate an elevated risk of developing a hereditary cancer.”
Dr Caramins says that genetics is an exciting and rapidly developing area to work in as there are so many questions to be answered.
“I encourage anyone willing to work hard to consider pathology and genetics in particular. There is great variety in the work on offer, including lab work and consulting directly with patients.”
This burgeoning profession owes much to genetic pioneers like Rosalind Franklin.
The Adenosine Receptor Agonist 5’-N-Ethylcarboxamide-Adenosine Increases Mouse Serum Total Homocysteine Levels, Which Is a Risk Factor for Cardiovascular Diseases
Spring ZhouEditor at Scientific Research Publishing
I would like to share this paper with you. Any comments on this article are welcome.
An increase in total homocysteine (Hcy) levels (protein-bound and free Hcy in the serum) has been identified as a risk factor for vascular diseases. Hcy is a product of the methionine cycle and is a precursor of glutathione in the transsulfuration pathway. The methionine cycle mainly occurs in the liver, with Hcy being exported out of the liver and subsequently bound to serum proteins. When the non-specific adenosine receptor agonist 5’-N-ethylcarboxamide-adenosine (NECA; 0.1 or 0.3 mg/kg body weight) was intraperitoneally administered to mice that had been fasted for 16 h, total Hcy levels in the serum significantly increased 1 h after its administration. The NECA treatment may have inhibited transsulfuration because glutathione levels were significantly decreased in the liver. After the intraperitoneal administration of a high dose of NECA (0.3 mg/kg body weight), elevations in total Hcy levels in the serum continued for up to 10 h. The mRNA expression of methionine metabolic enzymes in the liver was significantly reduced 6 h after the administration of NECA. NECA-induced elevations in total serum Hcy levels may be maintained in the long term through the attenuated expression of methionine metabolic enzymes.
Comments:
Is level of protein consumption a factor?
Is reliance on plant food products a factor?
What are the levels of transthyretin?
Is there a concomitant decrease in vitamin A or vitamin D?
The Adenosine Receptor Agonist 5’-N-Ethylcarboxamide-Adenosine Increases Mouse Serum Total Homocysteine Levels, Which Is a Risk Factor for Cardiovascular Diseases
Sakata, S. , Matsuda, K. , Horikawa, Y. and Sasaki, Y. (2015) The Adenosine Receptor Agonist 5’-N-Ethylcarboxamide-Adenosine Increases Mouse Serum Total Homocysteine Levels, Which Is a Risk Factor for Cardiovascular Diseases. Pharmacology & Pharmacy, 6, 461-470. doi: 10.4236/pp.2015.610048.
An increase in total serum homocysteine levels (total Hcy: serum protein-bound and free Hcy) has been identified as a risk factor for cardiovascular disease [1] [2] and liver fibrosis [3]. The normal range of total Hcy in adults is typically 5 – 15 μM, with the mean level being approximately 10 μM [2]. Plasma Hcy concentrations were previously found to be strongly associated with the presence and number of small infarctions, or infarction of the putamen in elderly diabetic patients [4]. High levels of Hcy have been shown to induce endoplasmic reticulum (ER) stress and increase the production of reactive oxygen species (ROS) [5]. Hcy has strong reducibility and modifies disulfide bonds in proteins. Only 1% to 2% of Hcy occurs as thiol homocysteine in the serum; 75% of Hcy has been suggested to bind to proteins through disulfide bonds with protein cysteines [6]. Hcy is formed as an intermediary in methionine metabolism [7] [8]. Methionine metabolism mainly occurs in the livers of mammals. Methionine receives an adenosine group from ATP to become S-adenosylmethionine (AdoMet) in the methionine cycle. This reaction is catalyzed in the liver by liver-specific methionine adenosyltransferase I/III (MAT I/III), which is encoded by the methionine adenosyltransferase 1A (MAT1A) gene [9]. AdoMet then transfers its methyl group to a large number of compounds, a process that is catalyzed by various methyltransferases (e.g., glycine N-methyltransferase: GNMT; DNA methyltransferase; phosphatidylethanolamine N-methyl- transferase), to produce S-adenosylhomocysteine (AdoHcy). Hcy is formed from AdoHcy by AdoHcy hydrolase (SAHH). The reaction that generates Hcy from AdoHcy is reversible, and AdoHcy from Hcy is shown to be thermodynamically favored over the synthesis of Hcy [10]. A previous study reported that Hcy levels were very low in the liver [11]. This reaction then proceeds toward the synthesis of Hcy when the products (Hcy and adenosine) are removed by further metabolism [12]. Three enzymes metabolize Hcy, with the betaine-homocysteine S-methyltransferase (BHMT) and methionine synthase (MS) reactions both yielding methionine. A large proportion of Hcy in the liver is remethylated by BHMT [3]. The third enzyme, cystathionine β-synthase (CBS) catalyzes Hcy to cystathionine in the transsulfuration pathway. Previous studies of whole body methionine kinetics demonstrated that 62% of Hcy was converted to cystathionine during each cycle in males fed a basal diet, resulting in the production of glutathione (GSH), while 38% of Hcy was remethylated to methionine [13]. Hcy is located at an important regulatory branch point: remethylation to methionine; conversion to cystathionine; export from the cells.
A decrease in intracellular ATP levels, accompanied by the accumulation of 5’-AMP and subsequently adenosine, is known to follow ischemia. Adenosine levels in interstitial fluids were shown to increase 100 – 1000- fold from basal levels (10 – 300 nM) with ischemia [14]. Furthermore, adenosine levels in hepatocytes were increased by a hypoxic challenge, with excess amounts of adenosine being exported out of cells [14]. Adenosine levels were also found to increase 10-fold due to hypoxia, stress, and inflammation [15]. Adenosine has been shown to activate A1, A2a, and A3 receptors with EC50 values in the range of 0.2 – 0.7 μM, and also A2b receptors with an EC50 of 24 μM [16]. A1 and A3 receptors have been classified as adenylate cyclase inhibitory receptors, and A2a and A2b receptors as adenylate cyclase-activating receptors [17]. The activation of adenosine receptors accompanied by ischemia may increase total Hcy levels in the serum because hepatic ischemia is known to decrease the content of GSH and activity of MAT [18].
We previously reported that the non-specific adenosine receptor agonist 5’-N-ethylcarboxamide-adenosine (NECA) increased serum glucose levels and the expression of a glucogenic enzyme (glucose 6-phosphatase) in the liver [19] [20]. Based on the dose of NECA administered in these studies and plasma concentrations after the administration of other adenosine agonists [21], it was inferred that the serum NECA concentration was in the μM range and also that NECA activated adenosine A2b receptors. In the present study, we measured methionine metabolites, including Hcy, in NECA-treated mice in order to determine whether the activation of adenosine receptors increased total Hcy levels in the serum. The results obtained clearly demonstrated that NECA increased total Hcy levels in the serum.
Measurement of Methionine Metabolites AdoMet and AdoHcy levels in the liver were measured using an HPLC method [25] and total GSH in the liver was measured using a microtiter plate assay [26], as described previously [23]. Total Hcy and total cysteine levels (total Cys: free and protein-bound cysteine) in the serum were measured using an HPLC method [27]. Briefly, a mixture of 50 μL of serum, 25 μL of an internal standard, and 25 μL of phosphate-buffered saline (PBS, pH 7.4) was incubated with 10 μL of 100 mg/mL TCEP for 30 min at room temperature in order to reduce and release protein-bound thiols. After this incubation, 90 μL of 100 mg/mL trichloroacetic acid containing 1 mmol/L EDTA was added for deproteinization, centrifuged at 15,000 ×g for 10 min, and 50 μL of the supernatant was added to a tube containing 10 μL of 1.55 mol/L NaOH; 125 μL of 0.125 mol/L borate buffer containing 4 mmol/L EDTA, pH 9.5; and 50 μL of 1 mg/mL SBD-F in the borate buffer. The sample was then incubated for 60 min at 60˚C. HPLC was performed on a Waters M-600 pump equipped with a Waters 2475 Multi λ Fluorescence Detector (385 nm excitation, 515 nm emission). The separation of SBD-derivatized thiols was performed on a μ-BONDASPHERE C18 column (Waters, 5 μm, 100 A, 150 × 3.9 mm) with a 20-μL injection volume and 0.1 mol/L acetate buffer, pH 5.5, containing 30 ml/L methanol as the mobile phase at a flow rate of 1.0 mL/min and column temperature of 29˚C.
3.1. Effects of NECA on Total Hcy and Total Cys Levels in the Serum As shown in Table 1, serum total Hcy and total Cys levels significantly increased after 16 h of fasting. The administration of a low dose of NECA (NECA0.1 group) to mice fasted for 16 h resulted in higher serum total Hcy levels than those in the control group at 1 h (Experiment 1). Serum total Hcy levels were also significantly elevated at 3 h (Experiment 2), but were not significantly different from those in the control group at 6 h (Experiment 3). The administration of a high dose of NECA (NECA0.3 group) resulted in significantly higher serum total Hcy levels than those in the control group at 1 h, 3 h, 6 h, and 10 h (Experiments 4, 5, 6, and 7), gradually increasing Hcy levels to 19.7 μM. The effects of NECA on serum total Cys levels were the same as those on total Hcy levels.
Table 1. Effects of NECA on the content of total homocysteine and total cysteine in the serum.
3.2. Effects of NECA on Other Methionine Metabolite Levels in the Liver We previously reported that fasting for 16 h decreased AdoMet and GSH levels, and increased AdoHcy levels in the livers of mice [23]. In the present study, as shown in Table 2, the administration of a low dose of NECA (NECA0.1 group) to mice fasted for 16 h resulted in lower liver GSH levels than those in the control group at 1 h (Experiment 1). Liver GSH levels were also significantly lower at 3 h (Experiment 2), while GSH levels were not significantly different from those in the control group at 6 h (Experiment 3). The administration of a high dose of NECA (NECA0.3 group) resulted in liver GSH levels that were significantly lower than those in the control group at 1 h, 6 h, and 10 h (Experiments 4, 6, and 7). The effects of NECA on total Hcy levels in the serum and GSH levels in the liver were similar at each dose and time. Furthermore, the low and high doses of NECA both led to significantly higher AdoMet levels than those in the control group at 1 h (Experiments 1 and 4). AdoMet levels at 3 h, 6 h, and 10 h were not significantly different from those in the control group (Experiments 2, 3, 5, 6, and 7). AdoHcy levels were significantly lower in the NECA0.3 group than in the control group 6 h and 10 h after the administration of NECA (Experiments 6 and 7), while the administration of a low dose of NECA had less of an impact on AdoHcy levels.
Table 2. Effects of NECA on the content of methionine metabolites in the liver.
3.3. Effects of NECA on mRNA Expression of Methionine Cycle Enzymes in the Liver Figure 1 shows changes in the mRNA expression of methionine cycle enzymes in Experiments 4, 5, and 6. The expression of methionine cycle enzymes did not significantly change 1 h after the administration of NECA. The expression of MAT1A mRNA was significantly decreased in the liver 6 h after the NECA treatment, while that of MAT2A was increased. The changes observed in the expression of MAT in the present study were consistent with previous findings obtained in ischemic livers [18] or with liver regeneration [28]. The expression of GNMT, which eliminates excess AdoMet, was significantly decreased 6 h after the NECA treatment. The expression of CBS, which converts Hcy to cystathionine through the transsulfuration pathway, and BHMT, which converts Hcy to methionine, was also decreased at 6 h.
Figure 1 shows changes in the mRNA expression of methionine cycle enzymes in Experiments 4, 5, and 6. The expression of methionine cycle enzymes did not significantly change 1 h after the administration of NECA. The expression of MAT1A mRNA was significantly decreased in the liver 6 h after the NECA treatment, while that of MAT2A was increased. The changes observed in the expression of MAT in the present study were consistent with previous findings obtained in ischemic livers [18] or with liver regeneration [28]. The expression of GNMT, which eliminates excess AdoMet, was significantly decreased 6 h after the NECA treatment. The expression of CBS, which converts Hcy to cystathionine through the transsulfuration pathway, and BHMT, which converts Hcy to methionine, was also decreased at 6 h.
Figure 1. Effects of NECA on the mRNA expression of methionine cycle enzymes in the mouse liver. Northern hybridization was performed on the liver RNA of mice in experiments 4, 5, and 6. The mean ± SEM of the ratio of each enzyme mRNA to the level of the 18S rRNA signal is shown as an arbitrary unit. Unpaired Student’s t-tests were used to compare NECA- treated groups with the control groups. *p < 0.05, **p < 0.01: significantly different from each control.
4. Discussion In the present study, an increase in total Hcy levels and AdoMet levels, and decrease in GSH levels occurred 1 h after the NECA treatment. These results were not due to changes in the expression of methionine metabolic enzymes, which remained unchanged 1 h after the NECA treatment (Figure 1). The effects of NECA on methionine metabolism are summarized in Figure 2. No previous study has demonstrated that adenosine has the ability to directly affect CBS; however, the overproduction of carbon monoxide (CO), which is generated by heme oxygenase (HO), is found to inhibit transsulfuration [11]. CO has been shown to inhibit CBS activity and increase AdoMet concentrations [11]. Adenosine and NECA were previously reported to markedly induce HO in macrophages [29]. Hcy, which is a substrate of CBS, may be increased by NECA via the CO-induced inhibition of CBS, and GSH may be decreased by the CO-induced inhibition of transsulfuration. However, the mechanism by which NECA affects transsulfuration in the short term has not yet been elucidated.
Figure 2. Effects of NECA on the methionine metabolic pathway. MAT: methionine adenosyltransferase, GNMT: glycine N-methyltransferase, CBS: cystathionine β-synthase, BHMT: betaine-homocysteine S-methyltransferase, MS: methionine synthase (Map is based on Sakata SF 2005).
GSH was maintained at a low level for up to 10 h by the NECA0.3 treatment and transsulfuration may have been continuously inhibited by the NECA0.3 treatment. Total Hcy levels were also continuously increased for up to 10 h by the NECA0.3 treatment, and decreased AdoHcy levels were observed 6 h and 10 h after the NECA0.3 treatment. Long-term elevations in serum total Hcy levels by NECA may be maintained by attenuating the expression of methionine metabolic enzymes via the following mechanisms: The expression of methionine metabolic enzymes in the liver was reduced 6 h after the NECA0.3 treatment (Figure 1); the flow of the methionine cycle may have been decreased by changes in the expression of MAT (decreased liver-specific MAT1A expression and increased non-liver type MAT2A expression) because MATIII (Km for methionine: 215 μM – 7 mM) is the true liver-specific isoform responsible for methionine metabolism [30] and the generation rate of AdoMet by MATII (non-liver type enzyme) was modest with a low Km (80 μM for methionine) [31]; inhibition of the methyltransferases, BHMT [32] and GNMT [33], induces hyperhomocysteinemia; decreases in AdoHcy levels may be caused by reductions in methyltransferase levels. However, the mechanisms by which NECA continuously increased total Hcy levels have not yet been elucidated in detail. 5. Conclusion The present study confirmed that the non-specific adenosine receptor agonist NECA continuously increased total Hcy levels in the serum. The inhibition of adenosine receptors may decrease the risk of cardiovascular diseases because an increase in serum total Hcy levels is a known risk factor.
Antoniades, C., Antonopoulos, A.S., Tousoulis, D., Marinou, K. and Stefanadis, C. (2009) Homocysteine and Coronary Atherosclerosis: from Folate Fortification to the Recent Clinical Trials. European Heart Journal, 30, 6-15. http://dx.doi.org/10.1093/eurheartj/ehn515
Garcia-Tevijano, E.R., Berasain, C., Rodriguez, J.A., Corrales, F.J., Arias, R., Martin-Duce, A., Caballeria, J., Mato, J.M. and Avila, M.A. (2001) Hyperhomocysteinemia in Liver Cirrhosis: Mechanisms and Role in Vascular and Hepatic Fibrosis. Hypertension, 38, 1217-1221. http://dx.doi.org/10.1161/hy1101.099499
Araki, A., Ito, H., Majima, Y., Hosoi, T. and Orimo, H. (2003) Association between Plasma Homocysteine Concentrations and Asymptomatic Cerebral Infarction or Leukoaraiosis in Elderly Diabetic Patients. Geriatrics & Gerontology International, 3, 15-23. http://dx.doi.org/10.1046/j.1444-1586.2003.00051.x
De La Haba, G. and Cantoni, G.L. (1959) The Enzymatic Synthesis of S-Adenosyl-L-Homocysteine from Adenosine and Homocysteine. The Journal of Biological Chemistry, 234, 603-608. http://www.jbc.org/content/234/3/603.short
Variability of Gene Expression and Drug Resistance, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 1: Next Generation Sequencing (NGS)
Variability of Gene Expression and Drug Resistance
Larry H. Bernstein, MD, FCAP, Curator
LPBI
New Data Suggest Extreme Genetic Diversity of Tumors May Impart Drug Resistance
NEW YORK (GenomeWeb) – Researchers from the University of Chicago and the Beijing Institute of Genomics have undertaken one of the most extensive analyses of the genome of a single tumor and found far greater genetic diversity than anticipated. Such variation, they said, may enable even small tumors to resist treatment.
“With 100 million mutations, each capable of altering a protein in some way, there is a high probability that a significant minority of tumor cells will survive, even after aggressive treatment,” Chung-I Wu, a University of Chicago researcher and senior author of the study, said in a statement. “In a setting with so much diversity, those cells could multiply to form new tumors, which would be resistant to standard treatments.”
Extremely high genetic diversity in a single tumor points to prevalence of non-Darwinian cell evolution
A tumor comprising many cells can be compared to a natural population with many individuals. The amount of genetic diversity reflects how it has evolved and can influence its future evolution. We evaluated a single tumor by sequencing or genotyping nearly 300 regions from the tumor. When the data were analyzed by modern population genetic theory, we estimated more than 100 million coding region mutations in this unexceptional tumor. The extreme genetic diversity implies evolution under the non-Darwinian mode. In contrast, under the prevailing view of Darwinian selection, the genetic diversity would be orders of magnitude lower. Because genetic diversity accrues rapidly, a high probability of drug resistance should be heeded, even in the treatment of microscopic tumors.
The prevailing view that the evolution of cells in a tumor is driven by Darwinian selection has never been rigorously tested. Because selection greatly affects the level of intratumor genetic diversity, it is important to assess whether intratumor evolution follows the Darwinian or the non-Darwinian mode of evolution. To provide the statistical power, many regions in a single tumor need to be sampled and analyzed much more extensively than has been attempted in previous intratumor studies. Here, from a hepatocellular carcinoma (HCC) tumor, we evaluated multiregional samples from the tumor, using either whole-exome sequencing (WES) (n = 23 samples) or genotyping (n = 286) under both the infinite-site and infinite-allele models of population genetics. In addition to the many single-nucleotide variations (SNVs) present in all samples, there were 35 “polymorphic” SNVs among samples. High genetic diversity was evident as the 23 WES samples defined 20 unique cell clones. With all 286 samples genotyped, clonal diversity agreed well with the non-Darwinian model with no evidence of positive Darwinian selection. Under the non-Darwinian model,MALL (the number of coding region mutations in the entire tumor) was estimated to be greater than 100 million in this tumor. DNA sequences reveal local diversities in small patches of cells and validate the estimation. In contrast, the genetic diversity under a Darwinian model would generally be orders of magnitude smaller. Because the level of genetic diversity will have implications on therapeutic resistance, non-Darwinian evolution should be heeded in cancer treatments even for microscopic tumors.
Data deposition: The sequence data reported in this paper have been deposited in the genome sequence archive of Beijing Institute of Genomics, Chinese Academy of Sciences, gsa.big.ac.cn (accession no. PRJCA000091).
Aziz Belkadi, Alexandre Bolze, Yuval Itan, Aurélie Cobat, Quentin B. Vincent, Alexander Antipenko, Lei Shang, Bertrand Boisson, Jean-Laurent Casanova, and Laurent Abel
Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variantsPNAS 2015 112 (17)5473–5478; published ahead of print March 31, 2015,doi:10.1073/pnas.1418631112
Estimating the total genetic diversity of a spatial field population from a sample and implications of its dependence on habitat areaPNAS 2005 102 (28)9826–9829; published ahead of print July 5, 2005, doi:10.1073/pnas.0408471102
The findings, which appeared in the Proceedings of the National Academy of Sciences this week, also call into question the widely held view that evolution at the cellular level is driven by Darwinian selection, revealing a level of rapid and extensive genetic diversity beyond what would be expected under this model.
In the study, the researchers focused on a single hepatocellular carcinoma tumor, roughly the size of a ping pong ball. They sampled 286 regions from a single slice of the tumor, studying each one with either whole-exome sequencing or genotyping under both the infinite-site and infinite-allele models of population genetics.
Based on their analyses, the team estimated more than 100 million coding region mutations in what they called an “unexceptional” tumor — more mutations than would ordinarily be expected by orders of magnitude, according to Wu.
This extreme genetic diversity, the study’s authors wrote, implies evolution under the non-Darwinian mode, which is driven by random mutations largely unaffected by natural selection. It also raises the question of why there is so little apparent Darwinian selection in the tumor.
The scientists speculated that in solid tumors, cells remain together and do not migrate, “so that when an advantageous mutation indeed emerges, cells carrying it are competing mostly with themselves. These mutations may confer advantages in fighting for space or extracting nutrients, but they are stifled by their own advantages,” they wrote.
Beneficial mutations may emerge on occasion, but in solid tumors the cell populations are “so structured that selection may often be blunted,” they stated. “The physiological effect has to be very strong to overcome those constraints.” Cancer drugs could remove those constraints, loosening up a cell population and allowing competition to occur, the investigators added.
Wu and his colleagues see the presence of so many mutations in a tumor as creating problems when it comes to treatment. “It almost guarantees that some cells will be resistant,” study co-author and University of Chicago oncologist Daniel Catenacci said in the statement. “But it also suggests that aggressive treatment could push tumor cells into a more Darwinian mode.”
Overall, the findings highlight the need to consider non-Darwinian evolution and the vast genetic diversity it can confer as factors when developing treatment strategies, even for small tumors, the researchers concluded.
Human Genetics and Childhood Diseases, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 1: Next Generation Sequencing (NGS)
Human Genetics and Childhood Diseases
Curator: Larry H. Bernstein, MD, FCAP
Publication Roundup: HGMD
HGMD®, the Human Gene Mutation Database is used by scientists around the world to find information on reported genetic mutations. The papers below use the database to advance our understanding of disease, DNA dynamics, and more.
Local DNA dynamics shape mutational patterns of mononucleotide repeats in human genomes First author: Albino Bacolla
Scientists in the US and UK published results in Nucleic Acids Research of a detailed analysis of single-base substitutions and indels in the human genome. Their findings show that certain base positions are more susceptible to mutagenesis than others. They used HGMD Professional to find mutations in specific genomic regions for analysis; the paper includes charts showing mutation patterns, germline SNPs, and more from HGMD data.
High prevalence of CDH23 mutations in patients with congenital high-frequency sporadic or recessively inherited hearing loss First author: Kunio Mizutari
This Orphanet Journal of Rare Diseases paper from scientists in Japan sequenced 72 patients with unexplained hearing loss, finding several CDH23 mutations, some of which were novel. Mutations in the gene have been linked to Usher syndrome and other forms of hereditary hearing loss. The scientists used HGMD to find all known CDH23 mutations within nearly 70 coding regions.
Mutation analyses and prenatal diagnosis in families of X-linked severe combined immunodeficiency caused by IL2Rγ gene novel mutation First author: Q.L. Bai
In Genetics and Molecular Research, scientists report the utility of mutation analysis of the interleukin-2 receptor gamma gene to assess carrier status and perform prenatal diagnosis for X-linked severe combined immunodeficiency. They studied two high-risk families, along with 100 controls, to evaluate the approach. Sequence variation was determined using HGMD Professional and an X-SCID database, and a new mutation was discovered in the project.
Impact of glucocerebrosidase mutations on motor and nonmotor complications in Parkinson’s disease First author: Tomoko Oeda
Researchers from three hospitals in Japan published this Neurobiology of Aging report that may help stratify Parkinson’s disease patients by prognosis. They sequenced mutations in the GBA gene in 215 patients, finding that those who had mutations associated with Gaucher disease suffered dementia and psychosis much earlier than those who didn’t. The team found previously reported GBA mutations using HGMD Professional.
Comprehensive Genetic Characterization of a Spanish Brugada Syndrome Cohort First author: Elisabet Selga
In this PLoS One publication, scientists from a number of institutions in Spain examined genetic variation among patients with Brugada syndrome, a rare genetic cardiac arrhythmia. They sequenced 14 genes in 55 patients, identifying 61 variants and finding the subset that appear pathogenic. Variants were filtered against a number of databases, including HGMD.
Local DNA dynamics shape mutational patterns of mononucleotide repeats in human genomes
Single base substitutions (SBSs) and insertions/deletions are critical for generating population diversity and can lead both to inherited disease and cancer. Whereas on a genome-wide scale SBSs are influenced by cellular factors, on a fine scale SBSs are influenced by the local DNA sequence-context, although the role of flanking sequence is often unclear. Herein, we used bioinformatics, molecular dynamics and hybrid quantum mechanics/molecular mechanics to analyze sequence context-dependent mutagenesis at mononucleotide repeats (A-tracts and G-tracts) in human population variation and in cancer genomes. SBSs and insertions/deletions occur predominantly at the first and last base-pairs of A-tracts, whereas they are concentrated at the second and third base-pairs in G-tracts. These positions correspond to the most flexible sites along A-tracts, and to sites where a ‘hole’, generated by the loss of an electron through oxidation, is most likely to be localized in G-tracts. For A-tracts, most SBSs occur in the direction of the base-pair flanking the tracts. We conclude that intrinsic features of local DNA structure, i.e. base-pair flexibility and charge transfer, render specific nucleotides along mononucleotide runs susceptible to base modification, which then yields mutations. Thus, local DNA dynamics contributes to phenotypic variation and disease in the human population.
INTRODUCTION
Changes in human genomic DNA in the form of base substitutions and insertions/deletions (indels) are essential to ensure population diversity, adaptation to the environment, defense from pathogens and self-recognition; they are also a critical source of human inherited disease and cancer. On a genome-wide scale, base substitutions result from the combined action of several factors, including replication fidelity, lagging versus leading strand DNA synthesis, repair, recombination, replication timing, transcription, nucleosome occupancy, etc., both in the germline and in cancer (1–4). On a much finer scale [(over a few base pairs (bp)], rates of base substitutions may be strongly influenced by interrelationships between base–protein and base–base interactions. For example, the mutator role of activation-induced deaminase (AID) in B-cells during class-switch recombination and somatic hypermutation (5) targets preferentially cytosines within WRC (W: A|T; R: A|G) sequences (6), whereas apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like (APOBEC) overexpression displays a preference for base substitutions at cytosines in TCW contexts (7). Other examples, such as the induction of C→T transitions at CG:CG dinucleotides by cytosine-5-methylation and the role of UV light in promoting base substitutions at pyrimidine dimers have been well documented (reviewed in (4,8)). More recently, complex patterns of base substitution at guanosines in cancer genomes have been found to correlate with changes in guanosine ionization potentials as a result of electronic interactions with flanking bases (9), suggesting a role for electron transfer and oxidation reactions in sequence-dependent mutagenesis. However, despite these advances, the increasing number of sequence-dependent patterns of mutation noted in genome-wide sequencing studies has met with a lack of understanding of most of the underlying mechanisms (10). Thus, a picture is emerging in which mutations are often heavily dependent on sequence-context, but for which our comprehension is limited.
Mononucleotide repeats comprise blocks of identical base pairs (A|T or C|G; hereafter referred to as A-tracts and G-tracts) and display distinct features: they are abundant in vertebrate genomes; mutations within the tracts occur more frequently than the genome-wide average; mutations generally increase with increasing tract length; length instability is a hallmark of mismatch repair-deficiency in cancers; and sequence polymorphism within the general population has been linked to phenotypic diversity (11–15). Thus, mononucleotide repeats appear ideal for addressing the question of sequence-dependent mutagenesis since base pairs within the tracts are flanked by identical neighbors. Both historic and recent investigations concur with the conclusion that a major source of mononucleotide repeat polymorphism is the occurrence of slippage (i.e. repeat misalignment) during semiconservative DNA replication, which gives rise to the addition or deletion of repeat units (11,12). An additional and equally important source of mutation has recently been suggested to arise from errors in DNA replication by translesion synthesis DNA polymerases, such as pol η and pol κ (13), also on slipped intermediates, leading to single base substitutions.
A key question that remains unanswered in these studies and which is relevant to the issue of sequence context-dependent mutagenesis is whether all base pairs within mononucleotide repeats display identical susceptibility to single base changes and whether indels (which are consequent to DNA breakage) occur randomly within the tracts.
Herein, we combine bioinformatics analyses on mononucleotide repeat variants from the 1000 Genomes Project and cancer genomes with molecular dynamics simulations and hybrid quantum mechanics/molecular mechanics calculations to address the question of sequence-dependent mutagenesis within these tracts. We show that mutations along both A-tracts and G-tracts are highly non-uniform. Specifically, both base substitutions and indels occur preferentially at the first and last bp of A-tracts, whereas they are concentrated between the second and third G:C base pairs in G-tracts. These positions coincide with the most flexible base pairs for A-tracts and with the preferential localization of a ‘hole’ that results when one electron is lost due to an oxidation reaction anywhere along G-tracts. Thus, despite the uniformity of sequence composition, mutations occur in a sequence-dependent context at homopolymeric runs according to a hierarchy that is imposed by both local DNA structural features and long-range base–base interactions. We also show that the repair processes leading to base substitution must differ between A- and G-tracts, since in the former, but not in the latter, base substitutions occur predominantly in the direction of the base immediately flanking the tracts. Additional sequence-dependent patterns of mutation are likely to arise from studies of more heterogeneous sequence combinations, possibly involving other aspects intrinsic to the structure of DNA.
RESULTS
Mononucleotide repeat variation is defined by tract length and flanking base composition
We define mononucleotide repeats in the GRCh37/hg19 (hg19) human genome assembly as uninterrupted runs of A:T and G:C base pairs (hereafter referred to as A-tracts and G-tracts, respectively) from 4 to 13 base pairs in length (Figure 1A). We retrieved a total of 48,767,945 A-tracts and 13,633,781 G-tracts, both of which displayed a biphasic distribution with an inflection point between tract lengths of 8 and 9 (bp) and with the number of runs declining with length more dramatically for G-tracts than for A-tracts (Figure 1B), as noted previously (29). Both the number of short tracts and the extent of decline varied with flanking base composition, TA[n]T runs being two- to three-fold more abundant than CA[n]Cs (Supplementary Figure S1A) and AG[n]As declining the most rapidly (Supplementary Figure S1B). Thus, mononucleotide runs exist as a collection of separate pools of sequences in extant human genomes, each maintained at distinctive rates of sequence stability, as determined by factors such as bp composition (A:T versus G:C), tract length and flanking sequence composition.
Mononucleotide repeat variation, evolutionary conservation and association with transcription. (A) The search algorithm was designed to retrieve runs of As or Ts (A-tracts) and Gs or Cs (G-tracts) length n (n = 4 to 13), along with their 5′ (n = 0) and 3′ (n = n + 1) nearest neighbors from hg19. Tract bases were numbered 5′ to 3′ with respect to the purine-rich sequence. The panel exemplifies the nomenclature for A- and G-tracts of length 4. (B) Logarithmic plot of the number of A-tracts (closed circles) and G-tracts (open circles) in hg19 as a function of length. (C) Normalized fractions of polymorphic tracts (F SNV) (number of SNVs divided by both hg19 number of tracts and n) from the 1KGP for A-tracts (closed circles) and G-tracts (open circles). (D) Radial plot of SNVs in the 1KGP at the 5′ and 3′ nearest neighbors of A-tracts. Periphery, tract length; horizontal axis, scale for the fraction of SNVs (F SNV). (E) Radial plot of SNVs in the 1KGP at the 5′ and 3′ nearest neighbors of G-tracts. (F) Percent difference in the numbers of A-tracts (closed circles) and G-tracts (open circles) between syntenic regions of hg19 and HN genomes. (G) The exponents of Benjamini-corrected P-values for A-tract-containing genes enriched in transcription-factor binding sites plotted as a function of A-tract length (triangles); each value represents the median of the top 11 USCS_TFBS terms. The percent A-tracts (closed circles) and G-tracts (open circles) intersecting genomic regions pulled-down by chromatin immunoprecipitation using antibodies against transcription factors are plotted as a function of tract length. (H) List of gene enrichment terms with a Benjamini-corrected P-value of <0.05 in common between genes containing A- and G-tracts of lengths 4–13, excluding the UCSC_TFBS terms.
We examined the extent of sequence variation in the human population by mapping 38,878,546 single nucleotide variants (SNVs) from 1092 haplotype-resolved genomes (the 1000 Genomes Project, 1KGP) (30) to the hg19 A- and G-tracts. The normalized fractions of polymorphic tracts (F SNV) were greater for G-tracts than A-tracts and both displayed Gaussian-type distributions, with maxima of 0.067 for G-tracts of length 8 and 0.017 for A-tracts of length 9 (Figure 1C). CA[n]C and AG[n]A runs displayed the highest F SNV values for A- and G-tracts, respectively (Supplementary Figure S1C and D), with F SNV values for AG[n]As attaining ∼0.10 at length 8. We conclude that flanking base composition influences the rates of SNV within mononucleotide runs and, as a consequence, their representation in the reference human genome.
F SNV values at the flanking 5′ and 3′ bp were similar between A- and G-tracts, except for minor differences for the least represented (i.e. longest) tracts and did not exceed 0.02 (Supplementary Figure S1E). These fractions are expected to be greater than at more distant positions from the tracts, based on previous data (29). SNVs at G-tracts, but not at A-tracts, were more frequent than at flanking base pairs. F SNVs for base pairs flanking short (≤8 bp) tracts were at least twice as high as those flanking long tracts; F SNVs also displayed distinct sequence preference with most (∼0.1) variants occurring at Ts 3′ of G-tracts (Figure 1D and E). In summary, SNVs at mononucleotide runs do not increase monotonically with length but peak at 8–9 bp. This behavior mirrors the genomic distributions, both with respect to the total number of tracts (Figure 1B) and the subsets flanked by specific-sequence combinations (Supplementary Figure S1A–D). Variation at flanking base pairs also displayed a biphasic pattern centered at a length of 8–9 bp, with a greater chance of variation adjacent to G- than A-tracts and with characteristic sequence preferences.
Long tracts are evolutionarily conserved and associated with high transcription
To assess whether more variable monosatellite runs (Figure 1C) might have undergone a greater reduction in number in extant humans relative to extinct hominids, we compared the number of A- and G-tracts between syntenic regions of five individuals comprising hg19 and three Neanderthal (HN) specimens (31). The difference between hg19 and HN was very small (<±2%) for the short tracts, but it displayed more negative values in hg19 with increasing tract length, which reached a maximum of −11.8 and −32.7% for A- and G-tracts, respectively, of length 9. Beyond this threshold, the numbers of tracts converged for A-tracts, whereas they were more abundant in hg19 for G-tracts >11 bp (Figure 1F). In summary, the largest difference in the number of mononucleotide runs between hg19 and HN sequences was centered at 9 bp for both A- and G-tracts, suggesting that the length distributions (Figure 1A and Supplementary Figure S1A and B) reflect distinct rates of evolutionary gains and losses due to differential sequence mutability (Figure 1C) as a function of length and flanking sequence composition (12).
The fact that long (>9 bp) mononucleotide runs display low variability in the human population (Figure 1C) and sequence conservation during evolutionary divergence (Figure 1F) raises the possibility that they might serve functional roles. Through gene enrichment analyses, we found that genes containing A- and G-tracts were enriched for genes associated with the term ‘UCSC_TFBS’, which pertains to transcripts harboring frequent transcription factor binding sites (32,33). For A-tract-containing genes, the median P-values for the top 11 UCSC_TFBS terms decreased from 2.95E-26 for tracts of length 4 to 5.22E-241 for tracts of length 13 (Figure 1G). The percent of A-tracts intersecting genomic fragments amplified from chromatin immunoprecipitation using transcription-factor binding antibodies (32,33) also increased from 8.7 to 9.9 from length 6 to 13, whereas it was constant (mean ± SD, 22.4 ± 1.1) for G-tracts (Figure1G). For gene classes excluding ‘UCSC_TFBS’, a search for categories enriched at P < 0.05 and common to all A- and G-tract-containing genes returned a set of 25 terms, 22 of which were associated with high levels of tissue-specific gene expression (Figure 1H). In summary, these analyses extend prior work (14) supporting a role for mononucleotide tracts in enhancing gene expression, a function that for A-tracts appears to increase with increasing tract length.
Repeat variability is highly skewed
Next we addressed whether bp along A- and G-tracts display equal probability and type of variation. In the 1KGP dataset, the number of SNVs at each position along both A- and G-tracts of length 4 was within a two-fold difference (144,000–240,000); for both types of sequence, transitions (i.e. A→G and G→A) were the predominant (51–78%) type of base substitution (Supplementary Figure S2A and B). However, with increasing length, the number of SNVs decreased up to 30-fold more drastically for G-tracts than for A-tracts, with increasing numbers of transversions (A→T and G→C|T) being predominant. Normalizing the data for the number of tracts genome-wide revealed that the extent of SNV varied by up to 10-fold, depending upon tract length and bp position. Specifically, the highest degree of variation was observed at the first and last A within the A-tracts (i.e. A1 and An), which underwent up to 61% A→T and 43% A→C transversions, respectively, at length 9 (Figure 2A). Likewise, for G-tracts, the most polymorphic sites were G3, followed by G2, for mid-size tracts of 8–10 bp, with 44% G→C transversions at G3 for tracts of length 8 (Figure2B). Thus, the extent of SNV at mononucleotide runs is grossly skewed in human genomes, both along the sequence itself and across tract length, which must account for the bell-shape behavior in F SNV for the tracts as a whole (Figure 1C).
Population variation spectra. (A) Variation spectra of A-tracts. Percent (number of SNVs at each position divided by the number of tracts in hg19 × 100) of A→T (black), A→C (red) and A→G (green) SNVs in the 1KGP dataset (left). Percent SNVs at A1 as a function of tract length (right). (B) Variation spectra of G-tracts. As in panel A with G→T (black), G→C (red) and G→A (cyan) (left). Percent SNVs at G3 as a function of tract length (right). (C) Percent A→T, A→C and A→G transitions at each position along A-tracts (stars) preceded and followed by a T (TA[n]T, left), C (CA[n]C), center) and G (GA[n]G, right) as a function of tract length. (D) Percent G→T, G→C and G→A transitions at each position along G-tracts (stars) preceded and followed by a T (TG[n]T, left), C (CG[n]C), center) and A (AG[n]A, right) as a function of tract length. (E) Percent transitions at base pairs (stars) preceding or following A-tracts (left) and G-tracts (right) as a function of tract length (n). *, mutated position.
We assessed whether SNV hypervariability was associated with specific combinations of nearest neighbors. For A-tracts flanked 5′ by a T, C or G, the highest percentage of SNVs was observed at A1 when preceded by a T, which reached 7.9% for TA[n] tracts of length 9 (Supplementary Figure S2C). By contrast, for 3′ T, C or G, the greatest effect was elicited by a C, with the highest percentage (7.1%) of SNVs at An for A[n]C tracts of length 9 (Supplementary Figure S2D). Therefore, flanking base pairs play a critical role both in the spectra and frequencies of SNVs at A-tracts. More detailed plots along A-tracts either preceded (Supplementary Figure S2E), followed (Supplementary Figure S2F) or preceded and followed (Figure 2C) by a T, C or G revealed the dramatic and long-range (up to 9–10 bp for the longest tracts, higher than the value of 4 bp predicted by mathematical models of slippage (11)) influence of flanking base pairs on variation spectra, in which up to 95% of the changes were in the direction of the base flanking the tract. Because the number of A-tracts preceded or followed by a specific base varies by up to three-fold (Supplementary Figure S2G), we conclude that for A-tracts, the overall mutation fractions and spectra are the result of at least three variables; length, position along the tract, and base composition of the 5′ and 3′ nearest-neighbors.
For G-tracts flanked 5′ by a T, C or A, high percentages (10–12%) of SNVs were observed at G1 for tracts preceded by a C, an effect that decreased with increasing tract length (Supplementary Figure S3A). This result, together with an exceedingly low number of G→A transitions at G1 for tracts not preceded by a C (Supplementary Figure S3C) relative to all tracts (Supplementary Figure S2B), is consistent with the known high mutability of CG:CG dinucleotides as a result of cytosine-5 methylation (9). The hypermutability at G2 was observed preferentially for tracts preceded by an A, and to a lesser extent T, whereas that at G3 was insensitive to flanking sequence composition. Likewise, G-tracts flanked 3′ by a T, C or A did not display marked sequence-dependent effects (Supplementary Figure S3B). Detailed plots of the SNV spectra along G-tracts either preceded (Supplementary Figure S3D), followed (Supplementary Figure S3E), or preceded and followed (Figure 2D) by a T, C or A revealed a noticeable effect only for 5′ T in association with G→T substitutions at G1for tracts of length ≥8. Thus, despite a consistent over-representation of G-tracts flanked 5′ by a T (Supplementary Figures S3F and S1B), which must account for the high absolute number of SNVs at G1 for TG[n] relative to AG[n] and CG[n] (Supplementary Figure S3G), nearest-neighbor base composition seems to play a lesser role in SNV spectra at G-tracts than at A-tracts.
With respect to SNVs at the flanking 5′ and 3′ nearest positions, no B→A or H→G substitutions (Figure 1A) were found above a length threshold of 9 for A-tracts and 8 for G-tracts (Figure 2E, gray shading) out of 5969 SNVs, implying that tract expansion by recruiting flanking base pairs is disfavored at these lengths. In summary, base substitution along mononucleotide repeats is strongly skewed towards the edges of A-tracts and within the 5′ half of G-tracts, with frequencies that peak at midsize lengths (8–9 bp). For A-tracts ≥7 bp, base substitution occurred almost exclusively in the direction of the flanking nearest-neighbors. Finally, base substitution at flanking bases did not contribute to tract expansion for mononucleotide runs longer than 8–9 bp.
Insertions and deletions display length and positional preference
In addition to SNVs, mononucleotide runs are polymorphic in length as a result of indels. Herein, we consider separately two types of indels: one in which tract length changes by ±1 and flanking bp composition is not altered (slippage); the other comprising all other cases involving the addition or removal of 1–200 bp (indels). Slippage is a widely accepted mutational mechanism (11–12,34), whereby DNA replication errors at reiterated DNA motifs cause changes in the number of motifs (most often +/−1). The normalized fractions of slippage in the 1KGP dataset peaked at lengths of 8 bp for A-tracts and 9 bp for G-tracts (Figure 3A), generating bell-shaped curves similar to those observed for SNVs (Figure1C) and with no differences in the highest fraction of ‘slipped’ tracts, which peaked at ∼0.02. By contrast, +1 slippage occurred more frequently than −1 slippage at A-tracts (Figure 3B). These results support recent studies on microsatellite repeats (12) and contrast with previous conclusions that slippage increases monotonically with tract length, and that the extent of slippage differs between A- and G-tracts (35,36).
Population insertions and deletions. (A) Normalized fractions of A-tracts (closed circles) and G-tracts (open circles) displaying +/−1 bp slippage in the 1KGP dataset as a function of tract length. Data were obtained by dividing the number of events by both the number of hg19 tracts and tract length (n). (B) Ratio of the number of +1 to −1 slippage for A-tracts (closed circles) and G-tracts (open circles). (C) Indels at A-tracts. For positions along the tracts (‘Tract’), ‘F Indel’ is the ratio between the number of indels and the number of tracts in hg19 multiplied by tract length. For the positions immediately flanking the tracts genomic coordinates (‘Before tract’ and ‘After tract’), ‘F Indel’ is the ratio between the number of indels and the number of tracts in hg19. (D) Indels at G-tracts, calculated as described in panel C. (E) Heatmap representation of insertions along A-tracts. The percent insertions (i.e. the number of insertions at each position divided by the number of tracts in hg19) (y-axis) plotted as a function of location (x-axis) from position 0 (insertion between the bp 5′ to the tract and the first bp of the tract) to position n + 1 (insertion between the bp 3′ to the last bp of the tract and the following bp) (see Figure 1A) and as a function of tract length (z-axis). (F) Heatmap representation of insertions along G-tracts.
With respect to indels, the normalized fractions were low (<1 × 10−3) along short (4–6 bp) A- and G-tracts, but rose to a plateau for longer tracts as reported earlier (11); this plateau was 10-fold higher for G-tracts (∼0.03) than for A-tracts (∼0.003) (Figure 3C and D). Indels also occurred more frequently (up to six-fold for A-tracts of length 11) at nearest-neighboring base pairs (‘Before tract’ and ‘After tract’ in Figure 3C and D) than along the tracts. Thus, contrary to SNVs and slippage, indels increased to a plateau with mononucleotide tract length.
We analyzed in detail the locations of insertions along the tracts and the flanking positions with respect to the 5′ to 3′ orientation of the tracts (Figure 1A). The normalized fractions demonstrated that insertions peaked at the 3′, and to a lesser extent 5′, ends of the longest A-tracts (Figure 3E), but remained low. For G-tracts, insertions occurred most efficiently at two locations (G2–3 and G5) (Figure 3F), they increased with tract length (up to ∼0.04), and attained ∼10-fold higher values than for A-tracts. In conclusion, insertion sites at A- and G-tracts followed the patterns observed for SNVs (Figure 2A and B), suggesting that factors associated with local DNA dynamics sensitize specific bases along the tracts to genetic alteration, inducing both SBS and indels.
Base pair flexibility and charge localization map to sites of sequence changes
To elucidate elements of intrinsic DNA dynamics that may be responsible for the biases in SNV and insertion sites, we performed molecular dynamics (MD) and hybrid quantum mechanics/molecular mechanics (QM/MM) simulations on model A[6], A[9], G[6] and G[9] duplex DNA fragments. We focused on water bridge coordination (Figure 4A), bp step flexibility, and for the G[6] and G[9], charge localization, as these properties are known to impact the susceptibility of DNA to base damage, repair and mutation. The fractions of one water coordination increased along the A[9] and A[6] structures in a 5′ to 3′ direction, irrespective of flanking sequence composition, in concert with a decrease in minor groove width (Figure 4B and Supplementary Figure S4A) as predicted (37). Vstep, a measure of bp structural fluctuation, displayed a prominent peak of ∼40 Å3deg3 at the 5′-TA-3′ step for both structures (Figure 4C and Supplementary Figure S4B), which together with low water occupancy points to 5′-TA-3′ being a preferred location for base modification and mutation. In the G[9] and G[6] structures water coordination involved mostly two-water bridges due to wide (∼14 Å) minor grooves (Figure 4Dand Supplementary Figure S4C), whereas flexibility was modest (∼20–22 Å3deg3, Figure 4E and Supplementary Figure S4D). Thus, bp dynamics are likely to impact mutations at A-tracts to a greater extent than at G-tracts. Guanine has the lowest ionization potential (IP) of all four bases and IP further decreases at guanine runs, rendering them targets for electron loss, charge localization, oxidation and eventually mutation (4,38). Because after electron loss the ensuing charge (hole) can migrate along the DNA double-helix and relocalize at specific guanines, we addressed whether the preferred sites of mutation along G-tracts, i.e. G2–3 and G5, would also be preferred sites for charge localization. The QM/MM determinations indicated that whereas for the short G[6] fragment the difference in the density-derived atomic partial charges (DDAPC) (i.e. the hole) localized most often (∼50%) to the first position (Figure 4F), for the long G[9] fragment charge localization shifted downstream (mostly to the second, but also to positions 6–7, Figure 4G). Importantly, the charge was found exclusively around the guanine rings (Figure 4H). Thus, the two main sites of sequence change along G-tracts, i.e. G2–3 and G5, coincide with positions where charge localization and hence one-electron oxidation reactions is predicted to occur most frequently. In summary, bp flexibility at A-tracts and charge transfer at G-tracts likely represent intrinsic DNA features underlying the bias in SNV and insertions at mononucleotide runs in human genomes.
MD and QM/MM simulations. (A) Molecular modeling of one (left) and two (right) minor groove water bridge coordination. (B) Fraction of one-water bridge occupancy (left axis) at A[9] DNA sequences flanked 5′ and 3′ by a T (black circles), C (red circles) or G (green circles). Minor groove widths (right axis), as determined from intrastrand phosphate-to-phosphate distances. (C) Vstep for A[9] DNA sequences, determined as the product of the square root of the eigenvalues (λi) described by the six bp step parameters shift, slide, rise, tilt, roll and twist; i.e. Vstep=∏6i=1λi−−√. (D) Fraction of one- (black circles) and two-water (red circles) bridge occupancy (left axis) at G[9] DNA sequences. Minor groove widths (right axis), as assessed from intrastrand phosphate-to-phosphate distances. (E) Vstep for G9 DNA sequences. (F) Average charge redistribution (open circles and right axis) for G[6] DNA structures upon vertical ionization, examined by calculating the difference on the density-derived atomic partial charges (DDAPC) for the neutral and negatively charged states. Histogram of the number of instances (left axis) in which the largest charge redistribution occurred at a specific position along the G[6] structures. (G) DDAPC for G[9] DNA structures (open circles and right axis) and histogram of the number of instances (left axis) in which the largest charge redistribution occurred at a specific position. (H) VMD rendering of a G[9] DNA structure displaying hole localization at G2. Capped base pairs were removed for clarity.
Position and orientation along nucleosome core particles modulate sequence variation
DNA wrapped around histones in nucleosomes is subject to local deformation (39), which may impact mutation. Thus, we analyzed the 1KGP SNVs at A- and G-tracts predicted to overlap with well-positioned nucleosome core particles (NCPs) (16). In hg19, the percentage of tracts that overlap with NCPs decreased moderately from ∼90% at length of 4 to 81% and 71% for A- and G-tracts of length 13, respectively (Figure 5A), suggesting that mononucleotide runs are not depleted in NCPs in human genomes as previously proposed (40). A-tracts of lengths 4–8 base pairs displayed distinctive peaks along the NCP surface in phase with the helical repeat of DNA (10.5 bp) and with minor grooves facing toward the inner protein core (lengths 4–5) (16) (Figure 5B and Supplementary Figure S5A). A-tracts of length of 9–13 bp exhibited only half (six) the peaks evident for the shorter tracts. For the G-tracts, only small peaks with no clear minor groove-inward-facing regions were detected (Supplementary Figure S5B).
Positioning along nucleosome core particles. (A) Percent of A-tract (open circles) and G-tract (closed circles) base pairs in hg19 overlapping with well-positioned NCP genomic coordinates as a function of tract length. (B) Counts of base pairs in hg19 A-tracts of length 5 overlapping with NCPs genomic regions as a function of distance from the histone octamer dyad axis. Minor groove-inward-facing regions (gray) were derived from the X-ray crystal structure of NCP147 (41). (C) Percent SNVs in the 1KGP dataset (left axis) at every bp along A-tracts of length 5 for tracts centered at maxima (black) and minima (gray) along NCPs (Figure 5B). Percent increase (right axis) of SNVs at minima relative to maxima (green). P-values for paired t-tests: 0.013 (*), 0.002 (**) and 4.7 × 10−6 (***). (D) Whisker plots of%SNVs (left axis) at A1 for A-tracts of length 5 centered at maxima and minima (black) along NCPs (Figure 5B). Percent difference (right axis) in the number of A-tracts of length 5 in hg19 preceded by C, T or G (red) between those centered at minima and those centered at maxima (Figure5B). (E) C-containing/G-containing ratios (see text) for G-tracts of length 5 in hg19 as a function of distance from the NCP dyad axis (black) and location of core histones (maroon and green). Peaks correspond to negative iSAT (i.e. tilt parameters multiplied by the corresponding sin θ) values (gray) (39). Ratios of%SNV at G1 (upshifted by 0.5 for clarity) between C-containing (5′-CCCCCG-3′ sequences on the hg19 forward strand) and G-containing (5′-CGGGGG-3′ sequences on the hg19 forward strand) (Figure 1A) CG[5] tracts mapping NCP Chip-seq genomic intervals (red) fitted by a non-parametric local regression (loess; sampling proportion, 0.100; polynomial degree, 3). (F) VMD rendering (top) of TATTT residues 34–38 (yellow) and the complementary AAATA residues 672–753 (pink) from the 1EQZ pdb nucleosomal crystal structure, corresponding to peak area from −40 to −36 in Figure 5E. The switch in G-tract (lengths of 5 and 7) orientation along NCPs (bottom) serves to position the C-containing strand on the outside (yellow) and, correspondingly, the G-containing strand on the inside (pink).
To assess if tract-positioning along NCPs influences SNVs, we selected A-tracts of lengths 5, 7 and 9 bp and G-tracts of lengths 5 and 7 bp whose central positions coincided with either the maxima or minima (41) (Figure 5B and Supplementary Figure S5A and B) and conducted pair-wiset-tests (330 total) between permutations of ‘categories’, including ‘tracts centered at maxima versus minima’, ‘position along the tracts’, ‘flanking sequence composition’, ‘specific NCP locations’ and ‘tract orientation’. For A-tracts, 79/207 (38%) significant pairs were found, 68 (86%) of which were related to differences between tracts centered at maxima versus minima, with a preponderance (63%) of tests displaying increased %SNVs at minima (Supplementary Figure S5C and E). For example, %SNVs at length 5 bp were greater at minima than at maxima at each position along the A-tracts (Figure 5C). A→C substitutions at A1 were more abundant at maxima than at minima (mean ± SD, 18.7 ± 0.7% at max and 17.6 ± 0.8% at min; P-value 0.001), whereas A→T substitutions at the same position displayed the opposite trend (mean ± SD, 18.4 ± 0.5% at max and 19.8 ± 1.1% at min; P-value 0.0005) (Figure 5D). A-tracts of length 7 also exhibited a similar pattern at A7 (Supplementary Figure S5H). The percentages of CA[5] and A[7]C tracts in hg19 centered at maxima were greater than at minima and the reverse was observed for the TA[5] and A[7T] tracts (Figure 5D and Supplementary Figure S5H). Thus, we conclude that positioning along the NCP surface of both the double-helical grooves and junctions with flanking base pairs influence SNVs along A-tracts. However, this influence is complex and for the most part, difficult to predict.
For G-tracts, most pairwise comparisons (18/34, 53%) indicated SNV variation according to sequence orientation (Supplementary Figure S5F and G). In hg19, the ratio of the numbers of G-tracts of lengths 5 and 7 for which the C-containing strand coincided with the forward sequence (downstream example sequence in Figure 1A) to the numbers of G-tracts for which the G-containing strand coincided with the forward sequence (upstream example sequence in Figure 1A) (C-containing/G-containing ratios) displayed a prominent 10.5-bp oscillation in phase with iSAT (Figure 5E), a measure of ‘inside’ and ‘outside’ bases, according to the bp step tilt parameter (39). Analysis of the helical path of a 146-bp DNA fragment wrapped around histones showed that the oscillation in the C-containing/G-containing ratios corresponds to a preference for guanine bases to face the protein core (Figure 5F). We analyzed the subset of G-tracts preceded by a 5′ C (i.e. CG[5]) to assess whether SNVs at G1, the position known to be mutable due to CpG methylation also oscillated with the C-containing/G-containing ratios. Oscillation in SNV-C-containing/SNV-G-containing values was evident, with peaks aligning to the hg19 troughs (Figure 5E) implying that the cytosines facing the protein surface harbor more variants than those facing away. We conclude that A- and G-tracts display preferential positioning (the former) and orientation (the latter) along NCPs, which in turn modulate the rate of sequence variation.
Mutations associated with human disease
Knowing that the first and last As of long A-tracts and G2–3 in G-tracts are the major sites of SNV in the human population, we addressed whether these features are also discernible in mutated mononucleotide tracts associated with human genetic disease. We collected 9,450,456 unique SBSs (both SBSs and SNVs refer to single base changes) from sequenced cancer genomes and normalized the percent mutations along A- and G-tracts to enable a direct comparison with the 1KGP dataset. For A-tracts (Figure 6A and Supplementary Figure S6A), SBSs displayed the same trend as the 1KGP data (Figure 2A) with respect to the bell-shape increase in mutations at A1 and An and the mutation spectra, although the susceptibility to mutation as a function of tract length attained greater values (6.36% for length 11 in cancer versus 4.15% for length 9 in the 1KGP datasets at A1). The first and last 3 bp also harbored more SBSs than in the 1KGP dataset for tracts >7 bp, a feature that we found to be due exclusively to a large cancer dataset (42) containing high-level microsatellite instability (MSI) samples (Supplementary Figure S6B and C), which are known to result from mismatch-repair deficiency (15). Thus, A-tracts display similar patterns of base substitution between the germline and somatic cancer tissues. For G-tracts, mutation spectra were characterized by G→T transversions at tract lengths >7, particularly at G1, the most frequently mutated position for tracts lengths up to 11 bp (Figure 6B and Supplementary Figure S6D). This trend persisted even when the high rates of methylation-mediated deamination mutations at the CG dinucleotide were removed (Supplementary Figure S6E). Thus, mutation patterns in cancer genomes contrast with those observed in the germline, both with respect to the most mutable position (G1 versus G2–3) and the types of base substitution (G→T in cancer genomes versus G→T and G→C in the germline).
Mutation patterns in cancer genomes. (A) Mutation spectra for SBSs at A-tracts. Percent values were obtained by dividing the total number of SBSs at each position by the number of tracts in hg19 and then multiplying by 3.2516 to equalize the percentage of A-tracts of length 4 between the cancer genomes and the 1KGP datasets. (B) Mutation spectra for SBSs at G-tracts in cancer genomes. Percent values were obtained as in (A) using a multiplication factor of 3.7419. (C) Normalized fractions of A-tracts (closed circles) and G-tracts (open circles) displaying +/−1 bp slippage, obtained by dividing the number of events by both the number of tracts in hg19 and tract length. (D) Indels at A-tracts, calculated as described in Figure 3C. (E) Indels at G-tracts, calculated as described in Figure3C. (F) Heatmap representation of insertions along G-tracts, as described in Figure 3E.
With respect to slippage, the fractions for A-tracts elicited an excess at lengths 9 and 10 bp relative to the 1KGP dataset, which was also due to the MSI-containing dataset. For G-tracts, the fractions peaked at length 8, as for the 1KGP dataset (Figures 3A and 6C), implying that the propensity to undergo slippage is indistinguishable between the germline and soma. Indels were also more abundant at flanking base pairs than along the tracts (Figure 6D and E), particularly for G-tracts of length >7, similar to the 1KGP dataset (Figure 3C and D). Detailed analyses of insertions revealed that both G1 and the preceding position were the most significant sites of mutation (F-values up to 0.08 at G1 for tracts of length 8) (Figure 6F). Thus, the 5′ end of long G-tracts is the most susceptible site for both SBSs and insertions in cancer genomes, in contrast to the germline where these occur within the runs, typically at G2–3.
We also extracted the mutated A- and G-tracts from the Human Gene Mutation Database (HGMD), a collection of >150,000 germline gene mutations associated with human inherited disease. A total of 1519 genes were mutated at A- or G-tracts out of a total of 3972 (38%); 3480 SBSs and 2866 slippage events were noted within these tracts, 85 and 46% of which were predicted to be disease-causing, respectively (Figure 7A and Supplementary Table S1). Ranking genes by the number of literature reports indicated that among the top 10 entries three were associated with cancer (BRCA1, BRCA2 and APC), two with hemophilia (F8 and F9), four with debilitating lesions of the skin (COL71A), muscle (DMD), lung (CFTR) and kidney (PKD1), with one causing hypercholesterolemia (LDLR) (Figure 7B). Thus, mutations within A- and G-tracts carry a high social burden by contributing to some of the most common human pathological conditions.
Mutation patterns in HGMD and model for sequence context-dependent changes. (A) Number of germline SBSs and slippage events (Slip.) at A- and G-tracts in HGMD. Gene alterations were classified as disease-causing mutation (DM), likely disease-causing mutation (DM?), disease-associated and putatively functional polymorphism (DFP), disease-associated polymorphism with additional supporting functional evidence (DP) and invitro/laboratory orinvivo functional polymorphism (FP). Codon changes (SIFT predictor) were classified as damaging (d), null (n), tolerated (t) and low-confidence prediction (l). (B) The 10 most commonly reported genes in HGMD with mutations at A- and G-tracts. Various mutated tracts were generally reported for the same gene in different reports. (C) Mutation spectra for SBSs at A- (left) and G-tracts (right) in HGMD. Percent values were obtained by dividing the total number of SBSs at each position by the number of tracts in hg19 exons. A|G→T (black), A|G→C (red), A→G (green), G→A (cyan). (D) Normalized fractions of A-tracts (closed circles) and G-tracts (open circles) displaying +/−1 bp slippage, obtained by dividing the total number of events by the number of tracts in hg19 exons and by tract length. (E) Model for sequence context-dependent changes at A-tracts (left) and G-tracts (right). *, site of base modification.
For both A- and G-tracts, SBSs occurred mostly at tract lengths of 4–7, with patterns more similar to those in the 1KGP than in the cancer datasets, both with respect to the location of the most mutable positions (first and last As and first/second Gs) and the types of base substitution (A→T and G→H) (Figure 7C and Supplementary Figure S6F). Likewise, slippage events peaked at tract lengths of 7–9 as observed in the 1KGP dataset (Figure 7D). In summary, the patterns of both SBSs and slippage in the HGMD dataset followed the trend observed in the 1KGP dataset, suggesting that germline variants at mononucleotide repeats leading to either population variation or human inherited disease may have arisen through similar mechanisms.
DISCUSSION
Why are specific A:T and G:C base pairs within A- and G-tracts more susceptible to sequence changes than their identical neighbors? For A-tracts, bp flexibility may play a role. Chemical damage to DNA, such as by hydroxyl radicals has been shown to be proportional to the geometrical solvent-accessible surface of the atomic groups, which increases with DNA flexibility (43). Along A-tracts flexibility is restricted, but it is high at both the 5′ and 3′ junctions. Thus, the fact that the highest rates of mutation coincide with the highest degree of flexibility at the 5′-TA-3′ bp step is consistent with the view that this position may be susceptible to DNA damage as a result of flexibility. Other sources of DNA dynamics are also likely to be relevant, such as sugar flexibility at the junctions, which increases with tract length (44). Chemical modification at these junctions may then lead to base substitution and indels, the latter as a result of strand breaks.
With respect to SNV mutation spectra, these were found mostly in the direction of flanking base composition above a length of 7–8 bp. We interpret this behavior in terms of DNA slippage along A-tracts when attempts are made during translesion synthesis (TLS) to bypass a damaged site (Figure 7Ei). Two scenarios may be considered to account for A→T transitions at A1. In the first, the last tract-template base would loop out into the polymerase active site permitting base-pairing and strand elongation (Figure 7Eii) using the tract-flanking base as a template (34,45–46). In the second (Figure 7Eiii), slippage would occur behind the polymerase, prompting extension past the newly created A*:T mispair generated by primer/template misalignment. Either pathway would yield a common intermediate (Figure 7Eiv) that contains the base complementary to the junction across from the damaged site upon slippage resolution (34). Following DNA synthesis (S) and/or repair (R) (Figure 7Ev and vi), this mispair will generate a base change that is always identical to the tract-flanking base.
For G-tracts, the high rates of G→T transversions at G1 in cancer genomes are also consistent with preferred chemical attack at this site due to high flexibility (Figure 7F top). Direct chemical attack at a guanine is known to result in stable products, such as 8-oxo-G and Fapy-G, both of which are known to yield G→T transversions (47–50). Thus, G1 may be the most susceptible site for such reactions for G-tracts of lengths ≥7 (Figure 7Fright), which in cancer genomes would become a mutation hotspot. In the germline, SNVs peaked inside G-tract base pairs, while mutational spectra were insensitive to flanking base composition; these events are inconsistent with a role for template misalignment and slippage as noted for A-tracts. Rather, the correspondence between hotspot mutations at G2–3 and G5 and the QM/MM simulations suggest a role for charge transfer. A large body of work during the past 20 years using computational, theoretical chemistry and biophysical techniques on short oligonucleotides, has shown that guanine is the most easily oxidizable base in DNA and that indeed a guanine radical cation can be generated through long-range hole transfer from an oxidant via one-electron oxidation mechanisms (51–55). GGG triplets were found to act as the most effective traps in hole transfer by both experimental and theoretical work (56–59), demonstrating that the resulting guanine radical cation (or its neutral deprotonated form) became rather delocalized, but it preferentially centered at the first and second G. These well-established patterns of chemical reactivity are consistent with our experimental observation of high mutation frequencies at G1 for short G-tracts and the results from QM/MM simulations on G6. For longer tracts, the downstream shift in mutation hotspots, i.e., G2–3 and G5, also correlate well with the charge localization predicted from QM/MM simulations, which explicitly included solvent effects and structural fluctuations. Thus, in conjunction with the constrained density functional theory (60), both the neutral and oxidized forms of a guanine nucleobase can be reliably constructed to infer the accurate determination of mutational patterns of mononucleotide repeats in human genomic DNA.
The compact organization of the sperm genome (61), and presumably low levels of oxidative stress in the germline, may enable guanine oxidization through one-electron oxidation reactions rather than by direct chemical attack, thereby favoring the formation of radical cations. A charge injected at G1 by electron loss would then migrate to neighboring guanines and localize at sites of low IP, such as G2 (Figure 7F left). Guanine radical cations are known to readily undergo further chemical modification leading to products such as 8-oxo-G, oxazolone, imidazolone, guanidinohydantoin, and spiroiminodyhydantoin (62) (M in Figure 7F), to yield G→T, G→C and G→A substitutions (4,63). Our model is in line with recent observations in which mutations at guanines within short G-runs (1–4 bp) correlate with sequence-dependent IPs at the target guanine in cancer genomes (9). Interestingly, these correlations were not observed in the germline (9). We interpret these composite observations as follows. The IP values for G-runs have been shown to decrease asymptotically with tract length, although the absolute values vary according to the methods and assumptions used (we obtained a value of 5.43 eV for both G[6] and G[9]) (64,65). We suggest that short G-runs with high IPs undergo one-electron oxidation reactions in the oxidative environment of cancer cells but would be refractory to such a mechanism in the germline (Figure 7Fright yellow and left white sectors). As length increases and IP values fall, G-runs would be attacked directly by oxidants abundant in tumor cells (Figure 7F orange sector), whereas oxidation will be limited to electron loss in the germline environment (Figure 7F left yellow sector).
These models (template misalignment for A-tracts and charge transfer for G-tracts) suggest a more complex scenario for mechanisms underlying mononucleotide repeat polymorphism in the human population than recently proposed (13), in which nucleotide misincorporation by error-prone polymerases is proposed as a primary source of mutations at both A- and G-tracts. As already stated, the directionality of SNVs toward tract-flanking bases in A-tracts and the hotspot mutations at G2–3, supports multiple and distinct mechanisms of base substitution at mononucleotide repeats.
Our analyses highlight additional information, including the lack of mutations in the direction of tract-base composition for base pairs flanking long tracts, the association with gene expression and the preference of guanines for the inner NCP surface, and extend prior observations (12) such as the bell-shape character of base substitution and slippage, whose mechanisms remain to be fully clarified. Finally, we document the contribution of mononucleotide mutagenesis to key aspects of human pathology beyond the well-established MSI instability in cancer (15), including hemophilia and tissue degeneration. Our collective work supports the conclusion that as the human genome undergoes evolutionary diversification and along the way suffers disease-associated mutations, oxidation reactions including charge transfer may play a prominent role.
Severe combined immunodeficiency diseases (SCIDs) are a group of primary immunodeficiency diseases characterized by a severe lack of T cells (or T cell dysfunction) caused by various gene abnormalities and accompanied by B cell dysfunction (WHO, 1992; Buckley et al., 1997). The incidence rates in infants were 1/75,000-1/10,0000 (WHO, 1992), but no morbidity statistics are available in China. The 2 genetic modes of SCID include X-linked recessive and autosomal recessive genetic inheritance. X-linked severe combined immunodeficiency (X-SCID) is the most common form, accounting for 50-60% of SCID cases (Noguchi et al., 1993). Immune system abnormalities in patients with X-SCID include T-B+NK-, in which T cells (CD3+) and natural killer (NK) cells (CD16+/CD56+) are absent or significantly reduced, and the number of B cells (CD19+) is normal or increased, causing reduced immunoglobulin production and class switching disorder (Buckley, 2004; Fischer et al., 2005). The IL- 2Rg gene mutation has been confirmed to be a major cause of X-SCID (Noguchi et al., 1993). In recent years, great progress has been made in understanding the pathogenesis of primary immunodeficiency disease and its application in clinical treatment, particularly regarding the development of critical care medicine and immune reconstruction technology. With timely control of infection and early bone marrow or stem cell transplantation, X-SCID patients can be treated, prolonging survival time. Therefore, early diagnosis of X-SCID is very important for patient treatment. Gene diagnosis has become a better early diagnosis or differential diagnosis method. In addition, familial X-SCID brings a great psychological burden to the relatives of patients. Ordinary chromosome analysis and immunological evaluation cannot be used for female carrier identification and fetal diagnosis, and gene diagnosis is the most effective method of carrier detection and prenatal diagnosis. In this study, we detected mutations in 2 families with X-SCID and identified 2 novel mutations, confirming the X-SCID pedigrees. Prenatal diagnosis was performed for the pregnant fetus in the mother of one of the probands based on gene diagnosis. Female individuals in this family were subjected to carrier detection.
IL2Rg gene mutation test Direct sequencing of 1-8 exons and the flanking region of the IL2Rg gene by PCR in family 1 showed that the 3rd exon of the proband contained the c.361-363delGAG heterozygous deletion mutation, which led to deletion of the 121st amino acid glutamate (p.E121del) in its coding product. There were no sequence variations in other coding regions or in the shear zone. The proband’s mother carried the same heterozygous mutation, while his father did not carry the mutation site (Figure 2a, b, c). This mutation was not observed in any cases of the control group, and this family was identified as an X-SCID family. The c.510-511insGAACT insertion heterozygous mutation was present in the 4th exon of the proband’s mother in family 2. This mutation was a 5-base repeat of GAACT, resulting in a change in amino acid 173 from tryptophan into a stop codon (p.W173X). While there were no sequence variations in other coding regions or in the shear zone, the patient’s father did not carry the mutation (see Figure 2d, e). We did not find this mutation in the healthy control group. We presumed that the 4th exon of the deceased child in family 2 contained the c.510-511insGAACT insertion mutation, leading to X-SCID symptoms, and thus we speculated that this family was an X-SCID pedigree. Prenatal diagnosis We verified the chorionic villus status of the fetus in family 1 using the PowerPlex 16 HS System kit. The results of prenatal diagnosis showed that the fetal tissue contained no maternal contamination and that this fetus was female. The results of prenatal diagnosis showed that there was no c.361-363delGAG (p.E121del) heterozygous mutation in the female fetus of family 1.
Figure 2. Sequencing graph of IL2Rg gene in 2 pedigrees with X-chain severe combined immunodeficiency. a.-c. Family 1. a. Normal control (rectangle indicates 3 edentulous bases of this patient). b. Proband carrying the c.361- 363delGAG (p.E121del) mutation (arrow indicates deletion of fragment connection sites). c. The proband’s mother contained a c.361-363delGAG (p.E121del) heterozygous mutation (arrow). d.-e. Family 2. d. The proband’s mother carried the c.510-511insGAACT (p.W173X) heterozygous mutation (arrow indicates that the reverse sequencing graph was positive). e. Normal control (rectangular box indicates 2 normal copies of GAACT (the mutation fragment was 3 copies). Carrier detection results For the c.361-363delGAG (p.E121del) site, the gene analysis results of the female individual in family 1 showed that I2 (proband’s grandmother) was a heterozygous carrier and that II3 (proband’s aunt) was a non-carrier and had no mutations.
IL-2 can combine with the IL-2 receptor (IL-2R) of the immune cell membrane. IL-2R is composed of 3 subunits, including the IL-2Ra chain (CD25), IL-2Rb chain (CD122), and IL- 2Rg chain (CD132). IL-2Rg functional units in common with IL-4, IL-7, IL-9, IL-15, IL-21, and other cytokine receptors, and these regions are referred to as the total chain (Li et al., 2000). The IL-2Rg chain can maintain the integrity of the IL-2R complex and is required for the internalization of the IL-2/IL-2R complex; it is also the link that contacts the cell membrane surface factor region and downstream cell signal transduction molecules. Therefore, the integrity of the IL-2Rg chain is vital for the immune function of an organism (Malka et al., 2008; Shi et al., 2009).
Mutations in the IL2Rg gene, which encodes IL-2Rg, were identified to be a major cause of X-SCID in 1993 (Noguchi et al., 1993). The IL2Rg gene is located on chromosome X q21.3-22, is 37.5 kb length, and contains 8 exons, which encode 369 IL-2Rg amino acids. The IL2Rg chain exhibits varying structural regions, such as the signal peptide [amino acids (AA) 1-22], extracellular domain (AA 23-262), transmembrane region (AA 263-283), and intracellular region (AA 284-369). The WSXWS motif is located in the extracellular region (AA 237-241), while Box 1 is located in the intracellular region (AA 286-294).
By the end of 2013, the Human Gene Mutation Database contained a total of 200 mutations in the IL2Rg gene (HGMD Professional 2013.4). The most common mutation types in the IL2Rg gene were the missense or nonsense mutations, which result from single base changes. A total of 100 missense or nonsense mutations have been identified, followed by insertion or deletion mutations in a total of 50 species. The 3rd most common type of mutations includes shear mutations in approximately 30 species. Eight exons contained mutations, and mutations in 3rd or 4th exons were the highest, accounting for a total mutation rate of 43% (86/200). According to the X-SCID gene database (IL2RGbase) (http://research.nhgri. nih.gov/scid/), the gene mutations in IL2Rg mainly occurred in the extracellular region of the IL2Rg chain (Fugmann et al., 1998). Zhang et al. (2013) reported that the IL2Rg gene mutations in 10 patients with X-SCID in China were located in the extracellular region. Two mutations reported in our study were also located in the extracellular region. The mutation of IL2Rg gene in family 1 was a codon mutation in the 3rd exon, resulting in a 3-base deletion. The c.361-363delGAG (p.E121del) mutation was located in the extracellular area of the IL- 2Rg subunit, and we inferred that the 121 glutamate deletion caused by the mutation would lead to changes in the structure of the peptide chain, affecting signal transmission and resulting in serious symptoms. The mutation of family 2 was a GAACT repeat of ILR2g gene; this repeat of 5 bases resulted in 173 codon changes from tryptophan into a stop codon. Generation of the peptide chain with the mutation lacked 196 amino acids compared to the normal chain, including the intracellular, transmembrane, and some extracellular regions, directly affecting the structure and function of receptors and causing disease. No studies have been reported regarding these 2 mutations. We combined with the mutation characteristics and clinical manifestations and diagnosed family 1 as X-SCID pedigrees. Although the patient in family 2 was deceased, it can be speculated that the 2 deceased patients in family 2 were X-SCID pedigrees caused by c.510-511insGAACT (W173X).
Prenatal diagnosis can accurately identify fetal situations and be used to avoid birth defects, which can also ease the anxiety of the pregnant mother. Gene diagnosis for pedigrees of patients based on DNA samples has advanced recently, particularly with the application of high-throughput sequencing technology (Alsina et al., 2013). We can now perform gene analysis for varied clinical infectious diseases for differential diagnosis. However, the effectiveness of prenatal diagnosis for pedigrees in which the proband is dead remains unclear. Because the gene mutations in the proband is unknown in these cases, the patient’s situation was only inferred by his mother’s genotypes. However, we considered that for the deceased, if we can define the mother was a pathogenic gene carrier, even if the proband is not X-SCID, the woman also has a risk of having X-SCID children and this pedigree may be X-linked recessive inheritance. Prenatal diagnosis may provide a choice for preventing the birth of patients in these families in the premise of informed consent.
Gene diagnosis of IL2Rg can also be used for carrier detection of suspected females in the family.
In the present study, we performed carrier detection of the patient’s grandmother and aunt in family 1 and determined that the patient’s pathogenic mutations were from his grandmother. His aunt did not inherit the pathogenic gene, and thus she was a non-carrier and her fertility will not be affected. In this study, we used direct sequencing of PCR products and identified IL2Rg gene mutations in 2 pedigrees with X-SCID. We found 2 unreported mutations in the IL2Rg gene, and prenatal diagnosis and carrier detection were conducted in 1 X-SCID family. Because the incidence rate of X-SCID is extremely low, it is difficult to promote the widespread use and application of genetic diagnosis. However, this study may provide some implications for the diagnosis of infants with immunodeficiency, and gene diagnosis techniques such as conventional or high-throughput sequencing should be used as soon as possible during pregnancy, which can be used to guide treatment. This method can also provide reliable prenatal diagnosis and carrier detection service for these families.
MEF2A gene mutations and susceptibility to coronary artery disease in the Chinese population
Coronary artery disease (CAD) has high morbidity and mortality rates worldwide. Thus, the pathogenesis of CAD has long been the focus of medical studies. Myocyte enhancer factor 2A (MEF2A) was first discovered as a CAD-related gene by Wang (2005) and Wang et al. (2003, 2005). Three mutation points in exon 7 of MEF2A were subsequently identified by Bhagavatula et al. (2004); however, Altshuler and Hirschhorn (2005) and Weng et al. (2005) predicted that the MEF2A gene lacked mutations. Zhou et al. (2006a,b) analyzed the mutations and polymorphisms in exons 7 and 11 of the MEF2A gene in the Han population in Beijing, and various rare mutations were found in exon 11 rather than in exon 7. The clinical significance of specific 21-bp deletions in MEF2A was also explored, and previous studies have shown mixed results. In this study, polymerase chain reaction-singlestrand conformation polymorphism (PCR-SSCP) and DNA sequencing were used to detect exon 11 of the MEF2A gene in samples collected from 210 CAD patients and 190 healthy controls and to investigate the function of the MEF2A gene in CAD pathogenesis and their correlation.
CAD, a common disease in China, is induced by multiple factors, such as genetics, the environment, and lifestyle. Thus, a multi-faceted approach is necessary in the study of CAD pathogenesis, particularly in molecular biology research, which is important for developing comprehensive treatment of CAD based on gene therapy. The MEF2A gene was first identified as a CAD-related gene through linkage analysis of a large family with CAD (9 of 13 patients developed MI) in 2003.
In this study, we found the following mutations: 1) codon 451G/T (147191) heterozygous or homozygous mutation; 2) loss of 1 (Q), 2 (QQ), 3 (QQP), 6 (425QQQQQQ430), and 7 (424QQQQQQQ430) amino acids (147108-147131); and 3) codon 435G/A (147143) heterozygous mutation. Among these mutations, the synonymous mutation at locus 147191 was confirmed by reference to the National Center for Biotechnology Information (NCBI) database to be a single nucleotide polymorphism, which was also demonstrated in our study by the extensive presence of this polymorphism in healthy controls. However, the heterozygous mutation at locus 147143 was only found in the genomes of CAD patients, and was therefore identified as a mutation.
Given that MEF2A is a CAD-related gene, the results of various studies are controversial among several countries. Weng et al. (2005) screened gene mutations in exon 11 of the MEF2A gene from 300 CAD patients and 1500 healthy controls. They hypothesized that the changes in 5-12 CAG repeats are genetic polymorphisms and that the 21-base deletion in exon 11 of the MEF2A gene did not induce autosomal dominant genetic CAD. Gonzalez et al. (2006) suggested that the CAG repeat polymorphism was independent of MI susceptibility in Spanish patients. Kajimoto et al. (2005) reported that the CAG repeat sequence was not correlated with MI susceptibility in Japanese patients. Horan et al. (2006) also found that the CAG repeat sequence was not associated with the susceptibility to early-onset familial CAD in an Irish population. Hsu et al. (2010) identified no correlation between the CAG repeat sequence and CAD susceptibility in the Taiwanese population. Dai et al. (2010) found that the structural change in exon 11 was not related to CAD in the Chinese Han population. Lieb et al. (2008) and Guella et al. (2009) hypothesized that MEF2A was independent of CAD. However, Yuan et al. (2006) and Han et al. (2007) suggested that the CAG repeat sequence was correlated with CAD because 9 CAG repeats was an independent predictor of CAD. Elhawari et al. (2010) and Maiolino et al. (2011) suggested that MEF2A is a susceptibility gene for CAD. Dai et al. (2013) showed that mutations in exon 12 are associated with the early onset of CAD in the Chinese population. Liu et al. (2012) failed to demonstrate a correlation between the CAG repeat sequence and CAD through case-control analysis, systematic review, and meta-analysis, but found that the 21- base deletion in exon 11 was strongly associated with CAD, and that genetic variations in MEF2A may be a relatively rare, but specific, pathogenic gene for CAD/MI. Kajimoto et al. (2005) reported 4-15 CAG repeats. However, only 4-11 CAG repeats were observed in our study, possibly because of genetic differences in patients in this study. Eleven CAG repeats were observed in most samples from the control group, and the proportion of 10, 9, and 8 repeats exceeded 1%. The heterozygous mutation at 147143, as well as the 4 and 5 CAG repeats, was only observed in CAD patients. Thus, we speculated that the CAG repeat sequence is correlated with CAD susceptibility, and the presence of 4 or 5 repeats may be a risk factor for CAD, which was inconsistent with the results obtained by Han et al. (2007). The inconsistency in these results may be explained by the differences in subjects and sample sizes among studies.
Impact of glucocerebrosidase mutations on motor and nonmotor complications in Parkinson’s disease
Here, we conducted a multicenter retrospective cohort analysis, and the data were investigated by survival time analysis to show the impact of GBA mutations on PD clinical course. We also investigated regional cerebral blood flow (rCBF) and cardiac sympathetic nerve degeneration of subjects with GBA mutations, compared with matched PD controls.
3.1. Subjects
Among the 224 eligible PD patients (the subjects were not related to each other), 9 subjects were excluded from the analysis (4 due to multiple system atrophy findings on subsequent brain MRI and 5 because of insufficient clinical information). Therefore, 215 PD patients [female, 52.1%; age, 66.7 ± 10.8 (mean ± standard deviation)] were analyzed. For non-PD healthy controls, 126 patients’ spouses (female, 58.7%; age, 67.3 ± 10.3) without a family history of PD or GD were enrolled.
3.2. GBA mutations and risk ratios for PD
In the PD subjects, we identified 10 nonsynonymous and 2 synonymous GBA variants. Within the nonsynonymous variants, 7 mutations were previously reported in GD [R120W, L444P-A456P-V460 (RecNciI), L444P, D409H, A384D, D380N, and444L(1447-1466 del 20, insTG)] as GD-associated mutations. Three nonsynonymous mutations have never been reported in GD patients [I(-20)V, I489V, and there was one novel mutation (Y11H)].
GD-associated GBA mutations were found in 19 of the 215 (8.8%) PD patients but none in the healthy controls. The risk of PD development relative to these GD-associated mutations was estimated as an OR of 25.1 [95% confidence interval (CI), 1.50–420,p = 0.0001] with 0-cell correction. The nonsynonymous mutations that were not reported in GD patients had no association with PD development (p = 0.506; OR, 1.3; 95% CI, 0.7–2.6) ( Table 1). Four subjects had double mutations. For subsequent analyses, 2 subjects with double mutations of I (-20)V and K466K were adopted to the group of mutations unreported in GD, and 2 subjects with double mutations of R120W and I(-20)V, and of R120W and L336L were adopted to the group of GD-associated mutations.
Table 1.Frequency of glucocerebrosidase gene allele in Parkinson’s disease patients and controls
3.3. Clinical features of PD patients by GBA mutation groups
The clinical features of PD patients with GD-associated mutations, those with mutations unreported in GD, and those without mutations are shown in Table 2. In the GD-associated mutation group, females, those with a family history and those with dementia (DSM IV) were significantly more frequent than those in the no-mutation group (p = 0.047, 0.012, and 0.020, respectively). The age of PD onset was lower in patients with GD-associated mutations (55.2 ± 9.9 years ± standard deviation), compared with those without mutations (59.3 ± 11.5), although the statistical difference was not significant. There were no differences in clinical manifestations between subjects with mutations unreported in GD and those without mutations, except for dopamine agonist dosage (p = 0.026) ( Table 2).
Table 2.Epidemiological and clinical features of PD patients with Gaucher disease–associated GBA mutations, those with mutations previously unreported in GD and those without mutations
3.4. Survival time analyses to develop dementia, psychosis, dyskinesia, and wearing-off
Time to develop clinical outcomes (dementia, psychosis, dyskinesia, and wearing-off) was compared in 19 subjects with GD-associated mutations, 29 with mutations unreported in GD, and 167 without mutation. The median observation time was 6.0 years. The subjects with GD-associated mutations showed a significantly earlier development of dementia and psychosis, compared with subjects without mutation (p < 0.001 and p = 0.017) ( Supplementary Table e-1, Fig. 1A and B). We rereviewed the clinical record of the subject who showed early dementia (defined by DSM IV) ( Fig. 1A) and made sure it did not satisfy the criteria of DLB ( McKeith et al., 2005).
Fig. 1.
Kaplan–Meier curves of dementia and psychosis in Parkinson’s disease (PD) patients with Gaucher disease (GD)-associated glucocerebrosidase gene (GBA) mutations and those without mutations. PD patients with GD-associated GBA mutations and those without GBA mutations were compared to investigate the time taken to develop dementia (A) and psychosis (B). Because of insufficient information in several patients, the numbers in each analysis were different. The patients with and without mutations were 17 and 165 (A), 18 and 165 (B) against a total of 19 and 167. DSM IV, Diagnostic and Statistical Manual of Mental Disorders, revised fourth edition. p-Values were calculated by log-rank tests.
The associations of GBA mutations and these symptoms were estimated as HRs, adjusting for sex and age at PD onset. HRs were 8.3 for dementia (95% CI, 3.3–20.9; p < 0.001) and 3.1 for psychosis (95% CI, 1.5–6.4; p = 0.002). The time until development of wearing-off and dyskinesia complications was not statistically significant, with HRs of 1.5 (95% CI, 0.8–3.1; p = 0.219) and 1.9 (95% CI, 0.9–4.1; p = 0.086) ( Table 3).
Table 3.Hazard ratios of GBA pathogenic mutations for clinical symptoms
Model
Clinical feature
Hazard ratio
95% CI
p
1
Dementia (DSM-IV)
8.3
3.3–20.9
<0.001
2
Psychosis
3.1
1.5–6.4
0.002
3
Wearing-off
1.5
0.8–3.1
0.219
4
Dyskinesia
1.9
0.9–4.1
0.086
Each model was adjusted for sex and age at onset.
Key: CI, confidence interval; DSM-IV; The Diagnostic and Statistical Manual of Mental Disorders part 1IV; GBA, glucocerebrosidase.
Subjects with mutations unreported in GD did not show significant differences in time to develop all 4 outcomes, compared with no mutation subjects. Therefore, subjects with GD-unreported mutations were regarded as subjects without GBA mutations in further analyses.
3.5. rCBF on SPECT in patients with GD-associated GBA mutations
We conducted pixel-by-pixel comparisons of rCBF on SPECT between PD subjects with mutations (cases) and sex-, age-, and disease duration-matched PD subjects without any mutations in GBA (controls). Four controls were adopted for each case (except for a 34-year-old female case who was matched to a control), and in total 12 cases (female 50%, age at SPECT mean ± standard error (SE); 58.9 ± 3.3 years, disease duration at SPECT 7.3 ± 1.5 years) and 45 controls (female 64.4%, age at SPECT mean ± SE; 61.0 ± 1.3 years, disease duration at SPECT 7.1 ± 0.7 years) were analyzed. As a result, a significantly lower rCBF was seen in the cases compared to the controls in the bilateral parietal cortex, including the precuneus ( Fig. 2).
Fig. 2.
Regional cerebral blood flow in the group with GD-associated mutations compared with the matched Parkinson’s disease group without mutations. Regions with lower regional cerebral blood flow in the group with GD-associated mutations displayed on an anatomic reference map. Abbreviation: GD, Gaucher disease.
3.6. H/M ratios on MIBG scintigraphy in patients with GD-associated GBA mutations
Cardiac MIBG scintigraphy visualizes catecholaminergic terminals in vivo that are reduced as well as brain dopaminergic neurons in PD patients. We also investigated MIBG scintigraphy between 16 cases (female 68.8%, age at examination mean ± SE; 60.2 ± 2.6 years, disease duration at examination 6.2 ± 1.2 years) and sex-, age- and disease duration-matched 61 controls [(63.8 %, age 62.0 ± 1.1 years, disease duration 5.5 ± 0.6 years) (1:4 except for 1 young 34-year-old female case who was matched to a control)]. In the results, both early and late H/M ratios declined in both groups and did not show any significant differences (p = 0.309 and 0.244) ( Supplementary Table e-2).
4. Discussion
4.1. Contributions of GD-associated GBA mutations to the development of PD
In the analysis of 215 PD patients and 126 non-PD controls, we identified 10 nonsynonymous heterozygous GBA mutations, including 1 novel mutation. Among these mutations, 7 were GD-associated, and the patients carrying these mutations represented 8.8% of the PD cohort. No significant association was found between the GD-unreported mutations and PD development, which suggests that only the GD-associated mutations are a genetic risk for PD. According to a worldwide multicenter analysis of 1883 fully sequenced PD patients, 7% of the GD-associated mutations are found in non-Ashkenazi Jewish PD patients ( Sidransky et al., 2009). Although the mutation frequency in the present study was similar to previous results, the OR of GD-associated heterozygous mutations (25.1) was significantly greater than the OR (5.43) of other ethnic cohorts (Sidransky et al., 2009) and was consistent with an OR of 28.0 from a previous Japanese report ( Mitsui et al., 2009). These results, taken together, suggest the possibility thatGBA mutations are at a distinct risk for PD in the Japanese population. However, a larger Japanese cohort study is required to confirm this.
4.2. Cross-sectional clinical figures of PD with GBA mutations
4.3. Impact of GBA mutations on the clinical course of PD
To investigate the impact of GBA mutations on the clinical course of PD, a prospective-designed study over a long period is preferred. Although there has been a few longitudinally designed study to date, follow-up clinical data for a median of 6 years of 121 PD cases from a community-based incident cohort was recently reanalyzed; results demonstrate that progression to dementia defined by DSM IV (HR 5.7) and Hoehn and Yahr stage 3 (HR 3.2) are significantly earlier in 4 GBA mutation-carrier patients compared with 117 patients with wild-type GBA ( Winder-Rhodes et al., 2013). A 2-year follow-up clinical report of 28 heterozygous GBA carriers who were recruited from relatives of GD-patients shows slight but significant deterioration of cognition and smelling, compared to healthy controls ( Beavan et al., 2015). Brockmann et al. (2015)assessed motor and nonmotor symptoms including cognitive and mood disturbances for 3 years in 20 PD patients with GBA mutations and showed a more rapid disease progression of motor impairment and cognitive decline in GBA mutation cases comparing to sporadic PD controls. The current long-term retrospective cohort study up to 12 years reinforced these results. It revealed that dementia and psychosis developed significantly earlier in subjects with GD-associated mutations compared with those without mutation, and the HRs of GBA mutations were estimated at 8.3 for dementia and 23.1 for psychosis, with adjustments for sex and PD onset age. In contrast, the results showed no significant difference in developing wearing-off and dyskinesia.
In this study, we also investigated whether GD-unreported mutations affected the clinical course of PD. In both cross-sectional and survival time analyses, the mutations unreported in GD carried no increased burden on clinical symptoms such as dementia, psychosis, wearing-off, and dyskinesia.
4.4. Reduced rCBF in PD with GBA mutations compared with matched PD controls
We found a significantly decreased rCBF, reflecting decreased synaptic activity, in the bilateral parietal cortex including the precuneus, in subjects with GD-associated mutations compared with matched subjects without mutations. The pattern of reduced rCBF was very similar to the pattern of H215O positron-emission tomography that Goker-Alpan et.al. (2012) reported, showing decreased resting rCBF in the lateral parietal association cortex and the precuneus bilaterally in GD subjects with parkinsonism (7 subjects with homozygous or compound heterozygous GBA mutations), compared with 11 PD without GBA mutations. Results suggest that PD with heterozygous GBAmutations and GD patients presenting parkinsonism had a common reduced pattern of rCBF. Interestingly, in their study, rCBF in the precuneus—but not in the lateral parietal cortex—correlated with IQ, suggesting that the involvement of the precuneus is critical for defining GBA-associated patterns.
4.5. Reduced cardiac MIBG H/M ratios as well as matched PD controls
We also showed that cardiac MIBG H/M ratios in subjects with GD-associated mutations were lower than the cutoff point for PD discrimination (Sawada et al., 2009), suggesting that postganglionic sympathetic nerve terminals to the epicardium were denervated, as well as in PD without mutations.
4.6. Mechanisms of impact on PD clinical course by GD-associated GBA mutations
Experimental studies suggesting a bidirectional pathogenic loop between α-synuclein and glucocerebrosidase have been accumulated (Fishbein et al., 2014, Gegg et al., 2012, Mazzulli et al., 2011, Noelker et al., 2015, Schondorf et al., 2014 and Uemura et al., 2015). Loss of glucocerebrosidase function compromises α-synuclein degradation in lysosome, whereas aggregated α-synuclein inhibits normal lysosomal function of glucocerebrosidase. The pathogenic loop may facilitate neurodegeneration in GD-associated PD brain, resulting in early development of dementia or psychosis as shown in the present study. Several recent researches propose the possibility that the similar mechanism as in PD with GBA mutations exists even in idiopathic PD brain ( Alcalay et al., 2015, Chiasserini et al., 2015, Gegg et al., 2012 and Murphy et al., 2014). On the other hand, the impacts of GD-associated GBA mutations for the development of motor complications such as wearing-off and dyskinesia were not statistically significant, suggesting other pathophysiological mechanisms in the striatal circuit brought out after long-term therapy especially by l-dopa.
4.7. Limitations
Our study has several limitations. In the design of the study, we assumed that the sample size was 215 (PD patients) for survival time analyses and investigated 224 PD patients. We assumed that the mutation prevalence would be 9.4%, and in fact, we found 19 patients with mutations (8.5%) of the 224 patients. Based on these figures, we estimated the risk ratios of heterozygous GBA mutations for the risk of PD development and PD clinical symptoms as ORs in the cross-sectional multivariate analyses, although the 95% CIs were broad. More of subject numbers will be needed to determine robust risk ratios.
Comprehensive Genetic Characterization of a Spanish Brugada Syndrome Cohort
Brugada syndrome (BrS) was identified as a new clinical entity in 1992 [1]. Six years later, the first genetic basis for the disease was identified, with the discovery of genetic variations inSCN5A [2]. Nowadays, more than 300 pathogenic variations in this first gene are known to be associated with BrS [3]. SCN5A encodes for the α subunit of the cardiac voltage-dependent sodium channel (Nav1.5), which is responsible for inward sodium current (INa), and thus plays an essential role in phase 0 of the cardiac action potential (AP). Genetic variations in this gene can explain around 20–25% of BrS cases [3].
Since BrS was classified as a genetic disease, several other genes have been described to confer BrS-susceptibility [4–7]. Pathogenic variations have been mainly described in: 1) genes encoding proteins that modulate Nav1.5 function, and 2) other calcium and potassium channels and their regulatory subunits. All these proteins participate, either directly or indirectly, in the development of the cardiac AP. Although the incidence of pathogenic variations in these BrS-associated genes is low [6], it is considered that, among all of them, they could provide a genetic diagnosis for up to an extra 5–10% of BrS cases. Hence, altogether, a genetic diagnosis can be achieved approximately in 35% of clinically diagnosed BrS patients.
Other types of genetic abnormalities have been suggested to explain the remaining percentage of undiagnosed patients. Indeed, multiplex ligation-dependent probe amplification (MLPA) has allowed the detection of large-scale gene rearrangements involving one or several exons ofSCN5A in BrS cases. However, the low proportion of BrS patients carrying large genetic imbalances identified to date suggests that this type of rearrangements will provide a genetic diagnosis for a modest percentage of BrS cases [8–10].
BrS has been associated with an increased risk of sudden cardiac death (SCD), despite the reported variability in disease penetrance and expressivity [11]. The prevalence of BrS is estimated at about 1.34 cases per 100 000 individuals per year, with a higher incidence in Asia than in the United States and Europe [12]. However, the dynamic nature of the typical electrocardiogram (ECG) and the fact that it is often concealed, hinder the diagnosis of BrS. Therefore, an exhaustive genetic testing and subsequent family screening may prove to be crucial in identifying silent carriers. A large percentage of these pathogenic variation carriers are clinically asymptomatic, and may be at risk of SCD, which is, sometimes, the first manifestation of the disease [13].
In the present work, we aimed to determine the spectrum and prevalence of genetic variations in BrS-susceptibility genes in a Spanish cohort diagnosed with BrS, and to identify variation carriers among relatives, which would enable the adoption of preventive measures to avoid SCD in their families.
Table 1. Demographics of the 55 Spanish BrS patients included in the study.
The table shows the demographic characteristics of all the patients included in the study. Numbers in parentheses represent the relative percentages for each condition. T1 ECG refers to Type 1 BrS diagnostic electrocardiogram (ECG), obtained either spontaneously, or after drug challenge. The information regarding both the electrophysiological studies (EPS) and the treatment was not available for all the patients. Two of the patients that didn’t receive any treatment died, and were not taken into account for the calculations of percentages (+2 dead). ICD, intracardiac cardioverter defibrillator.
Table 2. Characteristics of the Spanish BrS patients carrying rare genetic variations.
The table shows the clinical characteristics of the probands who carried rare genetic variations in SCN5A, SCN2B, or RANGRF. All of them are potentially pathogenic except that found in RANGRF, which is of unknown significance (see discussion). All the potentially pathogenic variations (PPVs) that had been previously reported, except p.P1725L and p.R1898C, had been identified in BrS patients. p.P1725L had been associated with Long QT Syndrome and p.R1898C was found in Exome Variant Server with a MAF of 0.0079%. No rare variations were identified in the control population. Patient’s age is expressed in years. Bold identifies the patients carrying variations that had not been described previously. M, male; F, female; S, syncope; ICD, intracardiac cardioverter defibrillator; UK, unknown; EPS, electrophysiological studies (+, positive response;-, negative response; N/P, not performed). The two patients who carried two PPVs each are identified by a and b, respectively.
We performed a genetic screening of 14 genes (SCN5A, CACNA1C, CACNB2, GPD1L,SCN1B, SCN2B, SCN3B, SCN4B, KCNE3, RANGRF, HCN4, KCNJ8, KCND3, and KCNE1L), which allowed the identification of 61 genetic variations in our cohort. Of these, 20 were classified as potentially pathogenic variations (PPVs), one variation of unknown significance, and 40 common or synonymous variants considered benign.
The 20 PPVs were found in 18 of the 55 patients (32.7% of the patients, 83.3% males; Table 2). Sixteen patients (88.9%) carried one PPV, and two patients (11.1%) carried two different PPVs each. Nineteen out of the 20 PPVs identified were localized in SCN5A and one in SCN2B.
The vast majority of the PPVs identified were missense (70%). We also detected 2 nonsense variations (10%), 3 insertions or deletions causing frameshifts (15%), and one splicing variation (5%). The three frameshifts (p.R569Pfs*151, p.E625Rfs*95 and p.R1623Efs*7) were identified in SCN5A. These were not found in any of the databases consulted (see Methods), and were thus considered potentially pathogenic (see below). The other 16 rare variations identified inSCN5A had been previously described, and hence were also considered potentially pathogenic. Fourteen of them had been identified in BrS patients. Of these, 6 had also been identified in individuals diagnosed with other cardiac electric diseases (i.e. Sick Sinus Syndrome, Long QT Syndrome, Sudden Unexplained Nocturnal Death Syndrome or Idiopathic Ventricular Fibrillation [2,15,16,20,21,25]). The other 2, p.P1725L and p.R1898C, had only been associated with Long QT Syndrome or found in Exome Variant Server with a MAF of 0.0079%, respectively. Furthermore, we identified a variation in SCN2B (c.632A>G in exon 4 of the gene, resulting in p.D211G) which was considered pathogenic. This patient was included within our cohort, but the functional characterization of channels expressing SCN2B p.D211G was object of a previous study from our group [7]. We also identified a nonsense variation in RANGRFwhich has been formerly reported as rare genetic variation of unknown significance [29].
Additionally, we screened the relatives of those probands carrying a PPV. We analysed a total of 129 relatives, 69 of which (53.5%) were variation carriers. Genotype-phenotype correlations evidenced that 8 of the families displayed complete penetrance (S3 Table). Additionally, no relatives were available for one of the probands carrying a PPV, thus hampering genotype-phenotype correlation assessment. The other 12 families showed incomplete penetrance.
MLPA analysis
The 37 patients with negative results after the genetic screening of the 14 BrS-associated genes underwent MLPA analyses of SCN5A. This technique did not reveal any large exon deletion or duplication in this gene for any of the patients.
SCN5A p.R569Pfs*151 (c.1705dupC), a novel PPV
A 41-year-old asymptomatic male presented a type 3 BrS ECG which was suggestive of BrS. Flecainide challenge unmasked a type 1 BrS ECG (Fig 1A, left), which was also spontaneously observed sometimes during medical follow up. Sequencing of SCN5A revealed a duplication of a cytosine at position 1705 (c.1705dupC; Fig 1A, right), which originated a frameshift that lead to a truncated Nav1.5 channel (p.R569Pfs*151). The proband’s sister also carried this duplication, but had never presented signs of arrhythmogenesis. The proband’s twin daughters were also variation carriers, displayed normal ECGs and, to date, are asymptomatic (Fig 1A, middle). Thus, p.R569Pfs*151 represents a novel genetic alteration in the Nav1.5 channel that could potentially lead to BrS, but with incomplete penetrance.
Fig 1. Characteristics of the probands carrying non-reported potentially pathogenic variations (PPVs) in SCN5A and their families.
Left: Electrocardiograms of the probands: (A) patient carrying the p.R569Pfs*151 variation, showing the ST elevation characteristic of BrS in V1 at the time of the flecainide test; (B) patient carrying the p.E625Rfs*95 variation, showing the spontaneous ST elevation characteristic of BrS in V1 and V2; and (C) patient carrying the p.R1623Efs*7 variation, showing the spontaneous ST elevation characteristic of BrS in V1 and V2. Middle: Family pedigrees. Open symbols designate clinically normal subjects, filled symbols mark clinically affected individuals and question marks identify subjects without an available clinical diagnosis. Plus signs indicate the carriers of the PPVs and minus signs, non-carriers. The crosses mark deceased individuals and arrows identify the proband. Right: Detail of the electropherograms obtained after SCN5Asequence analysis of a control subject (left panels) and of the probands (right panels).
A 51-year-old asymptomatic male was diagnosed with BrS since he presented a spontaneous ST segment elevation in leads V1 and V2 characteristic of type 1 BrS ECG (Fig 1B, left). The sequencing of SCN5A evidenced an adenine duplication at position 1872 (c.1872dupA, Fig 1B, right). This genetic variation results in a truncated Nav1.5 channel (p.E625Rfs*95). The genetic analysis of the proband’s relatives proved that only her mother carried the variation (Fig 1B, middle). She was asymptomatic, but a BrS ECG was unmasked upon ajmaline challenge. The proband’s sister was found dead in her crib at 6 months of age, which suggests that her death might be compatible with BrS. Therefore, the p.E625Rfs*95 variation in the Nav1.5 channel represents a novel genetic alteration potentially causing BrS.
SCN5A p.R1623Efs*7 (c.4867delC), a novel PPV
The proband, a 31-year-old male, was admitted to hospital after suffering a syncope. His baseline 12-lead ECG showed a ST segment elevation in leads V1 and V2 that strongly suggested BrS type 1 (Fig 1C, left). A deletion of the cytosine at position 4867 (c.4867delC) was observed upon SCN5A sequencing (Fig 1C, right). This base deletion leads to a frameshift that originates a truncated Nav1.5 channel (p.R1623Efs*7). Genetic screening of his parents and sisters evidenced that none of them carried this novel variation (Fig 1C, middle). None of them had presented any signs of arrhythmogenicity, nor had a BrS ECG. Nevertheless, in uterogenetic analysis of one of his daughters proved that she had inherited the variation. She died when she was 1 year of age of non-arrhythmogenic causes. Hence, the p.R1623Efs*7 variation in the Nav1.5 channel is a novel genetic alteration originated de novo in the proband that could potentially lead to BrS.
Synonymous and common genetic variations portrayal
In our cohort, we identified 40 single nucleotide variations which were common genetic variants and/or synonymous variants (S2 Table). Twenty-nine had a minor allele frequency (MAF) over 1%, and were thus considered common genetic variants.
We also identified 11 variants with MAF less than 1%. Of them, 9 were synonymous variants, what made us assume that they were not disease-causing. Four of these synonymous variants were not found in any of the databases consulted, and thus their MAF was considered to be less than 1%. Each of these synonymous variations was identified in 1 patient of the cohort. A similar proportion of individuals carrying these novel variations was detected upon sequencing of 300 healthy Spanish individuals (600 alleles). The remaining 2 variants were missense, and although they had either a MAF of less than 1% or an unknown MAF according to the Exome Variant Server and dbSNP websites, they were common in our cohort (29.2 and 50%, respectively; S2 Table), and a similar MAF was detected in a Spanish cohort of healthy individuals (26.7% and 48.8%, respectively).
Influence of phenotype and age on PPV discovery
To assess if a connection existed between the probands’ phenotype and the PPV detection yield, we classified the patients in our cohort according to their ECG (spontaneous or induced type 1), the presence of BrS cases within their families, and the presence/absence of symptoms. Even though the overall PPV detection yield was 32.7%, it was even higher for symptomatic patients (Fig 2). Indeed, in this group of patients, having a family history of BrS was identified as a factor for increased PPV discovery yield. In the case of absence of BrS in the family, the variation discovery yield was almost double for those patients having a spontaneous type 1 BrS ECG than for patients with drug-induced type 1 ECG (45.5% vs 25%, respectively). In addition, we identified a PPV in 44.4% of the asymptomatic patients who presented family history of BrS and a spontaneous type 1 BrS ECG. When the patient presented drug-induced type 1 ECG or in the absence of family history of BrS, the PPV discovery yield was of around 15%.
Fig 2. Influence of the phenotype on PPV discovery yield.
Bar graph comparing the PPV detection yield in 8 different clinical categories (stated below the graph). Each bar shows the total number of patients for each clinical category divided in those with a PPV (black) and those without an identified PPV (white). The number of patients (in brackets) and percentages are given. Pos, positive; Neg, negative; Spont, spontaneous type 1 BrS ECG; Drug, drug-induced type 1 BrS ECG; n, number of patients.
We also investigated the role of age on the PPV occurrence. No significant age differences were observed between variation carriers and non-carriers (38.6±10.3 and 43.5±14.4, respectively, p = 0.16). However, the PPV discovery yield was higher for patients with ages between 30 and 50 years: out of the total of patients carrying a PPV, 83.3% of the patients were in this age range, while 11.1% were younger and 5.6% were older patients (Fig 3A, upper panel). The PPV discovery yield was significantly higher for symptomatic than for asymptomatic patients (42.3% vs 24.1%, respectively; Fig 3A, lower panels).
Fig 3. Influence of the age on PPVs discovery yield.
(A) Pie charts showing the distribution of patients in the overall population as well as in the categories of symptomatic and asymptomatic patients regarding PPV discovery. The percentage and the number of patients (in brackets) are given for each group. The small pie charts correspond to the age distribution of patients with an identified PPV. (B) Bar graphs of the PPV detection yields obtained for each of the age groups (< 30 years, 30–50 years and > 50 years). Numbers inside each bar correspond to the number of patients carrying a PPV for each category and the percentages represent the variation detection yield.
Noteworthy, in the 30–50 age range, 52.9% (9/17) of the symptomatic patients and 35.3% (6/17) of asymptomatic patients carried one PPV (Fig 3B, middle). Additionally, 40% (2/5) of the symptomatic young patients (< 30 years) were variation carriers, while no PPVs were identified in asymptomatic patients within this age range.
Overall, 55 unrelated Spanish patients clinically diagnosed with BrS were included in our study.Table 1 shows the demographics of this cohort, and Table 2 and S1 Table show the clinical and genetic characteristics of all the patients included in the study. The mean age at clinical diagnosis was of 41.9±13.3 years. Although the majority of patients were males (74.5%), their age at diagnosis was not different than that of females (41.8±12.1 years and 42.3±16.3 years, respectively; p = 0.92). A type 1 BrS ECG was present spontaneously in 37 patients (67.3%), and drug challenge revealed a type 1 BrS ECG for the remaining 18 patients (32.7%). Almost half of the patients had experienced symptoms, including 2 SCD and 4 aborted SCD. Patients who had not previously experienced any signs of arrhythmogenicity despite having a BrS ECG were considered asymptomatic. Comparison of symptomatic vs asymptomatic patients evidenced a similar percentage of males (73.1% and 75.9%, respectively). However, the mean age at diagnosis was different between the two groups of patients (37.7±14.3 and 45.7±11.4, respectively; p<0.05).
Discussion
To the best of our knowledge, this is the first comprehensive genetic evaluation of 14 BrS-susceptibility genes and MLPA of SCN5A in a Spanish cohort. Well delimited BrS cohorts from Japan, China, Greece and even Spain have been genetically studied [24,30–32]. Additionally, an international compendium of BrS genetic variations identified in more than 2100 unrelated patients from different countries was published in 2010 [3]. However, all these studies screenedSCN5A exclusively. In 2012, Crotti et al. reported the spectrum and prevalence of genetic variations in 12 BrS-susceptibility genes in a BrS cohort [5]. However, this study included patients of different ethnicity. Here, we report the analysis of 14 genes which has been conducted on a well-defined BrS cohort of the same ethnicity.
Our results confirm that SCN5A is still the most prevalent gene associated with BrS. Indeed,SCN5A-mediated BrS in our cohort (30.9%) is higher than the proportion described in other European reports [3,23], where a potentially causative variation is identified in only 20–25% of BrS patients. The reason for this discrepancy is unclear but could point towards a higher prevalence of SCN5A PPVs in the Spanish population or to selection bias. Additionally, we identified a genetic variation in SCN2B (c.632A>G, which results in p.D211G). We have formerly published the comprehensive electrophysiological characterization of this variation, and showed that indeed this variation could be responsible of the phenotype of the patient, thus linking SCN2B with BrS for the first time [7]. Also, we identified a variation in RANGRF. This variation (c.181G>T leading to p.E61X) had been previously reported in a Danish atrial fibrillation cohort [33]. Surprisingly, the authors reported an incidence of 0.4% for this variation in the healthy Danish population, which brought into question its pathogenicity. Our finding of this variation in an asymptomatic patient displaying a type 2 BrS ECG also points toward considering it as a rare genetic variation with a potential modifier effect on the phenotype but not clearly responsible for the disease [29].
No PPVs were identified in the other genes tested. Certainly, it is well accepted that the contribution of these genes to the disease is minor, and thus should only be considered under special circumstances [13,34]. In addition, recent studies have questioned the causality of variations identified in some of these minority genes [35].
We also used the MLPA technique for the detection of large exon duplications and/or deletions in SCN5A in patients without PPVs, and no large rearrangements were identified. This is in accordance with previous reports, which revealed that such imbalances are uncommon [8–10].
Kapplinger et al. [3] reported a predominance of PPVs in transmembrane regions of Nav1.5. Indeed, it has been proposed that most rare genetic variations in interdomain linkers may be considered as non-pathogenic [36]. In contrast, PPVs identified in this study are mainly located in extracellular loops and cytosolic linker regions of Nav1.5 (Fig 4). Additionally, 2 of our non-previously reported frameshifts are located in the DI-DII linker. These 2 genetic variations lead to truncated proteins, which would lack around 75% of the protein sequence, and thus are presupposed to be pathogenic.
Fig 4. Nav1.5 channel scheme showing the relative position of the SCN5A PPVs identified in our cohort.
Open symbols indicate already described variations and closed symbols locate novel variations reported in this study. DI to DIV designate the 4 domains of the protein, and numbers 1–6 identify the different segments within each domain. Crosses mark the voltage sensor.
In our cohort, we have identified 40 synonymous or common genetic variations, 4 of which have not been previously reported. These variations are gradually becoming more and more important in the explanation of certain phenotypes of genetic diseases. Only a few common variations identified here are already published as phenotypic modifiers [37,38]. The effect of these and other common variants identified in our cohort on BrS phenotype should be further studied.
Unexpectedly, almost 40% (7/18) of the PPV carriers did not present signs of arrhythmogenicity. We also performed genotype-phenotype correlations of the PPVs identified in the families (S3 Table). These studies uncovered relatives, most of whom were young individuals, who carried a familial variation but had never exhibited any clinical manifestations of the disease. This is in agreement with Crotti et al. and Priori et al. [5,23], who postulated that a positive genetic testing result is not always associated with the presence of symptoms. Indeed, the existence of asymptomatic patients carrying genetic variations described to cause a severe Nav1.5 channel dysfunction has been reported [39]. The identification of silent carriers is of paramount importance since it allows the adoption of preventive measures before any lethal episode takes place. Unknown environmental factors, medication and modifier genes have been suggested to influence and/or predispose to arrhythmogenesis [11]. Hence, this group of patients has to be cautiously followed in order to avoid fatal events.
Our studies on the connection between patients’ phenotype and the PPV detection yield highlighted the presence of symptoms as a factor for an increased variation discovery yield. Within the group of symptomatic individuals, a PPV was identified in a higher proportion of patients displaying a spontaneous type 1 BrS ECG than for patients showing a drug-induced ECG. Likewise, within the asymptomatic patients with family history of BrS, those who presented spontaneous type 1 BrS ECG carried a PPV more often than those with a drug-induced ECG (Fig 2). Referring to age, the vast majority (17/20, 85%) of the PPVs were identified in patients around their fourth decade of age (30–50 years). This is in accordance with the accepted mean age of disease manifestation. Moreover, in this age range, more than 50% of the patients who presented symptoms carried a variation that could be pathogenic (Fig 3). Importantly, 35.3% of asymptomatic patients of around 40 years of age also carried one of such variations. These data highlight the importance of performing a genetic test even in the absence of clinical manifestations of the disease, and particularly when in the 30–50 years range, which is in accordance with consensus recommendations [13,34].
In conclusion, we have analysed for the first time 14 BrS-susceptibility genes and performed MLPA of SCN5A in a Spanish BrS cohort. Our cohort showed male prevalence with a mean age of disease manifestation around 40 years. BrS in this cohort was almost exclusivelySCN5A-mediated. The mean PPV discovery yield in our Spanish BrS patients is higher than that described for other BrS cohorts (32.7% vs 20–25%, respectively), and is even higher for patients in the 30–50 years age range (up to 53% for symptomatic patients). All these evidences support the genetic testing, at least of SCN5A, in all clinically well diagnosed BrS patients.
Study Limitations
First of all, drug challenge tests were not performed for all the relatives who were asymptomatic variation carriers. This fact hampered their clinical diagnosis and represents an impediment to definitely assess the link between PPVs and BrS. These patients are nowadays under follow-up.
New PPVs have been identified in our cohort. The clinical information available for the families suggests that these new variations could be pathogenic. Still, in vitro studies of these variations are required in order to evaluate their functional effects and verify their pathogenic role. Additionally, genotyping in an independent cohort would help reduce the likelihood of type I (false positive) error in genetic variant discovery.
We have to acknowledge that the study set is relatively small. Consequently, the classification of patients according to the different clinical categories rendered rather small sub-groups, which may lead to over-interpretation of the results. Future studies will be directed to the genetic screening of additional Spanish BrS patients, which will probably reinforce the significance of the tendencies observed here.
Single Nucleotide Repair and Tunable DNA-directed Assembly of Nanomaterials, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 1: Next Generation Sequencing (NGS)
single nucleotide repair and tunable DNA-directed assembly of nanomaterials
Larry H. Bernstein, MD, FCAP, Curator
LPBI
Expanding DNAzyme functionality through enzyme cascades with applications in single nucleotide repair and tunable DNA-directed assembly of nanomaterials
Many biological functions require two or more enzymes working together in cascades. While many examples of protein and RNA enzyme cascades are known, few enzyme cascades containing solely DNAzymes have been reported. Herein we demonstrate the combination of an 8–17 DNAzyme with RNA cleavage activity and an E47 DNAzyme with DNA ligation activity to achieve a new function of single ribonucleotide repair in DNA while maintaining the integrity of the original DNA sequence, which is difficult for a single DNAzyme to achieve. In addition, this method is applied to modify the sequences of DNA strands immobilized on the surface ofnanoparticles to control the DNA-directed assembly selectively and sequentially. Such an approach can be applied to other DNAzymes with different activities to expand the functions of DNAzymes and the scope of their applications.
The discovery of deoxyribozymes (DNAzymes) with enzymatic activity in the 1990s1,2 has demonstrated that DNA molecules are not simply inert biopolymers for genetic storage; they can be active catalysts as well.3–8 Since then, many DNAzymes have been obtained with catalytic functions such as cleavage,2,9–13 ligation,14–16 phosphorylation,17 adenylation18 or depurination19 of nucleic acids, as well as other reactions including porphyrin metallation, C–C bond formation, nucleopeptide linkage formation, oxygen transfer and thymine dimer repair.20–26 Because DNAzymes are facile to synthesize and more stable than protein and RNA enzymes, they have been widely used in applications such as nanomaterial assembly,27,28 biosensing,29–31 logical computing,32 nanomachine engineering,33 antiviral or gene therapy,34 and in vitro RNA manipulation.35 Despite these successes, the application of DNAzymes is limited by the narrower range of catalytic functionality compared to protein enzymes. One possible approach to addressing this issue would be to combine enzymes with different reactivities to form a cascade of successive enzymatic reactions, which together create new functionality. Indeed, many such examples exist in biology, since nearly all important biological functions, such as the pathways involved in DNA repair and protein synthesis, require a cascade of multiple protein enzymes to carry out their full function. In contrast, little has been reported about the use of DNAzyme cascades to realize enhanced functionality. Such a strategy could expand the functionality of DNAzymes to a level more on par with protein and RNA enzymes, which should greatly increase the range of possible applications.
One such application is single nucleotide repair, i.e., excision of a misincorporated ribonucleotide in single-stranded DNA and subsequent insertion of the corresponding deoxyribonucleotide at the excision site. The misincorporation of ribonucleotides into DNA strands can occur from exposure to external oxidizing agents or ionizing radiation,36 or spontaneously during DNA replication.37 Misincorporation of ribonucleotide can distort the structure of DNA,38 reduce its stability,39 and interfere with the normal interaction between DNA and DNA polymerases.40 In fact, the overexpression of DNA polymerases that are prone to ribonucleotide misincorporation has been linked to many cancers, including ovarian, prostate, breast and colon cancers.41 In nature, protein enzymes such as RNase H and FEN-1 can efficiently excise misincorporated ribonucleotides in DNA by cleaving the DNA at the ribonucleotide site and then restoring the correct deoxyribonucleotides by DNA polymerases,42,43 which is an example of an enzyme cascade. It would be interesting to nd out if a similar function could be achieved through DNAzyme cascades.
Another potential application is in tuning the properties of DNA-functionalized nanomaterials. For example, DNA-functionalized gold nanoparticles27 have emerged as an attractive platform for biosensing,32,44–50 nanomedicine,45 and as building blocks for controlled nanoassemblies.51–55 Although much research has been focused on the surface modification of gold nanoparticles with DNA for various applications, there are still limited methods to modify the sequences of DNA already immobilized on gold nanoparticles in order to make the properties of the DNA-modified nanomaterials tunable aer fabrication. The use of DNAzymes is a promising approach for DNA modification on nanomaterials56 due to the excellent stability of DNAzymes and their smaller size compared to protein enzymes, thereby minimizing steric effects between the enzyme and the DNA in order to avoid reduction in reaction efficiency. However, it is still very challenging to modify a specific DNA sequence on multiple-DNA-functionalized nanomaterials to tune their functions in a selective and sequential fashion.
Herein, we demonstrate a cascade of two DNAzymes with RNA cleavage and DNA ligation activities, respectively, in order to carry out single nucleotide repair or selective sequence modification of DNA. In a one-pot reaction, a single misincorporated ribonucleotide in a DNA strand was converted to the corresponding deoxyribonucleotide while maintaining sequence integrity. Furthermore, the sequences of DNA strands immobilized on multiple functional nanoparticles were successfully modified in order to control and alter the DNAdirected assembly of nanoparticles in a stepwise and selective fashion.
Results and discussion To demonstrate that single nucleotide repair in DNA can be achieved by the cascade of two DNAzymes, we used a 26-nt DNA strand (O1) containing a misincorporated cytidine (rC) ribonucleotide as an example. The goal was to convert the rC in O1 into a deoxycytidine (C), as seen in O4 (Fig. 1a), while maintaining the integrity of the DNA sequence. The DNAzymes 17Em1 (Fig. 1a, blue) with RNA cleavage activity2,57–59 and E47 (Fig. 1a, red) with DNA ligation activity14,60 were chosen as the cascade pair in this study. The 17Em1 DNAzyme catalyzes the hydrolysis of the 30 phosphodiester linkage of the internal rC in the DNA strand when metal ion cofactors such as Pb2+ and Zn2+ are present (Fig. S1a in ESI†). On the other hand, the E47 DNAzyme can induce the catalytic ligation of the 50 –OH of the DNA substrate with another 30 -phosphorylated DNA strand (activated by imidazole)14 in the presence of Cu2+ or Zn2+ as the metal cofactor (Fig. S1b in ESI†). Therefore, by sequential cleavage and ligation reactions catalyzed by these two DNAzymes on O1 containing rC, O1 could first be cleaved at the 30 phosphodiester of the rC by 17Em1 and then undergo ligation at the cleavage site with another 30 -phosphorylated DNA strand of an identical sequence (except with deoxyribonucleotide C in place of ribonucleotide rC) by E47. The product O4 has a sequence identical to the starting strand O1, with the rC replaced with C.
Fig. 1 (a) Conversion of a single ribonucleotide (rC) in a DNA strand O1 to a deoxyribonucleotide (C) by the cascade of DNAzymes 17Em1 and E47: O1 is cleaved by 17Em1 to afford products of O2 and Oc; O2 is then ligated with O3 (activated by imidazole) to form O4 by E47. (b) Sequence modification of a DNA strand O5 to O4 through a similar protocol by the cascade of DNAzymes 17Em2 and E47.
Initially, 30 -fluorescein-labeled O1 was treated with 17Em1 to form DNA duplex O1-17Em1 via 18 matched base pairs (9 on each binding arm). In the presence of Pb2+, O1 was efficiently cleaved by 17Em1 into fragments O2 and Oc, resulting in the dehybridization of the duplex because the melting temperature of the duplex between 17Em1 and O2 or between 17Em1 and Oc is below room temperature (Fig. 1a). The fluorescence image after polyacrylamide gel electrophoresis (PAGE) suggested the complete cleavage of O1 and formation of O2 (Fig. 2a, lane 1 and 2 for O1 and O2, respectively), while Oc was not visible on the gel due to the lack of a fluorescein label. The cleavage reaction product O2 was also confirmed by the result from matrixassisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrum (Table 1 and Fig. S2 in ESI†). Control experiments using a DNAzyme of a different sequence (17Em2) or without Pb2+ showed negligible cleavage of the substrate O1 (Fig. S3 in ESI†) due to the specificity of the DNAzyme and the essential role of the metal ion cofactor.2,57–59 Subsequently, without any purification of O2 from the mixture solution after the previous cleavage step, E47 and 30 -phosphorylated O3 (imidazole-activated) were added into the solution to generate another DNA complex O2–O3-E47, which gave O4 as the product after the E47-catalyzed ligation reaction in the presence of Cu2+ (Fig. 1a).14,60 The formation of O4 was confirmed by both fluorescent PAGE (Fig. 2a, the upper band of lane 3) and MALDITOF MS (Table 1 and Fig. S2 in ESI†), while some unreacted O2 was also observed on the gel (Fig. 2a, the lower band of lane 3). Here, O3 was invisible due to the lack of a fluorescein label. Considerably lower levels of ligation between O2 and O3 were observed if either E47 or Cu2+ was absent (Fig. S3 in ESI†). Together these results indicate that the reactions catalyzed by the DNAzyme cascade were achieved through a one-pot reaction without isolation and purification of the intermediate O2.
Fig. 2 (a) Fluorescent PAGE (20% denaturing gel) images of the transformation from O1 to O4 by DNAzymes 17Em1 and E47. Lanes in (a): 1, O1; 2, 1 after cleavage by 17Em1 to yield O2 and Oc in the presence of Pb2+; 3, 2 after ligation to yield O4 by E47 in the presence of O3 and Cu2+; 4, 2 after ligation to yield O4 + 8A by E47 in the presence of O3 + 8A and Cu2+; 5, O4 in the presence of Pb2+ and 17Em1. (b) Fluorescent PAGE images of the transformation from O5 to O4 by 17Em2 and E47: Lanes in (b): 1, O5; 2, 1 after cleavage by 17Em2 to yield O2 in the presence of Pb2+; 3, 2 after ligation to yield O4 by E47 in the presence of O3 and Cu2+; 4, 2 after ligation to yield O4 + 8A by E47 in the presence of O3 + 8A and Cu2+; 5, O4 in the presence of Pb2+ and 17Em2
To provide further confirmation of the above successful conversion of rC in O1 to C in O4, while keeping other sequences identical, a longer O3 + 8A (O3 extended by A8 at 50 ) was used in place of O3 (Fig. 1a). Under the same conditions, a longer product O4 + 8A was obtained (Fig. 2a, the upper band of lane 4) with slower gel migration compared to O4 (Fig. 2a, the upper band of lane 3), suggesting that the ligation reaction occurred mostly between O2 and imidazole-activated O3, and not between O2 and un-activated Oc (Oc is the product from the previous cleavage reaction of O1 and 17Em1), in which case a band with the same migration as O4 would have been observed. The presence of C rather than rC in the product O4 was supported by the lower molecular weight of O4 in the MALDI-TOF mass spectrum as compared to that of O1 (Table 1), as well as the increased resistance to hydrolysis of O4 even in the presence of Pb2+ and 17Em1 (Fig. 2a, lane 5), which can catalyze the cleavage of a substrate containing an internal ribonucleotide linkage (O1),2,57 but not a substrate containing entirely deoxyribonucleotides (O4).
Table 1 Measured and calculated molecular weight (m/z) in MALDI-TOF mass spectra of 30 -fluorescein-labeled DNAs (O1, O2, O4 and O5). For full spectra, see Fig. S2 in ESI.†
DNA O1 O2 O4 O5 Measured 8831.6 4824.1 8812.3 8819.1 Calculated 8831.9 4826.3 8815.9 8821.9duct O4 + 8A was obtained (Fig. 2a, the upper band of lane 4) with slower gel migration compared to O4 (Fig. 2a, the upper band of lane 3), suggesting that the ligation reaction occurred mostly between O2 and imidazole-activated O3, and not between O2 and un-activated Oc (Oc is the product from the previous cleavage reaction of O1 and 17Em1), in which case a band with the same migration as O4 would have been observed. The presence of C rather than rC in the product O4 was supported by the lower molecular weight of O4 in the MALDI-TOF mass spectrum as compared to that of O1 (Table 1), as well as the increased resistance to hydrolysis of O4 even in the presence of Pb2+ and 17Em1 (Fig. 2a, lane 5), which can catalyze the cleavage of a substrate containing an internal ribonucleotide linkage (O1),2,57 but not a substrate containing entirely deoxyribonucleotides (O4).
In addition to the single nucleotide repair functionality, it is also possible to use this methodology to edit the sequence of a DNA strand, which was used to convert the DNA strand O5 into O4 using the same cascade and conditions as before (Fig. 1b and S4 in ESI†). The product O4 was confirmed by PAGE (Fig. 2b) and MALDI-TOF MS (Table 1 and Fig. S2 in ESI†) and found to be identical to that obtained from the method in Fig. 1a.
Encouraged by the above results, we applied this method to modify the sequence of DNA immobilized on gold nanoparticles27 (AuNPs) to control the DNA-directed assembly of the AuNPs in a selective manner. DNA-functionalized gold AuNPs have been used in a variety of applications due to both their unique properties and the sequence-dependent hybridization of ssDNA immobilized on the AuNPs for controlled assembly.51–55 As shown in Fig. 3, when AuNPs are functionalized by complementary DNAs, the AuNPs can assemble into an “aggregated” state via DNA hybridization, which shows red-shifted and broadened absorption spectra compared to AuNPs functionalized by non-complementary DNAs.
Fig. 3 Assembly of two types of DNA-functionalized gold nanoparticles. If the sequences of the two DNAs are not complementary, the gold nanoparticles are in a “dispersed” state and exhibit a red color with a sharp absorption band peaked around 532 nm (left). In contrast, the assembly of the gold nanoparticles with complementary DNAs causes the formation of an “aggregated” state with a broad absorption band around 600 nm (right).
By modifying the sequences of DNA on the AuNPs, the assembly of the particles can be effectively controlled. Although methods for fabrication of DNAfunctionalized AuNPs have been developed,27 there are still limited methods to modify the DNA sequences already immobilized on AuNPs in order to tune their functions. It is even more challenging to modify a specific DNA sequence on multiply-functionalized AuNPs with different DNA sequences on each nanoparticle. Selective modification can allow each different function of the AuNP to be controlled in a selective fashion for potential applications.
AuNPs of 13 nm diameter were functionalized with DNA molecules via 30 -end thiols and used for this study. The formation of the AuNP assembly was confirmed by TEM images (Fig. S5 in ESI†) and characterized by large changes in absorption spectra27 (A700/A532 changed from <0.15 to >0.50 as illustrated in Fig. 5 and Table S1 in ESI†). As shown in Fig. 4, O6- functionalized AuNPs (red) were found to be able to form aggregates with O9-functionalized AuNPs (blue) via DNAdirected assembly through 12 complementary base pairs (Fig. 5 and S5 and Table S1†), but not with O10-functionalized AuNPs (purple), because of the 4 mismatched base pairs in the middle of the binding arm (Fig. 5 and S5 and Table S1†). After being treated with 17Em2 and Pb2+, the O6 on the surface of AuNPs was cleaved and converted to O8, which could not hybridize with either O9 or O10 efficiently. Thus no DNA-directed assembly was observed between the resulting O8-functionalized AuNPs and either O9- or O10-functionalized AuNPs (Fig. 5 and S5 and Table S1†). However, after a subsequent ligation reaction catalyzed by E47 in the presence of imidazole-activated O3 and Cu2+, O8 on the surface of AuNPs could be extended to O7, making the AuNPs capable of assembling with O10-, but not O9- functionalized AuNPs (Fig. 5 and S5 and Table S1†).
Fig. 4 Controlling the assembly of DNA-functionalized gold nanoparticles via cascade-mediated modification of the DNA sequences. The solid and dashed lines indicate the successful and unsuccessful formation of aggregates, respectively. The inset shows the assembly of AuNPs modified with the complementary strands O6 and O9 (top) and the lack of assembly between AuNPs modified with noncomplementary strands O9 and O7 (obtained by treating O6-functionalized AuNPs with 17Em2 and E47) (bottom)
Fig. 5 Absorption spectra of O6-, O7- or O8-functionalized AuNPs in the presence of O9- and O10-functionalized AuNPs, respectively. The red-shift of the peak indicates the formation of AuNP aggregations due to the hybridization of complementary DNAs on the AuNPs.27 For the ratios of absorbance (A700/A532), see Table S1 in ESI.†
Interestingly, the product, O7-functionalized AuNPs, showed inverse characteristics in the formation of DNA-directed assembly with O10- and O9-functionalized AuNPs, compared to the original O6-functionalized AuNPs. TEM images of O6- functionalized AuNPs, either with or without treatment with 17Em2/E47, mixed with O10-functionalized AuNPs, are displayed in the inset of Fig. 4. These results clearly demonstrate that the ability to edit and replace DNA on AuNPs allows for exquisite programmable control over the assembly of nanoparticles.
Taking advantage of the specificity of DNAzyme to its nucleic acid substrates by complementary base pairing in the binding arms, selective modification of DNA sequences on surface of multiple functional AuNPs was also achieved in this work. As depicted in Fig. 6, O6 (red) and O11 (blue) bi-functional AuNPs could be modified by 17Em1, 17Em2 and E47 selectively and sequentially. As shown in Fig. 6, AuNPs capable of forming DNA-directed assembly with both (A and E), either (B, C and F), or neither (D) of the O9- and O10-functionalized AuNPs could be obtained by monitoring the significant increase of A700/A532 as indication of assembly formation (Fig. 7 and Table S2 in ESI†). Such a result, which is challenging to achieve by other techniques, can be used for the construction of tunable nanoassemblies for various applications.
Fig. 6 (a) Scheme showing stepwise modification of DNA sequences on multiply-functional gold nanoparticles by the collaboration of 17Em1 or 17Em2 and E47. (b) DNA sequences of O6–O8 and O11.
Fig. 7 Absorption spectra of DNA-functionalized gold nanoparticles (A–F) (Fig. 6a) in the presence of O9- and O10-functionalized AuNPs, respectively. The red-shift of the peak indicates the formation of AuNP aggregations due to the hybridization of complementary DNAs on the AuNPs.27 For the ratios of absorbance (A700/A532), see Table S2 in ESI.†
In summary, by putting together a cascade of two DNAzymes with cleaving and ligating activities, we have generated a new functionality for effective DNA modification. This function was applied in the conversion of a single misincorporated ribonucleotide into the corresponding dexoyribonucleotide in DNA and the modification of DNA sequences on the surface of gold nanoparticles to modify and control their self-assembly through DNA hybridization. The results suggest that combining DNAzymes with different catalytic activities may achieve more interesting functions and thus broaden the applications of DNAzymes.
Notes and references
1 D. L. Robertson and G. F. Joyce, Nature, 1990, 344, 467–468.
2 R. R. Breaker and G. F. Joyce, Chem. Biol., 1994, 1, 223–229.
3 D. Sen and C. R. Geyer, Curr. Opin. Chem. Biol., 1998, 2, 680– 687.
4 Y. F. Li and R. R. Breaker, Curr. Opin. Struct. Biol., 1999, 9, 315–323.
5 Y. Lu, Chem.–Eur. J., 2002, 8, 4588–4596.
6 R. R. Breaker, Nature, 2004, 432, 838–845.
7 K. Schlosser and Y. F. Li, Chem. Biol., 2009, 16, 311–322.
8 S. K. Silverman, Acc. Chem. Res., 2009, 42, 1521–1531.
9 R. R. Breaker and G. F. Joyce, Chem. Biol., 1995, 2, 655–660.
10 N. Carmi, S. R. Balkhi and R. R. Breaker, Proc. Natl. Acad. Sci. U. S. A., 1998, 95, 2233–2237.
11 A. R. Feldman and D. Sen, J. Mol. Biol., 2001, 313, 283–294.
12 J. W. Liu, A. K. Brown, X. L. Meng, D. M. Cropek, J. D. Istok, D. B. Watson and Y. Lu, Proc. Natl. Acad. Sci. U. S. A., 2007, 104, 2056–2061.
13 M. Chandra, A. Sachdeva and S. K. Silverman, Nat. Chem. Biol., 2009, 5, 718–720.
14 B. Cuenoud and J. W. Szostak, Nature, 1995, 375, 611–614.
15 A. Sreedhara, Y. F. Li and R. R. Breaker, J. Am. Chem. Soc., 2004, 126, 3454–3460.
16 W. E. Purtha, R. L. Coppins, M. K. Smalley and S. K. Silverman, J. Am. Chem. Soc., 2005, 127, 13124–13125.
17 Y. F. Li and R. R. Breaker, Proc. Natl. Acad. Sci. U. S. A., 1999, 96, 2746–2751.
18 Y. F. Li, Y. Liu and R. R. Breaker, Biochemistry, 2000, 39, 3106–3114.
19 T. L. Sheppard, P. Ordoukhanian and G. F. Joyce, Proc. Natl. Acad. Sci. U. S. A., 2000, 97, 7802–7807.
20 Y. F. Li and D. Sen, Nat. Struct. Biol., 1996, 3, 743–747
Lingjie Li, Jing Xu, Jinglei Lei, Jie Zhang, Frank McLarnon, Zidong Wei, Nianbing Li and Fusheng Pan
J. Mater. Chem. A, 2015,3, 1953-1960
Abstract
The Ni(OH)2 hexagonal platelets were in situ fabricated on Ni foam as a binder-free supercapacitor electrode material with high performance and excellent cycling stability by a one-step, cost-effective, green hydrothermal treatment of three-dimensional (3D) Ni foam in a 15 wt% H2O2 aqueous solution.
Meirong Xia, Ying Liu, Zidong Wei, Siguo Chen, Kun Xiong, Li Li, Wei Ding, Jinsong Hu, Li-Jun Wan, Rong Li and Shahnaz Fatima Alvia
J. Mater. Chem. A, 2013,1, 14443-14448
Abstract
A facile and controllable process for preparing Pd@Pt/CNT core@shell catalysts for the oxygen reduction reaction (ORR) via Pd-induced Pt(IV) reduction on Pd/CNT.
Linfeng Xiong, Hui Zhang, Aiqing Zhong, Zidong He and Kun Huang
Chem. Commun., 2014,50, 14778-14781
Abstract
A novel method that enables the formation of core-confined bottlebrush copolymers (CCBCs) as catalyst supports for one-pot cascade reactions is reported for the first time.
GEN Tech Focus: Rethinking Gene Expression Analysis, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 1: Next Generation Sequencing (NGS)
GEN Tech Focus: Rethinking Gene Expression Analysis
Larry H. Bernstein, MD, FCAP, Curator
LPBI
Quantitating gene expression is essential for researchers to answer important biological questions about basic cellular functions, as well as disease states. In the following articles you will discover the multitude of advances investigators have made to accurately measure and quantitate genetic transcripts within the cell.
A great deal of research on pathway analysis is currently focusing on RNA rather than proteins, and the complex RNA networks that regulate gene expression. With the realization that more than 90% of the genome that is transcribed into RNA is not translated into protein, and the growing numbers of naturally occurring microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) being identified and characterized, the important role these RNAs play in normal biological processes and across human diseases is becoming increasingly clear.
The Gene-Expression Undergrowth Have Been Well Trodden, but RNA Paths Want Wear, Too
Hepatitis C virus depends on a functional interaction between its genome and miR-122 for viral stability and replication. Researchers recently used an antisense oligonucleotide that targets the liver-specific microRNA miR-122, blocking its function. [Bluebay2014/Fotolia]
A great deal of research on pathway analysis is currently focusing on RNA rather than proteins, and the complex RNA networks that regulate gene expression.
With the realization that more than 90% of the genome that is transcribed into RNA is not translated into protein, and the growing numbers of naturally occurring microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) being identified and characterized, the important role these RNAs play in normal biological processes and across human diseases is becoming increasingly clear.
This knowledge—combined with the available technology and strategies to decipher RNA pathways and link alterations in the levels or activity of miRNAs or lncRNAs to gene expression, epigenetic mechanisms, and protein activity in normal and disease phenotypes—is driving the development and clinical testing of novel drug targets and therapeutics that target regulatory RNAs.
For example, a microRNA was targeted in a Phase II clinical study that assessed the effect of miravirsen, an antisense oligonucleotide, in patients with hepatitis C. The study, which was described in 2013 in the New England Journal of Medicine, indicated that miravirsen sequesters the liver-specific microRNA miR-122 in a highly stable heteroduplex, thereby inhibiting its function.
Hepatitis C virus (HCV) depends on a functional interaction between its genome and miR-122 for viral stability and replication. According to the study, inhibition of miR-122 in HCV-infected patients was associated with decreased levels of HCV RNA that continued beyond the treatment period, without evidence of viral resistance.
The therapeutic potential of regulatory RNAs is also being assessed in other conditions such as cancer. Specifically, miRNAs and other ncRNAs in cancer initiation, progression, and metastasis are being studied by George Calin, M.D., Ph.D., a professor of experimental therapeutics, MD Anderson Cancer Center, University of Texas. Dr. Calin’s group is scouring the “microRNAome” to identify miRNAs of about 21–22 nucleotides that can serve as reliable biomarkers for cancer diagnosis and to guide decision-making in patient management, including as predictors of survival and response to drug therapy.
miRNAs are involved in every aspect of tumorigenesis, cancer progression, and dissemination. Not only are they expressed in tumor cells, they are also stably expressed in exosomes and are present in various bodily fluids, where they can act like hormones and signaling molecules. Comparative profiling of these fluids for differences in miRNA levels between patients with and without cancer could identify relevant biomarkers.
Analyzing RNA Pathways
Using Qiagen’s Ingenuity Pathway Analysis, researchers can analyze relationships between molecules and diseases of interest by modeling how gene expression patterns affect functional outcomes or disease processes.
Dr. Calin and colleagues have described the significance of miRNA signatures obtained in recent studies involving miRNA profiling of human tumors. An overview appeared 2014 in CA: A Cancer Journal for Clinicians (“MicroRNAome genome: a treasure for cancer diagnosis and therapy”). Also, last February, Dr. Calin gave an account of his group’s work at the Molecular Med Tri Conference in San Francisco.
Technology is not holding back advances in the field of RNA pathway analysis according to Dr. Calin. The main bottleneck at present is in the design of prospective studies needed to confirm the predictive value of miRNA-based biomarkers.
Dr. Calin points to two other key challenges that scientists currently face in translating research findings into diagnostic, prognostic, and therapeutic tools. One is the difficulty in selecting an miRNA target, mainly because an individual miRNA could have a role in regulating tens, hundreds, or even thousands of protein-coding genes. For drug discovery, the aim is to identify miRNAs that affect a single pathway of interest to help limit off-target effects. The need for novel delivery systems for RNA-targeted drugs is another key challenge.
At the Molecular Med Tri Conference, Jean-Noel Billaud, Ph.D., principal scientist at Qiagen Bioinformatics, presented a case study demonstrating how the company’s Ingenuity Pathway Analysis technology can be used to conduct a systems biology analysis to identify the pathways, potential upstream regulators, and downstream outcomes involved in the host response to West Nile Virus (WNV) infection. Dr. Billaud also discussed how to interpret the results from a biological perspective.
In his presentation, Dr. Billaud described the first step in this analytical process as the acquisition of RNA sequence data using next-generation sequencing techniques for the purpose of characterizing and quantifying differential gene expression between an infected and uninfected cell. The CLC Cancer Research Workbench tool is used to process the sequence data, and the results are imported directly into the IPA system.
Analysis of differential gene expression aims to answer a series of key questions, including the following: What metabolic and/or signaling pathway(s) is activated or inhibited? Is there an overlap of the genes or pathways that are activated or inhibited? What are the potential upstream, downstream, functional, and phenotypic implications of this pathway activation or inhibition?
Dr. Billaud described other questions researchers might attempt to answer through the use of IPA: What are the identifying the underlying transcriptional programs? Which biological processes are involved and in what way? Are there splice variants of interest? What type of regulation is involved?
In the WNV case study, IPA predicted activation of the interferon signaling pathway and added statistically and functionally relevant biological processes to the WNV-related biochemical network the system developed. IPA is able to simulate the effects of interferon pathway activation on neighboring molecules and processes, which enables broader modeling of antiviral responses, prediction of the effects on viral replication, and identification of upstream transcriptional regulators of antiviral and related anti-inflammatory processes, for example.
These data and analytical capabilities may allow researchers to propose new hypotheses that connect molecules in regulatory networks to disease-related pathways in a predictive way, leading to the identification of a “master regulator” that could serve as a disease-specific drug target, according to Dr. Billaud.
In the WNV example, he described the use of the Molecule Activity Predictor (MAP) function in IPA to test the hypothesis that CLEC7A is a host susceptibility factor required by WNV to stimulate an immune response in the brains of infected patients, contributing to the development of life-threatening encephalitis. The MAP function simulates the inhibition or downregulation of CLEC7A, showing how it would likely reduce the risk of WNV-associated encephalitis. These types of hypotheses would then need to be tested and validated.
Pathways Driving B-Cell Differentiation
Robert C. Rickert, Ph.D., professor and director of the Tumor Microenvironment and Metastasis Program at Sanford-Burnham Medical Research Institute, is using conditional gene targeting to identify the genes and biochemical pathways that play a role at specific stages of B-cell differentiation. With this approach, it is possible to knock out targeted genes in a mouse at different stages of B-cell development, and to do so in an inducible fashion, allowing you “to look at how it affects different signal transduction pathways in a context-specific manner,” says Dr. Rickert.
When applied to a relevant mouse model of disease—such as a B-cell lymphoma—this inducible genetic system should yield effects similar to those that could be obtained with a drug capable of blocking the activity of the targeted gene product. Dr. Rickert and colleagues are exploring the similarity between the effects achieved with conditional gene targeting and those of recently approved drugs to treat chronic lymphocytic leukemia (CLL) and some forms of lymphoma such as idelalisib and ibrutinib, which are both inhibitors of the B-cell receptor pathway via blocking of PI3K or Bruton’s tyrosine kinase (BTK), respectively.
Dr. Rickert presented his group’s latest research at a Keystone Symposium Conference, PI 3-Kinase Signaling Pathways in Disease, which took place last January in Vancouver. In his talk, Dr. Rickert emphasized that the phosphatidyl inositol-3 kinase (PI3K) pathway is a major regulator B lymphocyte differentiation and function.
Dr. Rickert has also applied conditional gene targeting to compare the roles of the NFκB and PI3K pathways in B-cell maturation. He has shown that while both pathways are essential at some stages of B-cell differentiation, only one pathway may be necessary for B-cell maintenance and survival.
“Ultimately we want to gain more insight at the biochemical level into single cells and the heterogeneity of the cell populations we’re interested in,” says Dr. Rickert. Tumors and cancer cell populations are quite heterogeneic, and better biochemical tools are needed to be able to sort through these populations of cells and “look at some of the more interesting, rogue cells, such as cancer stem cells,” he adds.
An Evolutionary Approach
In his laboratory at Hebrew University of Jerusalem, researcher Yuval Tabach, Ph.D., is using computational tools to analyze and compare the genomes and proteins of hundreds of species to identify evolutionary patterns of conservation and loss that point to connections between molecular pathways and disease.
“The main power of this phylogenetic profiling approach is that if you look at proteins across evolution, some are lost at certain points in certain species,” says Dr. Tabach. For example, proteins involved in the tricarboxylic acid (TCA) cycle have been highly conserved across some species, but have disappeared in others because those species have lost their mitochondria.
Dr. Tabach and colleagues have shown that sets of genes associated with particular diseases have similar phylogenetic profiles. They are also using this approach to identify genes associated with longevity, cancer resistance, and various extreme environmental conditions.
Phylogenetic profiling to connect patterns of conservation and loss across millions of years of evolution can be applied to entire proteins, protein domains, and RNA molecules such as microRNAs. The potential applicability of this approach to drug discovery and development is multifaceted.
For example, given a gene known to be related to a certain disease, the ability to identify other genes with a similar phylogenetic profile might reveal genetic factors that could explain incomplete penetrance or the variability of disease severity in different affected individuals. Alternatively, identification of a candidate gene in one patient could serve as the basis for identifying other key factors in other patients with the same disease using the phylogenetic profile.
Compared to strategies such as gene expression analysis or protein-protein interaction mapping for identifying disease-related genes, phylogenetic profiling “is much faster” and will become an increasingly powerful tool as the genome sequences of more species become available, explains Dr. Tabach.
The Israeli start-up company ReThink Pharmaceuticals is using the molecular networks generated through this phylogenetic profiling work for the purpose of drug repositioning. “If you know that a certain drug targets a gene, we can build a network to find other genes/proteins that interact with the drug target,” asserts Dr. Tabach, citing preliminary results that demonstrate the ability to predict additional effects of a drug candidate.
A critical component of RNA interference (RNAi) studies is the validation of gene expression inhibition. RNAi experiments have many sources of variation that make accurate quantitation of target mRNA difficult when qPCR is used. Variation in the potency and stability of short interfering RNA (siRNA), coupled with differences in transfection efficiency and protein turnover, results in varying gene knockdown efficiency.
Over the past 10 years, scientists say new methods, including deep sequencing and DNA tiling arrays, have enabled the identification and characterization of the human transcriptome. These techniques completely changed our understanding of genome organization and content and revealed that a much larger part of the human genome is transcribed into RNA than was previously assumed—about 70%.
Over the past 10 years, scientists say new methods, including deep sequencing and DNA tiling arrays, have enabled the identification and characterization of the human transcriptome. These techniques completely changed our understanding of genome organization and content and revealed that a much larger part of the human genome is transcribed into RNA than was previously assumed—about 70%.
Last year researchers, including Tim Mercer, Ph.D., at the Institute for Molecular Bioscience-University of Queensland, Roche Nimblegen, and John Rinn, Ph.D., and his team in the department of stem cell and regenerative biology at Harvard, reported that “transcriptomic analyses have revealed an ‘unexpected complexity’ to the human transcriptome, the depth and breadth of which exceeds current RNA sequencing capability.”
These scientists used these techniques to identify and characterize unannotated transcripts whose rare or transient expression is below the detection limits of conventional sequencing approaches. The data also show that intermittent sequenced reads observed in conventional RNA sequencing datasets, previously dismissed as noise, are indicative of unassembled rare transcripts. Collectively, they say these results reveal the range, depth, and complexity of a human transcriptome that is far from fully characterized.
Noncoding transcripts are RNA molecules that include classical “housekeeping” RNAs such as transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), small nuclear RNAs (snRNAs), and small nucleolar RNAs (snoRNAs), which are constitutively expressed and play critical roles in protein biosynthesis.
Among these noncoding RNAs are numerous long noncoding RNAs (lncRNAs), which are defined as endogenous cellular RNAs of more than 200 nucleotides in length that lack an open reading frame of significant length (less than 100 amino acids). The RNA molecules constitute a heterogeneous group, allowing them, scientists point out, to cover a broad spectrum of molecular and cellular functions by implementing different modes of action. lncRNAs are roughly classified based on their position relative to protein-coding genes as intergenic (between genes), intragenic/intronic (within genes), and antisense. Initial efforts to characterize these molecules demonstrated that they function in cis, regulating their immediate genomic neighbors.
Regulatory Levels
lncRNAs can regulate gene expression at epigenetic, transcriptional, and post-transcriptional levels and take part in various physiological and pathological processes, such as cell development, immunity, oncogenesis, clinical disease processes, and more. A classic lncRNA, HOTAIR, was originally identified through work done by Howard Chang, M.D., Ph.D., at Stanford, and Dr. Rinn. Their research eventually led to the discovery of this 2.2 kilobase spliced RNA transcript that interacts with Polycomb group proteins to modify chromatin and repress transcription of the human HOX genes, which regulate development. It remains unclear as to exactly this is accomplished.
HOTAIR, it was found, originates from the HOXC locus and represses transcription across 40 kb of that locus by altering the chromatin trimethylation state. Hox genes, a highly conserved subgroup of the homeobox superfamily, regulate numerous processes including apoptosis, receptor signaling, differentiation, motility, and angiogenesis. Aberrations in Hox gene expression have been reported in abnormal development and malignancy.
HOTAIR works to repress Hox gene expression by directing the action of Polycomb chromatin remodeling complexes in trans to govern the cells’ epigenetic state and subsequent gene expression.HOTAIR expression is increased in primary breast tumors and metastases and its expression level in primary tumors can predict eventual metastasis and death. The recent discovery that lncRNA HOTAIRcan link chromatin changes to cancer metastasis furthers the relevance of lncRNAs to human disease.
Dr. Chang and his colleagues say that the finding that several lncRNAs can control transcriptional alteration implies that the difference in lncRNA profiling between normal and cancer cells is not merely the secondary effect of cancer transformation, and that lncRNAs are strongly associated with cancer progression. The researchers showed that lncRNAs in the HOX loci become systematically dysregulated during breast cancer progression.
They further demonstrated that enforced expression of HOTAIR in epithelial cancer cells induced genome-wide retargeting of polycomb repressive complex 2 (PRC2) to an occupancy pattern more resembling embryonic fibroblasts, leading to altered histone H3 lysine 27 methylation, gene expression, and increased cancer invasiveness and metastasis in a manner dependent on PRC2.
On the other hand they noted loss of HOTAIR can inhibit cancer invasiveness, particularly in cells that possess excessive PRC2 activity. These findings indicate that lncRNAs have active roles in modulating the cancer epigenome and may be important targets for cancer diagnosis and therapy. Thus, the investigators say, differential expression of lncRNAs may be profiled to aid in cancer diagnosis and prognosis and in the selection of potential therapeutics.
Two years ago the GENCODE consortium, within the framework of the ENCODE project, presented, and analyzed the most complete human lncRNA annotation to date. The data comprise 9,277 manually annotated genes producing 14,880 transcripts. The identification and annotation of this wealth of lncRNAs leaves scientists with a lot of research to do to fully characterize the varied functions of these unusual RNAs. Their identification also challenges technology developers to produce the tools to necessary for these analyses.
Drug-drug interactions (DDIs) are of particular concern for regulatory agencies and the pharmaceutical industry for drug safety. Induction of drug metabolizing enzymes by pharmaceuticals, nutraceuticals, and lifestyle influences is one type of DDI in which the influence of a perpetrator molecule increases the enzyme capacity that can metabolize a victim molecule, rendering it ineffective as a therapy. To evaluate this potential, screening assays have been developed, such as the use…
Biomarkers defining specific phenotypes are becoming increasingly important for developing new drugs for specific patient subpopulations. The value of a new biomarker is measured by its ability to reduce risk. Ideally, the biomarker should be developed in parallel with the new drug, as nearly 50% of the projected development costs can be saved by…
Imanova takes a structured approach to the development of imaging biomarkers, or i-biomarkers.
Biomarkers defining specific phenotypes are becoming increasingly important for developing new drugs for specific patient subpopulations. The value of a new biomarker is measured by its ability to reduce risk.
Ideally, the biomarker should be developed in parallel with the new drug, as nearly 50% of the projected development costs can be saved by shutting down a development program before it enters Phase II. A meaningful risk-benefit analysis of a biomarker requires estimates of its cost and accuracy, as well as the consequences of decisions that it will enable.
For the biomarker to be of value, the cost of its development has to be less than the projected costs of development from Phase II onwards, discounted to present time. While multiple competing business considerations affect a pharmaceutical company’s decision to proceed with a biomarker program, the skyrocketing market for biomarker discovery underscores the pharmaceutical industry’s hope that biomarkers will bolster the success rates of pipeline products.
“Imaging biomarkers have been Ideally, the biomarker should be developed in parallel with the new drug, as nearly 50% of the projected development costs can be saved by shutting down a development program before it enters Phase II. A meaningful risk-benefit analysis of a biomarker requires estimates of its cost and accuracy, as well as the consequences of decisions that it will enable.
Ideally, the biomarker should be developed in parallel with the new drug, as nearly 50% of the projected development costs can be saved by shutting down a development program before it enters Phase II. A meaningful risk-benefit analysis of a biomarker requires estimates of its cost and accuracy, as well as the consequences of decisions that it will enable.
For the biomarker to be of value, the cost of its development has to be less than the projected costs of development from Phase II onwards, discounted to present time. While multiple competing business considerations affect a pharmaceutical company’s decision to proceed with a biomarker program, the skyrocketing market for biomarker discovery underscores the pharmaceutical industry’s hope that biomarkers will bolster the success rates of pipeline products.
“Imaging biomarkers have been largely underutilized in drug development,” says Kevin Cox, Ph.D., CEO of London-based Imanova. “But we believe that molecular imaging has the power to assist in successful translation of molecules by reducing the risk of several specific causes of failure in Phase II clinical studies. Imaging biomarkers, or i-biomarkers, are especially valuable in giving confidence of tissue delivery, determination of target engagement, and the evaluation of a drug’s pharmacodynamic effects.”
While imaging is routinely used in clinical diagnostics for cancer, its acceptance in drug development has been slow. “This is a highly specialized area of knowledge,” Dr. Cox observes. “Designing imaging experiments to answer the right questions is not trivial. Combined with the perceived high costs and dearth of well-equipped facilities, this has slowed down the adoption of imaging as an integral step in drug development.”
Imanova presents an innovative and highly integrated solution in reducing the barriers for use of molecular imaging. Located in the former GlaxoSmithKline imaging center, Imanova’s staff applies the knowledge needed for translational application of imaging science.
“Another historical barrier for use of molecular imaging has been the lack of versatile PET tracers for key therapeutic targets,” remarks Dr. Cox. Together with its pharmaceutical clients, Imanova develops proprietary tracers that can answer critical questions about target engagement directly after drug administration. A structured approach for i-biomarker development takes the novel tracer from the candidate pool to clinical validation.
Uniquely, Imanova utilizes in silico biomathematical modeling to predict a candidate with ideal physicohemical characteristics. “The i-biomarker development pipeline adheres to a strict quality system,” continues Dr. Cox. “We not only provide candidate selection and labeling, but also rigorous preclinical evaluation in several species, combined with blood chemistry or other physiological measurements.”
The resulting biomarker provides quantitative information to make informed go/no-go decisions. Imanova hopes to develop an open innovation approach to i-biomarker research, and to encourage pharmaceutical companies to collaborate on tracer development.
“By collaborating in this pre-competitive space, a pharma-academic consortium can de-risk i-biomarker development programs and generate new tools to eliminate costs associated with futile activities downstream,” concludes Dr. Cox. “Most tracers need to be utilized early in the drug development process. Used at the right time, imaging biomarkers are able to inform the design of Phase II studies, including dose ranging and possibly patient selection, saving many months in development and millions of dollars in costs.”
Answers from Big Data
“Clinical bioinformatics is the application of a data-driven, high-tech approach in clinical setting,” says Jerome Wojcik, Ph.D., CEO of Quartz Bio, a clinical bioinformatics service provider located in Plan-Les-Ouates, Switzerland. “We use clinical bioinformatics to adapt treatment to patients, that is, to identify cohorts that respond to the drug in a predictable manner,” says Dr. Wojcik.
Pharmaceutical partners supply Quartz Bio with data collected in a course of clinical trials. The data (which may include information from protein and RNA expression, genotyping, molecular diagnostics, and flow cytometry studies) often exists in silos within a pharma company. To make sense of the data, Quartz Bio integrates heterogeneously formatted data, analyzes it for consistency, and identifies gaps and outliers.
Dr. Wojcik’s team dedicates over 40% of the overall analysis time to the biomarker data management. This key step is crucial for the quality of the overall analysis. According to Quartz Bio, all the data-management processes are documented, auditable, and reproducible.
Once the “Big Data” horde is adequately cleaned up, the team applies adaptive statistical methods to generate multiple hypotheses linking the drug action with subpopulations of patients. “Our challenge is to generate reliable hypotheses on a fairly small statistical patient sample, for example, a thousand patients, but using millions of biomarker datapoints,” continues Dr. Wojcik. “We do not rely on statistics alone. Graphical visualization adapted to the objectives of the study is necessary for interpretation of results.”
In a recent project, Quartz Bio analyzed multiple oncology biomarkers, such as gene expression, circulating tumor cells, and immunohistochemistry, to identify patient cohorts that would most likely benefit from a novel treatment. Biomarker analysis revealed a subpopulation whose survival rate increased significantly over the population average, bringing a potential application of personalized medicine closer to reality.
FLG:You recently told the Graduating class of 2015 at UC San Diego School of Medicine that pretty soon they’ll find that most of what they’ve learned is “just plain wrong”. What would you say is the first thing in our understanding of human medicine that is going to change significantly?
JCV: One of the areas that’s changing the fastest right now is cancer – as we drill down to the genome level we’re getting more information and understanding than has ever been possible before. Every single cancer is a genetic disease. Not necessarily inherited from your parents, but it’s genetic changes which cause cancer. So as we sequence the genomes of tumours and compare those to the sequence of patients, we’re getting down to the fundamental basis of each individual person’s cancer. And that’s truly my definition of my view of precision medicine. For example, at Human Longevity (HLI), we sequence the whole genome of the patient; we sequence the genome of the tumour to a very high, adept, coverage; we sequence the RNA in the tumour to understand which genes in the tumour are being expressed and modified; and we sequence the entire immune system. From that picture we understand the patients susceptibility in the first place for cancer, and why they probably got it, and whether their immune system responded to the cancer – and usually it doesn’t which is why cancer shows up. From the modified proteins that show up from the genetic changes, we get a whole new view of which drugs will work, and will not work, on that tumour. Also, we’re taking that further, developing personalised cancer vaccines for that individual against their specific tumour. So, it’s getting very precise – very data and information driven, versus what standard practise is today; doing surgery and trying to diagnose things using a microscope. It’s a different level resolution. It’s like trying to look through a telescope on Earth at Pluto versus the photos we just saw from that flyby.
FLG: Your new company, Human Longevity, is aiming to play a significant part in changing the human experience. What got you excited enough to buy all those hi-seq machines and set out to build the world’s largest genomic database?
JCV: Well, you might recall that 15 years ago I announced the first human genome that my team sequenced at Celera. The trouble is, that genome cost $100 million and took 9 months to do, with a large dedicated team. That seems extraordinary today, now that we can do thousands a month for little over $1,000 each, but 15 years ago there was a $3 billion 15 year government program to try and do the same thing. So we’ve changed from that 15 year $3 billion dollar effort, down to 9 months and $100 million, and down to thousands a month. It’s always been the dream, but technology didn’t allow it until recently.
FLG:You guys already have some great partnerships out there giving you access to samples to get you to that 1 million genomes mark. Are you still on course to hit your total by 2020? What are you looking for when you approach organisations whose samples you want to sequence and analyse?
JCV: The way it’s starting to look, we may greatly exceed the 1 million! The technology is still changing – we’re exceeding Moore’s Law still with technology change. We have more transistors per unit that change the compute capacity; we’re getting higher and higher throughput per machine; there’s new technologies coming – I’ve never had sequencing machines last more than 3 years in the last 20 years of my career, before they were replaced by a new, faster and better, technology. 5 years from now, this will still look like the end of the dark era.
FLG:From a technology standpoint, what are you hoping to see in the next 5 years that can help you better reach your goal?
JCV: We need a combination of the cost and the throughput of the Illumina sequencers, with the quality and long sequence reads on single molecules that we get with PacBio. Future technologies can still improve substantially on the quality of the data, the percentage of the genome that’s covered, and how well that’s done. In my talk at the Festival of Genomics, I talked about haplotype phasing, where on sequencing your genome we can separate your chromosomes into the parts you got from your mother and the parts you got from your father. We need much better technology to do that routinely, rapidly, and cheaply.
FLG:At a personal level, the idea of staying healthy for longer is very appealing. However, we already have some major social and economic factors to deal with as a result of longer life expectancies. Here in the UK, we just had our general election. One of the topics for discussion was how the government was going to address some of the challenges being faced by my generation of 20-30 something year olds. People are living longer, so the government has to pay more for pensions, which in turn are funded by those working today on comparatively lower salaries. People are working further into their life times, so some of those big opportunities for vertical movement can be harder to come by. And then you have the general problems associated with an every increasing population. So, if you’re successful in increasing healthy lifespan for people, what kind of knock-on effect do you think it will have at the population level?
JCV: I’m glad you picked up on our emphasis on healthy lifespan versus just increasing human longevity. Even though that’s our name, our goal is totally focused on the healthy lifespan.
Healthcare is the biggest rising cost, certainly in the US, and in the UK I think as well. So we don’t bankrupt our entire economies, we need to switch to preventative medicine. One of the challenges with a government health system, like in the UK, with all of this data, is that you have a government making decisions on which treatments they’ll pay for and which ones they won’t. That’s a dangerous, dangerous, place to get into society. The UK health system is already there, insurance companies are already there – but countries where that isn’t an issue right now, are where there is good competition and different paying systems. So there’s a lot of reform that’s going to be needed across the board, there. But if we can prevent disease – it solves a lot of the social dilemmas about the government deciding you’re not worthy of getting a new kidney or getting a new treatment.
On the other hand, if we live longer healthier lives – in a few months I turn 69, I have relatives who are younger than me, who have retired already – it would be an incredible thought to me, to even consider stopping what I’m doing. I have a very exciting job and career. But we could solve a lot of these economic problems as well (the US has a bigger problem with this than the UK I think) if we just changed the retirement age to 75. With this notion that you work 20 years and then retire, it’s pretty stunning. My science career has already been close to 40 odd years, and I’m hoping for at least another 20. We need to have opportunities, not just for labourers to labour another decade, but having an education system that helps people move up the economic ladder. Knowing you’re going to be working a much longer period of time, you get incentivised to get retraining and take on something new, rather than assuming the government is going to take care of you at age 65.
FLG:One of the first things that brought your name to public attention was the congressional briefing back in 1991 where you mentioned that the NIH were planning on filing patent applications on thousands of genes based on expressed sequence tags. Amongst the numerous arguments against this plan, was the notion that this would impede the open exchange of information and increase the price of obtaining the sequence of the human genome. Ultimately, the NIH didn’t go ahead with the plan, and you’ve been carrying the ‘egomaniac’ tag ever since. By having that patent and license protection in place so early on, what do you think would be different today if the plan had gone ahead?
JCV: Well, even though the US government abandoned their patents, I think it put the taxpayer at an economic disadvantage. It’s well documented history that as the UK and US public genome labs, with the $5 billion funding, dumped their data nightly – every single pharmaceutical company downloaded that data nightly, and patented it. So it just shifted it from US taxpayers owning it directly, to the worldwide pharmaceutical companies owning that data directly. It’s led to the development of a lot of drugs and tests that are currently available in the market. I’ve said so publically, and am delighted by the recent Supreme Court ruling saying that these naturally occurring DNA sequences are not patentable – like Myriad have done with their breast cancer test.
What we’re doing with whole genome sequencing was going to make them obsolete anyway, because they’re multi thousand dollar tests, while we get the entire genome for a little over a thousand dollars. The patents wouldn’t have allowed them to block us looking at that data. So one way or another, they were going to become obsolete. I think it quite interesting now – some of the biggest critics from 20 years ago, are using the economic models that they criticised me for. In fact the Wellcome Trust, is now charging subscriptions to get access to data. So the world has come around. All this stuff was in the heat of a competition that most academic scientists never expected – that somebody would just come along and take their 15-year project away from them and just do it!
That created a lot more heat than light at the time. Some of the arguments that came out then were the weapons of the rhetoric of the time that had nothing to do with reality. Point to drug after drug, and test after test – even Myriad’s test with breast cancer – that have helped hundreds of thousands of people understand their risk for cancer and have new drugs to treat them. So if it was such an oppressive system, it would have disappeared a long time ago. Academic scientists have never been limited in their access to any of this data, so all of these were political arguments for rhetoric.
One of the things I’ve said several times recently, with these anniversaries of our first genome announcement, is that if you look at all of the rhetoric of the time – Francis Collins calling what we were doing, generating the “Mad Magazine” version and that whole genome shotgunning wasn’t going to work. All you have to do is take a look around the world, and every genome that’s been sequenced by us and what every other group has done with the methods that we published 20 years ago. That’s the nice thing about the field of science – the test of time sorts out the truth. Sometimes it takes the test of time to get away from the emotion and the rhetoric, but the fact that we’re now sequencing 3,000 genomes a month with this technique, and globally millions of genomes of countless different species… Every one of them has been sequenced with the technology we first described with the first genome in 1995.
FLG:There’s a worry out there that today’s political and commercial interest in genomics is not always in the best interest of scientific pursuit?
JCV: You’re probably hearing that because you’re in the UK! We don’t hear that so much in the US. But there’s this constant left wing thinking that comes out of academia in the UK, that companies are inherently evil. It’s just bull****. The leading edge of the best science in the world is being driven by private money, and investment money because of the scarcity of government money to do this. It’s not only by far the best and most advanced science, we’re driving the equation at Human Longevity that everyone else is beginning to follow as well. I think those are old world thinkings of academia versus industry versus government, and just has nothing whatsoever to do with reality outside of perhaps a totalitarian communist regime!
FLG:We touched on it before, but, for better or for worse, you do seem to be seen as one of modern science’s greatest egomaniacs. Is there any factual basis to that allegation, or is it just part of being at the top?
JCV: Show me a highly successful person in any field that has gotten there having a weak ego. You have to believe in yourself, and you have to believe in what you’re doing. I think because of all that early rhetoric, and because my teams have been continuously successful at the very leading edge of this field for that last 20 years, it’s easy to label anyone at the front of things. I do have a healthy belief in my teams and the science that we’re doing, and that it’s going to change what’s going on. If I had a weak ego, and doubts about this, the first genome would not yet have been completed with US and UK government funding.
FLG:You’ve already had a pretty storied career in genomics, and it certainly seems far from over. When it does come to an end, is there any one thing in particular you hope people will remember you for? What is it, ultimately, that you’ve been trying to achieve?
JCV: I think you should ask me that in another 20 years! I think I’ve achieved some good things; doing the first genome in history – my team on that was phenomenal and all the things they pulled together; writing the first genome with a synthetic cell; my teams at the Venter Institute, Human Longevity, and before that Celera. These are all team sports. I’m the captain of the team, or the orchestra conductor, but the only reason I’ve been successful is because of having the most extraordinary scientists, mathematicians and engineers excited about working on some of the ideas I put forward. I’m hoping that these next 20 years will show what we did 20 years ago in sequencing the first human genome, was the beginning of the health revolution that will have more positive impact in people’s lives than any other health event in history.
FLG:In the build up to The Festival of Genomics, we asked people who they were most looking forward to seeing present. Perhaps a little unsurprisingly, your name was almost always mentioned. So we thought it would be a nice idea to have some of our previous interviewees and contributors to the magazine put some questions to you:
Richard Lumb, CEO, Front Line Genomics: One of your partners, Peter Diamandis, talks about the need of businesses to regularly “disrupt their own business model”. The stated purpose of Human Longevity is already differentiating and your approach already appears disruptive [an impressive combination of stem cell technology and genomics in a commercial enterprise]. Is this concept of ‘self-disruption’ something that you recognize in your past work, and how would you anticipate Human Longevity disrupting your own business model over the next few years?
JCV: That’s an excellent, thoughtful, question from somebody who’s obviously put some unique things together. If you ask anybody that works at Human Longevity, and on my other projects, I disrupt things daily. There’s no complacency. We modified our business model, relatively substantially, from 18 months ago when that was first put together. We’re adapting to the data in real-time, and that’s what happens in the best of science. All the things I’ve done are because the data we got has told us what the next direction was going to be, and what was possible, and the kinds of questions to ask. We have new data here on tens of thousands of human genomes. The machine learning team here, headed up by Franz Och whom I hired out of Google (you’re aware of his work if you use Google Translate), have already come up with some amazing associations out of the data. Now we’re trying to predict somebody’s voice from their genetic code, pictures of them, and their precise biological age. If you’d asked me about a year ago if these would be highly probably in the next year or two, I would have said “I’m doubtful!” We have great scientists making real nice breakthroughs, modifying how we think about the data going forward.
Some of the biggest companies of the past have disappeared because they stuck with their technology and have refused to evolve. Our genomes are evolving and changing every single day. I think that is somewhat of a surprise for me. I thought we’d just sequence the genome once and that would be sufficient for most things in people’s lifetimes. Now we’re seeing how changeable and adaptable it is, which is why we’re surviving and evolving as a species. If we don’t adapt and change constantly, then we will become one of the relics of evolution. So it’s not just a nice thing to do for survival, it’s essential in building for the next stage of success.
Jean-Claude Marshall, Director Clinical Pharmacology Laboratory, Pfizer: What are your thoughts around how the FDA could regulate both LDTs (laboratory developed tests) and NGS (next generation sequencing)? Additionally, what do you foresee as the next set of challenges in the field of both companion diagnostics based on genomic analysis of patients, and the challenge of direct to consumer genetic offerings?
JCV: That’s a sophisticated question, and an important one. We have a staff of several people who’s job it is to help work out a good regular trade path. I’ve met personally with the FDA commissioner. This is an area that’s very key to us. We want to help educate the FDA on these changes. We’re working with companies, and Pfizer is one of the ones we’re in discussions with, to use our technology to change how they do clinical trials. We’re working with several pharmaceutical companies on sequencing the genomes of patients from failed phase III clinical trials, to rescue them. In fact, Pfizer is probably more familiar with this than any other company. They did a large clinical trial for one of their drugs to treat lung cancer. The trial failed pretty badly. But then they did retrospective analysis of lung cancer patients with a translocation in the ALK gene. It turns out it’s in around 4-6% of lung cancer patients. Over 60% of those individuals, respond extremely well to the Pfizer drug. And now Pfizer have a blockbuster drug, totally because of that genetic segregation to rescue that failed trial.
As to the question on companion diagnostics; if you measure whether people have the ALK translocation, that’s a companion diagnostic for prescribing the Pfizer ALK targeted drug. To me, it will become the standard of care. Not an unusual abnormality. Pfizer’s path to this helped pave the way for others to see it.
Brian Dougherty, Translational Genomics Lead – Oncology, AstraZeneca: What’s different for you this time around? Sequencing and analysis is more sophisticated. The first human genome is done. Will similar business models work a decade later?
JCV: Well Brian was one of the key contributors back in the early days at TIGR and he participated in the very first human genome. He came in from Ham Smith’s lab, and saw first hand and contributed to the very first stages.
So what’s different today? Well the world has had my genome for 15 years, done with Sanger sequencing. Others have been added to it, Jim Watson was the second one done with the 454 technology. One, or two, or even a few dozen genomes, have proven to give great targets for pharmaceutical analysis. But they don’t give you enough to answer fundamental questions about what’s unique to you, what’s unique to me, and how do we interpret that data? So we concluded that the only route to get to that data, was rather than wait for the academic community to do one genetic study at a time, was to build a very large database so we can comprehensively and globally understand the 3% differences amongst all of us. It’s already starting to pay out. Doing more of the same in a highly homogenous species doesn’t really make sense. When you sequence sperm cells, no two sperm are alike. No two eggs are a like. No two people’s genomes are alike. Even Identical twins have some spontaneous mutations that make their genomes different. So we’re now able to get down to the resolution to start seeing those differences. I’d say that this is actually the most exciting era of genomics!
Anna Middleton, Principal Staff Scientists, Genetic Counsellor, Wellcome Trust Sanger Institute:What hooks do you use to start a conversation about genomics with people who know nothing about genomics, i.e. what, in a nutshell, do you think people want to connect to?
JCV: That’s a very interesting question. What we’re trying to design, with helping to introduce the data to people, is that we’re ultimately trying to describe them at the most comprehensive level. The interpretation of medicine today is ‘do your clinical values fall within a normal range?’ Everything in the globe right now is in the law of averages, which mean absolutely nothing to individuals.
Larry Page told me that even if we cured all cancer, it would only change average human lifespan by a few years. But you can see what a meaningless statistic that is if you’re a 9-month-old child and you die from a neuroblastoma tumour. That doesn’t shift the averages, but it’s a huge individual effect. Genomics are about individuals. It’s about what’s specific to you, not your siblings, not your parents – each of us is totally unique. We will only see that uniqueness by drilling down to the genetic code. Like I said in my talk, we’re a genetically, DNA driven software species. Every parent knows that when they see their children on day zero. We all come out totally unique, and everyone comes out differently. We understand it at an intuitive level, we are now developing the scientific data to help all of us understand what’s unique and different about us, and how we can use that information to have better, healthier lives.
Alka Chaubey, Director Cytogenetics Laboratory, Greenwood Genetic Center: You played the most important role in not only the Human Genome but also getting your diploid genome sequenced and available to the public. With all the human genome information available and the ability to identify rare genetic (constitutional) disorders, what are your thoughts on approaches to reducing the burden and improving the quality of life of individuals with disorders persisting as lifelong disabilities (e.g. Autism, Intellectual disability, etc.)?
JCV: That’s a nice compliment and another important question. It’s going to be the challenges of medicine, and of this technology. Not every disease or disorder is going to be amenable to cure and treatment. Particularly for diseases that result in a dramatic reordering of brain structures and functions. For autism in particular, we’re doing a large cohort where we’re sequencing the entire genome of autistic individuals. It appears that no two are alike. But we classify it as one disease under that name. It doesn’t have a single cause. So if you call any disease the ‘unlucky disease’, you might call that one the unlucky one. It seems to be primarily driven by spontaneous mutations in that individual’s genome. The rate of those spontaneous mutations is accelerated by having older parents. Perhaps that’s why we’re seeing more of it?
Sequencing the genomes of individuals with autism, and trying to find which genes are affected – in some cases will lead to some pharmaceutical therapies that might help them. But it won’t be across the board. So, I am not one to promise that genomics is the savour for all of medicine and all of humanity. That’s why prevention becomes more important than treatment. If we can prevent the miswiring of the brain, either by early screening, or selecting embryonic cells that don’t have mutations, we increase the chance of healthier outcomes for everybody. But there won’t be magic drug treatments for every disorder. But Alzheimer’s disease might be an important exception if it’s treated early enough. We can detect changes through a combination of the RNAi imaging we’re doing here of the brain, and the genome, that indicate a high risk of Alzheimer’s disease 20 years before somebody would experience their first symptoms. If that’s what we target for preventing the development of the disease, it might yield, as some recent trials are beginning to show, a very different outcome compared to trying to treat late stage Alzheimer’s disease where a third of the functioning neurons have already been lost in the brain, and pathways are gone – you can’t just instantly restore those with a magic pill. So prevention is probably the single most important word to come out of the genome era.
Keith Bradnam, Associate Project Scientists, UC Davis: What do you see as the limits of synthetic biology? Could we assemble a functional eukaryotic genome, and what are the practical applications of such technology?
JCV: That’s a great question! The limitations will ultimately be more society limitations, ethical limitations, and standards rather than technology. I think a synthetic single eukaryotic cell would be very straightforward to do today. Various groups of scientists have been trying to build the yeast genome. It’s kind of like rebuilding a house one brick at a time, but they’re making a synthetic version of yeast. That’s not quite the same as writing the genetic code and then booting it up as we did, but that’s just because of the limitations on writing the genetic code now.
I think understanding what makes a multicellular organism, and all the regulation associated with that, are so far away from design that we’re going to have to learn a whole lot more biology before we get to that stage of deliberate design. I think about 10% of the genes in our designed synthetic bacterial cell, are of unknown function. All we know is that you can’t get life without them. That problem expands tremendously with eukaryotic cells. If you extrapolate to the challenge of interpreting the human genome, we only understand a tiny fraction of the human genome today.
Nick McCooke, former CEO of Solexa also asked to remind you that you still owe him for tea at Claridges, in London, back in 2003.
JCV: Ha ha ha! Well it’s interesting…My cofounder at HLI is Peter Diamandis, who is also the CEO of the XPRIZE organization. I started a prize out of the Venter Institute early on, which was a half million dollars to spur on technology development. Today, Solexa would clearly be the winner of that. But things progressed so fast. The economics changed so dramatically, that nobody cared about a half million dollar prize anymore. XPRIZE made it a $10 million prize, but that wasn’t big enough to influence anything that Illumina or Life Technologies was doing. So the economic scale of the field has changed in part due to the tremendous success of Solexa.
FLG: That’s it for the questions, so thank you very much for your time! Is there anything else you’d like to mention?
JCV: No, I think you’ve covered the waterfront pretty nicely! It was fun talking to you and an enjoyable conversation. I was impressed by the quality of questions you guys put together!
New CRISPR-non Cas9 proteins, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 2: CRISPR for Gene Editing and DNA Repair
New CRISPR-non Cas9 proteins
Larry H. Bernstein, MD, FCAP, Curator
LPBI
More CRISPR Proteins Discovered
Researchers identify three new proteins that may serve as alternatives to Cas9.
Crystal structure of a Cas9 in complex with an RNA guide and a stretch of target DNAWIKIMEDIA, H. NISHIMASU ET AL.
Scouring genomic databases for sequences with similarity to the components of the CRISPR/Cas9 system and the recently identified CRISPR/Cpf1 system, researchers from the National Center for Biotechnology Information (NCBI), MIT, and Rutgers University have discovered three novel CRISPR systems that could one day provide new gene-editing tools to supplement the currently used CRISPR/Cas9 system. The newly discovered CRISPR systems contain three new proteins, C2c1, C2c2, and C2c3 (named for “Class 2 candidate x”), one of which may cleave RNA.
“This work shows a path to discovery of novel CRISPR/Cas systems with diverse properties, which are demonstrated here in direct experiments,” coauthor Eugene Koonin of NCBI told GenomeWeb. “The most remarkable aspect of the story is how evolution has achieved a broad repertoire of biological activities, a feat we can take advantage of for new genome manipulation tools.” The group published its results yesterday (October 22) in Molecular Cell.
Using such sequence-based techniques, the researchers predict that there are even more CRISPR systems to be discovered, added study coauthor Konstantin Severinov of Rutgers. “There are multiple ways to modify the search algorithm. So more exciting and distinct CRISPR/Cas mechanisms should be expected soon.”
Using CRISPR as a High-Throughput Cancer Screening and Modeling Tool
Using CRISPR/Cas9, scientists created a new high-throughput screening tool for studying the development and progression of liver cancer in mice. [Ernesto del Aguila III, NHGRI]
A contingent of researchers from the UK, Germany, and Spain have recently developed a novel CRISPR/Cas9 system that they believe can be utilized as a multiplexed screening approach to study and model cancer development in mice. In the current study, the investigators directly mutated genes within adult mouse livers to elucidate their role in cancer development and progression—simultaneously uncovering the gene combinations that coordinate to cause liver cancer.
“We reasoned that, by targeting mutations directly to adult liver cells using CRISPR/Cas9, we could better study and understand the biology of this important cancer,” explained co-author Mathias Friedrich, Ph.D., research scientist at the Wellcome Trust Sanger Institute. “Other approaches to engineer mutations in mice, such as stem cell manipulation, are limited by the laborious process, the long time frames and large numbers of animals needed. And, our method better mimics important aspects of human cancer biology than many “classic” mouse models: as in most human cancers, the mutations occur in the adult and only affect a few cells”.
The findings from this study were published online recently in PNAS through an article entitled “CRISPR/Cas9 somatic multiplex-mutagenesis for high-throughput functional cancer genomics in mice.”
This new approach is rapid, scalable, and extremely efficient, allowing the researchers to examine an array of genes or large regions of the genome concurrently. Moreover, this methodology affords scientists the ability to distinguish between cancer driver mutations and passenger mutations—those that occur as side-effects of cancer development.
The research team developed a list of up to eighteen genes with known or unknown evidence for their importance in two forms of liver cancer. They then introduced the CRISPR/Cas9 molecules, targeting various combinations of these genes into mice, which subsequently developed liver or bile duct cancer within a few months.
“Our approach enables us to simultaneously target multiple putative genes in individual cells,” noted co-author Roland Rad, Ph.D., project leader at the Technical University of Munich and the German Cancer Research Center Heidelberg. “We can now rapidly and efficiently screen which genes are cancer-causing and which ones are not. And, we can study how genes work together to cause cancers—a crucial piece of the puzzle we must solve to understand and tackle the disease.”
The investigators were able to confirm that a set of DNA-binding proteins called ARID (AT-rich interactive domain), influence the organization of chromosomes and are important for liver cancer development. Furthermore, mutations in a second protein, TET2, were found to be causative in bile duct cancer: although TET2 has not been found to be mutated in human biliary cancers, the proteins that it interacts with have been, showing that the CRISPR/Cas9 method can identify human cancer genes that are not mutated, but whose function is disturbed by other events.
“The new tools of targeting genes in combination and inducing insertions or deletions in chromosomes change our ability to identify new cancer-causing genes and to understand their role in cancer,” stated senior group leader and co-author Allan Bradley, Ph.D., director emeritus from the Sanger Institute. “Our results show that this approach is feasible and productive in liver cancer; we will now continue to study our new findings and try to extend the approach to other cancer types.”
This CRISPR/Cas9 approach may also be favorable for an in-depth examination of genomic deserts —regions within the human genome that appear to be devoid of genes. Yet, recent data from the ENCODE Project suggests that deserts can be populated, if not by genes, then by DNA regulatory regions that influence the activity of genes.
“Liver cancer has many DNA alterations in regions lacking genes: we don’t know which of these might be important for the disease,” said Dr. Rad. “However, we could show that it is now possible to delete such regions to systematically determine their role in liver cancer development.”
Three newly discovered enzymes—provisionally named C2c1, C2c3, and C2c3—promise to expand the CRISPR genome-editing toolbox beyond the well-known Cas9. They had been hiding inside NIH genomic databases, but they were eventually found out, thanks to the application of computational approaches developed by two groups of researchers. These researchers also initiated experimental work to explore the function of the bioinformatically identified enzymes.
One of the groups was led by Eugene Koonin, Ph.D., senior investigator at the National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), part of the NIH. The other group was led by Feng Zhang, Ph.D., of the MIT-Harvard Broad Institute.
According to the researchers, the three newly discovered enzymes are all naturally occurring, and all share some features with Cas9. In addition, these three enzymes have unique properties that could be exploited for novel genome-editing applications.
“This work shows a path to discovery of novel CRISPR-Cas systems with diverse properties, which are demonstrated here in direct experiments,” said Dr. Koonin. “The most remarkable aspect of the story is how evolution has achieved a broad repertoire of biological activities, a feat we can take advantage of for new genome-manipulation tools.”
This comment highlights how the researchers’ work, which appeared October 22 in the journal Molecular Cell, included information about potential evolutionary pathways. The researchers also emphasized that their work might lead to additional enzyme discoveries.
“There are multiple ways to modify the search algorithm, so more exciting and distinct CRISPR-Cas mechanisms should be expected soon,” said Konstantin Severinov, Ph.D., one of the researchers. He is affiliated with Rutgers and the Skolkovo Institute of Science and Technology. “These new mechanisms will undoubtedly attract the attention of basic and applied scientists alike.”
The Koonin and Zhang groups also recently collaborated on a project that resulted in the characterization of Cpf1, a class II CRISPR endonuclease, like Cas9. This work was described last month in an article, published in Cell, suggesting that the newly found enzyme’s distinct features pointed to unique genome-editing possibilities.
In his comments about this earlier work, Dr. Zhang made a point that presaged the current work: “Our goal is to develop tools that can accelerate research and eventually lead to new therapeutic applications. We see much more to come, even beyond Cpf1 and Cas9, with other enzymes that may be repurposed for further genome-editing advances.”
Researchers have discovered hundreds of genes that harbor variations contributing to human illness, identified genetic variability in patients’ responses to dozens of treatments, and begun to target the molecular causes of some diseases. In addition, scientists are developing and using diagnostic tests based on genetics or other molecular mechanisms to better predict patients’ responses to targeted therapy.
The challenge is to deliver the benefits of this work to patients. As the leaders of the National Institutes of Health (NIH) and the Food and Drug Administration (FDA), we have a shared vision of personalized medicine and the scientific and regulatory structure needed to support its growth. Together, we have been focusing on the best ways to develop new therapies and optimize prescribing by steering patients to the right drug at the right dose at the right time.
We recognize that myriad obstacles must be overcome to achieve these goals. These include scientific challenges, such as determining which genetic markers have the most clinical significance, limiting the off-target effects of gene-based therapies, and conducting clinical studies to identify genetic variants that are correlated with a drug response. There are also policy challenges, such as finding a level of regulation for genetic tests that both protects patients and encourages innovation. To make progress, the NIH and the FDA will invest in advancing translational and regulatory science, better define regulatory pathways for coordinated approval of codeveloped diagnostics and therapeutics, develop risk-based approaches for appropriate review of diagnostics to more accurately assess their validity and clinical utility, and make information about tests readily available.
Moving from concept to clinical use requires basic, translational, and regulatory science. On the basic-science front, studies are identifying many genetic variations underlying the risks of both rare and common diseases. These newly discovered genes, proteins, and pathways can represent powerful new drug targets, but currently there is insufficient evidence of a downstream market to entice the private sector to explore most of them. To fill that void, the NIH and the FDA will develop a more integrated pathway that connects all the steps between the identification of a potential therapeutic target by academic researchers and the approval of a therapy for clinical use. This pathway will include NIH-supported centers where researchers can screen thousands of chemicals to find potential drug candidates, as well as public– private partnerships to help move candidate compounds into commercial development.
The NIH will implement this strategy through such efforts as the Therapeutics for Rare and Neglected Diseases (TRND) program. With an open environment, permitting the involvement of all the world’s top experts on a given disease, the TRND program will enable certain promising compounds to be taken through the preclinical development phase — a time-consuming, high-risk phase that pharmaceutical firms call “the valley of death.” Besides accelerating the development of drugs to treat rare and neglected diseases, the TRND program may also help to identify molecularly distinct subtypes of some common diseases, which may lead to new therapeutic possibilities, either through the development of targeted drugs or the salvaging of abandoned or failed drugs by identifying subgroups of patients likely to benefit from them.
Another important step will be expanding efforts to develop tissue banks containing specimens along with information linking them to clinical outcomes. Such a resource will allow for a much broader assessment of the clinical importance of genetic variation across a range of conditions. For example, the NIH is now supporting genome analysis in participants in the Framingham Heart Study, obtaining biologic specimens from babies enrolled in the National Children’s Study, and performing detailed genetic analysis of 20 types of tumors to improve our understanding of their molecular basis.
As for translational science, the NIH is harnessing the talents and strengths of its Clinical and Translational Sciences Award program, which currently funds 46 centers and has awardees in 26 states, and its Mark O. Hatfield Clinical Research Center (the country’s largest research hospital, in Bethesda, MD) to translate basic research findings into clinical applications. Just as the NIH served as an initial home for human gene therapy, the Hatfield Center can provide specialized diagnostic services for rare and neglected diseases, offer a state-of-the-art manufacturing facility for novel therapies, and pioneer clinical trials of other innovative biologic therapies, such as those using human embryonic stem cells or induced pluripotent stem cells.
Today, about 10% of labels for FDA-approved drugs contain pharmacogenomic information — a substantial increase since the 1990s but hardly the limit of the possibilities for this aspect of personalized medicine.1 There has been an explosion in the number of validated markers but relatively little independent analysis of the validity of the tests used to identify them in biologic specimens.
The success of personalized medicine depends on having accurate diagnostic tests that identify patients who can benefit from targeted therapies. For example, clinicians now commonly use diagnostics to determine which breast tumors overexpress the human epidermal growth factor receptor type 2 (HER2), which is associated with a worse prognosis but also predicts a better response to the medication trastuzumab. A test for HER2 was approved along with the drug (as a “companion diagnostic”) so that clinicians can better target patients’ treatment (see table).
Increasingly, however, the use of therapeutic innovations for a specific patient is contingent on or guided by the results from a diagnostic test that has not been independently reviewed for accuracy and reliability by the FDA. For example, in 2006, the FDA granted approval to rituximab (Rituxan) for use as part of firstline treatment in patients with certain cancers. Since then, a laboratory has marketed a test with the claim that it can identify the approximately 20% of patients who are more likely to have a response to the drug. The FDA has not reviewed the scientific justification for this claim, but health care providers may use the test results to guide therapy. This undermines the approval process that has been established to protect patients, fails to ensure that physicians have accurate information on which to make treatment decisions, and decreases the chances that physicians will adopt a new therapeutic–diagnostic approach. The FDA is coordinating and clarifying the process that manufacturers must follow regarding their claims, including defining the times when a companion diagnostic must be approved or cleared before or concurrently with approval of the therapy. The agency will ensure that claims that a test will improve the care of patients are based on solid evidence, and developers will get straightforward, consistent advice about the standards for review and the best way to demonstrate that the combination works as intended.
In February, the NIH and the FDA announced a new collaboration on regulatory and translational science to accelerate the translation of research into medical products and therapies; this effort includes a joint funding opportunity for regulatory science. Working with academic experts, companies, doctors, patients, and the public, we intend to help make personalized medicine a reality. A recent example of this collaboration is an effort to identify new investigational agents to which certain tumors, identified by their genetic signatures, are responsive. Real progress will come when clinically beneficial new products and approaches are incorporated into clinical practice. As the field advances, we expect to see more efficient clinical trials based on a more thorough understanding of the genetic basis of disease. We also anticipate that some previously failed medications will be recognized as safe and effective and will be approved for subgroups of patients with specific genetic markers.
Insights in Biological and Synthetic Medicinal Chemistry
Larry H. Bernstein, M.D., FCAP, Curator
Leaders in Pharmaceutical Intelligence
Series E. 2; 10
Selected Articles Linking the Biological and Synthetic Worlds
The worlds of biological and synthetic chemistry both offer incredible diversity. Biology provides complex architectures including proteins, nucleic acids, and polysaccharides. Synthetic chemistry, on the other hand, provides a tool for atom-by-atom control over molecular structure that can be used to obtain molecules and materials inaccessible through biology.
In this ACS Select Virtual Issue, we highlight some of the recent advances in bioconjugation chemistry. These publications describe new strategies for functionalization of biomacromolecules, as well as the use of synthetic molecules as building blocks for assembly using biological machinery. The resultant conjugate systems have new and exciting properties, as demonstrated in new therapeutic and imaging applications.
– Vincent Rotello, Editor-in-Chief, Bioconjugate Chemistry
– C. Dale Poulter, Editor-in-Chief, The Journal of Organic Chemistry
– Amos Smith, III, Editor-in-Chief, Organic Letters
10.1.8 Surface Functionalization of Exosomes Using Click Chemistry
Smyth, T.; Petrova, K.; Payton, N. M.; Persaud, I.; Redzic, J. S.; Gruner, M. W.; Smith-Jones, P.; Anchordoquy, T. J. Bioconjugate Chem., 2014, 25 (10), pp 1777-1784 DOI: 10.1021/bc500291r
10.3.3 A Photoinduced, Benzyne Click Reaction
Gann, A. W.; Amoroso, J. W.; Einck, V. J.; Rice, W. P.; Chambers, J. J.; Schnarr, N. A. Org. Lett., 2014, 16 (7), pp 2003-2005 DOI: 10.1021/ol500389t
This Special Issue on “Synthesis, Design and Molecular Function”, guest-edited by Paul Wender, is intended to explore the many exciting new advances and challenges associated with designing and making molecules in the 21st century. It features contributions from thought leaders in the field directed at new reactions, reagents and catalysts, process technologies and screening strategies.
Mass spectrometry has undoubtedly boomed over the last two decades and has become a major analytical tool in many disciplines. The technique relies on the separation of ions of different m/z, and its success hinges on efficient ionization methods that furthermore should be tailored to the task at hand. Depending on the application, ionization should be soft, hard, selective, as efficient as possible, etc. This virtual issue pulls together publications from Analytical Chemistry that showcase the exemplary developments in ionization techniques.