Feeds:
Posts
Comments

Archive for the ‘Gene Regulation and Evolution’ Category

Cancer Companion Diagnostics

Curator: Larry H. Bernstein, MD, FCAP

 

Companion Diagnostics for Cancer: Will NGS Play a Role?

Patricia Fitzpatrick Dimond, Ph.D.

http://www.genengnews.com/insight-and-intelligence/companion-diagnostics-for-cancer/77900554/

Companion diagnostics (CDx), in vitro diagnostic devices or imaging tools that provide information essential to the safe and effective use of a corresponding therapeutic product, have become indispensable tools for oncologists.  As a result, analysts expect the global CDx market to reach $8.73 billion by 2019, up from from $3.14 billion in 2014.

Use of CDx during a clinical trial to guide therapy can improve treatment responses and patient outcomes by identifying and predicting patient subpopulations most likely to respond to a given treatment.

These tests not only indicate the presence of a molecular target, but can also reveal the off-target effects of a therapeutic, predicting toxicities and adverse effects associated with a drug.

For pharma manufacturers, using CDx during drug development improves the success rate of drugs being tested in clinical trials. In a study estimating the risk of clinical trial failure during non-small cell lung cancer drug development in the period between 1998 and 2012 investigators analyzed trial data from 676 clinical trials with 199 unique drug compounds.

The data showed that Phase III trial failure proved the biggest obstacle to drug approval, with an overall success rate of only 28%. But in biomarker-guided trials, the success rate reached 62%. The investigators concluded from their data analysis that the use of a CDx assay during Phase III drug development substantially improves a drug’s chances of clinical success.

The Regulatory Perspective

According to Patricia Keegen, M.D., supervisory medical officer in the FDA’s Division of Oncology Products II, the agency requires a companion diagnostic test if a new drug works on a specific genetic or biological target that is present in some, but not all, patients with a certain cancer or disease. The test identifies individuals who would benefit from the treatment, and may identify patients who would not benefit but could also be harmed by use of a certain drug for treatment of their disease. The agency classifies companion diagnosis as Class III devices, a class of devices requiring the most stringent approval for medical devices by the FDA, a Premarket Approval Application (PMA).

On August 6, 2014, the FDA finalized its long-awaited “Guidance for Industry and FDA Staff: In Vitro Companion Diagnostic Devices,” originally issued in July 2011. The final guidance stipulates that FDA generally will not approve any therapeutic product that requires an IVD companion diagnostic device for its safe and effective use before the IVD companion diagnostic device is approved or cleared for that indication.

Close collaboration between drug developers and diagnostics companies has been a key driver in recent simultaneous pharmaceutical-CDx FDA approvals, and partnerships between in vitro diagnostics (IVD) companies have proliferated as a result.  Major test developers include Roche Diagnostics, Abbott Laboratories, Agilent Technologies, QIAGEN), Thermo Fisher Scientific, and Myriad Genetics.

But an NGS-based test has yet to make it to market as a CDx for cancer.  All approved tests include PCR–based tests, immunohistochemistry, and in situ hybridization technology.  And despite the very recent decision by the FDA to grant marketing authorization for Illumina’s MiSeqDx instrument platform for screening and diagnosis of cystic fibrosis, “There still seems to be a number of challenges that must be overcome before we see NGS for targeted cancer drugs,” commented Jan Trøst Jørgensen, a consultant to DAKO, commenting on presentations at the European Symposium of Biopathology in June 2013.

Illumina received premarket clearance from the FDA for its MiSeqDx system, two cystic fibrosis assays, and a library prep kit that enables laboratories to develop their own diagnostic test. The designation marked the first time a next-generation sequencing system received FDA premarket clearance. The FDA reviewed the Illumina MiSeqDx instrument platform through its de novo classification process, a regulatory pathway for some novel low-to-moderate risk medical devices that are not substantially equivalent to an already legally marketed device.

Dr. Jørgensen further noted that “We are slowly moving away from the ‘one biomarker: one drug’ scenario, which has characterized the first decades of targeted cancer drug development, toward a more integrated approach with multiple biomarkers and drugs. This ‘new paradigm’ will likely pave the way for the introduction of multiplexing strategies in the clinic using gene expression arrays and next-generation sequencing.”

The future of CDxs therefore may be heading in the same direction as cancer therapy, aimed at staying ahead of the tumor drug resistance curve, and acknowledging the reality of the shifting genomic landscape of individual tumors. In some cases, NGS will be applied to diseases for which a non-sequencing CDx has already been approved.

Illumina believes that NGS presents an ideal solution to transforming the tumor profiling paradigm from a series of single gene tests to a multi-analyte approach to delivering precision oncology. Mya Thomae, Illumina’s vice president, regulatory affairs, said in a statement that Illumina has formed partnerships with several drug companies to develop a universal next-generation sequencing-based oncology test system. The collaborations with AstraZeneca, Janssen, Sanofi, and Merck-Serono, announced in 2014 and 2015 respectively, seek to  “redefine companion diagnostics for oncology  focused on developing a system for use in targeted therapy clinical trials with a goal of developing and commercializing a multigene panel for therapeutic selection.”

On January 16, 2014 Illumina and Amgen announced that they would collaborate on the development of a next-generation sequencing-based companion diagnostic for colorectal cancer antibody Vectibix (panitumumab). Illumina will develop the companion test on its MiSeqDx instrument.

In 2012, the agency approved Qiagen’s Therascreen KRAS RGQ PCR Kit to identify best responders to Erbitux (cetuximab), another antibody drug in the same class as Vectibix. The label for Vectibix, an EGFR-inhibiting monoclonal antibody, restricts the use of the drug for those metastatic colorectal cancer patients who harbor KRAS mutations or whose KRAS status is unknown.

The U.S. FDA, Illumina said, hasn’t yet approved a companion diagnostic that gauges KRAS mutation status specifically in those considering treatment with Vectibix.  Illumina plans to gain regulatory approval in the U.S. and in Europe for an NGS-based companion test that can identify patients’ RAS mutation status. Illumina and Amgen will validate the test platform and Illumina will commercialize the test.

Treatment Options

Foundation Medicine says its approach to cancer genomic characterization will help physicians reveal the alterations driving the growth of a patient’s cancer and identify targeted treatment options that may not have been otherwise considered.

FoundationOne, the first clinical product from Foundation Medicine, interrogates the entire coding sequence of 315 cancer-related genes plus select introns from 28 genes often rearranged or altered in solid tumor cancers.  Based on current scientific and clinical literature, these genes are known to be somatically altered in solid cancers.

These genes, the company says, are sequenced at great depth to identify the relevant, actionable somatic alterations, including single base pair change, insertions, deletions, copy number alterations, and selected fusions. The resultant fully informative genomic profile complements traditional cancer treatment decision tools and often expands treatment options by matching each patient with targeted therapies and clinical trials relevant to the molecular changes in their tumors.

As Foundation Medicine’ s NGS analyses are increasingly applied, recent clinical reports describe instances in which comprehensive genomic profiling with the FoundationOne NGS-based assay result in diagnostic reclassification that can lead to targeted drug therapy with a resulting dramatic clinical response. In several reported instances, NGS found, among the spectrum of aberrations that occur in tumors, changes unlikely to have been discovered by other means, and clearly outside the range of a conventional CDx that matches one drug to a specific genetic change.

TRK Fusion Cancer

In July 2015, the University of Colorado Cancer Center and Loxo Oncology published a research brief in the online edition of Cancer Discovery describing the first patient with a tropomyosin receptor kinase (TRK) fusion cancer enrolled in a LOXO-101 Phase I trial. LOXO-101 is an orally administered inhibitor of the TRK kinase and is highly selective only for the TRK family of receptors.

While the authors say TRK fusions occur rarely, they occur in a diverse spectrum of tumor histologies. The research brief described a patient with advanced soft tissue sarcoma widely metastatic to the lungs. The patient’s physician submitted a tumor specimen to Foundation Medicine for comprehensive genomic profiling with FoundationOne Heme, where her cancer was demonstrated to harbor a TRK gene fusion.

Following multiple unsuccessful courses of treatment, the patient was enrolled in the Phase I trial of LOXO-101 in March 2015. After four months of treatment, CT scans demonstrated almost complete tumor disappearance of the largest tumors.

The FDA’s Elizabeth Mansfield, Ph.D., director, personalized medicine staff, Office of In Vitro Diagnostics and Radiological Health, said in a recent article,  “FDA Perspective on Companion Diagnostics: An Evolving Paradigm” that “even as it seems that many questions about co-development have been resolved, the rapid accumulation of new knowledge about tumor biology and the rapid evolution of diagnostic technology are challenging FDA to continually redefine its thinking on companion diagnostics.” It seems almost inevitable that a consolidation of diagnostic testing should take place, to enable a single test or a few tests to garner all the necessary information for therapeutic decision making.”

Whether this means CDx testing will begin to incorporate NGS sequencing remains to be seen.

Read Full Post »

Variability of Gene Expression and Drug Resistance, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 1: Next Generation Sequencing (NGS)

Variability of Gene Expression and Drug Resistance

Larry H. Bernstein, MD, FCAP, Curator

LPBI

 

New Data Suggest Extreme Genetic Diversity of Tumors May Impart Drug Resistance

NEW YORK (GenomeWeb) – Researchers from the University of Chicago and the Beijing Institute of Genomics have undertaken one of the most extensive analyses of the genome of a single tumor and found far greater genetic diversity than anticipated. Such variation, they said, may enable even small tumors to resist treatment.

“With 100 million mutations, each capable of altering a protein in some way, there is a high probability that a significant minority of tumor cells will survive, even after aggressive treatment,” Chung-I Wu, a University of Chicago researcher and senior author of the study, said in a statement. “In a setting with so much diversity, those cells could multiply to form new tumors, which would be resistant to standard treatments.”

 

Extremely high genetic diversity in a single tumor points to prevalence of non-Darwinian cell evolution

Shaoping Linga,1Zheng Hua,1Zuyu Yanga,1Fang Yanga,1Yawei LiaPei LinbKe ChenaLili DongaLihua CaoaYong TaoaLingtong HaoaQingjian ChenbQiang Gonga, et al.

Shaoping Ling,  PNAS   http://dx.doi.org:/10.1073/pnas.1519556112      http://www.pnas.org/content/early/2015/11/11/1519556112

A tumor comprising many cells can be compared to a natural population with many individuals. The amount of genetic diversity reflects how it has evolved and can influence its future evolution. We evaluated a single tumor by sequencing or genotyping nearly 300 regions from the tumor. When the data were analyzed by modern population genetic theory, we estimated more than 100 million coding region mutations in this unexceptional tumor. The extreme genetic diversity implies evolution under the non-Darwinian mode. In contrast, under the prevailing view of Darwinian selection, the genetic diversity would be orders of magnitude lower. Because genetic diversity accrues rapidly, a high probability of drug resistance should be heeded, even in the treatment of microscopic tumors.

The prevailing view that the evolution of cells in a tumor is driven by Darwinian selection has never been rigorously tested. Because selection greatly affects the level of intratumor genetic diversity, it is important to assess whether intratumor evolution follows the Darwinian or the non-Darwinian mode of evolution. To provide the statistical power, many regions in a single tumor need to be sampled and analyzed much more extensively than has been attempted in previous intratumor studies. Here, from a hepatocellular carcinoma (HCC) tumor, we evaluated multiregional samples from the tumor, using either whole-exome sequencing (WES) (n = 23 samples) or genotyping (n = 286) under both the infinite-site and infinite-allele models of population genetics. In addition to the many single-nucleotide variations (SNVs) present in all samples, there were 35 “polymorphic” SNVs among samples. High genetic diversity was evident as the 23 WES samples defined 20 unique cell clones. With all 286 samples genotyped, clonal diversity agreed well with the non-Darwinian model with no evidence of positive Darwinian selection. Under the non-Darwinian model,MALL (the number of coding region mutations in the entire tumor) was estimated to be greater than 100 million in this tumor. DNA sequences reveal local diversities in small patches of cells and validate the estimation. In contrast, the genetic diversity under a Darwinian model would generally be orders of magnitude smaller. Because the level of genetic diversity will have implications on therapeutic resistance, non-Darwinian evolution should be heeded in cancer treatments even for microscopic tumors.

Semantically Related Articles

 

 

 

The findings, which appeared in the Proceedings of the National Academy of Sciences this week, also call into question the widely held view that evolution at the cellular level is driven by Darwinian selection, revealing a level of rapid and extensive genetic diversity beyond what would be expected under this model.

In the study, the researchers focused on a single hepatocellular carcinoma tumor, roughly the size of a ping pong ball. They sampled 286 regions from a single slice of the tumor, studying each one with either whole-exome sequencing or genotyping under both the infinite-site and infinite-allele models of population genetics.

Based on their analyses, the team estimated more than 100 million coding region mutations in what they called an “unexceptional” tumor — more mutations than would ordinarily be expected by orders of magnitude, according to Wu.

This extreme genetic diversity, the study’s authors wrote, implies evolution under the non-Darwinian mode, which is driven by random mutations largely unaffected by natural selection. It also raises the question of why there is so little apparent Darwinian selection in the tumor.

The scientists speculated that in solid tumors, cells remain together and do not migrate, “so that when an advantageous mutation indeed emerges, cells carrying it are competing mostly with themselves. These mutations may confer advantages in fighting for space or extracting nutrients, but they are stifled by their own advantages,” they wrote.

Beneficial mutations may emerge on occasion, but in solid tumors the cell populations are “so structured that selection may often be blunted,” they stated. “The physiological effect has to be very strong to overcome those constraints.” Cancer drugs could remove those constraints, loosening up a cell population and allowing competition to occur, the investigators added.

Wu and his colleagues see the presence of so many mutations in a tumor as creating problems when it comes to treatment. “It almost guarantees that some cells will be resistant,” study co-author and University of Chicago oncologist Daniel Catenacci said in the statement. “But it also suggests that aggressive treatment could push tumor cells into a more Darwinian mode.”

Overall, the findings highlight the need to consider non-Darwinian evolution and the vast genetic diversity it can confer as factors when developing treatment strategies, even for small tumors, the researchers concluded.

Read Full Post »

Irreconciliable Dissonance in Physical Space and Cellular Metabolic Conception

Irreconciliable Dissonance in Physical Space and Cellular Metabolic Conception

Curator: Larry H. Bernstein, MD, FCAP

Pasteur Effect – Warburg Effect – What its history can teach us today. 

José Eduardo de Salles Roselino

The Warburg effect, in reality the “Pasteur-effect” was the first example of metabolic regulation described. A decrease in the carbon flux originated at the sugar molecule towards the end of the catabolic pathway, with ethanol and carbon dioxide observed when yeast cells were transferred from an anaerobic environmental condition to an aerobic one. In Pasteur´s studies, sugar metabolism was measured mainly by the decrease of sugar concentration in the yeast growth media observed after a measured period of time. The decrease of the sugar concentration in the media occurs at great speed in yeast grown in anaerobiosis (oxygen deficient) and its speed was greatly reduced by the transfer of the yeast culture to an aerobic condition. This finding was very important for the wine industry of France in Pasteur’s time, since most of the undesirable outcomes in the industrial use of yeast were perceived when yeasts cells took a very long time to create, a rather selective anaerobic condition. This selective culture media was characterized by the higher carbon dioxide levels produced by fast growing yeast cells and by a higher alcohol content in the yeast culture media.

However, in biochemical terms, this finding was required to understand Lavoisier’s results indicating that chemical and biological oxidation of sugars produced the same calorimetric (heat generation) results. This observation requires a control mechanism (metabolic regulation) to avoid burning living cells by fast heat released by the sugar biological oxidative processes (metabolism). In addition, Lavoisier´s results were the first indications that both processes happened inside similar thermodynamics limits. In much resumed form, these observations indicate the major reasons that led Warburg to test failure in control mechanisms in cancer cells in comparison with the ones observed in normal cells.

[It might be added that the availability of O2 and CO2 and climatic conditions over 750 million years that included volcanic activity, tectonic movements of the earth crust, and glaciation, and more recently the use of carbon fuels and the extensive deforestation of our land masses have had a large role in determining the biological speciation over time, in sea and on land. O2 is generated by plants utilizing energy from the sun and conversion of CO2. Remove the plants and we tip the balance. A large source of CO2 is from beneath the earth’s surface.]

Biology inside classical thermodynamics places some challenges to scientists. For instance, all classical thermodynamics must be measured in reversible thermodynamic conditions. In an isolated system, increase in P (pressure) leads to increase in V (volume), all this occurring in a condition in which infinitesimal changes in one affects in the same way the other, a continuum response. Not even a quantic amount of energy will stand beyond those parameters.

In a reversible system, a decrease in V, under same condition, will led to an increase in P. In biochemistry, reversible usually indicates a reaction that easily goes either from A to B or B to A. For instance, when it was required to search for an anti-ischemic effect of Chlorpromazine in an extra hepatic obstructed liver, it was necessary to use an adequate system of increased biliary system pressure in a reversible manner to exclude a direct effect of this drug over the biological system pressure inducer (bile secretion) in Braz. J. Med. Biol. Res 1989; 22: 889-893. Frequently, these details are jumped over by those who read biology in ATGC letters.

Very important observations can be made in this regard, when neutral mutations are taken into consideration since, after several mutations (not affecting previous activity and function), a last mutant may provide a new transcript RNA for a protein and elicit a new function. For an example, consider a Prion C from lamb getting similar to bovine Prion C while preserving  its normal role in the lamb when its ability to change Human Prion C is considered (Stanley Prusiner).

This observation is good enough, to confirm one of the most important contributions of Erwin Schrodinger in his What is Life:

“This little book arose from a course of public lectures, delivered by a theoretical physicist to an audience of about four hundred which did not substantially dwindle, though warned at the outset that the subject matter was a difficult one and that the lectures could not be termed popular, even though the physicist’s most dreaded weapon, mathematical deduction, would hardly be utilized. The reason for this was not that the subject was simple enough to be explained without mathematics, but rather that it was much too involved to be fully accessible to mathematics.”

After Hans Krebs, description of the cyclic nature of the citrate metabolism and after its followers described its requirement for aerobic catabolism two major lines of research started the search for the understanding of the mechanism of energy transfer that explains how ADP is converted into ATP. One followed the organic chemistry line of reasoning and therefore, searched for a mechanism that could explain how the breakdown of carbon-carbon link could have its energy transferred to ATP synthesis. One of the major leaders of this research line was Britton Chance. He took into account that relatively earlier in the series of Krebs cycle reactions, two carbon atoms of acetyl were released as carbon dioxide ( In fact, not the real acetyl carbons but those on the opposite side of citrate molecule). In stoichiometric terms, it was not important whether the released carbons were or were not exactly those originated from glucose carbons. His research aimed at to find out an intermediate proteinaceous intermediary that could act as an energy reservoir. The intermediary could store in a phosphorylated amino acid the energy of carbon-carbon bond breakdown. This activated amino acid could transfer its phosphate group to ADP producing ATP. A key intermediate involved in the transfer was identified by Kaplan and Lipmann at John Hopkins as acetyl coenzyme A, for which Fritz Lipmann received a Nobel Prize.

Alternatively, under possible influence of the excellent results of Hodgkin and Huxley a second line of research appears. The work of Hodgkin & Huxley indicated that the storage of electrical potential energy in transmembrane ionic asymmetries and presented the explanation for the change from resting to action potential in excitable cells. This second line of research, under the leadership of Peter Mitchell postulated a mechanism for the transfer of oxide/reductive power of organic molecules oxidation through electron transfer as the key for the energetic transfer mechanism required for ATP synthesis.
This diverted the attention from high energy (~P) phosphate bond to the transfer of electrons. During most of the time the harsh period of the two confronting points of view, Paul Boyer and followers attempted to act as a conciliatory third party, without getting good results, according to personal accounts (in L. A. or Latin America) heard from those few of our scientists who were able to follow the major scientific events held in USA, and who could present to us later. Paul  Boyer could present how the energy was transduced by a molecular machine that changes in conformation in a series of 3 steps while rotating in one direction in order to produce ATP and in opposite direction in order to produce ADP plus Pi from ATP (reversibility).

However, earlier, a victorious Peter Mitchell obtained the result in the conceptual dispute, over the Britton Chance point of view, after he used E. Coli mutants to show H+ gradients in the cell membrane and its use as energy source, for which he received a Nobel Prize. Somehow, this outcome represents such a blow to Chance’s previous work that somehow it seems to have cast a shadow over very important findings obtained during his earlier career that should not be affected by one or another form of energy transfer mechanism.  For instance, Britton Chance got the simple and rapid polarographic assay method of oxidative phosphorylation and the idea of control of energy metabolism that brings us back to Pasteur.

This metabolic alternative result seems to have been neglected in the recent years of obesity epidemics, which led to a search for a single molecular mechanism required for the understanding of the accumulation of chemical (adipose tissue) reserve in our body. It does not mean that here the role of central nervous system is neglected. In short, in respiring mitochondria the rate of electron transport linked to the rate of ATP production is determined primarily by the relative concentrations of ADP, ATP and phosphate in the external media (cytosol) and not by the concentration of respiratory substrate as pyruvate. Therefore, when the yield of ATP is high as it is in aerobiosis and the cellular use of ATP is not changed, the oxidation of pyruvate and therefore of glycolysis is quickly (without change in gene expression), throttled down to the resting state. The dependence of respiratory rate on ADP concentration is also seen in intact cells. A muscle at rest and using no ATP has a very low respiratory rate.   [When skeletal muscle is stressed by high exertion, lactic acid produced is released into the circulation and is metabolized aerobically by the heart at the end of the activity].

This respiratory control of metabolism will lead to preservation of body carbon reserves and in case of high caloric intake in a diet, also shows increase in fat reserves essential for our biological ancestors survival (Today for our obesity epidemics). No matter how important this observation is, it is only one focal point of metabolic control. We cannot reduce the problem of obesity to the existence of metabolic control. There are numerous other factors but on the other hand, we cannot neglect or remove this vital process in order to correct obesity. However, we cannot explain obesity ignoring this metabolic control. This topic is so neglected in modern times that we cannot follow major research lines of the past that were interrupted by the emerging molecular biology techniques and the vain belief that a dogmatic vision of biology could replace all previous knowledge by a new one based upon ATGC readings. For instance, in order to display bad consequences derived from the ignorance of these old scientific facts, we can take into account, for instance, how ion movements across membranes affects membrane protein conformation and therefore contradicts the wrong central dogma of molecular biology. This change in protein conformation (with unchanged amino acid sequence) and/or the lack of change in protein conformation is linked to the factors that affect vital processes as the heart beats. This modern ignorance could also explain some major pitfalls seen in new drugs clinical trials and in a small scale on bad medical practices.

The work of Britton Chance and of Peter Mitchell have deep and sound scientific roots that were made with excellent scientific techniques, supported by excellent scientific reasoning and that were produced in a large series of very important intermediary scientific results. Their sole difference was to aim at very different scientific explanations as their goals (They have different Teleology in their minds made by their previous experiences). When, with the use of mutants obtained in microorganisms P Mitchell´s goal was found to survive and B Chance to succumb to the experimental evidence, all those excellent findings of B Chance and followers were directed to the dustbin of scientific history as an example of lack of scientific consideration.  [On the one hand, the Mitchell model used a unicellular organism; on the other, Chance’s work was with eukaryotic cells, quite relevant to the discussion.]

We can resume the challenge faced by these two great scientists in the following form: The first conceptual unification in bioenergetics, achieved in the 1940s, is inextricably bound up with the name of Fritz Lipmann. Its central feature was the recognition that adenosine triphosphate, ATP, serves as a universal energy  “currency” much as money serves as economic currency. In a nutshell, the purpose of metabolism is to support the synthesis of ATP. In microorganisms, this is perfect! In humans or mammals, or vertebrates, by the same reason that we cannot consider that gene expression is equivalent to protein function (an acceptable error in the case of microorganisms) this oversimplifies the metabolic requirement with a huge error. However, in case our concern is ATP chemistry only, the metabolism produces ATP and the hydrolysis of ATP pays for the performance of almost, all kinds of works. It is possible to presume that to find out how the flow of metabolism (carbon flow) led to ATP production must be considered a major focal point of research of the two contenders. Consequently, what could be a minor fall of one of the contenders, in case we take into account all that was found during their entire life of research, the real failure in B Chance’s final goal was amplified far beyond what may be considered by reason!

Another aspect that must be taken into account: Both contenders have in the scientific past a very sound root. Metabolism may produce two forms of energy currency (I personally don´t like this expression*) and I use it here because it was used by both groups in order to express their findings. Together with simplistic thermodynamics, this expression conveys wrong ideas): The second kind of energy currency is the current of ions passing from one side of a membrane to the other. The P. Mitchell scientific root undoubtedly have the work of Hodgkin & Huxley, Huxley &  Huxley, Huxley & Simmons

*ATP is produced under the guidance of cell needs and not by its yield. When glucose yields only 2 ATPs per molecule it is oxidized at very high speed (anaerobiosis) as is required to match cellular needs. On the other hand, when it may yield (thermodynamic terms) 38 ATP the same molecule is oxidized at low speed. It would be similar to an investor choice its least money yield form for its investment (1940s to 1972) as a solid support. B. Chance had the enzymologists involved in clarifying how ATP could be produced directly from NADH + H+ oxidative reductive metabolic reactions or from the hydrolysis of an enolpyruvate intermediary. Both competitors had their work supported by different but, sound scientific roots and have produced very important scientific results while trying to present their hypothetical point of view.

Before the winning results of P. Mitchell were displayed, one line of defense used by B. Chance followers was to create a conflict between what would be expected by a restrictive role of proteins through its specificity ionic interactions and the general ability of ionic asymmetries that could be associated with mitochondrial ATP production. Chemical catalyzed protein activities do not have perfect specificity but an outstanding degree of selective interaction was presented by the lock and key model of enzyme interaction. A large group of outstanding “mitochondriologists” were able to show ATP synthesis associated with Na+, K+, Ca2+… asymmetries on mitochondrial membranes and any time they did this, P. Mitchell have to display the existence of antiporters that exchange X for hydrogen as the final common source of chemiosmotic energy used by mitochondria for ATP synthesis.

This conceptual battle has generated an enormous knowledge that was laid to rest, somehow discontinued in the form of scientific research, when the final E. Coli mutant studies presented the convincing final evidence in favor of P. Mitchell point of view.

Not surprisingly, a “wise anonymous” later, pointed out: “No matter what you are doing, you will always be better off in case you have a mutant”

(Principles of Medical Genetics T D Gelehrter & F.S. Collins chapter 7, 1990).

However, let’s take the example of a mechanical wristwatch. It clearly indicates when the watch is working in an acceptable way, that its normal functioning condition is not the result of one of its isolated components – or something that can be shown by a reductionist molecular view.  Usually it will be considered that it is working in an acceptable way, in case it is found that its accuracy falls inside a normal functional range, for instance, one or two standard deviations bellow or above the mean value for normal function, what depends upon the rigor wisely adopted. While, only when it has a faulty component (a genetic inborn error) we can indicate a single isolated piece as the cause of its failure (a reductionist molecular view).

We need to teach in medicine, first the major reasons why the watch works fine (not saying it is “automatic”). The functions may cross the reversible to irreversible regulatory limit change, faster than what we can imagine. Latter, when these ideas about normal are held very clear in the mind set of medical doctors (not medical technicians) we may address the inborn errors and what we may have learn from it. A modern medical technician may cause admiration when he uses an “innocent” virus to correct for a faulty gene (a rather impressive technological advance). However, in case the virus, later shows signals that indicate that it was not so innocent, a real medical doctor will be called upon to put things in correct place again.

Among the missing parts of normal evolution in biochemistry a lot about ion fluxes can be found. Even those oscillatory changes in Ca2+ that were shown to affect gene expression (C. De Duve) were laid to rest since, they clearly indicate a source of biological information that despite the fact that it does not change nucleotides order in the DNA, it shows an opposing flux of biological information against the dogma (DNA to RNA to proteins). Another, line has shown a hierarchy, on the use of mitochondrial membrane potential: First the potential is used for Ca2+ uptake and only afterwards, the potential is used for ADP conversion into ATP (A. L. Lehninger). In fact, the real idea of A. L. Lehninger was by far, more complex since according to him, mitochondria works like a buffer for intracellular calcium releasing it to outside in case of a deep decrease in cytosol levels or capturing it from cytosol when facing transient increase in Ca2+ load. As some of Krebs cycle dehydrogenases were activated by Ca2+, this finding was used to propose a new control factor in addition to the one of ADP (B. Chance). All this was discontinued with the wrong use of calculus (today we could indicate bioinformatics in a similar role) in biochemistry that has established less importance to a mitochondrial role after comparative kinetics that today are seen as faulty.

It is important to combat dogmatic reasoning and restore sound scientific foundations in basic medical courses that must urgently reverse the faulty trend that tries to impose a view that goes from the detail towards generalization instead of the correct form that goes from the general finding well understood towards its molecular details. The view that led to curious subjects as bioinformatics in medical courses as training in sequence finding activities can only be explained by its commercial value. The usual form of scientific thinking respects the limits of our ability to grasp new knowledge and relies on reproducibility of scientific results as a form to surpass lack of mathematical equation that defines relationship of variables and the determination of its functional domains. It also uses old scientific roots, as its sound support never replaces existing knowledge by dogmatic and/or wishful thinking. When the sequence of DNA was found as a technical advance to find amino acid sequence in proteins it was just a technical advance. This technical advance by no means could be considered a scientific result presented as an indication that DNA sequences alone have replaced the need to study protein chemistry, its responses to microenvironmental changes in order to understand its multiple conformations, changes in activities and function. As E. Schrodinger correctly describes the chemical structure responsible for the coded form stored of genetic information must have minimal interaction with its microenvironment in order to endure hundreds and hundreds years as seen in Hapsburg’s lips. Only magical reasoning assumes that it is possible to find out in non-reactive chemical structures the properties of the reactive ones.

For instance, knowledge of the reactions of the Krebs cycle clearly indicate a role for solvent that no longer could be considered to be an inert bath for catalytic activity of the enzymes when the transfer of energy include a role for hydrogen transport. The great increase in understanding this change on chemical reaction arrived from conformational energy.

Again, even a rather simplistic view of this atomic property (Conformational energy) is enough to confirm once more, one of the most important contribution of E. Schrodinger in his What is Life:

“This little book arose from a course of public lectures, delivered by a theoretical physicist to an audience of about four hundred which did not substantially dwindle, though warned at the outset that the subject matter was a difficult one and that the lectures could not be termed popular, even though the physicist’s most dreaded weapon, mathematical deduction, would hardly be utilized. The reason for this was not that the subject was simple enough to be explained without mathematics, but rather that it was much too involved to be fully accessible to mathematics.”

In a very simplistic view, while energy manifests itself by the ability to perform work conformational energy as a property derived from our atomic structure can be neutral, positive or negative (no effect, increased or decreased reactivity upon any chemistry reactivity measured as work)

Also:

“I mean the fact that we, whose total being is entirely based on a marvelous interplay of this very kind, yet if all possess the power of acquiring considerable knowledge about it. I think it possible that this knowledge may advance to little just a short of a complete understanding -of the first marvel. The second may well be beyond human understanding.”

In fact, scientific knowledge allows us to understand how biological evolution may have occurred or have not occurred and yet does not present a proof about how it would have being occurred. It will be always be an indication of possible against highly unlike and never a scientific proven fact about the real form of its occurrence.

As was the case of B. Chance in its bioenergetics findings, we may get very important findings that indicates wrong directions in the future as was his case, or directed toward our past.

The Skeleton of Physical Time – Quantum Energies in Relative Space of S-labs

By Radoslav S. Bozov  Independent Researcher

WSEAS, Biology and BioSystems of Biomedicine

Space does not equate to distance, displacement of an object by classically defined forces – electromagnetic, gravity or inertia. In perceiving quantum open systems, a quanta, a package of energy, displaces properties of wave interference and statistical outcomes of sums of paths of particles detected by a design of S-labs.

The notion of S-labs, space labs, deals with inherent problems of operational module, R(i+1), where an imagination number ‘struggles’ to work under roots of a negative sign, a reflection of an observable set of sums reaching out of the limits of the human being organ, an eye or other foundational signal processing system.

While heavenly bodies, planets, star systems, and other exotic forms of light reflecting and/or emitting objects, observable via naked eye have been deduced to operate under numerical systems that calculate a periodic displacement of one relative to another, atomic clocks of nanospace open our eyes to ever expanding energy spaces, where matrices of interactive variables point to the problem of infinity of variations in scalar spaces, however, defining properties of minute universes as a mirror image of an astronomical system. The first and furthermost problem is essentially the same as those mathematical methodologies deduced by Isaac Newton and Albert Einstein for processing a surface. I will introduce you to a surface interference method by describing undetermined objective space in terms of determined subjective time.

Therefore, the moment will be an outcome of statistical sums of a numerical system extending from near zero to near one. Three strings hold down a dual system entangled via interference of two waves, where a single wave is a product of three particles (today named accordingly to either weak or strong interactions) momentum.

The above described system emerges from duality into trinity the objective space value of physical realities. The triangle of physical observables – charge, gravity and electromagnetism, is an outcome of interference of particles, strings and waves, where particles are not particles, or are strings strings, or  are waves waves of an infinite character in an open system which we attempt to define to predict outcomes of tomorrow’s parameters, either dependent or independent as well as both subjective to time simulations.

We now know that aging of a biological organism cannot be defined within singularity. Thereafter, clocks are subjective to apparatuses measuring oscillation of defined parameters which enable us to calculate both amplitude and a period, which we know to be dependent on phase transitions.

The problem of phase was solved by the applicability of carbon relative systems. A piece of diamond does not get wet, yet it holds water’s light entangled property. Water is the dark force of light. To formulate such statement, we have been searching truth by examining cooling objects where the Maxwell demon is translated into information, a data complex system.

Modern perspectives in computing quantum based matrices, 0+1 =1 and/or 0+0=1, and/or 1+1 =0, will be reduced by applying a conceptual frame of Aladdin’s flying anti-gravity carpet, unwrapping both past and future by sending a photon to both, placing present always near zero. Thus, each parallel quantum computation of a natural system approaching the limit of a vibration of a string defining 0 does not equal 0, and 1 does not equal 1. In any case, if our method 1+1 = 1, yet, 1 is not 1 at time i+1. This will set the fundamentals of an operational module, called labris operator or in simplicity S-labs. Note, that 1 as a result is an event predictable to future, while interacting parameters of addition 1+1 may be both, 1 as an observable past, and 1 as an imaginary system, or 1+1 displaced interactive parameters of past observable events. This is the foundation of Future Quantum Relative Systems Interference (QRSI), taking analytical technologies of future as a result of data matrices compressing principle relative to carbon as a reference matter rational to water based properties.

Goedel’s concept of loops exist therefore only upon discrete relative space uniting to parallel absolute continuity of time ‘lags’. ( Goedel, Escher and Bach: An Eternal Golden Braid. A Metaphorical Fugue on Minds and Machines in the Spirit of Lewis Carroll. D Hofstadter.  Chapter XX: Strange Loops, Or Tangled Hierarchies. A grand windup of many of the ideas about hierarchical systems and self-reference. It is concerned with the snarls which arise when systems turn back on themselves-for example, science probing science, government investigating governmental wrongdoing, art violating the rules of art, and finally, humans thinking about their own brains and minds. Does Gödel’s Theorem have anything to say about this last “snarl”? Are free will and the sensation of consciousness connected to Gödel’s Theorem? The Chapter ends by tying Gödel, Escher, and Bach together once again.)  The fight struggle in-between time creates dark spaces within which strings manage to obey light properties – entangled bozons of information carrying future outcomes of a systems processing consciousness. Therefore, Albert Einstein was correct in his quantum time realities by rejecting a resolving cube of sugar within a cup of tea (Henri Bergson 19th century philosopher. Bergson’s concept of multiplicity attempts to unify in a consistent way two contradictory features: heterogeneity and continuity. Many philosophers today think that this concept of multiplicity, despite its difficulty, is revolutionary.) However, the unity of time and space could not be achieved by deducing time to charge, gravity and electromagnetic properties of energy and mass.

Charge is further deduced to interference of particles/strings/waves, contrary to the Hawking idea of irreducibility of chemical energy carrying ‘units’, and gravity is accounted for by intrinsic properties of   anti-gravity carbon systems processing light, an electromagnetic force, that I have deduced towards ever expanding discrete energy space-energies rational to compressing mass/time. The role of loops seems to operate to control formalities where boundaries of space fluctuate as a result of what we called above – dark time-spaces.

Indeed, the concept of horizon is a constant due to ever expanding observables. Thus, it fails to acquire a rational approach towards space-time issues.

Richard Feynman has touched on issues of touching of space, sums of paths of particle traveling through time. In a way he has resolved an important paradigm, storing information and possibly studying it by opening a black box. Schroedinger’s cat is alive again, but incapable of climbing a tree when chased by a dog. Every time a cat climbs a garden tree, a fruit falls on hedgehogs carried away parallel to living wormholes whose purpose of generating information lies upon carbon units resolving light.

In order to deal with such a paradigm, we will introduce i+1 under square root in relativity, therefore taking negative one ( -1 = sqrt (i+1), an operational module R dealing with Wheelers foam squeezed by light, releasing water – dark spaces. Thousand words down!

What is a number? Is that a name or some kind of language or both? Is the issue of number theory possibly accountable to the value of the concept of entropic timing? Light penetrating a pyramid holding bean seeds on a piece of paper and a piece of slice of bread, a triple set, where a church mouse has taken a drop of tear, but a blood drop. What an amazing physics! The magic of biology lies above egoism, above pride, and below Saints.

We will set up the twelve parameters seen through 3+1 in classic realities:

–              discrete absolute energies/forces – no contradiction for now between Newtonian and Albert Einstein mechanics

–              mass absolute continuity – conservational law of physics in accordance to weak and strong forces

–              quantum relative spaces – issuing a paradox of Albert Einstein’s space-time resolved by the uncertainty principle

–              parallel continuity of multiple time/universes – resolving uncertainty of united space and energy through evolving statistical concepts of scalar relative space expansion and vector quantum energies by compressing relative continuity of matter in it, ever compressing flat surfaces – finding the inverse link between deterministic mechanics of displacement and imaginary space, where spheres fit within surface of triangles as time unwraps past by pulling strings from future.

To us, common human beings, with an extra curiosity overloaded by real dreams, value happens to play in the intricate foundation of life – the garden of love, its carbon management in mind, collecting pieces of squeezed cooling time.

The infinite interference of each operational module to another composing ever emerging time constrains unified by the Solar system, objective to humanity, perhaps answers that a drop of blood and a drop of tear is united by a droplet of a substance separating negative entropy to time courses of a physical realities as defined by an open algorithm where chasing power subdue to space becomes an issue of time.

Jose Eduardo de Salles Roselino

Some small errors: For intance an increase i P leads to a decrease in V ( not an increase in V)..

 

Radoslav S. Bozov  Independent Researcher

If we were to use a preventative measures of medical science, instruments of medical science must predict future outcomes based on observable parameters of history….. There are several key issues arising: 1. Despite pinning a difference on genomic scale , say pieces of information, we do not know how to have changed that – that is shift methylome occupying genome surfaces , in a precise manner.. 2. Living systems operational quo DO NOT work as by vector gravity physics of ‘building blocks. That is projecting a delusional concept of a masonry trick, who has not worked by corner stones and ever shifting momenta … Assuming genomic assembling worked, that is dealing with inferences through data mining and annotation, we are not in a position to read future in real time, and we will never be, because of the rtPCR technology self restriction into data -time processing .. We know of existing post translational modalities… 3. We don’t know what we don’t know, and that foundational to future medicine – that is dealing with biological clocks, behavior, and various daily life inputs ranging from radiation to water systems, food quality, drugs…

Read Full Post »

Size Matters

Larry H. Bernstein, MD, FCAP, Curator

LPBI

 

MinION Sequencing Untangles RNA Transcripts in a Difficult Gene

By Aaron Krol

http://www.bio-itworld.com/2015/11/3/minion-sequencing-untangles-rna-transcripts-difficult-gene.html

 

RNA isoforms are distinct versions of the same isoforms quotegene. Through a process called alternative splicing, the different subunits, or “exons,” that make up a gene can be reshuffled in new combinations. Many genes have two or more mutually exclusive exons, and which ones are actually expressed as RNA and protein can have big effects on cellular behavior ― in effect, expanding the protein arsenal of the genome.

 

November 3, 2015 | Brenton Graveley received his first MinION shipment in April 2014, at his lab at the University of Connecticut’s Institute of Systems Genomics. His lab was among the first to unwrap one of the candy bar-sized DNA sequencers made by Oxford Nanopore Technologies, and although its accuracy was shaky and its throughput low, right away Graveley and his colleagues could see it was producing real DNA data.

“I’m still amazed to this day that it works at all,” Graveley says. “It’s like Star Trek.”

A lot of buzz around the MinION has focused on its tiny size: early adopters have plotted to take MinIONs into outbreak zones and species-hunting tromps through the rainforest, working with bare-bones labs and laptop computers. But for Graveley, the size of the DNA strands the MinION reads is just as exciting as the size of the sequencer itself. That’s because most other sequencers rely on picking up chemical reactions that become more error-prone over time, meaning DNA can only be read in short fragments. The MinION, which reads genetic material by observing single molecules of DNA as they pass through extremely narrow “nanopores,” keeps producing data for as long as DNA is moving through the pore.

“You get the read length of whatever fragment you put into the MinION,” he says. “We’ve gotten reads that are over 100 kilobases,” hundreds or even thousands of times longer than researchers can expect with most other technologies.

Now, in a paper published in Genome Biology, Graveley and two of his lab members, post-doc Mohan Bolisetty and PhD student Gopinath Rajadinakaran, have shown how these read lengths can help explain the cellular behavior of Dscam1, one of the most difficult-to-study genes known to science. Related to a gene in humans that has been linked to Down syndrome ― the name stands for “Down Syndrome Cell Adhesion Molecule” ―Dscam1 plays a fundamental role in forming the architecture of insect brains. This single gene can produce thousands of subtly different proteins, an ability that makes it both a fascinating subject of research, and almost impossible to understand using standard sequencing technology.

 

Determining exon connectivity in complex mRNAs by nanopore sequencing

Mohan T. Bolisetty12, Gopinath Rajadinakaran1 and Brenton R. Graveley1*
Genome Biology 2015, 16:204       http://dx.doi.org:/10.1186/s13059-015-0777-z                    http://genomebiology.com/2015/16/1/204

Short-read high-throughput RNA sequencing, though powerful, is limited in its ability to directly measure exon connectivity in mRNAs that contain multiple alternative exons located farther apart than the maximum read length. Here, we use the Oxford Nanopore MinION sequencer to identify 7,899 ‘full-length’ isoforms expressed from four Drosophila genes, Dscam1, MRP, Mhc, and Rdl. These results demonstrate that nanopore sequencing can be used to deconvolute individual isoforms and that it has the potential to be a powerful method for comprehensive transcriptome characterization.

High throughput RNA sequencing has revolutionized genomics and our understanding of the transcriptomes of many organisms. Most eukaryotic genes encode pre-mRNAs that are alternatively spliced [1]. In many genes, alternative splicing occurs at multiple places in the transcribed pre-mRNAs that are often located farther apart than the read lengths of most current high throughput sequencing platforms. As a result, several transcript assembly and quantitation software tools have been developed to address this [2], [3]. While these computational approaches do well with many transcripts, they generally have difficulty assembling transcripts of genes that express many isoforms. In fact, we have been unable to successfully assemble transcripts of complex alternatively spliced genes such as Dscam1 or Mhc using any transcript assembly software (data not shown). These software tools also have difficulty quantitating transcripts that have many isoforms, and for genes with distantly located alternatively spliced regions, they can only infer, and not directly measure, which isoforms may have been present in the original RNA sample [4]. For example, consider a gene containing two alternatively spliced exons located 2 kbp away from one another in the mRNA. If each exon is observed to be included at a frequency of 50 % from short read sequence data, it is impossible to determine whether there are two equally abundant isoforms that each contain or lack both exons, or four equally abundant isoforms that contain both, neither, or only one or the other exon.

Pacific Bioscience sequencing can generate read lengths sufficient to sequence full length cDNA isoforms and several groups have recently reported the use of this approach to characterize the transcriptome [5]. However, the large capital expense of this platform can be a prohibitive barrier for some users. Thus, it remains difficult to accurately and directly determine the connectivity of exons within the same transcript. The MinION nanopore sequencer from Oxford Nanopore requires a small initial financial investment, can generate extremely long reads, and has the potential to revolutionize transcriptome characterization, as well as other areas of genomics.

Several eukaryotic genes can encode hundreds to thousands of isoforms. For example, inDrosophila, 47 genes encode over 1,000 isoforms each [6]. Of these, Dscam1 is the most extensively alternatively spliced gene known and contains 115 exons, 95 of which are alternatively spliced and organized into four clusters [7]. The exon 4, 6, 9, and 17 clusters contain 12, 48, 33, and 2 exons, respectively. The exons within each cluster are spliced in a mutually exclusive manner and Dscam1 therefore has the potential to generate 38,016 different mRNA and protein isoforms. The variable exon clusters are also located far from one another in the mRNA and the exons within each cluster are up to 80 % identical to one another at the nucleotide level. Together, these characteristics present numerous challenges to characterize exon connectivity within full-length Dscam1 transcripts for any sequencing platform. Furthermore, though no other gene is as complex as Dscam1, many other genes have similar issues that confound the determination of exon connectivity.

We are interested in developing methods to perform simple and robust long-read sequencing of individual isoforms of Dscam1 and other complex alternatively spliced genes. Here, we use the Oxford Nanopore MinION to sequence ‘full-length’ cDNAs from four Drosophila genes – Rdl, MRP,Mhc, and Dscam1 – and identify a total of 7,899 distinct isoforms expressed by these four genes.

 

Similarity between alternative exons

We were interested in determining the feasibility of using the MinION nanopore sequencer to characterize the connectivity of distantly located exons in the mRNAs expressed from genes with complex splicing patterns. For the purposes of these experiments, we have focused on fourDrosophila genes with increasingly complex patterns of alternative splicing (Fig. 1). Resistant to dieldrin (Rdl) contains two clusters, each containing two mutually exclusive exons and therefore has the potential to generate four different isoforms (Fig. 1a). Multidrug-Resistance like Protein 1(MRP) contains two mutually exclusive exons in cluster 1 and eight mutually exclusive exons in cluster 2, and can generate 16 possible isoforms (Fig. 1b). Myosin heavy chain (Mhc) can potentially generate 180 isoforms due to five clusters of mutually exclusive exons – clusters 1 and 5 contain two exons, clusters 2 and 3 each contain three exons, and cluster 4 contains five exons. Finally, Dscam1 contains 12 exon 4 variants, 48 exon 6 variants, 33 exon 9 variants (Fig. 1d), and two exon 17 variants (not shown) and can potentially express 38,016 isoforms. For this study, however, we have focused only on the exon 3 through exon 10 region of Dscam1, which encompasses the 93 exon 4, 6, and 9 variants, and 19,008 potential isoforms (Fig. 1d).

thumbnail

Fig. 1. Schematic of the exon-intron structures of the genes examined in this study. a The Rdl gene contains two clusters (cluster one and two) which each contain two mutually exclusive exons. b The MRP gene contains contains two and eight mutually exclusive exons in clusters 1 and 2, respectively. Mhc contains two mutually exclusive exons in clusters 1 and 5, three mutually exclusive exons in clusters 2 and 3, and five mutually exclusive exons in cluster 4. The Dscam1 gene contains 12, 48, and 33 mutually exclusive exons in the exon 4, 6, and 9 clusters, respectively. For each gene, the constitutive exons are colored blue, while the variable exons are colored yellow, red, orange, green, or light blue

Because our nanopore sequence analysis pipeline uses LAST to perform alignments [8], we aligned all of the Rdl, MRP, Mhc, and Dscam1 exons within each cluster to one another using LAST to determine the extent of discrimination needed to accurately assign nanopore reads to a specific exon variant. For Rdl, each variable exon was only aligned to itself, and not to the other exon in the same cluster (data not shown). For MRP, the two exons within cluster 1 only align to themselves, and though the eight variable exons in cluster 2 do align to other exons, there is sufficient specificity to accurately assign nanopore reads to individual exons (Fig. 2a). For Mhc, the variable exons in cluster 1 and cluster 5 do not align to other exons, and the variable exons in cluster 2, cluster 3, and cluster 4 again align with sufficient discrimination to identify the precise exon present in the nanopore reads (Fig. 2b). Finally, for Dscam1, the difference in the LAST alignment scores between the best alignment (each exon to itself) and the second, third, and fourth best alignments are sufficient to identify the Dscam1 exon variant (Fig. 2c). This analysis indicates that for each gene in this study, LAST alignment scores are sufficiently distinct to identify the variable exons present in each nanopore read.

thumbnail

Fig. 2. Similarity distance between the variable alternative exons of MRP,Mhc, and Dscam1. a Violin plots of the LAST alignment scores of each variable exon within MRP cluster 1 and MRP cluster 2 to themselves and the second (2nd) best alignments. b Violin plots of the LAST alignment scores of each variable exon within each Mhc cluster to themselves and the second (2nd) best alignments. c Violin plots of the LAST alignment scores of each variable exon within each Dscam1 cluster to themselves (1st), and to the exons with the second (2nd), third (3rd) and fourth (4th) best alignments

Optimizing template switching in Dscam1 cDNA libraries

Template switching can occur frequently when libraries are prepared by PCR and can confound the interpretation of results [9], [10]. For example, CAM-Seq [11] and a similar method we independently developed called Triple-Read sequencing [12] to characterize Dscam1 isoforms, were found to have excessive template switching due to amplification during the library prep protocols. To assess template switching in our current study, we generated a spike-in mixture of in vitro transcribed RNAs representing six unique Dscam1 isoforms – Dscam1 4.2,6.32,9.31 , Dscam14.1,6.46,9.30 , Dscam1 4.3,6.33,9.9 , Dscam1 4.12,6.44,9.32 , Dscam1 4.7,6.8,9.15 , and Dscam1 4.5,6.4,9.4. We used 10 pg of this control spike-in mixture and prepared libraries for MinION sequencing by amplifying the exon 3 through exon 10 region for 20, 25, or 30 cycles of RT-PCR. We then end-repaired and dA-tailed the fragments, ligated adapters, and sequenced the samples on a MinION (7.3) for 12 h each. We obtained 33,736, 8,961, and 7,511 base-called reads from the 20, 25, and 30 cycle libraries, respectively. Consistent with the size of the exon 3 to 10 cDNA fragment being 1,806–1,860 bp in length, depending on the precise combination of exons it contains, most reads we observed were in this size range (Fig. 3a). We used Poretools [13] to convert the raw output files into fasta format and then used LAST to align the reads to a LAST database containing each variable exon. From these alignments, we identified reads that mapped to all three exon clusters, as well as the exon with the best alignment score within each cluster. When examining the alignments to each cluster independently, we found that for these spike-in libraries, all reads mapped uniquely to the exons present in the input isoforms. Therefore, any observed isoforms that were not present in the input pool were a result of template switching during the RT-PCR and library prep protocol and not due to false alignments or sequencing errors.

thumbnail

Fig. 3. Optimized RT-PCR minimizes template-switching for MinION sequencing. a Histogram of read lengths from MinION sequencing ofDscam1 spike-ins from the library generated using 25 cycles of PCR. bBar plot indicating the extent of template switching in Dscam1 spike-ins at different PCR cycles (left). The blue portions indicate the fraction of reads corresponding to input isoforms while the red portions correspond to the fraction of reads corresponding to template-switched isoforms. On the right, plots of the rank order versus number of reads (log10) for the 20, 25, and 30 cycle libraries. The blue dots indicate input isoforms while the red portions correspond to template-switched isoforms

When comparing the combinations of exons within each read to the input isoforms, we observed that 32 % of the reads from the 30 cycle library corresponded to isoforms generated by template switching (Fig. 3b). The template-switched isoforms observed by the greatest number of reads in the 30 cycle library were due to template switching between the two most frequently sequenced input isoforms. In most cases, template switching occurred somewhere within exon 7 or 8 and resulted in a change in exon 9. However, the extent of template switching was reduced to only 1 % in the libraries prepared using 25 cycles, and to 0.2 % in the libraries prepared using 20 cycles of PCR (Fig. 3b). Again, for these two libraries the most frequently sequenced template-switched isoforms involved the input isoforms that were also the most frequently sequenced. These experiments demonstrate that the MinION nanopore sequencer can be used to sequence ‘full length’ Dscam1 cDNAs with sufficient accuracy to identify isoforms and that the cDNA libraries can be prepared in a manner that results in a very small amount of template switching.

Dscam1 isoforms observed in adult heads

To explore the diversity of Dscam1 isoforms expressed in a biological sample, we prepared aDscam1 library from RNA isolated from D. melanogaster heads prepared from mixed male and female adults using 25 cycles of PCR and sequenced it for 12 h on the MinION nanopore sequencer obtaining a total of 159,948 reads of which 78,097 were template reads, 48,474 were complement reads, and 33,377 were 2D reads (Fig. 4a). We aligned the reads individually to the exon 4, 6, and 9 variants using LAST. A total of 28,971 reads could be uniquely or preferentially aligned to a single variant in all three clusters. For further analysis, we used all 16,419 2D read alignments and 31 1D reads when both template and complement aligned to same variant exons (not all reads with both a template and complement yield a 2D read). The remaining 12,521 aligned reads were 1D reads where there was either only a template or complement read, or when the template and complement reads disagreed with one another and were therefore not used further. We observed 92 of the 93 potential exon 4, 6, or 9 variants – only exon 6.11 was not observed in any read (Fig. 4f). To assess the accuracy of the results we performed RT-PCR using primers in the flanking constitutive exons that contained Illumina sequencing primers to separately amplify the Dscam1exon 4, 6, and 9 clusters from the same RNA used to prepare the MinION libraries, and sequenced the amplicons on an Illumina MiSeq. The frequency of variable exon use in each cluster was extremely consistent between the two methods (R 2  = 0.95, Fig. 5a).

Fig. 4. MinION sequencing of Dscam1 identified 7,874 isoforms. aHistogram of read length distribution for Drosophila head samples. b The total number of Dscam1 isoforms identified from MinION sequencing. cCumulative distribution of Dscam1 isoforms with respect to expression. dViolin plot of the number of isoforms identified using 100 random pools of the indicated number of reads. e Plot of the estimated number of total isoforms present in the library using the capture-recapture method with two random pools of the indicated number of reads. The shaded blue area indicates the 95 % confidence interval. f Deconvoluted expression of Dscam1 exon cluster variants (top) and the isoform connectivity of two highly expressed Dscam1 isoforms (bottom)

thumbnail

Fig. 5. Accuracy of Dscam1 sequencing results. a Comparison of the frequency of variable exon inclusion for the Dscam1 exon 4 (yellow), 6 (red), and 9 (orange) clusters as determined by nanopore sequencing or by amplicon sequencing using an Illumina MiSeq. b Percent identities (left) or LAST alignment scores (right) of full-length template, complement, and two directions (sequencing both template and complements) nanopore read alignments

Over their entire lengths, the 2D reads that map specifically to one exon 4, 6, and 9 variants map with an average 90.37 % identity and an average LAST score of approximately 1,200 (Fig. 5b). The 16,450 full length reads correspond to 7,874 unique isoforms, or 42 % of the 18,612 possible isoforms given the exon 4, 6, and 9 variants observed. We note, however, that while 4,385 isoforms were represented by more than one read, 3,516 of isoforms were represented by only one read indicating that the depth of sequencing has not reached saturation (Fig. 4b and c). This was further confirmed by performing a bootstrapped subsampling analysis (Fig. 4d) and by using the capture-recapture method to attempt to assess the complexity of isoforms present in the library (Fig. 4e), which suggests that over 11,000 isoforms are likely to be present, though even this analysis has not yet reached saturation. The most frequently observed isoforms were Dscam14.1,6.12,9.30 and Dscam1 4.1,6.1,9.30 which were observed with 30 and 25 reads, respectively (Fig. 4e). In conclusion, these results demonstrate the practical application of using the MinION nanopore sequencer to identify thousands of distinct Dscam1 isoforms in a single biological sample.

Nanopore sequencing of ‘full-length’ Rdl, MRP, and Mhc isoforms

To extend this approach to other genes with complex splicing patterns, we focused on Rdl, MRP, and Mhc which have the potential to generate four, 16, and 180 isoforms, respectively. We prepared libraries for each of these genes by RT-PCR using primers in the constitutive exons flanking the most distal alternative exons using 25 cycles of PCR, pooled the three libraries and sequenced them together on the MinION nanopore sequencer for 12 h obtaining a total of 22,962 reads. The input libraries for Rdl, MRP, and Mhc were 567 bp, 1,769-1,772 bp, and 3,824 bp, respectively. The raw reads were aligned independently to LAST indexes of each cluster of variable exons. The alignment results were then used to assign reads to their respective libraries, identify reads that mapped to all variable exon clusters for each gene, and the exon with the best alignment score within each cluster. In total, we obtained 301, 337, and 112 full length reads forRdl (Fig. 6), MRP (Fig. 7), and Mhc (Fig. 8), respectively. For Rdl, both variable exons in each cluster was observed, and accordingly all four possible isoforms were observed, though in each case the first exon was observed at a much higher frequency than the second exon (Fig. 6d). Interestingly, the ratio of isoforms containing the first versus second exon in the second cluster is similar for isoforms containing either the first exon or the second exon in the first cluster indicating that the splicing of these two clusters may be independent. For MRP, both exons in the first cluster were observed and all but one of the exons in the second cluster (exon B) were observed, though the frequency at which the exons in both clusters were used varied dramatically (Fig. 7d). For example, within the first cluster, exon B was observed 333 times while exon A was observed only four times. Similarly, in the second cluster, exon A was observed 157 times whereas exons B, E, F, and G were observed 0 times, thrice, once, and twice, respectively, and exons D, E, and H were observed between 40 and 76 times. As a result, we observed only nine MRP isoforms. For Mhc, we again observed strong biases in the exons observed in each of the five clusters (Fig. 8d). In the first cluster, exon B was observed more frequently than exon A. In the second cluster, 109 of the reads corresponded to exon A, while exons B and C were observed by only two and one read, respectively. In the third cluster, exon A was not observed at all while exons B and C were observed in roughly 80 % and 20 % of reads, respectively. In the fourth cluster, exon A was observed only once, exons B and C were not observed at all, exon E was observed 13 times while exon D was present in all of the remaining reads. Finally, in the fifth cluster, only exon B was observed. As with MRP, these strong biases and near or complete absences of exons in some of the clusters severely reduces the number of possible isoforms that can be observed. In fact, of the 180 potential isoforms encoded by Mhc, we observed only 12 isoforms. Various Mhc isoforms are known to be expressed in striking spatial and temporally restricted patterns [14] and thus it is likely that other Mhc isoforms that we did not observe, could be observed by sequencing other tissue samples.

thumbnail

Fig. 6. MinION sequencing of Rdl identified four isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)

thumbnail

Fig. 7. MinION sequencing of MRP identified nine isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)

thumbnail

Fig. 8. MinION sequencing of Mhc identified 12 isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)

Conclusions

Here we have demonstrated that nanopore sequencing with the Oxford Nanopore MinION can be used to easily determine the connectivity of exons in a single transcript, including Dscam1, the most complicated alternatively spliced gene known in nature. This is an important advance for several reasons. First, because short-read sequence data cannot be used to conclusively determine which exons are present in the same RNA molecule, especially for complex alternatively spliced genes, long-read sequence data are necessary to fully characterize the transcript structure and exon connectivity of eukaryotic transcriptomes. Second, although the Pacific Bioscience platform can perform long-read sequencing, there are several differences between it and the Oxford Nanopore MinION that could cause users to choose one platform over the other. In general, the quality of the sequence generated by the Pacific Bioscience is higher than that currently generated by the Oxford Nanopore MinION. This is largely due to the fact that each molecule is sequenced multiple times on the Pacific Bioscience platform yielding a high quality consensus sequence whereas on the Oxford Nanopore MinION, each molecule is sequenced at most twice (in the template and complement). We have previously used the Pacific Bioscience platform to characterize Dscam1 isoforms and found that it works well, though due to the large amount of cDNA needed to generate the libraries, many cycles of PCR are necessary and we observed an extensive amount of template switching, making it impractical to use for these experiments (BRG, unpublished data). However, over the past year that we have been involved in the MAP, the quality of sequence has steadily increased. As this trend is likely to continue, the difference in sequence quality between these two platforms is almost certain to shrink. Nonetheless, as we demonstrate, the current quality of the data is more than sufficient to allow us to accurately distinguish between highly similar alternatively spliced isoforms of the most complex gene in nature. Third, the ability to accurately characterize alternatively spliced transcripts with the Oxford Nanopore MinION makes this technology accessible to a much broader range of researchers than was previously possible. This is in part due to the fact that, in contrast to all other sequencing platforms, very little capital expense is needed to acquire the sequencer. Moreover, the MinION is truly a portable sequencer that could literally be used in the field (provided one has access to an Internet connection), and due to its size, almost no laboratory space is required for its use.

Although nanopore sequencing has many exciting and potentially disruptive advantages, there are several areas in which improvement is needed. First, although we were able to accurately identify over 7,000 Dscam1 isoforms with an average identity of full-length alignments >90 %, there are several situations in which this level of accuracy will be insufficient to determine transcript structure. For instance, there are many micro-exons in the human genome [15], and these exons would be difficult to identify if they overlapped a portion of a read that contained errors. Additionally, small unannotated exons could be difficult to identify for similar reasons. Second, the current number of usable reads is lower than that which will be required to perform whole transcriptome analysis. One issue that plagues transcriptome studies is that the majority of the sequence generated comes from the most abundant transcripts. Thus, with the current throughput, numerous runs would be needed to generate a sufficient number of reads necessary to sample transcripts expressed at a low level. In fact, this is one reason that we chose in this study, to begin by targeting specific genes rather than attempting to sequence the entire transcriptome. We do note, however, that over the past year of our participation in the MAP, the throughput of the Oxford Nanopore MinION has increased, and it is reasonable to expect additional improvements in throughput that should make it possible to generate a sufficient number of long reads to deeply interrogate even the most complex transcriptome.

In conclusion, we anticipate that nanopore sequencing of whole transcriptomes, rather than targeted genes as we have performed here, will be a rapid and powerful approach for characterizing isoforms, especially with improvements in the throughput and accuracy of the technology, and the simplification and/or elimination of the time-consuming library preparations.

 

The Tangled Transcriptome

Graveley’s lab studies the transcriptome, the mass of RNA molecules in living cells whose job is to translate DNA into proteins. The transcriptome is a sort of snapshot of which parts of the genome are active at a given time and place. Which genes are transcribed into RNA, and in what quantities, changes from organ to organ and even cell to cell, and can vary over an organism’s lifetime or in response to environmental changes.

Of particular interest to Graveley are those RNA molecules than can take different shapes, or “isoforms,” depending on random chance or what the cell needs at a particular time. RNA isoforms are distinct versions of the same isoforms quotegene. Through a process called alternative splicing, the different subunits, or “exons,” that make up a gene can be reshuffled in new combinations. Many genes have two or more mutually exclusive exons, and which ones are actually expressed as RNA and protein can have big effects on cellular behavior ― in effect, expanding the protein arsenal of the genome.

“For the entire field of transcriptomics and gene function, knowing what isoforms are expressed is critical,” says Graveley. “Most genes are complicated, especially in humans, and have alternative splicing that occurs at multiple places.”

That brings us to the challenge of Dscam1, the world record holder for alternative splicing. In fruit flies, a particularly well-studied model organism, Dscam1 is made up of 115 exons, only 20 of which are always transcribed into RNA. The other 95 exist in four “clusters” of mutually exclusive exons, and as a result, over 38,000 possible isoforms of Dscam1 have been predicted.

“This is by far, an order of magnitude, more than any other gene,” Graveley explains. This flexibility makes sense in light of Dscam1’s function. The protein it makes helps to “identify” single neurons in the insect brain, making them distinct enough from their neighbors for these cells to assemble a neural circuit on principles of like avoiding like. In experiments where Dscam1 has been altered to make fewer RNA isoforms, the neural wiring breaks down during development, sometimes severely enough to kill the flies.

Dscam1 also plays a role in the insect immune system, another reason for it to produce a huge variety of isoforms. Each of these molecules might be more or less effective at fighting certain pathogens.

It’s frustratingly hard, however, to figure out exactly which isoforms are in a specific sample. Graveley has been working on Dscam1 in fruit flies for more than a decade, but very basic questions remain unanswered: are some isoforms more common, or more important, than others? Are all the theoretical isoforms expressed? Do the isoforms have different behaviors, or are they just arbitrary ways of tagging neurons?

Size Matters

The trouble is the current state of the art in sequencing technology, which reads just a couple of hundred DNA bases at a time. That works great for identifying which exons are present in the transcriptome, but it’s no good for saying which mix of exons any specific strand of RNA is carrying. Different exons can lie thousands of bases apart on the RNA molecule, and there’s no way to bridge the gap between reads.

Graveley has tried a lot of solutions. He’s used the outdated Sanger sequencing method, which is much slower and more labor-intensive than modern sequencers, but does span longer reads. His lab also worked out a roundabout way of reconstructing RNA transcripts with contemporary Illumina sequencers, through a combination of chemistry and computational approaches.

“It worked,” he says, “but it was complicated by a lot of library preparation artifacts, and you basically had to jury-rig a genome analyzer to do something it was not supposed to do.”

Graveley’s preferred method is to use a sequencer produced by Pacific Biosciences, which, like the MinION, is built on long-read, single-molecule technology. PacBio sequencing is much better established than nanopores, and its results are known to be reliable; it also has the high throughput typical of modern instruments. For researchers working on alternative splicing, it’s clearly the technology to beat.

Unfortunately, it’s also very expensive. So Graveley’s team set out to learn whether the MinION, a low-throughput but extremely cheap alternative, could be an adequate substitute.

For the Genome Biology paper, the team focused on a 1.8-kilobase region of Dscam1 RNA that covers 93 of the gene’s 95 alternatively spliced exons. To get their samples, they crushed fruit fly heads, isolated Dscam1 RNA from the sample using a polymerase, and reverse-transcribed it into cDNA for sequencing. They also sequenced transcripts of three other alternatively spliced genes, Rdl, MRP, and Mhc.

splicing quote

The biggest concern for new applications of the MinION is its shaky accuracy. While most sequencers can achieve comfortably over 99% consensus with reference sequences, Graveley’s group has seen only about 90% identity with the MinION. That’s actually a little better than most MinION users have managed, although the device’s accuracy has been steadily improving. Users have had to pick their projects carefully to account for this: the device is pretty reliable in resequencing studies that map DNA reads to known references, but it’s still a dubious choice for sequencing unknown genetic material from scratch (although it’s been tried).

To accurately pin down the exact isoforms in the transcriptome, the MinION didn’t have to read every RNA molecule perfectly, but it did have to come close enough to decisively tell one exon from another ― and inDscam1, those exons could be as much as 80% identical.

In fact, Graveley and his co-authors found that the MinION was very capable of this. Out of around 33,000 high-quality Dscam1 reads pulled off the sequencer, almost 29,000 were a strong match for one and only one combination of exons. To further check their accuracy, the team also sequenced the same sample on Illumina technology. While the Illumina sequencer could not give whole isoforms, it did show the same proportions of different exons, suggesting that the MinION gave a complete and unbiased picture of the sample.

“Alternative splicing, it turns out, is probably one of the ideal applications for this platform,” Graveley says. “Even with a gene as complicated as this one, we’re able to accurately distinguish the isoforms from one another. Unless you have very, very small exons, or two exons that are almost identical to each other, the accuracy is good enough.”

Make Way for PromethION

The results are good news for researchers studying the transcriptome, but the MinION probably won’t push out other methods for dealing with alternative splicing just yet. Its low throughput means that at best it can cover a very small portion of the transcriptome with each run ― and that means isolating targeted RNA transcripts, a process that can introduce new biases into the data.

“You need a lot of reads to get the whole transcriptome, and what happens is you end up sequencing boring genes like actin and tubulin, the really abundantly expressed things,” Graveley explains. Still, his data from this experiment was good enough to replicate a few earlier findings: for instance, that Dscam1 does appear to make every predicted isoform. In this experiment, his lab observed almost half the possible isoforms, containing 92 of 93 possible exons.

Meanwhile, Oxford Nanopore Technologies is working on a new instrument, the PromethION, which will contain 48 MinION-style flow cells in a battery. Graveley has already signed on to be one of the first recipients, in an access program that is likely to start in the winter.

Judging by studies like this one, the PromethION stands a good chance of becoming the instrument of choice for large-scale RNA sequencing. With Dscam1, Graveley hopes to reach high enough throughput to do functional studies, seeking to learn whether different combinations of isoforms give rise to physical or behavioral differences. He also wants to look at human genes with high levels of alternative splicing, and to test whether the MinION can accurately count total numbers of RNA isoforms.

“The fact that you can use this technology to characterize whole isoforms is very exciting,” Graveley says. “It’s going to help us start characterizing the transcriptome in ways that have been very difficult.”

 

 

 

Read Full Post »

Investigating Functional Compensation by Human Paralogous Proteins

Larry H. Bernstein, MD, FCAP, Curator

LPBI

 

 

Using Disease-Associated Coding Sequence Variation to Investigate Functional Compensation by Human Paralogous Proteins

Evolutionary Bioinformatics 2015:11 245-251    http://dx.doi.org:/10.4137/EBO.S30594

 

In this article, we examined the functional compensation among paralogs as a general phenomenon through an analysis of disease-associated genetic variation in humans.23–26 In contrast to expectations under the functional compensation hypothesis, we found that multigene families have a greater tendency to harbor dSNVs than singleton proteins. We proposed that differences in functional constraints (evolutionary constraint hypothesis) explain the observed pattern to a large degree.

 

Gene duplication enables the functional diversification in species. It is thought that duplicated genes may be able to compensate if the function of one of the gene copies is disrupted. This possibility is extensively debated with some studies reporting proteome-wide compensation, whereas others suggest functional compensation among only recent gene duplicates or no compensation at all. We report results from a systematic molecular evolutionary analysis to test the predictions of the functional compensation hypothesis. We contrasted the density of Mendelian disease-associated single nucleotide variants (dSNVs) in proteins with no discernable paralogs (singletons) with the dSNV density in proteins found in multigene families. Under the functional compensation hypothesis, we expected to find greater numbers of dSNVs in singletons due to the lack of any compensating partners. Our analyses produced an opposite pattern; paralogs have over 35% higher dSNV density than singletons. We found that these patterns are concordant with similar differences in the rates of amino acid evolution (ie, functional constraints), as the proteins with paralogs have evolved 33% slower than singletons. Our evolutionary constraint explanation is robust to differences in family sizes, ages (young vs. old duplicates), and degrees of amino acid sequence similarities among paralogs. Therefore, disease-associated human variation does not exhibit significant signals of functional compensation among paralogous proteins, but rather an evolutionary constraint hypothesis provides a better explanation for the observed patterns of disease-associated and neutral polymorphisms in the human genome.

 

 

Gene duplication is an important mechanism for the origin of novelty in evolution.1–3 When a gene is duplicated, one of the duplicate copies usually decays within a few million years due to an accumulation of deleterious mutations.4 However, duplicates may be retained if they become functionally important to the organism.5–7 It has been suggested that duplicate genes may be able to carry out the original gene function, which means that paralogs may compensate for each other.8,9 Gene knockout/knockdown experiments have been conducted in multiple species to examine the degree of functional redundancy in gene families. The results suggest that the loss of function in genes with paralogs is associated with higher organismal survival than the loss of function in genes without any known paralogs (singletons), supporting the functional compensation hypothesis.10–16 However, Liao and Zhang17 reported that duplicates rarely compensate for each other in mice, which has been debated.18–22 Overall, experimental data have not yet provided definitive evidence about whether paralogous genes do compensate for each other in most instances.

The predictions of functional compensation can be tested computationally by analyzing the disease-associated genetic variation in humans. These variants are currently experiencing negative selection in the human populations, which means that they constitute data of functional impact in nature. If functional compensation among gene family members is substantial, it is expected that fewer significant statistical associations between variants and disease phenotypes will be detected for proteins in multigene families than for singletons. Using this idea, Dickerson and Robertson23 tested the predictions of functional compensation and found no difference between the proportion of singletons and para logs implicated in diseases (2% difference), supporting the conclusions of Liao and Zhang.17 However, they and others have suggested that recently diverged paralogs are less likely to be disease-associated than singletons and proteins with distantly related paralogs.23–26 These results suggest functional redundancy among young gene duplicates.

However, the abovementioned computational studies have not accounted for many potentially confounding factors. First, disease-associated single nucleotide variants (dSNVs) are found preferentially at slowly evolving amino acid positions27; thus, we expect to observe a higher frequency of dSNVs in more conserved proteins. This could distort comparisons between singletons and multigene family proteins if the distributions of amino acid evolutionary rates are not the same for these two classes. Second, the numbers of dSNVs found in different proteins are not expected to be the same because the numbers of amino acids in proteins vary by an order of magnitude. This means that commonly used metrics, such as the relative fractions of disease and nondisease proteins in different protein classes, are too coarse. Metrics that take into account the number of amino acids in proteins (sequence length) are necessary for more robust hypothesis testing.

In the following section, we tested the hypothesis of functional compensation by considering the abovementioned factors to better understand the genome-wide pattern of functional evolution in gene families, which is vital for understanding genome evolution and predicting disruptive effects of the mutations of proteins that have paralogs.

We obtained a set of 15,485 human proteins and their homologs from 46 diverse species from the UCSC genome browser (see Material and Methods). For each protein, we also obtained a list of paralogs from the HOVERGEN database.28 Our set of proteins is representative of the whole human gene set because about half (52%) of these proteins have at least one paralog, a fraction that is similar to the overall fraction of proteins with paralogs in the human genome (49% in HOVERGEN database28). For each human protein, we computed the average rate of amino acid substitution (number of substitutions per site per billion years) using the interspecific amino acid sequence alignments (see Material and Methods). Figure 1 shows the distributions of evolutionary rates in singleton and multigene family proteins. Overall, singletons are less conserved than multigene family proteins, with a ∼20% mean and ∼30% median difference (P < 0.01 by two-sample Kolmogorov–Smirnov test; Fig. 1A). Similar patterns are observed when considering paralogs belonging to small (2–5) and large (.5) multigene families (P < 0.01; Fig. 1B).

 

Figure 1. Distributions of evolutionary rates of singleton (broken line) and multigene family proteins (solid or dotted line). (A) Evolutionary rates are in the units of the number of amino acid substitutions per amino acid site per billion years. the mean and median of these distributions are 1.05 and 0.89, respectively, for singletons, and 0.80 and 0.61, respectively, for proteins in multigene families. these distributions are significantly different (two-sample Kolmogorov– smirnov test; P < 0.01). (b) multigene family proteins were separated into those with two to five paralogs (small family; solid line) and greater than five paralogs (large family; dotted line). The mean and median of these distributions are 0.75 and 0.60, respectively, for the proteins from the small multigene families (two to five paralogs) and 0.87 and 0.63, respectively, for the proteins from the large multigene families (greater than five paralogs). These distributions are significantly different from the distribution for singletons (P < 0.01).

 

dsNVs in singletons and multigene families. We analyzed all available SNVs associated with Mendelian diseases in singleton and multigene family proteins. There were a total of 47,382 dSNVs in 2,589 proteins. In these data, the proportion of proteins with at least one dSNV was slightly lower (2.2%) for singletons than that of proteins with paralogs, which is consistent with the recent reports.23,29 However, the number of dSNVs in proteins varied extensively and was found to be positively correlated with the protein length (P < 0.05 for multigene family and singletons; Fig. 2). This is reasonable because longer proteins should have a greater chance of accumulating random mutations and are, therefore, more likely to be classified as disease genes. Thus, we normalized the number of dSNVs by protein length to avoid any bias due to length differences in subsequent analyses.

 

Figure 2. Distributions of the number of dsnvs. (A) a frequency diagram showing the number of proteins with at least one dsnv. (b) the average number of dsnvs per protein for proteins at different length thresholds at 100 amino acids intervals. the average number of dsnvs per protein is positively correlated with the average protein length for both multigene family (correlation = 0.005; P < 0.01) and singleton proteins (correlation = 0.002; P = 0.04).

 

We compared the number of dSNVs per 100 amino acid positions (dSNV density) between multigene family and singleton proteins. Multigene family proteins have 1.6 times higher density of dSNVs than detected in singleton proteins (0.66 and 0.42, respectively). We can statistically reject the null hypothesis of equal dSNV densities in singletons and multigene family proteins (P < 0.01). However, the direction of effect is opposite to the predictions of functional compensation from paralogous genes in multigene families, as the multigene family proteins contained significantly more dSNVs than singletons. We tested the influence of outliers on this result by excluding all proteins with .0.5 dSNVs per amino acid. This reduced the number of proteins slightly (131 proteins were excluded), but the ratio of multigene family and singleton protein dSNV densities remained unchanged (1.6; P < 0.01). We, nevertheless, excluded all proteins in which the number of dSNVs per position was .0.5 in all subsequent analyses to remove the influence of proteins with unusually high dSNV density when comparing the patterns between different classes of proteins. We also tested if the observed patterns reflect the mutations of specific amino acids (eg, arginine) that comprise a major fraction of the dataset of dSNVs (16%). Arginine codons contain a CpG dinucleotide in the first two positions and are, thus, more prone to transitional mutations, leading to amino acid variation.30 We computed the dSNV densities using only the arginine positions in proteins and found the dSNV density in multigene family proteins to be 1.5 times greater than observed in singletons (0.09 and 0.06, respectively; P < 0.01). A similar pattern was observed for glycine (replacement of glycine residues occurs for 12% of dSNVs in this dataset). The dSNV density in multigene family proteins was twice than observed in singletons (0.08 and 0.04, respectively; P < 0.01).

Finally, we looked for the signatures of functional compensation using dSNVs that are expected to be the most severe, with the rationale that functional compensation may be easier to detect, as ameliorating severe phenotypic effects will have greater relative effect on individual fitness. We designated a dSNV to be severe if the predicted functional impact score for the variant was in the top 5% of all dSNVs (see Material and Methods). For these data, the multigene family proteins have a dSNV density 2.3 times higher than that observed for singletons (0.034 and 0.015, respectively; P < 0.01), which does not support the functional compensation hypothesis. Therefore, the patterns of greater abundance of dSNVs in multigene families are robust to the predicted effect sizes of dSNVs analyzed and the amino acid composition bias of the variation dataset.

Relationship of evolutionary conservation and dsNVs.

We examined if protein conservation difference between singletons and multigene family proteins can explain the above mentioned pattern because it is now well established that highly conserved proteins are significantly more likely to contain dSNVs.27,31 Because the protein evolutionary rate distributions are neither normal nor symmetrical (Fig. 1), we compared medians (0.61 and 0.89, respectively) and found a ratio of 0.69 between the multigene family and singleton proteins. The inverse of this ratio (1.5) is only slightly different from the ratio of dSNV densities (1.6). This similarity suggests that the higher rate of dSNVs in multigene family proteins is mostly explained by the degree of functional constraint on proteins in multigene families versus singleton proteins. Based on this observation, we propose the evolutionary constraints hypothesis, which posits that the differences in dSNV densities among different classes of proteins (eg, singleton vs. multigene) are primarily a result of the differences in the degree of natural selection acting upon them. If true, this would be consistent with the neutral theory of molecular evolution.32 Evolutionary constraint hypothesis does not preclude the existence of functional compensation (among other factors) in some proteins or positions, but it does claim that differences in the intensity of purifying selection will be the primary cause of observed differences in the preponderance of SNVs in different groups of proteins.

We tested the prediction of the evolutionary constraint hypothesis in an analysis of 12,952 common neutral SNVs (nSNVs) obtained from the 1000 Genomes Project.33 These common nSNVs are complementary in nature to dSNVs, as common nSNVs persist in the human population and have risen to moderate frequencies (.5%) because their impact on fitness is effectively neutral (opposite of dSNVs that cause disease). Therefore, if functional constraints and, thus, the conservation level of human protein sequence explain the observed differences in dSNV density, we should also observe fewer nSNVs in multigene family proteins, as these proteins evolve more slowly and are expected to be subject to more severe purifying selection.34 Indeed, the nSNV density (number of nSNVs per 100 amino acids) in multigene family proteins was lower than that of singletons (ratio = 0.82; 0.13 and 0.16, respectively; P < 0.01). This ratio (0.82) is again similar to the ratio of the evolutionary rates (0.69) for these two classes of proteins. These results suggest that the occurrence of dSNVs and nSNVs in proteins is largely concordant with the degree of functional constraint on proteins, which is captured in their evolutionary rates.

Disease sNV prevalence in proteins with young and old paralogs.

Next, we tested the hypothesis that functional compensation is more common in proteins with younger paralogs.23,24 If functional compensation generally occurs only for a brief period after the gene duplication event, then the most recently diverged paralogs will provide the most powerful signal to detect functional compensation. We first identified the closest paralog for each protein within a given gene family by selecting the paralog with the smallest nucleotide divergence in their codons (third positions only). To estimate the relative antiquity of the duplicate event, we used the protein-specific human–mouse third positions in codons to normalize each closest paralog divergence across gene families (see Materials and Methods). This normalized value yields an approximate gene duplication time when it is scaled using the human–mouse divergence time (92.3 million years ago35). This approximation is reasonable, as third positions in codons evolve relatively neutrally and because we use divergence times primarily for identifying and sorting young paralogs for hypothesis testing.

Density of dSNV for duplicates that have diverged from their paralogs in the last 200 million years shows a tendency to increase with estimated duplicate age (Fig. 3A). The same pattern is observed for the positions of arginine and glycine and those with predicted severe functional impacts (Fig. 3B–D). Also, the dSNV densities for the youngest duplicates are lower than those for singletons (triangle in Fig. 3). We found that the evolutionary rate of proteins is negatively correlated with time since duplication, and the youngest duplicates have higher evolutionary rates than singletons (Fig. 4A). These patterns do not support the functional compensation hypothesis23 and are consistent with our evolutionary constraint hypothesis. These trends are confirmed in the analysis of nSNV densities that showed expected complementary patterns (Fig. 4B).

 

Figure 3. the dsnv density in duplicates over time. Each point shows the dsnv density of all proteins with duplication age less than or equal to a threshold time (x-axis; 10 million year intervals). the dsnv density of singletons is shown with a triangle. Panels show patterns obtained for all dsnvs (A), arginine dsnvs (b), and glycine dsnvs (C). Panel D shows patterns for dsnvs with severe impact predicted by EvoD.46

 

Figure 4. the average evolutionary rates (A) and nsnv densities (b) of all proteins with duplication age less than or equal to a threshold time (x-axis; 10 million year intervals). the decreasing trend for evolutionary rate (A) is opposite to that observed for dsnvs, but it is similar to that observed for nsnvs (b). in each panel, triangle shows the value from singletons.

 

Disease sNV prevalence in proteins with very similar paralogs.

We also tested the functional compensation hypothesis in proteins that show high amino acid sequence similarities with their paralogs, as studied by Hsiao and Vitkup.24 We found that paralogs with the highest amino acid sequence similarities (.95%) actually have higher dSNV densities than other paralogs (0.98 vs. 0.57; P < 0.01). This is inconsistent with the functional compensation hypothesis but agrees with our evolutionary constraint hypothesis because the evolutionary rates were lower in paralogs with .95% similarity (0.59 and 0.78 substitutions/site/billion years; P < 0.01). Therefore, differences in the degree of functional constraint (measured using evolutionary rates) account for the observed patterns of dSNV densities.

Next, we compared nSNV densities in paralogs with .95% sequence similarity to those with #95% similarity. For this comparison, we needed to be cognizant of the fact that variant calls are difficult when the paralogs have very similar DNA sequences.36–39 This is the case for paralogs with .95% amino acid sequence similarity because most of these proteins also showed small divergences at the third positions in codons between paralogs (#0.2 substitutions per site). To accommodate the variant call errors, we used proteins with #0.2 distance (third positions) for comparison between paralogs for two groups of proteins (225 and 69 proteins). The nSNV density was 0.30 and 0.52 for proteins that have paralogs with .95% and #95% sequence similarity, respectively (P < 0.01). The former proteins are more conserved (rate = 0.89) than the latter (rate = 1.97; P < 0.01), and so the result is consistent with the evolutionary constraint hypothesis.

 

In this article, we examined the functional compensation among paralogs as a general phenomenon through an analysis of disease-associated genetic variation in humans.23–26 In contrast to expectations under the functional compensation hypothesis, we found that multigene families have a greater tendency to harbor dSNVs than singleton proteins. We proposed that differences in functional constraints (evolutionary constraint hypothesis) explain the observed pattern to a large degree. We confirmed that singleton proteins show lower functional constraint than proteins with identifiable duplicates in the genome, which explains the increased detection of disease-associated variation observed in multigene families.

Some recent theoretical and empirical studies suggest that functional compensation can lead to enhanced purifying selection and, therefore, may actually be associated with slower evolutionary rates.14,40 Other studies indicate that the youngest duplicates are evolving under relaxed selection pressures, which would cause an increase in evolutionary rates for a few million years.4 Such short-term and localized rate changes (faster or slower) will not have significant impact on the estimates of very long-term evolutionary rates that we have used to quantify the functional constraint. We have calculated the evolutionary rates using sequence differences in proteins that have accumulated changes for hundreds of millions of years across major groups of vertebrates. There is no evidence that pervasive functional compensation exists across the phylogenetic breadth and genomic scale reflected in our analyses. We expect our major conclusions to hold true in general, while acknowledging that functional compensation may occur in some multigene families and some amino acid positions. In summary, we suggest that there is a need to fully consider differences in the evolutionary conservation of proteins when studying the patterns of sequence variation and variant–phenotype associations.

 

 

Related

 

Read Full Post »

Cancer Drug-Resistance Mechanism

Curator: Larry H. Bernstein, MD, FCAP

 

Drug-Resistance Mechanism in Tumor Cells Unravelled

Targeting the RNA-binding protein that promotes resistance could lead to better cancer therapies.

About half of all tumors are missing a gene called p53, which helps healthy cells prevent genetic mutations. Many of these tumors develop resistance to chemotherapy drugs that kill cells by damaging their DNA.

MIT cancer biologists have now discovered how this happens: A backup system that takes over when p53 is disabled encourages cancer cells to continue dividing even when they have suffered extensive DNA damage. The researchers also discovered that an RNA-binding protein called hnRNPA0 is a key player in this pathway.

“I would argue that this particular RNA-binding protein is really what makes tumor cells resistant to being killed by chemotherapy when p53 is not around,” says Michael Yaffe, the David H. Koch Professor in Science, a member of the Koch Institute for Integrative Cancer Research, and the senior author of the study.

The findings suggest that shutting off this backup system could make p53-deficient tumors much more susceptible to chemotherapy. It may also be possible to predict which patients are most likely to benefit from chemotherapy and which will not, by measuring how active this system is in patients’ tumors.

Rewired for resistance

In healthy cells, p53 oversees the cell division process, halting division if necessary to repair damaged DNA. If the damage is too great, p53 induces the cell to undergo programmed cell death.

In many cancer cells, if p53 is lost, cells undergo a rewiring process in which a backup system, known as the MK2 pathway, takes over part of p53’s function. The MK2 pathway allows cells to repair DNA damage and continue dividing, but does not force cells to undergo cell suicide if the damage is too great. This allows cancer cells to continue growing unchecked after chemotherapy treatment.

“It only rescues the bad parts of p53’s function, but it doesn’t rescue the part of p53’s function that you would want, which is killing the tumor cells,” says Yaffe, who first discovered this backup system in 2013.

In the new study, the researchers delved further into the pathway and found that the MK2 protein exerts control by activating the hnRNPA0 RNA-binding protein.

RNA-binding proteins are proteins that bind to RNA and help control many aspects of gene expression. For example, some RNA-binding proteins bind to messenger RNA (mRNA), which carries genetic information copied from DNA. This binding stabilizes the mRNA and helps it stick around longer so the protein it codes for will be produced in larger quantities.

“RNA-binding proteins, as a class, are becoming more appreciated as something that’s important for response to cancer therapy. But the mechanistic details of how those function at the molecular level are not known at all, apart from this one,” says Ian Cannell, a research scientist at the Koch Institute and the lead author of the Cancer Cell paper.

In this paper, Cannell found that hnRNPA0 takes charge at two different checkpoints in the cell division process. In healthy cells, these checkpoints allow the cell to pause to repair genetic abnormalities that may have been introduced during the copying of chromosomes.

One of these checkpoints, known as G2/M, is controlled by a protein called Gadd45, which is normally activated by p53. In lung cancer cells without p53, hnRNPA0 stabilizes mRNA coding for Gadd45. At another checkpoint called G1/S, p53 normally turns on a protein called p21. When p53 is missing, hnRNPA0 stabilizes mRNA for a protein called p27, a backup to p21. Together, Gadd45 and p27 help cancer cells to pause the cell cycle and repair DNA so they can continue dividing.

Personalized medicine

The researchers also found that measuring the levels of mRNA for Gadd45 and p27 could help predict patients’ response to chemotherapy. In a clinical trial of patients with stage 2 lung tumors, they found that patients who responded best had low levels of both of those mRNAs. Those with high levels did not benefit from chemotherapy.

“You could measure the RNAs that this pathway controls, in patient samples, and use that as a surrogate for the presence or absence of this pathway,” Yaffe says. “In this trial, it was very good at predicting which patients responded to chemotherapy and which patients didn’t.”

“The most exciting thing about this study is that it not only fills in gaps in our understanding of how p53-deficient lung cancer cells become resistant to chemotherapy, it also identifies actionable events to target and could help us to identify which patients will respond best to cisplatin, which is a very toxic and harsh drug,” says Daniel Durocher, a senior investigator at the Samuel Lunenfeld Research Institute of Mount Sinai Hospital in Toronto, who was not part of the research team.

The MK2 pathway could also be a good target for new drugs that could make tumors more susceptible to DNA-damaging chemotherapy drugs. Yaffe’s lab is now testing potential drugs in mice, including nanoparticle-based sponges that would soak up all of the RNA binding protein so it could no longer promote cell survival.

Read Full Post »

Human Genetics and Childhood Diseases, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 1: Next Generation Sequencing (NGS)

Human Genetics and Childhood Diseases

Curator: Larry H. Bernstein, MD, FCAP

 

 

 

Publication Roundup: HGMD

HGMD®, the Human Gene Mutation Database is used by scientists around the world to find information on reported genetic mutations. The papers below use the database to advance our understanding of disease, DNA dynamics, and more.

https://www.qiagenbioinformatics.com/blog/translational/publication-roundup-hgmd

Local DNA dynamics shape mutational patterns of mononucleotide repeats in human genomes
First author: Albino Bacolla

Scientists in the US and UK published results in Nucleic Acids Research of a detailed analysis of single-base substitutions and indels in the human genome. Their findings show that certain base positions are more susceptible to mutagenesis than others. They used HGMD Professional to find mutations in specific genomic regions for analysis; the paper includes charts showing mutation patterns, germline SNPs, and more from HGMD data.

High prevalence of CDH23 mutations in patients with congenital high-frequency sporadic or recessively inherited hearing loss
First author: Kunio Mizutari

This Orphanet Journal of Rare Diseases paper from scientists in Japan sequenced 72 patients with unexplained hearing loss, finding several CDH23 mutations, some of which were novel. Mutations in the gene have been linked to Usher syndrome and other forms of hereditary hearing loss. The scientists used HGMD to find all known CDH23 mutations within nearly 70 coding regions.

Mutation analyses and prenatal diagnosis in families of X-linked severe combined immunodeficiency caused by IL2Rγ gene novel mutation
First author: Q.L. Bai

In Genetics and Molecular Research, scientists report the utility of mutation analysis of the interleukin-2 receptor gamma gene to assess carrier status and perform prenatal diagnosis for X-linked severe combined immunodeficiency. They studied two high-risk families, along with 100 controls, to evaluate the approach. Sequence variation was determined using HGMD Professional and an X-SCID database, and a new mutation was discovered in the project.

Impact of glucocerebrosidase mutations on motor and nonmotor complications in Parkinson’s disease
First author: Tomoko Oeda

Researchers from three hospitals in Japan published this Neurobiology of Aging report that may help stratify Parkinson’s disease patients by prognosis. They sequenced mutations in the GBA gene in 215 patients, finding that those who had mutations associated with Gaucher disease suffered dementia and psychosis much earlier than those who didn’t. The team found previously reported GBA mutations using HGMD Professional.

Comprehensive Genetic Characterization of a Spanish Brugada Syndrome Cohort
First author: Elisabet Selga

In this PLoS One publication, scientists from a number of institutions in Spain examined genetic variation among patients with Brugada syndrome, a rare genetic cardiac arrhythmia. They sequenced 14 genes in 55 patients, identifying 61 variants and finding the subset that appear pathogenic. Variants were filtered against a number of databases, including HGMD.

 

 

Local DNA dynamics shape mutational patterns of mononucleotide repeats in human genomes

Albino Bacolla1Xiao Zhu2Hanning Chen3Katy Howells4David N. Cooper4 and Karen M. Vasquez1

Nucl. Acids Res. (26 May 2015) 43(10): 5065-5080.   http://dx.doi.org:/10.1093/nar/gkv364

Single base substitutions (SBSs) and insertions/deletions are critical for generating population diversity and can lead both to inherited disease and cancer. Whereas on a genome-wide scale SBSs are influenced by cellular factors, on a fine scale SBSs are influenced by the local DNA sequence-context, although the role of flanking sequence is often unclear. Herein, we used bioinformatics, molecular dynamics and hybrid quantum mechanics/molecular mechanics to analyze sequence context-dependent mutagenesis at mononucleotide repeats (A-tracts and G-tracts) in human population variation and in cancer genomes. SBSs and insertions/deletions occur predominantly at the first and last base-pairs of A-tracts, whereas they are concentrated at the second and third base-pairs in G-tracts. These positions correspond to the most flexible sites along A-tracts, and to sites where a ‘hole’, generated by the loss of an electron through oxidation, is most likely to be localized in G-tracts. For A-tracts, most SBSs occur in the direction of the base-pair flanking the tracts. We conclude that intrinsic features of local DNA structure, i.e. base-pair flexibility and charge transfer, render specific nucleotides along mononucleotide runs susceptible to base modification, which then yields mutations. Thus, local DNA dynamics contributes to phenotypic variation and disease in the human population.

INTRODUCTION

Changes in human genomic DNA in the form of base substitutions and insertions/deletions (indels) are essential to ensure population diversity, adaptation to the environment, defense from pathogens and self-recognition; they are also a critical source of human inherited disease and cancer. On a genome-wide scale, base substitutions result from the combined action of several factors, including replication fidelity, lagging versus leading strand DNA synthesis, repair, recombination, replication timing, transcription, nucleosome occupancy, etc., both in the germline and in cancer (14). On a much finer scale [(over a few base pairs (bp)], rates of base substitutions may be strongly influenced by interrelationships between base–protein and base–base interactions. For example, the mutator role of activation-induced deaminase (AID) in B-cells during class-switch recombination and somatic hypermutation (5) targets preferentially cytosines within WRC (W: A|T; R: A|G) sequences (6), whereas apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like (APOBEC) overexpression displays a preference for base substitutions at cytosines in TCW contexts (7). Other examples, such as the induction of C→T transitions at CG:CG dinucleotides by cytosine-5-methylation and the role of UV light in promoting base substitutions at pyrimidine dimers have been well documented (reviewed in (4,8)). More recently, complex patterns of base substitution at guanosines in cancer genomes have been found to correlate with changes in guanosine ionization potentials as a result of electronic interactions with flanking bases (9), suggesting a role for electron transfer and oxidation reactions in sequence-dependent mutagenesis. However, despite these advances, the increasing number of sequence-dependent patterns of mutation noted in genome-wide sequencing studies has met with a lack of understanding of most of the underlying mechanisms (10). Thus, a picture is emerging in which mutations are often heavily dependent on sequence-context, but for which our comprehension is limited.

Mononucleotide repeats comprise blocks of identical base pairs (A|T or C|G; hereafter referred to as A-tracts and G-tracts) and display distinct features: they are abundant in vertebrate genomes; mutations within the tracts occur more frequently than the genome-wide average; mutations generally increase with increasing tract length; length instability is a hallmark of mismatch repair-deficiency in cancers; and sequence polymorphism within the general population has been linked to phenotypic diversity (1115). Thus, mononucleotide repeats appear ideal for addressing the question of sequence-dependent mutagenesis since base pairs within the tracts are flanked by identical neighbors. Both historic and recent investigations concur with the conclusion that a major source of mononucleotide repeat polymorphism is the occurrence of slippage (i.e. repeat misalignment) during semiconservative DNA replication, which gives rise to the addition or deletion of repeat units (11,12). An additional and equally important source of mutation has recently been suggested to arise from errors in DNA replication by translesion synthesis DNA polymerases, such as pol η and pol κ (13), also on slipped intermediates, leading to single base substitutions.

A key question that remains unanswered in these studies and which is relevant to the issue of sequence context-dependent mutagenesis is whether all base pairs within mononucleotide repeats display identical susceptibility to single base changes and whether indels (which are consequent to DNA breakage) occur randomly within the tracts.

Herein, we combine bioinformatics analyses on mononucleotide repeat variants from the 1000 Genomes Project and cancer genomes with molecular dynamics simulations and hybrid quantum mechanics/molecular mechanics calculations to address the question of sequence-dependent mutagenesis within these tracts. We show that mutations along both A-tracts and G-tracts are highly non-uniform. Specifically, both base substitutions and indels occur preferentially at the first and last bp of A-tracts, whereas they are concentrated between the second and third G:C base pairs in G-tracts. These positions coincide with the most flexible base pairs for A-tracts and with the preferential localization of a ‘hole’ that results when one electron is lost due to an oxidation reaction anywhere along G-tracts. Thus, despite the uniformity of sequence composition, mutations occur in a sequence-dependent context at homopolymeric runs according to a hierarchy that is imposed by both local DNA structural features and long-range base–base interactions. We also show that the repair processes leading to base substitution must differ between A- and G-tracts, since in the former, but not in the latter, base substitutions occur predominantly in the direction of the base immediately flanking the tracts. Additional sequence-dependent patterns of mutation are likely to arise from studies of more heterogeneous sequence combinations, possibly involving other aspects intrinsic to the structure of DNA.

 

RESULTS

Mononucleotide repeat variation is defined by tract length and flanking base composition

We define mononucleotide repeats in the GRCh37/hg19 (hg19) human genome assembly as uninterrupted runs of A:T and G:C base pairs (hereafter referred to as A-tracts and G-tracts, respectively) from 4 to 13 base pairs in length (Figure 1A). We retrieved a total of 48,767,945 A-tracts and 13,633,781 G-tracts, both of which displayed a biphasic distribution with an inflection point between tract lengths of 8 and 9 (bp) and with the number of runs declining with length more dramatically for G-tracts than for A-tracts (Figure 1B), as noted previously (29). Both the number of short tracts and the extent of decline varied with flanking base composition, TA[n]T runs being two- to three-fold more abundant than CA[n]Cs (Supplementary Figure S1A) and AG[n]As declining the most rapidly (Supplementary Figure S1B). Thus, mononucleotide runs exist as a collection of separate pools of sequences in extant human genomes, each maintained at distinctive rates of sequence stability, as determined by factors such as bp composition (A:T versus G:C), tract length and flanking sequence composition.

Figure 1.

Mononucleotide repeat variation, evolutionary conservation and association with transcription. (A) The search algorithm was designed to retrieve runs of As or Ts (A-tracts) and Gs or Cs (G-tracts) length n (n = 4 to 13), along with their 5′ (n = 0) and 3′ (n = n + 1) nearest neighbors from hg19. Tract bases were numbered 5′ to 3′ with respect to the purine-rich sequence. The panel exemplifies the nomenclature for A- and G-tracts of length 4. (B) Logarithmic plot of the number of A-tracts (closed circles) and G-tracts (open circles) in hg19 as a function of length. (C) Normalized fractions of polymorphic tracts (F SNV) (number of SNVs divided by both hg19 number of tracts and n) from the 1KGP for A-tracts (closed circles) and G-tracts (open circles). (D) Radial plot of SNVs in the 1KGP at the 5′ and 3′ nearest neighbors of A-tracts. Periphery, tract length; horizontal axis, scale for the fraction of SNVs (F SNV). (E) Radial plot of SNVs in the 1KGP at the 5′ and 3′ nearest neighbors of G-tracts. (F) Percent difference in the numbers of A-tracts (closed circles) and G-tracts (open circles) between syntenic regions of hg19 and HN genomes. (G) The exponents of Benjamini-corrected P-values for A-tract-containing genes enriched in transcription-factor binding sites plotted as a function of A-tract length (triangles); each value represents the median of the top 11 USCS_TFBS terms. The percent A-tracts (closed circles) and G-tracts (open circles) intersecting genomic regions pulled-down by chromatin immunoprecipitation using antibodies against transcription factors are plotted as a function of tract length. (H) List of gene enrichment terms with a Benjamini-corrected P-value of <0.05 in common between genes containing A- and G-tracts of lengths 4–13, excluding the UCSC_TFBS terms.

 We examined the extent of sequence variation in the human population by mapping 38,878,546 single nucleotide variants (SNVs) from 1092 haplotype-resolved genomes (the 1000 Genomes Project, 1KGP) (30) to the hg19 A- and G-tracts. The normalized fractions of polymorphic tracts (F SNV) were greater for G-tracts than A-tracts and both displayed Gaussian-type distributions, with maxima of 0.067 for G-tracts of length 8 and 0.017 for A-tracts of length 9 (Figure 1C). CA[n]C and AG[n]A runs displayed the highest F SNV values for A- and G-tracts, respectively (Supplementary Figure S1C and D), with F SNV values for AG[n]As attaining ∼0.10 at length 8. We conclude that flanking base composition influences the rates of SNV within mononucleotide runs and, as a consequence, their representation in the reference human genome.

F SNV values at the flanking 5′ and 3′ bp were similar between A- and G-tracts, except for minor differences for the least represented (i.e. longest) tracts and did not exceed 0.02 (Supplementary Figure S1E). These fractions are expected to be greater than at more distant positions from the tracts, based on previous data (29). SNVs at G-tracts, but not at A-tracts, were more frequent than at flanking base pairs. F SNVs for base pairs flanking short (≤8 bp) tracts were at least twice as high as those flanking long tracts; F SNVs also displayed distinct sequence preference with most (∼0.1) variants occurring at Ts 3′ of G-tracts (Figure 1D and E). In summary, SNVs at mononucleotide runs do not increase monotonically with length but peak at 8–9 bp. This behavior mirrors the genomic distributions, both with respect to the total number of tracts (Figure 1B) and the subsets flanked by specific-sequence combinations (Supplementary Figure S1A–D). Variation at flanking base pairs also displayed a biphasic pattern centered at a length of 8–9 bp, with a greater chance of variation adjacent to G- than A-tracts and with characteristic sequence preferences.

Long tracts are evolutionarily conserved and associated with high transcription

To assess whether more variable monosatellite runs (Figure 1C) might have undergone a greater reduction in number in extant humans relative to extinct hominids, we compared the number of A- and G-tracts between syntenic regions of five individuals comprising hg19 and three Neanderthal (HN) specimens (31). The difference between hg19 and HN was very small (<±2%) for the short tracts, but it displayed more negative values in hg19 with increasing tract length, which reached a maximum of −11.8 and −32.7% for A- and G-tracts, respectively, of length 9. Beyond this threshold, the numbers of tracts converged for A-tracts, whereas they were more abundant in hg19 for G-tracts >11 bp (Figure 1F). In summary, the largest difference in the number of mononucleotide runs between hg19 and HN sequences was centered at 9 bp for both A- and G-tracts, suggesting that the length distributions (Figure 1A and Supplementary Figure S1A and B) reflect distinct rates of evolutionary gains and losses due to differential sequence mutability (Figure 1C) as a function of length and flanking sequence composition (12).

The fact that long (>9 bp) mononucleotide runs display low variability in the human population (Figure 1C) and sequence conservation during evolutionary divergence (Figure 1F) raises the possibility that they might serve functional roles. Through gene enrichment analyses, we found that genes containing A- and G-tracts were enriched for genes associated with the term ‘UCSC_TFBS’, which pertains to transcripts harboring frequent transcription factor binding sites (32,33). For A-tract-containing genes, the median P-values for the top 11 UCSC_TFBS terms decreased from 2.95E-26 for tracts of length 4 to 5.22E-241 for tracts of length 13 (Figure 1G). The percent of A-tracts intersecting genomic fragments amplified from chromatin immunoprecipitation using transcription-factor binding antibodies (32,33) also increased from 8.7 to 9.9 from length 6 to 13, whereas it was constant (mean ± SD, 22.4 ± 1.1) for G-tracts (Figure1G). For gene classes excluding ‘UCSC_TFBS’, a search for categories enriched at P < 0.05 and common to all A- and G-tract-containing genes returned a set of 25 terms, 22 of which were associated with high levels of tissue-specific gene expression (Figure 1H). In summary, these analyses extend prior work (14) supporting a role for mononucleotide tracts in enhancing gene expression, a function that for A-tracts appears to increase with increasing tract length.

Repeat variability is highly skewed

Next we addressed whether bp along A- and G-tracts display equal probability and type of variation. In the 1KGP dataset, the number of SNVs at each position along both A- and G-tracts of length 4 was within a two-fold difference (144,000–240,000); for both types of sequence, transitions (i.e. A→G and G→A) were the predominant (51–78%) type of base substitution (Supplementary Figure S2A and B). However, with increasing length, the number of SNVs decreased up to 30-fold more drastically for G-tracts than for A-tracts, with increasing numbers of transversions (A→T and G→C|T) being predominant. Normalizing the data for the number of tracts genome-wide revealed that the extent of SNV varied by up to 10-fold, depending upon tract length and bp position. Specifically, the highest degree of variation was observed at the first and last A within the A-tracts (i.e. A1 and An), which underwent up to 61% A→T and 43% A→C transversions, respectively, at length 9 (Figure 2A). Likewise, for G-tracts, the most polymorphic sites were G3, followed by G2, for mid-size tracts of 8–10 bp, with 44% G→C transversions at G3 for tracts of length 8 (Figure2B). Thus, the extent of SNV at mononucleotide runs is grossly skewed in human genomes, both along the sequence itself and across tract length, which must account for the bell-shape behavior in F SNV for the tracts as a whole (Figure 1C).

Figure 2.

Population variation spectra. (A) Variation spectra of A-tracts. Percent (number of SNVs at each position divided by the number of tracts in hg19 × 100) of A→T (black), A→C (red) and A→G (green) SNVs in the 1KGP dataset (left). Percent SNVs at A1 as a function of tract length (right). (B) Variation spectra of G-tracts. As in panel A with G→T (black), G→C (red) and G→A (cyan) (left). Percent SNVs at G3 as a function of tract length (right). (C) Percent A→T, A→C and A→G transitions at each position along A-tracts (stars) preceded and followed by a T (TA[n]T, left), C (CA[n]C), center) and G (GA[n]G, right) as a function of tract length. (D) Percent G→T, G→C and G→A transitions at each position along G-tracts (stars) preceded and followed by a T (TG[n]T, left), C (CG[n]C), center) and A (AG[n]A, right) as a function of tract length. (E) Percent transitions at base pairs (stars) preceding or following A-tracts (left) and G-tracts (right) as a function of tract length (n). *, mutated position.

We assessed whether SNV hypervariability was associated with specific combinations of nearest neighbors. For A-tracts flanked 5′ by a T, C or G, the highest percentage of SNVs was observed at A1 when preceded by a T, which reached 7.9% for TA[n] tracts of length 9 (Supplementary Figure S2C). By contrast, for 3′ T, C or G, the greatest effect was elicited by a C, with the highest percentage (7.1%) of SNVs at An for A[n]C tracts of length 9 (Supplementary Figure S2D). Therefore, flanking base pairs play a critical role both in the spectra and frequencies of SNVs at A-tracts. More detailed plots along A-tracts either preceded (Supplementary Figure S2E), followed (Supplementary Figure S2F) or preceded and followed (Figure 2C) by a T, C or G revealed the dramatic and long-range (up to 9–10 bp for the longest tracts, higher than the value of 4 bp predicted by mathematical models of slippage (11)) influence of flanking base pairs on variation spectra, in which up to 95% of the changes were in the direction of the base flanking the tract. Because the number of A-tracts preceded or followed by a specific base varies by up to three-fold (Supplementary Figure S2G), we conclude that for A-tracts, the overall mutation fractions and spectra are the result of at least three variables; length, position along the tract, and base composition of the 5′ and 3′ nearest-neighbors.

For G-tracts flanked 5′ by a T, C or A, high percentages (10–12%) of SNVs were observed at G1 for tracts preceded by a C, an effect that decreased with increasing tract length (Supplementary Figure S3A). This result, together with an exceedingly low number of G→A transitions at G1 for tracts not preceded by a C (Supplementary Figure S3C) relative to all tracts (Supplementary Figure S2B), is consistent with the known high mutability of CG:CG dinucleotides as a result of cytosine-5 methylation (9). The hypermutability at G2 was observed preferentially for tracts preceded by an A, and to a lesser extent T, whereas that at G3 was insensitive to flanking sequence composition. Likewise, G-tracts flanked 3′ by a T, C or A did not display marked sequence-dependent effects (Supplementary Figure S3B). Detailed plots of the SNV spectra along G-tracts either preceded (Supplementary Figure S3D), followed (Supplementary Figure S3E), or preceded and followed (Figure 2D) by a T, C or A revealed a noticeable effect only for 5′ T in association with G→T substitutions at G1for tracts of length ≥8. Thus, despite a consistent over-representation of G-tracts flanked 5′ by a T (Supplementary Figures S3F and S1B), which must account for the high absolute number of SNVs at G1 for TG[n] relative to AG[n] and CG[n] (Supplementary Figure S3G), nearest-neighbor base composition seems to play a lesser role in SNV spectra at G-tracts than at A-tracts.

With respect to SNVs at the flanking 5′ and 3′ nearest positions, no B→A or H→G substitutions (Figure 1A) were found above a length threshold of 9 for A-tracts and 8 for G-tracts (Figure 2E, gray shading) out of 5969 SNVs, implying that tract expansion by recruiting flanking base pairs is disfavored at these lengths. In summary, base substitution along mononucleotide repeats is strongly skewed towards the edges of A-tracts and within the 5′ half of G-tracts, with frequencies that peak at midsize lengths (8–9 bp). For A-tracts ≥7 bp, base substitution occurred almost exclusively in the direction of the flanking nearest-neighbors. Finally, base substitution at flanking bases did not contribute to tract expansion for mononucleotide runs longer than 8–9 bp.

Insertions and deletions display length and positional preference

In addition to SNVs, mononucleotide runs are polymorphic in length as a result of indels. Herein, we consider separately two types of indels: one in which tract length changes by ±1 and flanking bp composition is not altered (slippage); the other comprising all other cases involving the addition or removal of 1–200 bp (indels). Slippage is a widely accepted mutational mechanism (1112,34), whereby DNA replication errors at reiterated DNA motifs cause changes in the number of motifs (most often +/−1). The normalized fractions of slippage in the 1KGP dataset peaked at lengths of 8 bp for A-tracts and 9 bp for G-tracts (Figure 3A), generating bell-shaped curves similar to those observed for SNVs (Figure1C) and with no differences in the highest fraction of ‘slipped’ tracts, which peaked at ∼0.02. By contrast, +1 slippage occurred more frequently than −1 slippage at A-tracts (Figure 3B). These results support recent studies on microsatellite repeats (12) and contrast with previous conclusions that slippage increases monotonically with tract length, and that the extent of slippage differs between A- and G-tracts (35,36).

Figure 3.

Population insertions and deletions. (A) Normalized fractions of A-tracts (closed circles) and G-tracts (open circles) displaying +/−1 bp slippage in the 1KGP dataset as a function of tract length. Data were obtained by dividing the number of events by both the number of hg19 tracts and tract length (n). (B) Ratio of the number of +1 to −1 slippage for A-tracts (closed circles) and G-tracts (open circles). (C) Indels at A-tracts. For positions along the tracts (‘Tract’), ‘F Indel’ is the ratio between the number of indels and the number of tracts in hg19 multiplied by tract length. For the positions immediately flanking the tracts genomic coordinates (‘Before tract’ and ‘After tract’), ‘F Indel’ is the ratio between the number of indels and the number of tracts in hg19. (D) Indels at G-tracts, calculated as described in panel C. (E) Heatmap representation of insertions along A-tracts. The percent insertions (i.e. the number of insertions at each position divided by the number of tracts in hg19) (y-axis) plotted as a function of location (x-axis) from position 0 (insertion between the bp 5′ to the tract and the first bp of the tract) to position n + 1 (insertion between the bp 3′ to the last bp of the tract and the following bp) (see Figure 1A) and as a function of tract length (z-axis). (F) Heatmap representation of insertions along G-tracts.

With respect to indels, the normalized fractions were low (<1 × 10−3) along short (4–6 bp) A- and G-tracts, but rose to a plateau for longer tracts as reported earlier (11); this plateau was 10-fold higher for G-tracts (∼0.03) than for A-tracts (∼0.003) (Figure 3C and D). Indels also occurred more frequently (up to six-fold for A-tracts of length 11) at nearest-neighboring base pairs (‘Before tract’ and ‘After tract’ in Figure 3C and D) than along the tracts. Thus, contrary to SNVs and slippage, indels increased to a plateau with mononucleotide tract length.

We analyzed in detail the locations of insertions along the tracts and the flanking positions with respect to the 5′ to 3′ orientation of the tracts (Figure 1A). The normalized fractions demonstrated that insertions peaked at the 3′, and to a lesser extent 5′, ends of the longest A-tracts (Figure 3E), but remained low. For G-tracts, insertions occurred most efficiently at two locations (G2–3 and G5) (Figure 3F), they increased with tract length (up to ∼0.04), and attained ∼10-fold higher values than for A-tracts. In conclusion, insertion sites at A- and G-tracts followed the patterns observed for SNVs (Figure 2A and B), suggesting that factors associated with local DNA dynamics sensitize specific bases along the tracts to genetic alteration, inducing both SBS and indels.

Base pair flexibility and charge localization map to sites of sequence changes

To elucidate elements of intrinsic DNA dynamics that may be responsible for the biases in SNV and insertion sites, we performed molecular dynamics (MD) and hybrid quantum mechanics/molecular mechanics (QM/MM) simulations on model A[6], A[9], G[6] and G[9] duplex DNA fragments. We focused on water bridge coordination (Figure 4A), bp step flexibility, and for the G[6] and G[9], charge localization, as these properties are known to impact the susceptibility of DNA to base damage, repair and mutation. The fractions of one water coordination increased along the A[9] and A[6] structures in a 5′ to 3′ direction, irrespective of flanking sequence composition, in concert with a decrease in minor groove width (Figure 4B and Supplementary Figure S4A) as predicted (37). Vstep, a measure of bp structural fluctuation, displayed a prominent peak of ∼40 Å3deg3 at the 5′-TA-3′ step for both structures (Figure 4C and Supplementary Figure S4B), which together with low water occupancy points to 5′-TA-3′ being a preferred location for base modification and mutation. In the G[9] and G[6] structures water coordination involved mostly two-water bridges due to wide (∼14 Å) minor grooves (Figure 4Dand Supplementary Figure S4C), whereas flexibility was modest (∼20–22 Å3deg3, Figure 4E and Supplementary Figure S4D). Thus, bp dynamics are likely to impact mutations at A-tracts to a greater extent than at G-tracts. Guanine has the lowest ionization potential (IP) of all four bases and IP further decreases at guanine runs, rendering them targets for electron loss, charge localization, oxidation and eventually mutation (4,38). Because after electron loss the ensuing charge (hole) can migrate along the DNA double-helix and relocalize at specific guanines, we addressed whether the preferred sites of mutation along G-tracts, i.e. G2–3 and G5, would also be preferred sites for charge localization. The QM/MM determinations indicated that whereas for the short G[6] fragment the difference in the density-derived atomic partial charges (DDAPC) (i.e. the hole) localized most often (∼50%) to the first position (Figure 4F), for the long G[9] fragment charge localization shifted downstream (mostly to the second, but also to positions 6–7, Figure 4G). Importantly, the charge was found exclusively around the guanine rings (Figure 4H). Thus, the two main sites of sequence change along G-tracts, i.e. G2–3 and G5, coincide with positions where charge localization and hence one-electron oxidation reactions is predicted to occur most frequently. In summary, bp flexibility at A-tracts and charge transfer at G-tracts likely represent intrinsic DNA features underlying the bias in SNV and insertions at mononucleotide runs in human genomes.

Figure 4.

MD and QM/MM simulations. (A) Molecular modeling of one (left) and two (right) minor groove water bridge coordination. (B) Fraction of one-water bridge occupancy (left axis) at A[9] DNA sequences flanked 5′ and 3′ by a T (black circles), C (red circles) or G (green circles). Minor groove widths (right axis), as determined from intrastrand phosphate-to-phosphate distances. (C) Vstep for A[9] DNA sequences, determined as the product of the square root of the eigenvalues (λi) described by the six bp step parameters shift, slide, rise, tilt, roll and twist; i.e. Vstep=6i=1λi−−√. (D) Fraction of one- (black circles) and two-water (red circles) bridge occupancy (left axis) at G[9] DNA sequences. Minor groove widths (right axis), as assessed from intrastrand phosphate-to-phosphate distances. (E) Vstep for G9 DNA sequences. (F) Average charge redistribution (open circles and right axis) for G[6] DNA structures upon vertical ionization, examined by calculating the difference on the density-derived atomic partial charges (DDAPC) for the neutral and negatively charged states. Histogram of the number of instances (left axis) in which the largest charge redistribution occurred at a specific position along the G[6] structures. (G) DDAPC for G[9] DNA structures (open circles and right axis) and histogram of the number of instances (left axis) in which the largest charge redistribution occurred at a specific position. (H) VMD rendering of a G[9] DNA structure displaying hole localization at G2. Capped base pairs were removed for clarity.

Position and orientation along nucleosome core particles modulate sequence variation

DNA wrapped around histones in nucleosomes is subject to local deformation (39), which may impact mutation. Thus, we analyzed the 1KGP SNVs at A- and G-tracts predicted to overlap with well-positioned nucleosome core particles (NCPs) (16). In hg19, the percentage of tracts that overlap with NCPs decreased moderately from ∼90% at length of 4 to 81% and 71% for A- and G-tracts of length 13, respectively (Figure 5A), suggesting that mononucleotide runs are not depleted in NCPs in human genomes as previously proposed (40). A-tracts of lengths 4–8 base pairs displayed distinctive peaks along the NCP surface in phase with the helical repeat of DNA (10.5 bp) and with minor grooves facing toward the inner protein core (lengths 4–5) (16) (Figure 5B and Supplementary Figure S5A). A-tracts of length of 9–13 bp exhibited only half (six) the peaks evident for the shorter tracts. For the G-tracts, only small peaks with no clear minor groove-inward-facing regions were detected (Supplementary Figure S5B).

Figure 5.

Positioning along nucleosome core particles. (A) Percent of A-tract (open circles) and G-tract (closed circles) base pairs in hg19 overlapping with well-positioned NCP genomic coordinates as a function of tract length. (B) Counts of base pairs in hg19 A-tracts of length 5 overlapping with NCPs genomic regions as a function of distance from the histone octamer dyad axis. Minor groove-inward-facing regions (gray) were derived from the X-ray crystal structure of NCP147 (41). (C) Percent SNVs in the 1KGP dataset (left axis) at every bp along A-tracts of length 5 for tracts centered at maxima (black) and minima (gray) along NCPs (Figure 5B). Percent increase (right axis) of SNVs at minima relative to maxima (green). P-values for paired t-tests: 0.013 (*), 0.002 (**) and 4.7 × 10−6 (***). (D) Whisker plots of%SNVs (left axis) at A1 for A-tracts of length 5 centered at maxima and minima (black) along NCPs (Figure 5B). Percent difference (right axis) in the number of A-tracts of length 5 in hg19 preceded by C, T or G (red) between those centered at minima and those centered at maxima (Figure5B). (E) C-containing/G-containing ratios (see text) for G-tracts of length 5 in hg19 as a function of distance from the NCP dyad axis (black) and location of core histones (maroon and green). Peaks correspond to negative iSAT (i.e. tilt parameters multiplied by the corresponding sin θ) values (gray) (39). Ratios of%SNV at G1 (upshifted by 0.5 for clarity) between C-containing (5′-CCCCCG-3′ sequences on the hg19 forward strand) and G-containing (5′-CGGGGG-3′ sequences on the hg19 forward strand) (Figure 1A) CG[5] tracts mapping NCP Chip-seq genomic intervals (red) fitted by a non-parametric local regression (loess; sampling proportion, 0.100; polynomial degree, 3). (F) VMD rendering (top) of TATTT residues 34–38 (yellow) and the complementary AAATA residues 672–753 (pink) from the 1EQZ pdb nucleosomal crystal structure, corresponding to peak area from −40 to −36 in Figure 5E. The switch in G-tract (lengths of 5 and 7) orientation along NCPs (bottom) serves to position the C-containing strand on the outside (yellow) and, correspondingly, the G-containing strand on the inside (pink).

 To assess if tract-positioning along NCPs influences SNVs, we selected A-tracts of lengths 5, 7 and 9 bp and G-tracts of lengths 5 and 7 bp whose central positions coincided with either the maxima or minima (41) (Figure 5B and Supplementary Figure S5A and B) and conducted pair-wiset-tests (330 total) between permutations of ‘categories’, including ‘tracts centered at maxima versus minima’, ‘position along the tracts’, ‘flanking sequence composition’, ‘specific NCP locations’ and ‘tract orientation’. For A-tracts, 79/207 (38%) significant pairs were found, 68 (86%) of which were related to differences between tracts centered at maxima versus minima, with a preponderance (63%) of tests displaying increased %SNVs at minima (Supplementary Figure S5C and E). For example, %SNVs at length 5 bp were greater at minima than at maxima at each position along the A-tracts (Figure 5C). A→C substitutions at A1 were more abundant at maxima than at minima (mean ± SD, 18.7 ± 0.7% at max and 17.6 ± 0.8% at min; P-value 0.001), whereas A→T substitutions at the same position displayed the opposite trend (mean ± SD, 18.4 ± 0.5% at max and 19.8 ± 1.1% at min; P-value 0.0005) (Figure 5D). A-tracts of length 7 also exhibited a similar pattern at A7 (Supplementary Figure S5H). The percentages of CA[5] and A[7]C tracts in hg19 centered at maxima were greater than at minima and the reverse was observed for the TA[5] and A[7T] tracts (Figure 5D and Supplementary Figure S5H). Thus, we conclude that positioning along the NCP surface of both the double-helical grooves and junctions with flanking base pairs influence SNVs along A-tracts. However, this influence is complex and for the most part, difficult to predict.

For G-tracts, most pairwise comparisons (18/34, 53%) indicated SNV variation according to sequence orientation (Supplementary Figure S5F and G). In hg19, the ratio of the numbers of G-tracts of lengths 5 and 7 for which the C-containing strand coincided with the forward sequence (downstream example sequence in Figure 1A) to the numbers of G-tracts for which the G-containing strand coincided with the forward sequence (upstream example sequence in Figure 1A) (C-containing/G-containing ratios) displayed a prominent 10.5-bp oscillation in phase with iSAT (Figure 5E), a measure of ‘inside’ and ‘outside’ bases, according to the bp step tilt parameter (39). Analysis of the helical path of a 146-bp DNA fragment wrapped around histones showed that the oscillation in the C-containing/G-containing ratios corresponds to a preference for guanine bases to face the protein core (Figure 5F). We analyzed the subset of G-tracts preceded by a 5′ C (i.e. CG[5]) to assess whether SNVs at G1, the position known to be mutable due to CpG methylation also oscillated with the C-containing/G-containing ratios. Oscillation in SNV-C-containing/SNV-G-containing values was evident, with peaks aligning to the hg19 troughs (Figure 5E) implying that the cytosines facing the protein surface harbor more variants than those facing away. We conclude that A- and G-tracts display preferential positioning (the former) and orientation (the latter) along NCPs, which in turn modulate the rate of sequence variation.

Mutations associated with human disease

Knowing that the first and last As of long A-tracts and G2–3 in G-tracts are the major sites of SNV in the human population, we addressed whether these features are also discernible in mutated mononucleotide tracts associated with human genetic disease. We collected 9,450,456 unique SBSs (both SBSs and SNVs refer to single base changes) from sequenced cancer genomes and normalized the percent mutations along A- and G-tracts to enable a direct comparison with the 1KGP dataset. For A-tracts (Figure 6A and Supplementary Figure S6A), SBSs displayed the same trend as the 1KGP data (Figure 2A) with respect to the bell-shape increase in mutations at A1 and An and the mutation spectra, although the susceptibility to mutation as a function of tract length attained greater values (6.36% for length 11 in cancer versus 4.15% for length 9 in the 1KGP datasets at A1). The first and last 3 bp also harbored more SBSs than in the 1KGP dataset for tracts >7 bp, a feature that we found to be due exclusively to a large cancer dataset (42) containing high-level microsatellite instability (MSI) samples (Supplementary Figure S6B and C), which are known to result from mismatch-repair deficiency (15). Thus, A-tracts display similar patterns of base substitution between the germline and somatic cancer tissues. For G-tracts, mutation spectra were characterized by G→T transversions at tract lengths >7, particularly at G1, the most frequently mutated position for tracts lengths up to 11 bp (Figure 6B and Supplementary Figure S6D). This trend persisted even when the high rates of methylation-mediated deamination mutations at the CG dinucleotide were removed (Supplementary Figure S6E). Thus, mutation patterns in cancer genomes contrast with those observed in the germline, both with respect to the most mutable position (G1 versus G2–3) and the types of base substitution (G→T in cancer genomes versus G→T and G→C in the germline).

Figure 6.

Mutation patterns in cancer genomes. (A) Mutation spectra for SBSs at A-tracts. Percent values were obtained by dividing the total number of SBSs at each position by the number of tracts in hg19 and then multiplying by 3.2516 to equalize the percentage of A-tracts of length 4 between the cancer genomes and the 1KGP datasets. (B) Mutation spectra for SBSs at G-tracts in cancer genomes. Percent values were obtained as in (A) using a multiplication factor of 3.7419. (C) Normalized fractions of A-tracts (closed circles) and G-tracts (open circles) displaying +/−1 bp slippage, obtained by dividing the number of events by both the number of tracts in hg19 and tract length. (D) Indels at A-tracts, calculated as described in Figure 3C. (E) Indels at G-tracts, calculated as described in Figure3C. (F) Heatmap representation of insertions along G-tracts, as described in Figure 3E.

 With respect to slippage, the fractions for A-tracts elicited an excess at lengths 9 and 10 bp relative to the 1KGP dataset, which was also due to the MSI-containing dataset. For G-tracts, the fractions peaked at length 8, as for the 1KGP dataset (Figures 3A and 6C), implying that the propensity to undergo slippage is indistinguishable between the germline and soma. Indels were also more abundant at flanking base pairs than along the tracts (Figure 6D and E), particularly for G-tracts of length >7, similar to the 1KGP dataset (Figure 3C and D). Detailed analyses of insertions revealed that both G1 and the preceding position were the most significant sites of mutation (F-values up to 0.08 at G1 for tracts of length 8) (Figure 6F). Thus, the 5′ end of long G-tracts is the most susceptible site for both SBSs and insertions in cancer genomes, in contrast to the germline where these occur within the runs, typically at G2–3.

We also extracted the mutated A- and G-tracts from the Human Gene Mutation Database (HGMD), a collection of >150,000 germline gene mutations associated with human inherited disease. A total of 1519 genes were mutated at A- or G-tracts out of a total of 3972 (38%); 3480 SBSs and 2866 slippage events were noted within these tracts, 85 and 46% of which were predicted to be disease-causing, respectively (Figure 7A and Supplementary Table S1). Ranking genes by the number of literature reports indicated that among the top 10 entries three were associated with cancer (BRCA1, BRCA2 and APC), two with hemophilia (F8 and F9), four with debilitating lesions of the skin (COL71A), muscle (DMD), lung (CFTR) and kidney (PKD1), with one causing hypercholesterolemia (LDLR) (Figure 7B). Thus, mutations within A- and G-tracts carry a high social burden by contributing to some of the most common human pathological conditions.

Figure 7.

Mutation patterns in HGMD and model for sequence context-dependent changes. (A) Number of germline SBSs and slippage events (Slip.) at A- and G-tracts in HGMD. Gene alterations were classified as disease-causing mutation (DM), likely disease-causing mutation (DM?), disease-associated and putatively functional polymorphism (DFP), disease-associated polymorphism with additional supporting functional evidence (DP) and invitro/laboratory orinvivo functional polymorphism (FP). Codon changes (SIFT predictor) were classified as damaging (d), null (n), tolerated (t) and low-confidence prediction (l). (B) The 10 most commonly reported genes in HGMD with mutations at A- and G-tracts. Various mutated tracts were generally reported for the same gene in different reports. (C) Mutation spectra for SBSs at A- (left) and G-tracts (right) in HGMD. Percent values were obtained by dividing the total number of SBSs at each position by the number of tracts in hg19 exons. A|G→T (black), A|G→C (red), A→G (green), G→A (cyan). (D) Normalized fractions of A-tracts (closed circles) and G-tracts (open circles) displaying +/−1 bp slippage, obtained by dividing the total number of events by the number of tracts in hg19 exons and by tract length. (E) Model for sequence context-dependent changes at A-tracts (left) and G-tracts (right). *, site of base modification.

 For both A- and G-tracts, SBSs occurred mostly at tract lengths of 4–7, with patterns more similar to those in the 1KGP than in the cancer datasets, both with respect to the location of the most mutable positions (first and last As and first/second Gs) and the types of base substitution (A→T and G→H) (Figure 7C and Supplementary Figure S6F). Likewise, slippage events peaked at tract lengths of 7–9 as observed in the 1KGP dataset (Figure 7D). In summary, the patterns of both SBSs and slippage in the HGMD dataset followed the trend observed in the 1KGP dataset, suggesting that germline variants at mononucleotide repeats leading to either population variation or human inherited disease may have arisen through similar mechanisms.
DISCUSSION

Why are specific A:T and G:C base pairs within A- and G-tracts more susceptible to sequence changes than their identical neighbors? For A-tracts, bp flexibility may play a role. Chemical damage to DNA, such as by hydroxyl radicals has been shown to be proportional to the geometrical solvent-accessible surface of the atomic groups, which increases with DNA flexibility (43). Along A-tracts flexibility is restricted, but it is high at both the 5′ and 3′ junctions. Thus, the fact that the highest rates of mutation coincide with the highest degree of flexibility at the 5′-TA-3′ bp step is consistent with the view that this position may be susceptible to DNA damage as a result of flexibility. Other sources of DNA dynamics are also likely to be relevant, such as sugar flexibility at the junctions, which increases with tract length (44). Chemical modification at these junctions may then lead to base substitution and indels, the latter as a result of strand breaks.

With respect to SNV mutation spectra, these were found mostly in the direction of flanking base composition above a length of 7–8 bp. We interpret this behavior in terms of DNA slippage along A-tracts when attempts are made during translesion synthesis (TLS) to bypass a damaged site (Figure 7Ei). Two scenarios may be considered to account for A→T transitions at A1. In the first, the last tract-template base would loop out into the polymerase active site permitting base-pairing and strand elongation (Figure 7Eii) using the tract-flanking base as a template (34,4546). In the second (Figure 7Eiii), slippage would occur behind the polymerase, prompting extension past the newly created A*:T mispair generated by primer/template misalignment. Either pathway would yield a common intermediate (Figure 7Eiv) that contains the base complementary to the junction across from the damaged site upon slippage resolution (34). Following DNA synthesis (S) and/or repair (R) (Figure 7Ev and vi), this mispair will generate a base change that is always identical to the tract-flanking base.

For G-tracts, the high rates of G→T transversions at G1 in cancer genomes are also consistent with preferred chemical attack at this site due to high flexibility (Figure 7F top). Direct chemical attack at a guanine is known to result in stable products, such as 8-oxo-G and Fapy-G, both of which are known to yield G→T transversions (4750). Thus, G1 may be the most susceptible site for such reactions for G-tracts of lengths ≥7 (Figure 7Fright), which in cancer genomes would become a mutation hotspot. In the germline, SNVs peaked inside G-tract base pairs, while mutational spectra were insensitive to flanking base composition; these events are inconsistent with a role for template misalignment and slippage as noted for A-tracts. Rather, the correspondence between hotspot mutations at G2–3 and G5 and the QM/MM simulations suggest a role for charge transfer. A large body of work during the past 20 years using computational, theoretical chemistry and biophysical techniques on short oligonucleotides, has shown that guanine is the most easily oxidizable base in DNA and that indeed a guanine radical cation can be generated through long-range hole transfer from an oxidant via one-electron oxidation mechanisms (5155). GGG triplets were found to act as the most effective traps in hole transfer by both experimental and theoretical work (5659), demonstrating that the resulting guanine radical cation (or its neutral deprotonated form) became rather delocalized, but it preferentially centered at the first and second G. These well-established patterns of chemical reactivity are consistent with our experimental observation of high mutation frequencies at G1 for short G-tracts and the results from QM/MM simulations on G6. For longer tracts, the downstream shift in mutation hotspots, i.e., G2–3 and G5, also correlate well with the charge localization predicted from QM/MM simulations, which explicitly included solvent effects and structural fluctuations. Thus, in conjunction with the constrained density functional theory (60), both the neutral and oxidized forms of a guanine nucleobase can be reliably constructed to infer the accurate determination of mutational patterns of mononucleotide repeats in human genomic DNA.

The compact organization of the sperm genome (61), and presumably low levels of oxidative stress in the germline, may enable guanine oxidization through one-electron oxidation reactions rather than by direct chemical attack, thereby favoring the formation of radical cations. A charge injected at G1 by electron loss would then migrate to neighboring guanines and localize at sites of low IP, such as G2 (Figure 7F left). Guanine radical cations are known to readily undergo further chemical modification leading to products such as 8-oxo-G, oxazolone, imidazolone, guanidinohydantoin, and spiroiminodyhydantoin (62) (M in Figure 7F), to yield G→T, G→C and G→A substitutions (4,63). Our model is in line with recent observations in which mutations at guanines within short G-runs (1–4 bp) correlate with sequence-dependent IPs at the target guanine in cancer genomes (9). Interestingly, these correlations were not observed in the germline (9). We interpret these composite observations as follows. The IP values for G-runs have been shown to decrease asymptotically with tract length, although the absolute values vary according to the methods and assumptions used (we obtained a value of 5.43 eV for both G[6] and G[9]) (64,65). We suggest that short G-runs with high IPs undergo one-electron oxidation reactions in the oxidative environment of cancer cells but would be refractory to such a mechanism in the germline (Figure 7Fright yellow and left white sectors). As length increases and IP values fall, G-runs would be attacked directly by oxidants abundant in tumor cells (Figure 7F orange sector), whereas oxidation will be limited to electron loss in the germline environment (Figure 7F left yellow sector).

These models (template misalignment for A-tracts and charge transfer for G-tracts) suggest a more complex scenario for mechanisms underlying mononucleotide repeat polymorphism in the human population than recently proposed (13), in which nucleotide misincorporation by error-prone polymerases is proposed as a primary source of mutations at both A- and G-tracts. As already stated, the directionality of SNVs toward tract-flanking bases in A-tracts and the hotspot mutations at G2–3, supports multiple and distinct mechanisms of base substitution at mononucleotide repeats.

Our analyses highlight additional information, including the lack of mutations in the direction of tract-base composition for base pairs flanking long tracts, the association with gene expression and the preference of guanines for the inner NCP surface, and extend prior observations (12) such as the bell-shape character of base substitution and slippage, whose mechanisms remain to be fully clarified. Finally, we document the contribution of mononucleotide mutagenesis to key aspects of human pathology beyond the well-established MSI instability in cancer (15), including hemophilia and tissue degeneration. Our collective work supports the conclusion that as the human genome undergoes evolutionary diversification and along the way suffers disease-associated mutations, oxidation reactions including charge transfer may play a prominent role.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

 

 

Mutation analyses and prenatal diagnosis in families of X-linked severe combined immunodeficiency caused by IL2Rγ gene novel mutation

, , , ,

Genet. Mol. Res. 14 (2): 6164 – 6172   DOI: 10.4238/2015.June.9.2
Severe combined immunodeficiency diseases (SCIDs) are a group of primary immunodeficiency diseases characterized by a severe lack of T cells (or T cell dysfunction) caused by various gene abnormalities and accompanied by B cell dysfunction (WHO, 1992; Buckley et al., 1997). The incidence rates in infants were 1/75,000-1/10,0000 (WHO, 1992), but no morbidity statistics are available in China. The 2 genetic modes of SCID include X-linked recessive and autosomal recessive genetic inheritance. X-linked severe combined immunodeficiency (X-SCID) is the most common form, accounting for 50-60% of SCID cases (Noguchi et al., 1993). Immune system abnormalities in patients with X-SCID include T-B+NK-, in which T cells (CD3+) and natural killer (NK) cells (CD16+/CD56+) are absent or significantly reduced, and the number of B cells (CD19+) is normal or increased, causing reduced immunoglobulin production and class switching disorder (Buckley, 2004; Fischer et al., 2005). The IL- 2Rg gene mutation has been confirmed to be a major cause of X-SCID (Noguchi et al., 1993). In recent years, great progress has been made in understanding the pathogenesis of primary immunodeficiency disease and its application in clinical treatment, particularly regarding the development of critical care medicine and immune reconstruction technology. With timely control of infection and early bone marrow or stem cell transplantation, X-SCID patients can be treated, prolonging survival time. Therefore, early diagnosis of X-SCID is very important for patient treatment. Gene diagnosis has become a better early diagnosis or differential diagnosis method. In addition, familial X-SCID brings a great psychological burden to the relatives of patients. Ordinary chromosome analysis and immunological evaluation cannot be used for female carrier identification and fetal diagnosis, and gene diagnosis is the most effective method of carrier detection and prenatal diagnosis. In this study, we detected mutations in 2 families with X-SCID and identified 2 novel mutations, confirming the X-SCID pedigrees. Prenatal diagnosis was performed for the pregnant fetus in the mother of one of the probands based on gene diagnosis. Female individuals in this family were subjected to carrier detection.
IL2Rg gene mutation test Direct sequencing of 1-8 exons and the flanking region of the IL2Rg gene by PCR in family 1 showed that the 3rd exon of the proband contained the c.361-363delGAG heterozygous deletion mutation, which led to deletion of the 121st amino acid glutamate (p.E121del) in its coding product. There were no sequence variations in other coding regions or in the shear zone. The proband’s mother carried the same heterozygous mutation, while his father did not carry the mutation site (Figure 2a, b, c). This mutation was not observed in any cases of the control group, and this family was identified as an X-SCID family. The c.510-511insGAACT insertion heterozygous mutation was present in the 4th exon of the proband’s mother in family 2. This mutation was a 5-base repeat of GAACT, resulting in a change in amino acid 173 from tryptophan into a stop codon (p.W173X). While there were no sequence variations in other coding regions or in the shear zone, the patient’s father did not carry the mutation (see Figure 2d, e). We did not find this mutation in the healthy control group. We presumed that the 4th exon of the deceased child in family 2 contained the c.510-511insGAACT insertion mutation, leading to X-SCID symptoms, and thus we speculated that this family was an X-SCID pedigree. Prenatal diagnosis We verified the chorionic villus status of the fetus in family 1 using the PowerPlex 16 HS System kit. The results of prenatal diagnosis showed that the fetal tissue contained no maternal contamination and that this fetus was female. The results of prenatal diagnosis showed that there was no c.361-363delGAG (p.E121del) heterozygous mutation in the female fetus of family 1.
Figure 2. Sequencing graph of IL2Rg gene in 2 pedigrees with X-chain severe combined immunodeficiency. a.-c. Family 1. a. Normal control (rectangle indicates 3 edentulous bases of this patient). b. Proband carrying the c.361- 363delGAG (p.E121del) mutation (arrow indicates deletion of fragment connection sites). c. The proband’s mother contained a c.361-363delGAG (p.E121del) heterozygous mutation (arrow). d.-e. Family 2. d. The proband’s mother carried the c.510-511insGAACT (p.W173X) heterozygous mutation (arrow indicates that the reverse sequencing graph was positive). e. Normal control (rectangular box indicates 2 normal copies of GAACT (the mutation fragment was 3 copies). Carrier detection results For the c.361-363delGAG (p.E121del) site, the gene analysis results of the female individual in family 1 showed that I2 (proband’s grandmother) was a heterozygous carrier and that II3 (proband’s aunt) was a non-carrier and had no mutations.
IL-2 can combine with the IL-2 receptor (IL-2R) of the immune cell membrane. IL-2R is composed of 3 subunits, including the IL-2Ra chain (CD25), IL-2Rb chain (CD122), and IL- 2Rg chain (CD132). IL-2Rg functional units in common with IL-4, IL-7, IL-9, IL-15, IL-21, and other cytokine receptors, and these regions are referred to as the total chain (Li et al., 2000). The IL-2Rg chain can maintain the integrity of the IL-2R complex and is required for the internalization of the IL-2/IL-2R complex; it is also the link that contacts the cell membrane surface factor region and downstream cell signal transduction molecules. Therefore, the integrity of the IL-2Rg chain is vital for the immune function of an organism (Malka et al., 2008; Shi et al., 2009).
Mutations in the IL2Rg gene, which encodes IL-2Rg, were identified to be a major cause of X-SCID in 1993 (Noguchi et al., 1993). The IL2Rg gene is located on chromosome X q21.3-22, is 37.5 kb length, and contains 8 exons, which encode 369 IL-2Rg amino acids. The IL2Rg chain exhibits varying structural regions, such as the signal peptide [amino acids (AA) 1-22], extracellular domain (AA 23-262), transmembrane region (AA 263-283), and intracellular region (AA 284-369). The WSXWS motif is located in the extracellular region (AA 237-241), while Box 1 is located in the intracellular region (AA 286-294).
By the end of 2013, the Human Gene Mutation Database contained a total of 200 mutations in the IL2Rg gene (HGMD Professional 2013.4). The most common mutation types in the IL2Rg gene were the missense or nonsense mutations, which result from single base changes. A total of 100 missense or nonsense mutations have been identified, followed by insertion or deletion mutations in a total of 50 species. The 3rd most common type of mutations includes shear mutations in approximately 30 species. Eight exons contained mutations, and mutations in 3rd or 4th exons were the highest, accounting for a total mutation rate of 43% (86/200). According to the X-SCID gene database (IL2RGbase) (http://research.nhgri. nih.gov/scid/), the gene mutations in IL2Rg mainly occurred in the extracellular region of the IL2Rg chain (Fugmann et al., 1998). Zhang et al. (2013) reported that the IL2Rg gene mutations in 10 patients with X-SCID in China were located in the extracellular region. Two mutations reported in our study were also located in the extracellular region. The mutation of IL2Rg gene in family 1 was a codon mutation in the 3rd exon, resulting in a 3-base deletion. The c.361-363delGAG (p.E121del) mutation was located in the extracellular area of the IL- 2Rg subunit, and we inferred that the 121 glutamate deletion caused by the mutation would lead to changes in the structure of the peptide chain, affecting signal transmission and resulting in serious symptoms. The mutation of family 2 was a GAACT repeat of ILR2g gene; this repeat of 5 bases resulted in 173 codon changes from tryptophan into a stop codon. Generation of the peptide chain with the mutation lacked 196 amino acids compared to the normal chain, including the intracellular, transmembrane, and some extracellular regions, directly affecting the structure and function of receptors and causing disease. No studies have been reported regarding these 2 mutations. We combined with the mutation characteristics and clinical manifestations and diagnosed family 1 as X-SCID pedigrees. Although the patient in family 2 was deceased, it can be speculated that the 2 deceased patients in family 2 were X-SCID pedigrees caused by c.510-511insGAACT (W173X).
Prenatal diagnosis can accurately identify fetal situations and be used to avoid birth defects, which can also ease the anxiety of the pregnant mother. Gene diagnosis for pedigrees of patients based on DNA samples has advanced recently, particularly with the application of high-throughput sequencing technology (Alsina et al., 2013). We can now perform gene analysis for varied clinical infectious diseases for differential diagnosis. However, the effectiveness of prenatal diagnosis for pedigrees in which the proband is dead remains unclear. Because the gene mutations in the proband is unknown in these cases, the patient’s situation was only inferred by his mother’s genotypes. However, we considered that for the deceased, if we can define the mother was a pathogenic gene carrier, even if the proband is not X-SCID, the woman also has a risk of having X-SCID children and this pedigree may be X-linked recessive inheritance. Prenatal diagnosis may provide a choice for preventing the birth of patients in these families in the premise of informed consent.
Gene diagnosis of IL2Rg can also be used for carrier detection of suspected females in the family.
In the present study, we performed carrier detection of the patient’s grandmother and aunt in family 1 and determined that the patient’s pathogenic mutations were from his grandmother. His aunt did not inherit the pathogenic gene, and thus she was a non-carrier and her fertility will not be affected. In this study, we used direct sequencing of PCR products and identified IL2Rg gene mutations in 2 pedigrees with X-SCID. We found 2 unreported mutations in the IL2Rg gene, and prenatal diagnosis and carrier detection were conducted in 1 X-SCID family. Because the incidence rate of X-SCID is extremely low, it is difficult to promote the widespread use and application of genetic diagnosis. However, this study may provide some implications for the diagnosis of infants with immunodeficiency, and gene diagnosis techniques such as conventional or high-throughput sequencing should be used as soon as possible during pregnancy, which can be used to guide treatment. This method can also provide reliable prenatal diagnosis and carrier detection service for these families.
MEF2A gene mutations and susceptibility to coronary artery disease in the Chinese population
J. Li1 , H.-X. Chen2 , J.-G. Yang3 , W. Li3 , R. Du3 and L. Tian3       DOI http://dx.doi.org/10.4238/2014.October.20.15
Coronary artery disease (CAD) has high morbidity and mortality rates worldwide. Thus, the pathogenesis of CAD has long been the focus of medical studies. Myocyte enhancer factor 2A (MEF2A) was first discovered as a CAD-related gene by Wang (2005) and Wang et al. (2003, 2005). Three mutation points in exon 7 of MEF2A were subsequently identified by Bhagavatula et al. (2004); however, Altshuler and Hirschhorn (2005) and Weng et al. (2005) predicted that the MEF2A gene lacked mutations. Zhou et al. (2006a,b) analyzed the mutations and polymorphisms in exons 7 and 11 of the MEF2A gene in the Han population in Beijing, and various rare mutations were found in exon 11 rather than in exon 7. The clinical significance of specific 21-bp deletions in MEF2A was also explored, and previous studies have shown mixed results. In this study, polymerase chain reaction-singlestrand conformation polymorphism (PCR-SSCP) and DNA sequencing were used to detect exon 11 of the MEF2A gene in samples collected from 210 CAD patients and 190 healthy controls and to investigate the function of the MEF2A gene in CAD pathogenesis and their correlation.
CAD, a common disease in China, is induced by multiple factors, such as genetics, the environment, and lifestyle. Thus, a multi-faceted approach is necessary in the study of CAD pathogenesis, particularly in molecular biology research, which is important for developing comprehensive treatment of CAD based on gene therapy. The MEF2A gene was first identified as a CAD-related gene through linkage analysis of a large family with CAD (9 of 13 patients developed MI) in 2003.
In this study, we found the following mutations: 1) codon 451G/T (147191) heterozygous or homozygous mutation; 2) loss of 1 (Q), 2 (QQ), 3 (QQP), 6 (425QQQQQQ430), and 7 (424QQQQQQQ430) amino acids (147108-147131); and 3) codon 435G/A (147143) heterozygous mutation. Among these mutations, the synonymous mutation at locus 147191 was confirmed by reference to the National Center for Biotechnology Information (NCBI) database to be a single nucleotide polymorphism, which was also demonstrated in our study by the extensive presence of this polymorphism in healthy controls. However, the heterozygous mutation at locus 147143 was only found in the genomes of CAD patients, and was therefore identified as a mutation.
Given that MEF2A is a CAD-related gene, the results of various studies are controversial among several countries. Weng et al. (2005) screened gene mutations in exon 11 of the MEF2A gene from 300 CAD patients and 1500 healthy controls. They hypothesized that the changes in 5-12 CAG repeats are genetic polymorphisms and that the 21-base deletion in exon 11 of the MEF2A gene did not induce autosomal dominant genetic CAD. Gonzalez et al. (2006) suggested that the CAG repeat polymorphism was independent of MI susceptibility in Spanish patients. Kajimoto et al. (2005) reported that the CAG repeat sequence was not correlated with MI susceptibility in Japanese patients. Horan et al. (2006) also found that the CAG repeat sequence was not associated with the susceptibility to early-onset familial CAD in an Irish population. Hsu et al. (2010) identified no correlation between the CAG repeat sequence and CAD susceptibility in the Taiwanese population. Dai et al. (2010) found that the structural change in exon 11 was not related to CAD in the Chinese Han population. Lieb et al. (2008) and Guella et al. (2009) hypothesized that MEF2A was independent of CAD. However, Yuan et al. (2006) and Han et al. (2007) suggested that the CAG repeat sequence was correlated with CAD because 9 CAG repeats was an independent predictor of CAD. Elhawari et al. (2010) and Maiolino et al. (2011) suggested that MEF2A is a susceptibility gene for CAD. Dai et al. (2013) showed that mutations in exon 12 are associated with the early onset of CAD in the Chinese population. Liu et al. (2012) failed to demonstrate a correlation between the CAG repeat sequence and CAD through case-control analysis, systematic review, and meta-analysis, but found that the 21- base deletion in exon 11 was strongly associated with CAD, and that genetic variations in MEF2A may be a relatively rare, but specific, pathogenic gene for CAD/MI. Kajimoto et al. (2005) reported 4-15 CAG repeats. However, only 4-11 CAG repeats were observed in our study, possibly because of genetic differences in patients in this study. Eleven CAG repeats were observed in most samples from the control group, and the proportion of 10, 9, and 8 repeats exceeded 1%. The heterozygous mutation at 147143, as well as the 4 and 5 CAG repeats, was only observed in CAD patients. Thus, we speculated that the CAG repeat sequence is correlated with CAD susceptibility, and the presence of 4 or 5 repeats may be a risk factor for CAD, which was inconsistent with the results obtained by Han et al. (2007). The inconsistency in these results may be explained by the differences in subjects and sample sizes among studies.
Impact of glucocerebrosidase mutations on motor and nonmotor complications in Parkinson’s disease

Homozygous and compound heterozygous mutations in GBA encoding glucocerebrosidase lead to Gaucher disease (GD). A link between heterozygous GBAmutations and Parkinson’s disease (PD) has been suggested ( Bembi et al., 2003,Goker-Alpan et al., 2004, Halperin et al., 2006, Machaczka et al., 1999, Neudorfer et al., 1996, Tayebi et al., 2001 and Tayebi et al., 2003). In 2009, a 16-center worldwide analysis of GBA revealed that heterozygous GBA mutation carriers have a strong risk of PD ( Sidransky et al., 2009).

In addition, heterozygote GBA mutations not only carry a risk for PD development but also the possibility of some risk burden on the progression of PD clinical course. In cross-sectional analyses of GBA mutations in PD patients, earlier disease onset, increased cognitive impairment, a greater family history of PD, and more frequent pain were reported in patients with mutations, compared with no mutations ( Chahine et al., 2013,Clark et al., 2007, Gan-Or et al., 2008, Kresojevic et al., 2015, Lwin et al., 2004, Malec-Litwinowicz et al., 2014, Mitsui et al., 2009, Neumann et al., 2009, Nichols et al., 2009,Seto-Salvia et al., 2012, Sidransky et al., 2009, Swan and Saunders-Pullman, 2013 and Wang et al., 2012). Recently, a few prospective studies have investigated clinical features of PD with GBA and showed a more rapid progression of motor impairment and cognitive decline in GBA mutation cases than in PD controls ( Beavan et al., 2015, Brockmann et al., 2015 and Winder-Rhodes et al., 2013). However, in terms of motor complications such as wearing-off and dyskinesia, no studies exist in the longitudinal course of PD with GBA mutations.

Here, we conducted a multicenter retrospective cohort analysis, and the data were investigated by survival time analysis to show the impact of GBA mutations on PD clinical course. We also investigated regional cerebral blood flow (rCBF) and cardiac sympathetic nerve degeneration of subjects with GBA mutations, compared with matched PD controls.

3.1. Subjects

Among the 224 eligible PD patients (the subjects were not related to each other), 9 subjects were excluded from the analysis (4 due to multiple system atrophy findings on subsequent brain MRI and 5 because of insufficient clinical information). Therefore, 215 PD patients [female, 52.1%; age, 66.7 ± 10.8 (mean ± standard deviation)] were analyzed. For non-PD healthy controls, 126 patients’ spouses (female, 58.7%; age, 67.3 ± 10.3) without a family history of PD or GD were enrolled.

3.2. GBA mutations and risk ratios for PD

In the PD subjects, we identified 10 nonsynonymous and 2 synonymous GBA variants. Within the nonsynonymous variants, 7 mutations were previously reported in GD [R120W, L444P-A456P-V460 (RecNciI), L444P, D409H, A384D, D380N, and444L(1447-1466 del 20, insTG)] as GD-associated mutations. Three nonsynonymous mutations have never been reported in GD patients [I(-20)V, I489V, and there was one novel mutation (Y11H)].

GD-associated GBA mutations were found in 19 of the 215 (8.8%) PD patients but none in the healthy controls. The risk of PD development relative to these GD-associated mutations was estimated as an OR of 25.1 [95% confidence interval (CI), 1.50–420,p = 0.0001] with 0-cell correction. The nonsynonymous mutations that were not reported in GD patients had no association with PD development (p = 0.506; OR, 1.3; 95% CI, 0.7–2.6) ( Table 1). Four subjects had double mutations. For subsequent analyses, 2 subjects with double mutations of I (-20)V and K466K were adopted to the group of mutations unreported in GD, and 2 subjects with double mutations of R120W and I(-20)V, and of R120W and L336L were adopted to the group of GD-associated mutations.

Table 1.Frequency of glucocerebrosidase gene allele in Parkinson’s disease patients and controls

Allele name PD (n = 215) Controls (n = 126) p Odds ratio (95% CI)
GD-associated mutations
 R120W 7a 0 0.050 9.1 (0.5–160.8)
 RecNciI (L444P-A456P-V460) 4 0
 L444P 4 0
 D409H 1 0
 A384D 1 0
 D380N 1 0
444L(1447-1466 del 20, insTG) 1 0
 Subtotal, n (%) 19 (8.8%) 0 (0%) <0.001 25.1 (1.5–419.8)b
Nonsynonymous mutations not reported in GD
 I(-20)V 27a 13 0.603 1.3 (0.6–2.5)
 I489V 3 0
 Y11Hc 0 1
 Subtotal, n (%) 30 (14.0%) 14 (11.1%) 0.506 1.3 (0.7–2.6)
Synonymous, n
 K466K 2a 1
 L336L 1a 0
Allele names refer to the processed protein (excluding the 39-residue signal peptide).

Key: CI, confidence interval; GD, Gaucher disease; PD, Parkinson’s disease.

a Four subjects had double mutations; 2 of I(-20)V and K466K, 1 of I(-20)V and R120W, and 1 of R120W and L336L.
b Odds ratio was calculated by adding 0.5 to each value.
c Novel mutation.
3.3. Clinical features of PD patients by GBA mutation groups

The clinical features of PD patients with GD-associated mutations, those with mutations unreported in GD, and those without mutations are shown in Table 2. In the GD-associated mutation group, females, those with a family history and those with dementia (DSM IV) were significantly more frequent than those in the no-mutation group (p = 0.047, 0.012, and 0.020, respectively). The age of PD onset was lower in patients with GD-associated mutations (55.2 ± 9.9 years ± standard deviation), compared with those without mutations (59.3 ± 11.5), although the statistical difference was not significant. There were no differences in clinical manifestations between subjects with mutations unreported in GD and those without mutations, except for dopamine agonist dosage (p = 0.026) ( Table 2).

Table 2.Epidemiological and clinical features of PD patients with Gaucher disease–associated GBA mutations, those with mutations previously unreported in GD and those without mutations

Variables Total n = 215 Mutation (-) GD-associated mutations


Mutations unreported in GD


167 19a pb 29c pd
Sex Female, n (%) 83 (49.7) 14 (73.7) 0.047 15 (51.7) ns
Age Mean (SD) 67.0 (10.8) 62.2 (10.7) 0.063e 67.5 (11.2) nsf
Disease duration (y) Mean (SD) 7.7 (5.5) 6.9 (4.6) nsf 7.2 (4.9) nsf
Onset age Mean (SD) 59.3 (11.5) 55.2 (9.9) ns 60.3 (11.8) ns
Family history Yes, n (%) 17 (11.0)g 6 (31.6) 0.012 0 (0.0) ns
Dementia (DSM-IV) Yes, n (%) 29 (17.4) 9 (47.4) 0.020 5 (17.2) ns
MMSE Mean (SD) 25.8 (5.4)h 23.3 (7.7) nsf 27.0 (3.4)i nsf
Onset symptom (tremor vs. others) Tremor, n (%) 78 (46.8) 9 (47.4) ns 15 (51.7) ns
Modified H-Y on (<3 vs. ≥3) ≥3, n (%) 82 (49.1) 14 (73.7) 0.042 16 (55.2) ns
UPDRS part 3 Mean (SD) 23.6 (12.2)j 28.5 (13.8) nsf 21.9 (8.7) nsf
Wearing off Yes, n (%) 70 (41.9) 9 (47.4) ns 13 (44.8) ns
Dyskinesia Yes, n (%) 49 (29.3) 8 (42.1) ns 8 (27.6) ns
Mood disorder Yes, n (%) 43 (25.7) 8 (42.1) ns 7 (24.1) ns
Orthostatic hypotension symptom Yes, n (%) 21 (12.6) 5 (26.3) ns 7 (24.1) ns
Psychosis history Yes, n (%) 59 (35.3) 10 (52.6) ns 7 (24.1) ns
ICD history Yes, n (%) 8 (4.8) 1 (5.3) ns 1 (3.4) ns
Stereotactic brain surgery for PD Yes, n (%) 4 (2.4) 0 (0.0) ns 0 (0.0) ns
Agonist LED mg/d Mean (SD) 92.8 (114.2) 72.1 (137.7) nse 163.7 (155.6) 0.026e
Levodopa LED mg/d Mean (SD) 400.7 (184.2) 456.7 (206.9) nsf 369.2 (230.3) nse
Total LED mg/d Mean (SD) 496.4 (233.7) 537.9 (258.9) nsf 525.7 (287.4) nsf
Categorical data were examined by Fisher’s exact test.

Key: DSM-IV, Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition; GBA, glucocerebrosidase gene; GD, Gaucher disease; H-Y, Hoehn and Yahr; ICD, impulse control disorder; LED, levodopa equivalent dose; ns, not significant; MMSE, Mini-Mental State Examination; PD, Parkinson’s disease; SD, standard deviation; UPDRS, Unified Parkinson’s Disease Rating Scale.

a Including a double-mutation subject (with a mutation unreported in GD).
b GD-associated mutations versus mutation (-).
c Two subjects with double mutation, including GD-associated mutations, were assigned to GD-associated mutation group.
d Other mutations versus mutation (-).
e Examined by Student t test after Levene’s test for equality of variances.
f Examined by Mann-Whitney U-test because of non-Gaussian distribution.
g    n = 155 due to 10 missing data.
h    n = 164 due to 3 missing data.
i     n = 28 due to 1 missing datum.
j     n = 165 due to 2 missing data.

3.4. Survival time analyses to develop dementia, psychosis, dyskinesia, and wearing-off

Time to develop clinical outcomes (dementia, psychosis, dyskinesia, and wearing-off) was compared in 19 subjects with GD-associated mutations, 29 with mutations unreported in GD, and 167 without mutation. The median observation time was 6.0 years. The subjects with GD-associated mutations showed a significantly earlier development of dementia and psychosis, compared with subjects without mutation (p < 0.001 and p = 0.017) ( Supplementary Table e-1, Fig. 1A and B). We rereviewed the clinical record of the subject who showed early dementia (defined by DSM IV) ( Fig. 1A) and made sure it did not satisfy the criteria of DLB ( McKeith et al., 2005).

Kaplan–Meier curves of dementia and psychosis in Parkinson's disease (PD) ...

Fig. 1.

Kaplan–Meier curves of dementia and psychosis in Parkinson’s disease (PD) patients with Gaucher disease (GD)-associated glucocerebrosidase gene (GBA) mutations and those without mutations. PD patients with GD-associated GBA mutations and those without GBA mutations were compared to investigate the time taken to develop dementia (A) and psychosis (B). Because of insufficient information in several patients, the numbers in each analysis were different. The patients with and without mutations were 17 and 165 (A), 18 and 165 (B) against a total of 19 and 167. DSM IV, Diagnostic and Statistical Manual of Mental Disorders, revised fourth edition. p-Values were calculated by log-rank tests.

The associations of GBA mutations and these symptoms were estimated as HRs, adjusting for sex and age at PD onset. HRs were 8.3 for dementia (95% CI, 3.3–20.9; p < 0.001) and 3.1 for psychosis (95% CI, 1.5–6.4; p = 0.002). The time until development of wearing-off and dyskinesia complications was not statistically significant, with HRs of 1.5 (95% CI, 0.8–3.1; p = 0.219) and 1.9 (95% CI, 0.9–4.1; p = 0.086) ( Table 3).

Table 3.Hazard ratios of GBA pathogenic mutations for clinical symptoms

Model Clinical feature Hazard ratio 95% CI p
1 Dementia (DSM-IV) 8.3 3.3–20.9 <0.001
2 Psychosis 3.1 1.5–6.4 0.002
3 Wearing-off 1.5 0.8–3.1 0.219
4 Dyskinesia 1.9 0.9–4.1 0.086
Each model was adjusted for sex and age at onset.

Key: CI, confidence interval; DSM-IV; The Diagnostic and Statistical Manual of Mental Disorders part 1IV; GBA, glucocerebrosidase.

Subjects with mutations unreported in GD did not show significant differences in time to develop all 4 outcomes, compared with no mutation subjects. Therefore, subjects with GD-unreported mutations were regarded as subjects without GBA mutations in further analyses.

3.5. rCBF on SPECT in patients with GD-associated GBA mutations

We conducted pixel-by-pixel comparisons of rCBF on SPECT between PD subjects with mutations (cases) and sex-, age-, and disease duration-matched PD subjects without any mutations in GBA (controls). Four controls were adopted for each case (except for a 34-year-old female case who was matched to a control), and in total 12 cases (female 50%, age at SPECT mean ± standard error (SE); 58.9 ± 3.3 years, disease duration at SPECT 7.3 ± 1.5 years) and 45 controls (female 64.4%, age at SPECT mean ± SE; 61.0 ± 1.3 years, disease duration at SPECT 7.1 ± 0.7 years) were analyzed. As a result, a significantly lower rCBF was seen in the cases compared to the controls in the bilateral parietal cortex, including the precuneus ( Fig. 2).

Regional cerebral blood flow in the group with GD-associated mutations compared ...

Fig. 2.

Regional cerebral blood flow in the group with GD-associated mutations compared with the matched Parkinson’s disease group without mutations. Regions with lower regional cerebral blood flow in the group with GD-associated mutations displayed on an anatomic reference map. Abbreviation: GD, Gaucher disease.

3.6. H/M ratios on MIBG scintigraphy in patients with GD-associated GBA mutations

Cardiac MIBG scintigraphy visualizes catecholaminergic terminals in vivo that are reduced as well as brain dopaminergic neurons in PD patients. We also investigated MIBG scintigraphy between 16 cases (female 68.8%, age at examination mean ± SE; 60.2 ± 2.6 years, disease duration at examination 6.2 ± 1.2 years) and sex-, age- and disease duration-matched 61 controls [(63.8 %, age 62.0 ± 1.1 years, disease duration 5.5 ± 0.6 years) (1:4 except for 1 young 34-year-old female case who was matched to a control)]. In the results, both early and late H/M ratios declined in both groups and did not show any significant differences (p = 0.309 and 0.244) ( Supplementary Table e-2).

4. Discussion

4.1. Contributions of GD-associated GBA mutations to the development of PD

In the analysis of 215 PD patients and 126 non-PD controls, we identified 10 nonsynonymous heterozygous GBA mutations, including 1 novel mutation. Among these mutations, 7 were GD-associated, and the patients carrying these mutations represented 8.8% of the PD cohort. No significant association was found between the GD-unreported mutations and PD development, which suggests that only the GD-associated mutations are a genetic risk for PD. According to a worldwide multicenter analysis of 1883 fully sequenced PD patients, 7% of the GD-associated mutations are found in non-Ashkenazi Jewish PD patients ( Sidransky et al., 2009). Although the mutation frequency in the present study was similar to previous results, the OR of GD-associated heterozygous mutations (25.1) was significantly greater than the OR (5.43) of other ethnic cohorts (Sidransky et al., 2009) and was consistent with an OR of 28.0 from a previous Japanese report ( Mitsui et al., 2009). These results, taken together, suggest the possibility thatGBA mutations are at a distinct risk for PD in the Japanese population. However, a larger Japanese cohort study is required to confirm this.

4.2. Cross-sectional clinical figures of PD with GBA mutations

Before the survival time analyses, we investigated clinical features at enrollment between mutation groups. The lower onset age, more frequent family history and dementia, and worse disease severity of PD in patients with GBA mutations, compared with those without mutations, were consistent with previous cross-sectional case-control reports ( Anheim et al., 2012, Brockmann et al., 2011, Chahine et al., 2013, Lesage et al., 2011, Li et al., 2013, Mitsui et al., 2009, Neumann et al., 2009, Seto-Salvia et al., 2012 and Sidransky et al., 2009). In contrast, female-predominance (73.7%, p = 0.047) in patients with mutations observed in the present study is inconsistent ( Neumann et al., 2009 and Seto-Salvia et al., 2012).

4.3. Impact of GBA mutations on the clinical course of PD

To investigate the impact of GBA mutations on the clinical course of PD, a prospective-designed study over a long period is preferred. Although there has been a few longitudinally designed study to date, follow-up clinical data for a median of 6 years of 121 PD cases from a community-based incident cohort was recently reanalyzed; results demonstrate that progression to dementia defined by DSM IV (HR 5.7) and Hoehn and Yahr stage 3 (HR 3.2) are significantly earlier in 4 GBA mutation-carrier patients compared with 117 patients with wild-type GBA ( Winder-Rhodes et al., 2013). A 2-year follow-up clinical report of 28 heterozygous GBA carriers who were recruited from relatives of GD-patients shows slight but significant deterioration of cognition and smelling, compared to healthy controls ( Beavan et al., 2015). Brockmann et al. (2015)assessed motor and nonmotor symptoms including cognitive and mood disturbances for 3 years in 20 PD patients with GBA mutations and showed a more rapid disease progression of motor impairment and cognitive decline in GBA mutation cases comparing to sporadic PD controls. The current long-term retrospective cohort study up to 12 years reinforced these results. It revealed that dementia and psychosis developed significantly earlier in subjects with GD-associated mutations compared with those without mutation, and the HRs of GBA mutations were estimated at 8.3 for dementia and 23.1 for psychosis, with adjustments for sex and PD onset age. In contrast, the results showed no significant difference in developing wearing-off and dyskinesia.

In this study, we also investigated whether GD-unreported mutations affected the clinical course of PD. In both cross-sectional and survival time analyses, the mutations unreported in GD carried no increased burden on clinical symptoms such as dementia, psychosis, wearing-off, and dyskinesia.

4.4. Reduced rCBF in PD with GBA mutations compared with matched PD controls

We found a significantly decreased rCBF, reflecting decreased synaptic activity, in the bilateral parietal cortex including the precuneus, in subjects with GD-associated mutations compared with matched subjects without mutations. The pattern of reduced rCBF was very similar to the pattern of H215O positron-emission tomography that Goker-Alpan et.al. (2012) reported, showing decreased resting rCBF in the lateral parietal association cortex and the precuneus bilaterally in GD subjects with parkinsonism (7 subjects with homozygous or compound heterozygous GBA mutations), compared with 11 PD without GBA mutations. Results suggest that PD with heterozygous GBAmutations and GD patients presenting parkinsonism had a common reduced pattern of rCBF. Interestingly, in their study, rCBF in the precuneus—but not in the lateral parietal cortex—correlated with IQ, suggesting that the involvement of the precuneus is critical for defining GBA-associated patterns.

4.5. Reduced cardiac MIBG H/M ratios as well as matched PD controls

We also showed that cardiac MIBG H/M ratios in subjects with GD-associated mutations were lower than the cutoff point for PD discrimination (Sawada et al., 2009), suggesting that postganglionic sympathetic nerve terminals to the epicardium were denervated, as well as in PD without mutations.

4.6. Mechanisms of impact on PD clinical course by GD-associated GBA mutations

Experimental studies suggesting a bidirectional pathogenic loop between α-synuclein and glucocerebrosidase have been accumulated (Fishbein et al., 2014, Gegg et al., 2012, Mazzulli et al., 2011, Noelker et al., 2015, Schondorf et al., 2014 and Uemura et al., 2015). Loss of glucocerebrosidase function compromises α-synuclein degradation in lysosome, whereas aggregated α-synuclein inhibits normal lysosomal function of glucocerebrosidase. The pathogenic loop may facilitate neurodegeneration in GD-associated PD brain, resulting in early development of dementia or psychosis as shown in the present study. Several recent researches propose the possibility that the similar mechanism as in PD with GBA mutations exists even in idiopathic PD brain ( Alcalay et al., 2015, Chiasserini et al., 2015, Gegg et al., 2012 and Murphy et al., 2014). On the other hand, the impacts of GD-associated GBA mutations for the development of motor complications such as wearing-off and dyskinesia were not statistically significant, suggesting other pathophysiological mechanisms in the striatal circuit brought out after long-term therapy especially by l-dopa.

4.7. Limitations

Our study has several limitations. In the design of the study, we assumed that the sample size was 215 (PD patients) for survival time analyses and investigated 224 PD patients. We assumed that the mutation prevalence would be 9.4%, and in fact, we found 19 patients with mutations (8.5%) of the 224 patients. Based on these figures, we estimated the risk ratios of heterozygous GBA mutations for the risk of PD development and PD clinical symptoms as ORs in the cross-sectional multivariate analyses, although the 95% CIs were broad. More of subject numbers will be needed to determine robust risk ratios.

Comprehensive Genetic Characterization of a Spanish Brugada Syndrome Cohort

PLOS   Published: July 14, 2015   DOI: http://dx.doi.org:/10.1371/journal.pone.0132888

Brugada syndrome (BrS) was identified as a new clinical entity in 1992 [1]. Six years later, the first genetic basis for the disease was identified, with the discovery of genetic variations inSCN5A [2]. Nowadays, more than 300 pathogenic variations in this first gene are known to be associated with BrS [3]. SCN5A encodes for the α subunit of the cardiac voltage-dependent sodium channel (Nav1.5), which is responsible for inward sodium current (INa), and thus plays an essential role in phase 0 of the cardiac action potential (AP). Genetic variations in this gene can explain around 20–25% of BrS cases [3].

Since BrS was classified as a genetic disease, several other genes have been described to confer BrS-susceptibility [47]. Pathogenic variations have been mainly described in: 1) genes encoding proteins that modulate Nav1.5 function, and 2) other calcium and potassium channels and their regulatory subunits. All these proteins participate, either directly or indirectly, in the development of the cardiac AP. Although the incidence of pathogenic variations in these BrS-associated genes is low [6], it is considered that, among all of them, they could provide a genetic diagnosis for up to an extra 5–10% of BrS cases. Hence, altogether, a genetic diagnosis can be achieved approximately in 35% of clinically diagnosed BrS patients.
Other types of genetic abnormalities have been suggested to explain the remaining percentage of undiagnosed patients. Indeed, multiplex ligation-dependent probe amplification (MLPA) has allowed the detection of large-scale gene rearrangements involving one or several exons ofSCN5A in BrS cases. However, the low proportion of BrS patients carrying large genetic imbalances identified to date suggests that this type of rearrangements will provide a genetic diagnosis for a modest percentage of BrS cases [810].
BrS has been associated with an increased risk of sudden cardiac death (SCD), despite the reported variability in disease penetrance and expressivity [11]. The prevalence of BrS is estimated at about 1.34 cases per 100 000 individuals per year, with a higher incidence in Asia than in the United States and Europe [12]. However, the dynamic nature of the typical electrocardiogram (ECG) and the fact that it is often concealed, hinder the diagnosis of BrS. Therefore, an exhaustive genetic testing and subsequent family screening may prove to be crucial in identifying silent carriers. A large percentage of these pathogenic variation carriers are clinically asymptomatic, and may be at risk of SCD, which is, sometimes, the first manifestation of the disease [13].
In the present work, we aimed to determine the spectrum and prevalence of genetic variations in BrS-susceptibility genes in a Spanish cohort diagnosed with BrS, and to identify variation carriers among relatives, which would enable the adoption of preventive measures to avoid SCD in their families.

Results  
Study population 

thumbnail

Table 1. Demographics of the 55 Spanish BrS patients included in the study.

The table shows the demographic characteristics of all the patients included in the study. Numbers in parentheses represent the relative percentages for each condition. T1 ECG refers to Type 1 BrS diagnostic electrocardiogram (ECG), obtained either spontaneously, or after drug challenge. The information regarding both the electrophysiological studies (EPS) and the treatment was not available for all the patients. Two of the patients that didn’t receive any treatment died, and were not taken into account for the calculations of percentages (+2 dead). ICD, intracardiac cardioverter defibrillator.

http://dx.doi.org:/10.1371/journal.pone.0132888.t001

thumbnail

Table 2. Characteristics of the Spanish BrS patients carrying rare genetic variations.

The table shows the clinical characteristics of the probands who carried rare genetic variations in SCN5A, SCN2B, or RANGRF. All of them are potentially pathogenic except that found in RANGRF, which is of unknown significance (see discussion). All the potentially pathogenic variations (PPVs) that had been previously reported, except p.P1725L and p.R1898C, had been identified in BrS patients. p.P1725L had been associated with Long QT Syndrome and p.R1898C was found in Exome Variant Server with a MAF of 0.0079%. No rare variations were identified in the control population. Patient’s age is expressed in years. Bold identifies the patients carrying variations that had not been described previously. M, male; F, female; S, syncope; ICD, intracardiac cardioverter defibrillator; UK, unknown; EPS, electrophysiological studies (+, positive response;-, negative response; N/P, not performed). The two patients who carried two PPVs each are identified by a and b, respectively.

http://dx.doi.org:/10.1371/journal.pone.0132888.t002

Sequencing of genes associated with BrS

We performed a genetic screening of 14 genes (SCN5A, CACNA1C, CACNB2, GPD1L,SCN1B, SCN2B, SCN3B, SCN4B, KCNE3, RANGRF, HCN4, KCNJ8, KCND3, and KCNE1L), which allowed the identification of 61 genetic variations in our cohort. Of these, 20 were classified as potentially pathogenic variations (PPVs), one variation of unknown significance, and 40 common or synonymous variants considered benign.

The 20 PPVs were found in 18 of the 55 patients (32.7% of the patients, 83.3% males; Table 2). Sixteen patients (88.9%) carried one PPV, and two patients (11.1%) carried two different PPVs each. Nineteen out of the 20 PPVs identified were localized in SCN5A and one in SCN2B.

The vast majority of the PPVs identified were missense (70%). We also detected 2 nonsense variations (10%), 3 insertions or deletions causing frameshifts (15%), and one splicing variation (5%). The three frameshifts (p.R569Pfs*151, p.E625Rfs*95 and p.R1623Efs*7) were identified in SCN5A. These were not found in any of the databases consulted (see Methods), and were thus considered potentially pathogenic (see below). The other 16 rare variations identified inSCN5A had been previously described, and hence were also considered potentially pathogenic. Fourteen of them had been identified in BrS patients. Of these, 6 had also been identified in individuals diagnosed with other cardiac electric diseases (i.e. Sick Sinus Syndrome, Long QT Syndrome, Sudden Unexplained Nocturnal Death Syndrome or Idiopathic Ventricular Fibrillation [2,15,16,20,21,25]). The other 2, p.P1725L and p.R1898C, had only been associated with Long QT Syndrome or found in Exome Variant Server with a MAF of 0.0079%, respectively. Furthermore, we identified a variation in SCN2B (c.632A>G in exon 4 of the gene, resulting in p.D211G) which was considered pathogenic. This patient was included within our cohort, but the functional characterization of channels expressing SCN2B p.D211G was object of a previous study from our group [7]. We also identified a nonsense variation in RANGRFwhich has been formerly reported as rare genetic variation of unknown significance [29].

Additionally, we screened the relatives of those probands carrying a PPV. We analysed a total of 129 relatives, 69 of which (53.5%) were variation carriers. Genotype-phenotype correlations evidenced that 8 of the families displayed complete penetrance (S3 Table). Additionally, no relatives were available for one of the probands carrying a PPV, thus hampering genotype-phenotype correlation assessment. The other 12 families showed incomplete penetrance.

 

MLPA analysis

The 37 patients with negative results after the genetic screening of the 14 BrS-associated genes underwent MLPA analyses of SCN5A. This technique did not reveal any large exon deletion or duplication in this gene for any of the patients.

 

SCN5A p.R569Pfs*151 (c.1705dupC), a novel PPV

A 41-year-old asymptomatic male presented a type 3 BrS ECG which was suggestive of BrS. Flecainide challenge unmasked a type 1 BrS ECG (Fig 1A, left), which was also spontaneously observed sometimes during medical follow up. Sequencing of SCN5A revealed a duplication of a cytosine at position 1705 (c.1705dupC; Fig 1A, right), which originated a frameshift that lead to a truncated Nav1.5 channel (p.R569Pfs*151). The proband’s sister also carried this duplication, but had never presented signs of arrhythmogenesis. The proband’s twin daughters were also variation carriers, displayed normal ECGs and, to date, are asymptomatic (Fig 1A, middle). Thus, p.R569Pfs*151 represents a novel genetic alteration in the Nav1.5 channel that could potentially lead to BrS, but with incomplete penetrance.

thumbnail

Fig 1. Characteristics of the probands carrying non-reported potentially pathogenic variations (PPVs) in SCN5A and their families.

Left: Electrocardiograms of the probands: (A) patient carrying the p.R569Pfs*151 variation, showing the ST elevation characteristic of BrS in V1 at the time of the flecainide test; (B) patient carrying the p.E625Rfs*95 variation, showing the spontaneous ST elevation characteristic of BrS in V1 and V2; and (C) patient carrying the p.R1623Efs*7 variation, showing the spontaneous ST elevation characteristic of BrS in V1 and V2. Middle: Family pedigrees. Open symbols designate clinically normal subjects, filled symbols mark clinically affected individuals and question marks identify subjects without an available clinical diagnosis. Plus signs indicate the carriers of the PPVs and minus signs, non-carriers. The crosses mark deceased individuals and arrows identify the proband. Right: Detail of the electropherograms obtained after SCN5Asequence analysis of a control subject (left panels) and of the probands (right panels).

http://dx.doi.org:/10.1371/journal.pone.0132888.g001

SCN5A p.E625Rfs*95 (c.1872dupA), a novel PPV

A 51-year-old asymptomatic male was diagnosed with BrS since he presented a spontaneous ST segment elevation in leads V1 and V2 characteristic of type 1 BrS ECG (Fig 1B, left). The sequencing of SCN5A evidenced an adenine duplication at position 1872 (c.1872dupA, Fig 1B, right). This genetic variation results in a truncated Nav1.5 channel (p.E625Rfs*95). The genetic analysis of the proband’s relatives proved that only her mother carried the variation (Fig 1B, middle). She was asymptomatic, but a BrS ECG was unmasked upon ajmaline challenge. The proband’s sister was found dead in her crib at 6 months of age, which suggests that her death might be compatible with BrS. Therefore, the p.E625Rfs*95 variation in the Nav1.5 channel represents a novel genetic alteration potentially causing BrS.

SCN5A p.R1623Efs*7 (c.4867delC), a novel PPV

The proband, a 31-year-old male, was admitted to hospital after suffering a syncope. His baseline 12-lead ECG showed a ST segment elevation in leads V1 and V2 that strongly suggested BrS type 1 (Fig 1C, left). A deletion of the cytosine at position 4867 (c.4867delC) was observed upon SCN5A sequencing (Fig 1C, right). This base deletion leads to a frameshift that originates a truncated Nav1.5 channel (p.R1623Efs*7). Genetic screening of his parents and sisters evidenced that none of them carried this novel variation (Fig 1C, middle). None of them had presented any signs of arrhythmogenicity, nor had a BrS ECG. Nevertheless, in uterogenetic analysis of one of his daughters proved that she had inherited the variation. She died when she was 1 year of age of non-arrhythmogenic causes. Hence, the p.R1623Efs*7 variation in the Nav1.5 channel is a novel genetic alteration originated de novo in the proband that could potentially lead to BrS.

Synonymous and common genetic variations portrayal

In our cohort, we identified 40 single nucleotide variations which were common genetic variants and/or synonymous variants (S2 Table). Twenty-nine had a minor allele frequency (MAF) over 1%, and were thus considered common genetic variants.

We also identified 11 variants with MAF less than 1%. Of them, 9 were synonymous variants, what made us assume that they were not disease-causing. Four of these synonymous variants were not found in any of the databases consulted, and thus their MAF was considered to be less than 1%. Each of these synonymous variations was identified in 1 patient of the cohort. A similar proportion of individuals carrying these novel variations was detected upon sequencing of 300 healthy Spanish individuals (600 alleles). The remaining 2 variants were missense, and although they had either a MAF of less than 1% or an unknown MAF according to the Exome Variant Server and dbSNP websites, they were common in our cohort (29.2 and 50%, respectively; S2 Table), and a similar MAF was detected in a Spanish cohort of healthy individuals (26.7% and 48.8%, respectively).

Influence of phenotype and age on PPV discovery

To assess if a connection existed between the probands’ phenotype and the PPV detection yield, we classified the patients in our cohort according to their ECG (spontaneous or induced type 1), the presence of BrS cases within their families, and the presence/absence of symptoms. Even though the overall PPV detection yield was 32.7%, it was even higher for symptomatic patients (Fig 2). Indeed, in this group of patients, having a family history of BrS was identified as a factor for increased PPV discovery yield. In the case of absence of BrS in the family, the variation discovery yield was almost double for those patients having a spontaneous type 1 BrS ECG than for patients with drug-induced type 1 ECG (45.5% vs 25%, respectively). In addition, we identified a PPV in 44.4% of the asymptomatic patients who presented family history of BrS and a spontaneous type 1 BrS ECG. When the patient presented drug-induced type 1 ECG or in the absence of family history of BrS, the PPV discovery yield was of around 15%.

thumbnail

Fig 2. Influence of the phenotype on PPV discovery yield.

Bar graph comparing the PPV detection yield in 8 different clinical categories (stated below the graph). Each bar shows the total number of patients for each clinical category divided in those with a PPV (black) and those without an identified PPV (white). The number of patients (in brackets) and percentages are given. Pos, positive; Neg, negative; Spont, spontaneous type 1 BrS ECG; Drug, drug-induced type 1 BrS ECG; n, number of patients.

http://dx.doi.org/:10.1371/journal.pone.0132888.g002

We also investigated the role of age on the PPV occurrence. No significant age differences were observed between variation carriers and non-carriers (38.6±10.3 and 43.5±14.4, respectively, p = 0.16). However, the PPV discovery yield was higher for patients with ages between 30 and 50 years: out of the total of patients carrying a PPV, 83.3% of the patients were in this age range, while 11.1% were younger and 5.6% were older patients (Fig 3A, upper panel). The PPV discovery yield was significantly higher for symptomatic than for asymptomatic patients (42.3% vs 24.1%, respectively; Fig 3A, lower panels).

thumbnail

Fig 3. Influence of the age on PPVs discovery yield.

(A) Pie charts showing the distribution of patients in the overall population as well as in the categories of symptomatic and asymptomatic patients regarding PPV discovery. The percentage and the number of patients (in brackets) are given for each group. The small pie charts correspond to the age distribution of patients with an identified PPV. (B) Bar graphs of the PPV detection yields obtained for each of the age groups (< 30 years, 30–50 years and > 50 years). Numbers inside each bar correspond to the number of patients carrying a PPV for each category and the percentages represent the variation detection yield.

http://dx.doi.org:/10.1371/journal.pone.0132888.g003

Noteworthy, in the 30–50 age range, 52.9% (9/17) of the symptomatic patients and 35.3% (6/17) of asymptomatic patients carried one PPV (Fig 3B, middle). Additionally, 40% (2/5) of the symptomatic young patients (< 30 years) were variation carriers, while no PPVs were identified in asymptomatic patients within this age range.

Overall, 55 unrelated Spanish patients clinically diagnosed with BrS were included in our study.Table 1 shows the demographics of this cohort, and Table 2 and S1 Table show the clinical and genetic characteristics of all the patients included in the study. The mean age at clinical diagnosis was of 41.9±13.3 years. Although the majority of patients were males (74.5%), their age at diagnosis was not different than that of females (41.8±12.1 years and 42.3±16.3 years, respectively; p = 0.92). A type 1 BrS ECG was present spontaneously in 37 patients (67.3%), and drug challenge revealed a type 1 BrS ECG for the remaining 18 patients (32.7%). Almost half of the patients had experienced symptoms, including 2 SCD and 4 aborted SCD. Patients who had not previously experienced any signs of arrhythmogenicity despite having a BrS ECG were considered asymptomatic. Comparison of symptomatic vs asymptomatic patients evidenced a similar percentage of males (73.1% and 75.9%, respectively). However, the mean age at diagnosis was different between the two groups of patients (37.7±14.3 and 45.7±11.4, respectively; p<0.05).

Discussion

To the best of our knowledge, this is the first comprehensive genetic evaluation of 14 BrS-susceptibility genes and MLPA of SCN5A in a Spanish cohort. Well delimited BrS cohorts from Japan, China, Greece and even Spain have been genetically studied [24,3032]. Additionally, an international compendium of BrS genetic variations identified in more than 2100 unrelated patients from different countries was published in 2010 [3]. However, all these studies screenedSCN5A exclusively. In 2012, Crotti et al. reported the spectrum and prevalence of genetic variations in 12 BrS-susceptibility genes in a BrS cohort [5]. However, this study included patients of different ethnicity. Here, we report the analysis of 14 genes which has been conducted on a well-defined BrS cohort of the same ethnicity.

Our results confirm that SCN5A is still the most prevalent gene associated with BrS. Indeed,SCN5A-mediated BrS in our cohort (30.9%) is higher than the proportion described in other European reports [3,23], where a potentially causative variation is identified in only 20–25% of BrS patients. The reason for this discrepancy is unclear but could point towards a higher prevalence of SCN5A PPVs in the Spanish population or to selection bias. Additionally, we identified a genetic variation in SCN2B (c.632A>G, which results in p.D211G). We have formerly published the comprehensive electrophysiological characterization of this variation, and showed that indeed this variation could be responsible of the phenotype of the patient, thus linking SCN2B with BrS for the first time [7]. Also, we identified a variation in RANGRF. This variation (c.181G>T leading to p.E61X) had been previously reported in a Danish atrial fibrillation cohort [33]. Surprisingly, the authors reported an incidence of 0.4% for this variation in the healthy Danish population, which brought into question its pathogenicity. Our finding of this variation in an asymptomatic patient displaying a type 2 BrS ECG also points toward considering it as a rare genetic variation with a potential modifier effect on the phenotype but not clearly responsible for the disease [29].

No PPVs were identified in the other genes tested. Certainly, it is well accepted that the contribution of these genes to the disease is minor, and thus should only be considered under special circumstances [13,34]. In addition, recent studies have questioned the causality of variations identified in some of these minority genes [35].

We also used the MLPA technique for the detection of large exon duplications and/or deletions in SCN5A in patients without PPVs, and no large rearrangements were identified. This is in accordance with previous reports, which revealed that such imbalances are uncommon [810].

Kapplinger et al. [3] reported a predominance of PPVs in transmembrane regions of Nav1.5. Indeed, it has been proposed that most rare genetic variations in interdomain linkers may be considered as non-pathogenic [36]. In contrast, PPVs identified in this study are mainly located in extracellular loops and cytosolic linker regions of Nav1.5 (Fig 4). Additionally, 2 of our non-previously reported frameshifts are located in the DI-DII linker. These 2 genetic variations lead to truncated proteins, which would lack around 75% of the protein sequence, and thus are presupposed to be pathogenic.

thumbnail

Fig 4. Nav1.5 channel scheme showing the relative position of the SCN5A PPVs identified in our cohort.

Open symbols indicate already described variations and closed symbols locate novel variations reported in this study. DI to DIV designate the 4 domains of the protein, and numbers 1–6 identify the different segments within each domain. Crosses mark the voltage sensor.

http://dx.doi.org:/10.1371/journal.pone.0132888.g004

In our cohort, we have identified 40 synonymous or common genetic variations, 4 of which have not been previously reported. These variations are gradually becoming more and more important in the explanation of certain phenotypes of genetic diseases. Only a few common variations identified here are already published as phenotypic modifiers [37,38]. The effect of these and other common variants identified in our cohort on BrS phenotype should be further studied.

Unexpectedly, almost 40% (7/18) of the PPV carriers did not present signs of arrhythmogenicity. We also performed genotype-phenotype correlations of the PPVs identified in the families (S3 Table). These studies uncovered relatives, most of whom were young individuals, who carried a familial variation but had never exhibited any clinical manifestations of the disease. This is in agreement with Crotti et al. and Priori et al. [5,23], who postulated that a positive genetic testing result is not always associated with the presence of symptoms. Indeed, the existence of asymptomatic patients carrying genetic variations described to cause a severe Nav1.5 channel dysfunction has been reported [39]. The identification of silent carriers is of paramount importance since it allows the adoption of preventive measures before any lethal episode takes place. Unknown environmental factors, medication and modifier genes have been suggested to influence and/or predispose to arrhythmogenesis [11]. Hence, this group of patients has to be cautiously followed in order to avoid fatal events.

Our studies on the connection between patients’ phenotype and the PPV detection yield highlighted the presence of symptoms as a factor for an increased variation discovery yield. Within the group of symptomatic individuals, a PPV was identified in a higher proportion of patients displaying a spontaneous type 1 BrS ECG than for patients showing a drug-induced ECG. Likewise, within the asymptomatic patients with family history of BrS, those who presented spontaneous type 1 BrS ECG carried a PPV more often than those with a drug-induced ECG (Fig 2). Referring to age, the vast majority (17/20, 85%) of the PPVs were identified in patients around their fourth decade of age (30–50 years). This is in accordance with the accepted mean age of disease manifestation. Moreover, in this age range, more than 50% of the patients who presented symptoms carried a variation that could be pathogenic (Fig 3). Importantly, 35.3% of asymptomatic patients of around 40 years of age also carried one of such variations. These data highlight the importance of performing a genetic test even in the absence of clinical manifestations of the disease, and particularly when in the 30–50 years range, which is in accordance with consensus recommendations [13,34].

In conclusion, we have analysed for the first time 14 BrS-susceptibility genes and performed MLPA of SCN5A in a Spanish BrS cohort. Our cohort showed male prevalence with a mean age of disease manifestation around 40 years. BrS in this cohort was almost exclusivelySCN5A-mediated. The mean PPV discovery yield in our Spanish BrS patients is higher than that described for other BrS cohorts (32.7% vs 20–25%, respectively), and is even higher for patients in the 30–50 years age range (up to 53% for symptomatic patients). All these evidences support the genetic testing, at least of SCN5A, in all clinically well diagnosed BrS patients.

 

Study Limitations

First of all, drug challenge tests were not performed for all the relatives who were asymptomatic variation carriers. This fact hampered their clinical diagnosis and represents an impediment to definitely assess the link between PPVs and BrS. These patients are nowadays under follow-up.

New PPVs have been identified in our cohort. The clinical information available for the families suggests that these new variations could be pathogenic. Still, in vitro studies of these variations are required in order to evaluate their functional effects and verify their pathogenic role. Additionally, genotyping in an independent cohort would help reduce the likelihood of type I (false positive) error in genetic variant discovery.

We have to acknowledge that the study set is relatively small. Consequently, the classification of patients according to the different clinical categories rendered rather small sub-groups, which may lead to over-interpretation of the results. Future studies will be directed to the genetic screening of additional Spanish BrS patients, which will probably reinforce the significance of the tendencies observed here.

Read Full Post »

GEN Tech Focus: Rethinking Gene Expression Analysis, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 1: Next Generation Sequencing (NGS)

GEN Tech Focus: Rethinking Gene Expression Analysis

Larry H. Bernstein, MD, FCAP, Curator

LPBI

 

Quantitating gene expression is essential for researchers to answer important biological questions about basic cellular functions, as well as disease states. In the following articles you will discover the multitude of advances investigators have made to accurately measure and quantitate genetic transcripts within the cell.

Diverse Pathways to Drug Targets

A great deal of research on pathway analysis is currently focusing on RNA rather than proteins, and the complex RNA networks that regulate gene expression. With the realization that more than 90% of the genome that is transcribed into RNA is not translated into protein, and the growing numbers of naturally occurring microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) being identified and characterized, the important role these RNAs play in normal biological processes and across human diseases is becoming increasingly clear.

 

The Gene-Expression Undergrowth Have Been Well Trodden, but RNA Paths Want Wear, Too

  • Hepatitis C virus depends on a functional interaction between its genome and miR-122 for viral stability and replication. Researchers recently used an antisense oligonucleotide that targets the liver-specific microRNA miR-122, blocking its function. [Bluebay2014/Fotolia]

    A great deal of research on pathway analysis is currently focusing on RNA rather than proteins, and the complex RNA networks that regulate gene expression.

    With the realization that more than 90% of the genome that is transcribed into RNA is not translated into protein, and the growing numbers of naturally occurring microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) being identified and characterized, the important role these RNAs play in normal biological processes and across human diseases is becoming increasingly clear.

    This knowledge—combined with the available technology and strategies to decipher RNA pathways and link alterations in the levels or activity of miRNAs or lncRNAs to gene expression, epigenetic mechanisms, and protein activity in normal and disease phenotypes—is driving the development and clinical testing of novel drug targets and therapeutics that target regulatory RNAs.

    For example, a microRNA was targeted in a Phase II clinical study that assessed the effect of miravirsen, an antisense oligonucleotide, in patients with hepatitis C. The study, which was described in 2013 in the New England Journal of Medicine, indicated that miravirsen sequesters the liver-specific microRNA miR-122 in a highly stable heteroduplex, thereby inhibiting its function.

    Hepatitis C virus (HCV) depends on a functional interaction between its genome and miR-122 for viral stability and replication. According to the study, inhibition of miR-122 in HCV-infected patients was associated with decreased levels of HCV RNA that continued beyond the treatment period, without evidence of viral resistance.

    The therapeutic potential of regulatory RNAs is also being assessed in other conditions such as cancer. Specifically, miRNAs and other ncRNAs in cancer initiation, progression, and metastasis are being studied by George Calin, M.D., Ph.D., a professor of experimental therapeutics, MD Anderson Cancer Center, University of Texas. Dr. Calin’s group is scouring the “microRNAome” to identify miRNAs of about 21–22 nucleotides that can serve as reliable biomarkers for cancer diagnosis and to guide decision-making in patient management, including as predictors of survival and response to drug therapy.

    miRNAs are involved in every aspect of tumorigenesis, cancer progression, and dissemination. Not only are they expressed in tumor cells, they are also stably expressed in exosomes and are present in various bodily fluids, where they can act like hormones and signaling molecules. Comparative profiling of these fluids for differences in miRNA levels between patients with and without cancer could identify relevant biomarkers.

     

    Analyzing RNA Pathways

  • Using Qiagen’s Ingenuity Pathway Analysis, researchers can analyze relationships between molecules and diseases of interest by modeling how gene expression patterns affect functional outcomes or disease processes.

     

     

     

     

     

     

     

     

     

    Dr. Calin and colleagues have described the significance of miRNA signatures obtained in recent studies involving miRNA profiling of human tumors. An overview appeared 2014 in CA: A Cancer Journal for Clinicians (“MicroRNAome genome: a treasure for cancer diagnosis and therapy”). Also, last February, Dr. Calin gave an account of his group’s work at the Molecular Med Tri Conference in San Francisco.

    Technology is not holding back advances in the field of RNA pathway analysis according to Dr. Calin. The main bottleneck at present is in the design of prospective studies needed to confirm the predictive value of miRNA-based biomarkers.

    Dr. Calin points to two other key challenges that scientists currently face in translating research findings into diagnostic, prognostic, and therapeutic tools. One is the difficulty in selecting an miRNA target, mainly because an individual miRNA could have a role in regulating tens, hundreds, or even thousands of protein-coding genes. For drug discovery, the aim is to identify miRNAs that affect a single pathway of interest to help limit off-target effects. The need for novel delivery systems for RNA-targeted drugs is another key challenge.

    At the Molecular Med Tri Conference, Jean-Noel Billaud, Ph.D., principal scientist at Qiagen Bioinformatics, presented a case study demonstrating how the company’s Ingenuity Pathway Analysis technology can be used to conduct a systems biology analysis to identify the pathways, potential upstream regulators, and downstream outcomes involved in the host response to West Nile Virus (WNV) infection. Dr. Billaud also discussed how to interpret the results from a biological perspective.

    In his presentation, Dr. Billaud described the first step in this analytical process as the acquisition of RNA sequence data using next-generation sequencing techniques for the purpose of characterizing and quantifying differential gene expression between an infected and uninfected cell. The CLC Cancer Research Workbench tool is used to process the sequence data, and the results are imported directly into the IPA system.

    Analysis of differential gene expression aims to answer a series of key questions, including the following: What metabolic and/or signaling pathway(s) is activated or inhibited? Is there an overlap of the genes or pathways that are activated or inhibited? What are the potential upstream, downstream, functional, and phenotypic implications of this pathway activation or inhibition?

    Dr. Billaud described other questions researchers might attempt to answer through the use of IPA: What are the identifying the underlying transcriptional programs? Which biological processes are involved and in what way? Are there splice variants of interest? What type of regulation is involved?

    In the WNV case study, IPA predicted activation of the interferon signaling pathway and added statistically and functionally relevant biological processes to the WNV-related biochemical network the system developed. IPA is able to simulate the effects of interferon pathway activation on neighboring molecules and processes, which enables broader modeling of antiviral responses, prediction of the effects on viral replication, and identification of upstream transcriptional regulators of antiviral and related anti-inflammatory processes, for example.

    These data and analytical capabilities may allow researchers to propose new hypotheses that connect molecules in regulatory networks to disease-related pathways in a predictive way, leading to the identification of a “master regulator” that could serve as a disease-specific drug target, according to Dr. Billaud.

    In the WNV example, he described the use of the Molecule Activity Predictor (MAP) function in IPA to test the hypothesis that CLEC7A is a host susceptibility factor required by WNV to stimulate an immune response in the brains of infected patients, contributing to the development of life-threatening encephalitis. The MAP function simulates the inhibition or downregulation of CLEC7A, showing how it would likely reduce the risk of WNV-associated encephalitis. These types of hypotheses would then need to be tested and validated.

    Pathways Driving B-Cell Differentiation

    • Robert C. Rickert, Ph.D., professor and director of the Tumor Microenvironment and Metastasis Program at Sanford-Burnham Medical Research Institute, is using conditional gene targeting to identify the genes and biochemical pathways that play a role at specific stages of B-cell differentiation. With this approach, it is possible to knock out targeted genes in a mouse at different stages of B-cell development, and to do so in an inducible fashion, allowing you “to look at how it affects different signal transduction pathways in a context-specific manner,” says Dr. Rickert.

      When applied to a relevant mouse model of disease—such as a B-cell lymphoma—this inducible genetic system should yield effects similar to those that could be obtained with a drug capable of blocking the activity of the targeted gene product. Dr. Rickert and colleagues are exploring the similarity between the effects achieved with conditional gene targeting and those of recently approved drugs to treat chronic lymphocytic leukemia (CLL) and some forms of lymphoma such as idelalisib and ibrutinib, which are both inhibitors of the B-cell receptor pathway via blocking of PI3K or Bruton’s tyrosine kinase (BTK), respectively.

      Dr. Rickert presented his group’s latest research at a Keystone Symposium Conference, PI 3-Kinase Signaling Pathways in Disease, which took place last January in Vancouver. In his talk, Dr. Rickert emphasized that the phosphatidyl inositol-3 kinase (PI3K) pathway is a major regulator B lymphocyte differentiation and function.

      Dr. Rickert has also applied conditional gene targeting to compare the roles of the NFκB and PI3K pathways in B-cell maturation. He has shown that while both pathways are essential at some stages of B-cell differentiation, only one pathway may be necessary for B-cell maintenance and survival.

      “Ultimately we want to gain more insight at the biochemical level into single cells and the heterogeneity of the cell populations we’re interested in,” says Dr. Rickert. Tumors and cancer cell populations are quite heterogeneic, and better biochemical tools are needed to be able to sort through these populations of cells and “look at some of the more interesting, rogue cells, such as cancer stem cells,” he adds.

    An Evolutionary Approach

    In his laboratory at Hebrew University of Jerusalem, researcher Yuval Tabach, Ph.D., is using computational tools to analyze and compare the genomes and proteins of hundreds of species to identify evolutionary patterns of conservation and loss that point to connections between molecular pathways and disease.

    “The main power of this phylogenetic profiling approach is that if you look at proteins across evolution, some are lost at certain points in certain species,” says Dr. Tabach. For example, proteins involved in the tricarboxylic acid (TCA) cycle have been highly conserved across some species, but have disappeared in others because those species have lost their mitochondria.

    Dr. Tabach and colleagues have shown that sets of genes associated with particular diseases have similar phylogenetic profiles. They are also using this approach to identify genes associated with longevity, cancer resistance, and various extreme environmental conditions.

    Phylogenetic profiling to connect patterns of conservation and loss across millions of years of evolution can be applied to entire proteins, protein domains, and RNA molecules such as microRNAs. The potential applicability of this approach to drug discovery and development is multifaceted.

    For example, given a gene known to be related to a certain disease, the ability to identify other genes with a similar phylogenetic profile might reveal genetic factors that could explain incomplete penetrance or the variability of disease severity in different affected individuals. Alternatively, identification of a candidate gene in one patient could serve as the basis for identifying other key factors in other patients with the same disease using the phylogenetic profile.

    Compared to strategies such as gene expression analysis or protein-protein interaction mapping for identifying disease-related genes, phylogenetic profiling “is much faster” and will become an increasingly powerful tool as the genome sequences of more species become available, explains Dr. Tabach.

    The Israeli start-up company ReThink Pharmaceuticals is using the molecular networks generated through this phylogenetic profiling work for the purpose of drug repositioning. “If you know that a certain drug targets a gene, we can build a network to find other genes/proteins that interact with the drug target,” asserts Dr. Tabach, citing preliminary results that demonstrate the ability to predict additional effects of a drug candidate.

     

 

Measuring siRNA-mediated Knockdown of the IL-8 gene Using the QuantiGene Singleplex Assay

A critical component of RNA interference (RNAi) studies is the validation of gene expression inhibition. RNAi experiments have many sources of variation that make accurate quantitation of target mRNA difficult when qPCR is used. Variation in the potency and stability of short interfering RNA (siRNA), coupled with differences in transfection efficiency and protein turnover, results in varying gene knockdown efficiency.

 

The RNA World Expands

Over the past 10 years, scientists say new methods, including deep sequencing and DNA tiling arrays, have enabled the identification and characterization of the human transcriptome. These techniques completely changed our understanding of genome organization and content and revealed that a much larger part of the human genome is transcribed into RNA than was previously assumed—about 70%.

The RNA World Expands  

Long noncoding RNAs mean more than HOTAIR.

The RNA World Expands

Long noncoding RNA (lncRNAs) can regulate gene expression at epigenetic, transcriptional, and post-transcriptional levels. [© Alila Medicinal Media – Fotolia.com]

  • Over the past 10 years, scientists say new methods, including deep sequencing and DNA tiling arrays, have enabled the identification and characterization of the human transcriptome. These techniques completely changed our understanding of genome organization and content and revealed that a much larger part of the human genome is transcribed into RNA than was previously assumed—about 70%.

    Last year researchers, including Tim Mercer, Ph.D., at the Institute for Molecular Bioscience-University of Queensland, Roche Nimblegen, and John Rinn, Ph.D., and his team in the department of stem cell and regenerative biology at Harvard, reported that “transcriptomic analyses have revealed an ‘unexpected complexity’ to the human transcriptome, the depth and breadth of which exceeds current RNA sequencing capability.”

    These scientists used these techniques to identify and characterize unannotated transcripts whose rare or transient expression is below the detection limits of conventional sequencing approaches. The data also show that intermittent sequenced reads observed in conventional RNA sequencing datasets, previously dismissed as noise, are indicative of unassembled rare transcripts. Collectively, they say these results reveal the range, depth, and complexity of a human transcriptome that is far from fully characterized.

    Noncoding transcripts are RNA molecules that include classical “housekeeping” RNAs such as transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), small nuclear RNAs (snRNAs), and small nucleolar RNAs (snoRNAs), which are constitutively expressed and play critical roles in protein biosynthesis.

    Among these noncoding RNAs are numerous long noncoding RNAs (lncRNAs), which are defined as endogenous cellular RNAs of more than 200 nucleotides in length that lack an open reading frame of significant length (less than 100 amino acids). The RNA molecules constitute a heterogeneous group, allowing them, scientists point out, to cover a broad spectrum of molecular and cellular functions by implementing different modes of action. lncRNAs are roughly classified based on their position relative to protein-coding genes as intergenic (between genes), intragenic/intronic (within genes), and antisense. Initial efforts to characterize these molecules demonstrated that they function in cis, regulating their immediate genomic neighbors.

    Regulatory Levels

  • lncRNAs can regulate gene expression at epigenetic, transcriptional, and post-transcriptional levels and take part in various physiological and pathological processes, such as cell development, immunity, oncogenesis, clinical disease processes, and more. A classic lncRNA, HOTAIR, was originally identified through work done by Howard Chang, M.D., Ph.D., at Stanford, and Dr. Rinn. Their research eventually led to the discovery of this 2.2 kilobase spliced RNA transcript that interacts with Polycomb group proteins to modify chromatin and repress transcription of the human HOX genes, which regulate development. It remains unclear as to exactly this is accomplished.

    HOTAIR, it was found, originates from the HOXC locus and represses transcription across 40 kb of that locus by altering the chromatin trimethylation state. Hox genes, a highly conserved subgroup of the homeobox superfamily, regulate numerous processes including apoptosis, receptor signaling, differentiation, motility, and angiogenesis. Aberrations in Hox gene expression have been reported in abnormal development and malignancy.

    HOTAIR works to repress Hox gene expression by directing the action of Polycomb chromatin remodeling complexes in trans to govern the cells’ epigenetic state and subsequent gene expression.HOTAIR expression is increased in primary breast tumors and metastases and its expression level in primary tumors can predict eventual metastasis and death. The recent discovery that lncRNA HOTAIRcan link chromatin changes to cancer metastasis furthers the relevance of lncRNAs to human disease.

    Dr. Chang and his colleagues say that the finding that several lncRNAs can control transcriptional alteration implies that the difference in lncRNA profiling between normal and cancer cells is not merely the secondary effect of cancer transformation, and that lncRNAs are strongly associated with cancer progression. The researchers showed that lncRNAs in the HOX loci become systematically dysregulated during breast cancer progression.

    They further demonstrated that enforced expression of HOTAIR in epithelial cancer cells induced genome-wide retargeting of polycomb repressive complex 2 (PRC2) to an occupancy pattern more resembling embryonic fibroblasts, leading to altered histone H3 lysine 27 methylation, gene expression, and increased cancer invasiveness and metastasis in a manner dependent on PRC2.

    On the other hand they noted loss of HOTAIR can inhibit cancer invasiveness, particularly in cells that possess excessive PRC2 activity. These findings indicate that lncRNAs have active roles in modulating the cancer epigenome and may be important targets for cancer diagnosis and therapy. Thus, the investigators say, differential expression of lncRNAs may be profiled to aid in cancer diagnosis and prognosis and in the selection of potential therapeutics.

    Two years ago the GENCODE consortium, within the framework of the ENCODE project, presented, and analyzed the most complete human lncRNA annotation to date. The data comprise 9,277 manually annotated genes producing 14,880 transcripts. The identification and annotation of this wealth of lncRNAs leaves scientists with a lot of research to do to fully characterize the varied functions of these unusual RNAs. Their identification also challenges technology developers to produce the tools to necessary for these analyses.

     

Transcript Regulation of 18 ADME Genes by Prototypical Inducers in Human Hepatocytes

Drug-drug interactions (DDIs) are of particular concern for regulatory agencies and the pharmaceutical industry for drug safety. Induction of drug metabolizing enzymes by pharmaceuticals, nutraceuticals, and lifestyle influences is one type of DDI in which the influence of a perpetrator molecule increases the enzyme capacity that can metabolize a victim molecule, rendering it ineffective as a therapy. To evaluate this potential, screening assays have been developed, such as the use…

 

Biomarkers Reshape Drug Development

Biomarkers defining specific phenotypes are becoming increasingly important for developing new drugs for specific patient subpopulations. The value of a new biomarker is measured by its ability to reduce risk. Ideally, the biomarker should be developed in parallel with the new drug, as nearly 50% of the projected development costs can be saved by…

Biomarkers Reshape Drug Development

  • Imanova takes a structured approach to the development of imaging biomarkers, or i-biomarkers.

    Biomarkers defining specific phenotypes are becoming increasingly important for developing new drugs for specific patient subpopulations. The value of a new biomarker is measured by its ability to reduce risk.

    Ideally, the biomarker should be developed in parallel with the new drug, as nearly 50% of the projected development costs can be saved by shutting down a development program before it enters Phase II. A meaningful risk-benefit analysis of a biomarker requires estimates of its cost and accuracy, as well as the consequences of decisions that it will enable.
    For the biomarker to be of value, the cost of its development has to be less than the projected costs of development from Phase II onwards, discounted to present time. While multiple competing business considerations affect a pharmaceutical company’s decision to proceed with a biomarker program, the skyrocketing market for biomarker discovery underscores the pharmaceutical industry’s hope that biomarkers will bolster the success rates of pipeline products.
    “Imaging biomarkers have been Ideally, the biomarker should be developed in parallel with the new drug, as nearly 50% of the projected development costs can be saved by shutting down a development program before it enters Phase II. A meaningful risk-benefit analysis of a biomarker requires estimates of its cost and accuracy, as well as the consequences of decisions that it will enable.

    Ideally, the biomarker should be developed in parallel with the new drug, as nearly 50% of the projected development costs can be saved by shutting down a development program before it enters Phase II. A meaningful risk-benefit analysis of a biomarker requires estimates of its cost and accuracy, as well as the consequences of decisions that it will enable.

    For the biomarker to be of value, the cost of its development has to be less than the projected costs of development from Phase II onwards, discounted to present time. While multiple competing business considerations affect a pharmaceutical company’s decision to proceed with a biomarker program, the skyrocketing market for biomarker discovery underscores the pharmaceutical industry’s hope that biomarkers will bolster the success rates of pipeline products.

    “Imaging biomarkers have been largely underutilized in drug development,” says Kevin Cox, Ph.D., CEO of London-based Imanova. “But we believe that molecular imaging has the power to assist in successful translation of molecules by reducing the risk of several specific causes of failure in Phase II clinical studies. Imaging biomarkers, or i-biomarkers, are especially valuable in giving confidence of tissue delivery, determination of target engagement, and the evaluation of a drug’s pharmacodynamic effects.”

    While imaging is routinely used in clinical diagnostics for cancer, its acceptance in drug development has been slow. “This is a highly specialized area of knowledge,” Dr. Cox observes. “Designing imaging experiments to answer the right questions is not trivial. Combined with the perceived high costs and dearth of well-equipped facilities, this has slowed down the adoption of imaging as an integral step in drug development.”

    Imanova presents an innovative and highly integrated solution in reducing the barriers for use of molecular imaging. Located in the former GlaxoSmithKline imaging center, Imanova’s staff applies the knowledge needed for translational application of imaging science.

    “Another historical barrier for use of molecular imaging has been the lack of versatile PET tracers for key therapeutic targets,” remarks Dr. Cox. Together with its pharmaceutical clients, Imanova develops proprietary tracers that can answer critical questions about target engagement directly after drug administration. A structured approach for i-biomarker development takes the novel tracer from the candidate pool to clinical validation.

    Uniquely, Imanova utilizes in silico biomathematical modeling to predict a candidate with ideal physicohemical characteristics. “The i-biomarker development pipeline adheres to a strict quality system,” continues Dr. Cox. “We not only provide candidate selection and labeling, but also rigorous preclinical evaluation in several species, combined with blood chemistry or other physiological measurements.”

    The resulting biomarker provides quantitative information to make informed go/no-go decisions. Imanova hopes to develop an open innovation approach to i-biomarker research, and to encourage pharmaceutical companies to collaborate on tracer development.

    “By collaborating in this pre-competitive space, a pharma-academic consortium can de-risk i-biomarker development programs and generate new tools to eliminate costs associated with futile activities downstream,” concludes Dr. Cox. “Most tracers need to be utilized early in the drug development process. Used at the right time, imaging biomarkers are able to inform the design of Phase II studies, including dose ranging and possibly patient selection, saving many months in development and millions of dollars in costs.”

    Answers from Big Data

  • “Clinical bioinformatics is the application of a data-driven, high-tech approach in clinical setting,” says Jerome Wojcik, Ph.D., CEO of Quartz Bio, a clinical bioinformatics service provider located in Plan-Les-Ouates, Switzerland. “We use clinical bioinformatics to adapt treatment to patients, that is, to identify cohorts that respond to the drug in a predictable manner,” says Dr. Wojcik.

    Pharmaceutical partners supply Quartz Bio with data collected in a course of clinical trials. The data (which may include information from protein and RNA expression, genotyping, molecular diagnostics, and flow cytometry studies) often exists in silos within a pharma company. To make sense of the data, Quartz Bio integrates heterogeneously formatted data, analyzes it for consistency, and identifies gaps and outliers.

    Dr. Wojcik’s team dedicates over 40% of the overall analysis time to the biomarker data management. This key step is crucial for the quality of the overall analysis. According to Quartz Bio, all the data-management processes are documented, auditable, and reproducible.

    Once the “Big Data” horde is adequately cleaned up, the team applies adaptive statistical methods to generate multiple hypotheses linking the drug action with subpopulations of patients. “Our challenge is to generate reliable hypotheses on a fairly small statistical patient sample, for example, a thousand patients, but using millions of biomarker datapoints,” continues Dr. Wojcik. “We do not rely on statistics alone. Graphical visualization adapted to the objectives of the study is necessary for interpretation of results.”

    In a recent project, Quartz Bio analyzed multiple oncology biomarkers, such as gene expression, circulating tumor cells, and immunohistochemistry, to identify patient cohorts that would most likely benefit from a novel treatment. Biomarker analysis revealed a subpopulation whose survival rate increased significantly over the population average, bringing a potential application of personalized medicine closer to reality.

     

 

Read Full Post »

Anatomy of a $105M Deal for Joint R&D in Genomics: CRISPR Therapeutics & Vertex Pharmaceuticals, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 2: CRISPR for Gene Editing and DNA Repair

Anatomy of a $105M Deal for Joint R&D in Genomics: CRISPR Therapeutics & Vertex Pharmaceuticals

Reporter: Aviva Lev-Ari, PhD, RN

Under the terms of the agreement:

  • the partners will also evaluate an enumerated but undisclosed number of targets.
  • Vertex will provide CRISPR Therapeutics with a $75 million up-front cash payment
  • as well as a $30 million equity investment.
  • CRISPR Therapeutics is also eligible to receive development, regulatory, and sales milestone payments of up to $420 million
  • as well as royalty payments on future sales.
  • CRISPR Therapeutics will conduct the research,
  • with Vertex covering all related expenses.
  • Vertex holds the option to exclusively license up to six of any gene-based therapies that emerge from the collaboration.

 

CRISPR Therapeutics, Vertex Pharmaceuticals Ink $105M Collaboration Deal

NEW YORK (GenomeWeb) – CRISPR Therapeutics and Vertex Pharmaceuticals have linked up for a four-year strategic research collaboration to develop new treatments of genetic diseases using CRISPR/Cas9 genome editing.

The partners will evaluate the use of CRISPR/Cas9 in multiple diseases where gene targets have already been established by genetics research, including cystic fibrosis and sickle cell disease. Under the terms of the agreement, the partners will also evaluate an enumerated but undisclosed number of targets. Vertex will provide CRISPR Therapeutics with a $75 million up-front cash payment as well as a $30 million equity investment. CRISPR Therapeutics is also eligible to receive development, regulatory, and sales milestone payments of up to $420 million as well as royalty payments on future sales.

CRISPR Therapeutics will conduct the research, with Vertex covering all related expenses. Vertex holds the option to exclusively license up to six of any gene-based therapies that emerge from the collaboration.

The firms said in a statement that the initial focus of the collaboration will be to try to correct mutations in the cystic fibrosis transmembrane conductance regulator gene.

The collaboration will also focus on treatments for hemoglobinopathies, such as sickle cell disease. For treatments of these diseases, the partners will share all R&D costs equally and CRISPR Therapeutics will lead commercialization activities in the US.

The collaboration also provides Vertex with an observer seat on the CRISPR Therapeutics board of directors.

Basel, Switzerland-based CRISPR Therapeutics was co-founded by CRISPR/Cas9 pioneer Emmanuelle Charpentier and Nobel laureate Craig Mello, among others. The firm has R&D operations based in Cambridge, Massachusetts and corporate offices in London.

In May 2014, CRISPR Therapeutics raised $25 million in Series A funding.

SOURCE

https://www.genomeweb.com/business-news/crispr-therapeutics-vertex-pharmaceuticals-ink-105m-collaboration-deal

Read Full Post »

Malaria Vaccine Efficacy

Curators: Larry H. Bernstein, MD, FCAP, and Aviva Lev-Ari, PhD, RN

LPBI

Malaria Vaccine Efficacy Could Rely on Parasite’s Genotype

NEW YORK (GenomeWeb) – A malaria vaccine may be more effective against parasites whose genotype matches that of the vaccine itself, according to researchers from Harvard University and the Fred Hutchinson Cancer Research Center.

Reporting this week in the New England Journal of Medicine, researchers evaluated malarial genotypes of individuals enrolled in a phase III trial of GlaxoSmithKline’s vaccine, RTS,S/ASO1.

https://www.genomeweb.com/sequencing-technology/malaria-vaccine-efficacy-could-rely-parasites-genotype

The vaccine was previously evaluated in a large phase III trial in Africa in more than 15,000 children and was found to confer “moderate protective efficacy against clinical disease and severe malaria that wanes over time,” according to the study authors.

The mechanism by which the vaccine confers protection is incompletely understood, although it is known to target a specific protein produced by thePlasmodium falciparum malaria parasite called circumsporozoite protein. However, the circumsporozoite protein contains regions where polymorphisms can occur, including a conserved tandem repeat with a length polymorphism between 37 and 44 repeat unit, and numerous polymorphisms within the C-terminal region of protein.

Researchers hypothesized the vaccine might be less effective against malaria parasites with polymorphisms in those regions.

To test this theory, they used PCR and next-generation sequencing on both Illumina’s MiSeq instrument and Pacific Biosciences’ RS II. The researchers targeted and sequenced the circumsporozoite protein C-terminal and as well as a control region with the MiSeq from children enrolled in the clinical trial who had become infected with malaria. They used the PacBio system to sequence the longer repeat region.

Over 4,000 samples were sequenced on the MiSeq and over 3,000 on the PacBio. Samples included patients at multiple time points after they received the vaccine.

Genetic data of the malaria parasite was evaluated from 1,181 kids between the ages of five and 17 months who received the RTS,S vaccine and 909 who received a control vaccine, all of whom had developed clinically confirmed malaria.

Over two-thirds of patents had “complex infections,” defined as being founded by two or more distinct parasite lineages, the authors reported. Patients that received the RTS,S vaccine were more likely to have complex infections — 71 percent had complex infections compared to 61 percent of patients who received the control vaccine.

Looking at the relationship between polymorphisms to the C-terminal region and vaccine efficacy, the researchers found that one-year post vaccination, the C-terminal region in the malaria parasite matched that of the vaccine in 139 individuals, but was a mismatch in 1,951 individuals. Thus, cumulative vaccine efficacy against malaria with a perfect genotype match at the C-terminal site was 50.3 percent. For those without a perfect match, efficacy was 33.4 percent.

In addition, efficacy was higher immediately after receiving the vaccine. Through six months post vaccination, efficacy was 70.2 percent in individuals with a matched genotype and 56.3 percent in those with mismatched genotypes.

Looking at the relationship between the number of repeats and vaccine efficacy, the researchers found a non-significant effect with increasing repeats and vaccine efficacy.

The results suggest that among children between the ages of five and 17 months the RTS,S vaccine “has greater activity against malaria parasites with matched circumsporozoite protein allele than against mismatched malaria,” the authors concluded, and overall vaccine efficacy will depend on the genotype of the local parasite population.

In addition, the authors noted, “Genetic surveillance of circumsporozoite protein sequences in parasite populations could inform the development of future vaccine candidates targeting polymorphic malaria proteins.”

Read Full Post »

« Newer Posts - Older Posts »