Feeds:
Posts
Comments

Posts Tagged ‘Genome-wide association study’

Genomic Promise for Neurodegenerative Diseases, Dementias, Autism Spectrum, Schizophrenia, and Serious Depression

Reporter and writer: Larry H Bernstein, MD, FCAP

There has been an considerable success in the current state of expanding our knowledge in genomics and therapeutic targets in cancer (although clinical remission targets and relapse are a concern), cardiovascular disease, and infectious disease.  Our knowledge of  prenatal and perinatal events is still at an early stage.  The neurology front is by no means unattended.  Here there are two prominent drivers of progress –

  • genomic control of cellular apoptosis by ubiquitin pathways, and
  • epigenetic investigations,

among a complex sea of sequence-changes.  I indicate some of the current status in this.  However, as much as we have know, there is an incredible barrier to formulate working models because:

  1. ligand binding between DNA short-sequences is not predictable over time
  2. binding between proteins and DNA is still largely unknown
  3. specific regulatory roles between nucleotide-sequences and histone proeins are still unclear
  4. the relationship between intracellular as well as extracellular cations and the equilibria between cations and anions in intertitial fluid that bathes the cell and between organelles is virgin territory

Consequently, it is quite an accomplishment to have come as far as we have come, and yet, even with the huge compuational power at our disposal, there is insuficient data to unravel the complexity.  This may be especially true in the pathway to understanding of neurological and behavioral disorders.

Broad Map of Brain

John Markoff reports in the Feb 18 front-page of New York Times (Project would construct a broad map of the brain) that the Obama administration envisions a decade-long effort to examine the workings of the human brain and construct a map, comparable to what the Human Genome Project did for genetics.  It will be a collaboration between universities, the federal government, private foundations, and teams of scientists (neuro-, nano- and whoever else).  The goal is to break through the barrier to understanding the brain’s billions of neurons and gain greater insight into

  • perception
  • actions
  • and consciousness.

Essentially, it holds great promise for understanding

Alzheimer’s disease and Parkinson’s, as well as finding therapies for a variety of mental illnesses.  An open-ended question is whether it will also advance artificial intelligence research.  It is termed the Brain Activity Map project.
http://NYTimes/broad-map-of-brain/

Schizophrenia Genomics

Scientists Reveal Genomic Explanation for Schizophrenia

July 11, 2011 

http://GenWeb.com/Exome Sequences Reveal Role for De Novo Mutations in Schizophrenia/
h
ttp://NatureGenetics.com/Exome Sequences Reveal Role for De Novo Mutations in Schizophrenia/
http://SchizophreniaResearch.com/INFS integrates diverse neurological signals that control the development of embryonic stem cell and neural progenitor cells/

Buffalo, NY (Scicast) (GenomeWeb News) –

Two new studies, published in Schizophrenia Research and in Nature Genetics, propose hypotheses in a new mouse model of schizophrenia that demonstrates how gestational brain changes cause behavioural problems later in life.  

The first study implicates

A fibroblast growth factor receptor protein, (FGFR1), targets diverse genes implicated in schizophrenia.  The research demonstrates how defects in an important neurological pathway in early development

  • may be responsible for the onset of schizophrenia later in life.

Individuals with sporadic schizophrenia tend to carry more deleterious genetic changes than found in the general population, according to an exome sequencing study  that appeared online in Nature Genetics yesterday.  “The occurrence of de novo mutations may in part explain the high worldwide incidence of schizophrenia,”  according to co-senior author Guy Rouleau, CHU Sainte-Justine Research Center of University of Montreal.
Researchers from Canada and France did exome sequencing on individuals from 14 parent-child trios, each comprised of an individual with schizophrenia and his or her unaffected parents. In the process, they found

  • 15 de novo mutations in coding sequences from eight individuals with the psychiatric condition, including
  • four nonsense mutations predicted to abbreviate protein sequences.

“They surmise that [de novo mutations] may account for some of the heritability reported for schizophrenia.  Recent exome sequencing studies involving parent-child trios have implicated de novo mutations in other brain-related conditions, including

  • autism spectrum disorder and
  • mental retardation.

To detect de novo genetic changes specific to schizophrenia, the team compared coding sequences from affected individuals with

  • the human reference genome, with
  • both of his or her parents, and
  • with 26 unrelated control individuals.

Of the 15 de-novo mutations verified by Sager sequencing,

  • 11 were missense mutations predicted to alter the amino acid sequence of the resulting protein and
  • four were nonsense mutations predicted to truncate it.

Among the genes containing nonsense mutations were the zinc finger protein-coding gene ZNF480, the karyopherin alpha 1 gene KPNA1, the low-density lipoprotein receptor-related gene LRP1, and the ALS-like protein-coding gene ALS2CL.

The 15 mutations were found in coding sequences from eight of the individuals with schizophrenia,

  • hinting at a higher de novo mutation rate in individuals with sporadic schizophrenia than is predicted in the population overall.

This difference seems to be specific to exomes, and the researchers noted that

  • de novo mutation rates across the entire genome are likely comparable in those with or without schizophrenia.

They conclude that the enrichment of [de novo mutations] within the coding sequence of individuals with schizophrenia may underlie the pathogenesis of many of these individual.  Most of the genes identified in this study have not been previously linked to schizophrenia, thereby providing new potential therapeutic targets.

The second study

  • identifies the Integrative Nuclear FGFR 1 Signaling (INFS) as a central intersection point for multiple pathways of
  • as many as 160 different genes believed to be involved in the disorder.

The lead author Dr. Michal Stachowiakthis (UB School of Medicine and Biomedical Sciences) suggests this  is the first model that explains schizophrenia

  1. from genes
  2. to development
  3. to brain structure and
  4. finally to behaviour .

A key challenge has been that patients with schizophrenia exhibit mutations in different genes. It is  possible to have 100 patients with schizophrenia and each one has a different genetic mutation that causes the disorder. The explanation is possibly because INFS integrates diverse neurological signals that control the development of embryonic stem cell and neural progenitor cells, and

  • links pathways involving schizophrenia-linked genes.

“INFS functions like the conductor of an orchestra,” explains Stachowiak. “It doesn’t matter which musician is playing the wrong note,

  • it brings down the conductor and the whole orchestra.

With INFS, we propose that

  • when there is an alteration or mutation in a single schizophrenia-linked gene,
  • the INFS system that controls development of the whole brain becomes untuned.

Using embryonic stem cells, Stachowiak and colleagues at UB and other institutions found that

  • some of the genes implicated in schizophrenia bind the FGFR1 (fibroblast growth factor receptor) protein,
  • which in turn, has a cascading effect on the entire INFS.

“We believe that FGFR1 is the conductor that physically interacts with all genes that affect schizophrenia,” he says. “We think that schizophrenia occurs

  • when there is a malfunction in the transition from stem cell to neuron, particularly with dopamine neurons.”

The researchers tested their hypothesis by creating an FGFR1 mutation in mice, which produced the hallmarks of the human disease: altered brain anatomy,

  • behavioural impacts and
  • overloaded sensory processes.

The researchers would like to devise ways to arrest development of the disease before it presents fully in adolescence or adulthood. The UB work adds to existing evidence that nicotinic agonists, might  help improve cognitive function in schizophrenics by acting on the INFS.

childhood-schizophrenia-symptoms

childhood-schizophrenia-symptoms (Photo credit: Life Mental Health)

English: Types of point mutations. With examples.

English: Types of point mutations. With examples. (Photo credit: Wikipedia)

Parkinson’s Disease

http:// CMEcorner.com/file:///G:/neurodegenerative_disease/Parkinson’s_disease.htm

PINK1 and Parkin and Parkinson’s Disease

Studies of the familial Parkinson disease-related proteins PINK1 and Parkin have demonstrated that these factors promote the fragmentation and turnover of mitochondria following treatment of cultured cells with mitochondrial depolarizing agents. Whether PINK1 or Parkin influence mitochondrial quality control under normal physiological conditions in dopaminergic neurons, a principal cell type that degenerates in Parkinson disease, remains unclear. To address this matter, we developed a method to purify and characterize neural subtypes of interest from the adult Drosophila brain.

Using this method, we find that dopaminergic neurons from Drosophila parkin mutants accumulate enlarged, depolarized mitochondria, and that genetic perturbations that promote mitochondrial fragmentation and turnover rescue the mitochondrial depolarization and neurodegenerative phenotypes of parkin mutants. In contrast, cholinergic neurons from parkin mutants accumulate enlarged depolarized mitochondria to a lesser extent than dopaminergic neurons, suggesting that a higher rate of mitochondrial damage, or a deficiency in alternative mechanisms to repair or eliminate damaged mitochondria explains the selective vulnerability of dopaminergic neurons in Parkinson disease.

Our study validates key tenets of the model that PINK1 and Parkin promote the fragmentation and turnover of depolarized mitochondria in dopaminergic neurons. Moreover, our neural purification method provides a foundation to further explore the pathogenesis of Parkinson disease, and to address other neurobiological questions requiring the analysis of defined neural cell types.

Burmana JL, Yua S, Poole AC, Decala RB , Pallanck L. Analysis of neural subtypes reveals selective mitochondrial dysfunction in dopaminergic neurons from parkin mutants.

http://Burmana JL, Yua S, Poole AC, Decala RB , Pallanck L. Analysis of neural subtypes reveals selective mitochondrial dysfunction in dopaminergic neurons from parkin mutants./

Autophagy in Parkinson’s Disease.

Parkinson’s disease is a common neurodegenerative disease in the elderly. To explore the specific role of autophagy and the ubiquitin-proteasome pathway in apoptosis,

  • a specific proteasome inhibitor and macroautophagy inhibitor and stimulator were selected to investigate
  1. pheochromocytoma (PC12) cell lines
  2. transfected with human mutant (A30P) and wildtype (WT) -synuclein.
  • The apoptosis ratio was assessed by flow cytometry.
  • LC3heat shock protein 70 (hsp70) and caspase-3 expression in cell culture were determined by Western blot.
  • The hallmarks of apoptosis and autophagy were assessed with transmission electron microscopy.

Compared to the control group or the rapamycin (autophagy stimulator) group, the apoptosis ratio in A30P and WT cells was significantly higher after treatment with inhibitors of the proteasome and macroautophagy.

  1. The results of Western blots for caspase-3 expression were similar to those of flow cytometry;
  2. hsp70 protein was significantly higher in the proteasome inhibitor group than in control, but
  3. in the autophagy inhibitor and stimulator groups, hsp70 was similar to control.

These findings show that

  1. inhibition of the proteasome and autophagy promotes apoptosis, and
  2. the macroautophagy stimulator rapamycin reduces the apoptosis ratio.
  3. And inhibiting or stimulating autophagy has less impact on hsp70 than the proteasome pathway.

In conclusion,

  • either stimulation or inhibition of macroautophagy, has less impact on hsp70 than on the proteasome pathway.
  • rapamycin decreased apoptotic cells in A30P cells independent of caspase-3 activity.

Although several lines of evidence recently demonstrated crosstalk between autophagy and caspase-independent apoptosis, we could not confirm that

  • autophagy activation protects cells from caspase-independent cell death.

Undoubtedly, there are multiple connections between the apoptotic and autophagic processes. Inhibition of autophagy may

  • subvert the capacity of cells to remove
  • damaged organelles or to remove misfolded proteins, which
  • would favor apoptosis.

However, proteasome inhibition activated macroautophagy and accelerated apoptosis. A likely explanation is inhibition of the proteasome favors oxidative reactions that trigger apoptosis, presumably through

  • a direct effect on mitochondria, and
  • the absence of NADPH2 and ATP which may
  • deinhibit the activation of caspase-2 or MOMP.

Another possibility is that aggregated proteins induced by proteasome inhibition increase apoptosis.

Yang F, Yanga YP, Maoa CJ, Caoa BY, et al. Role of autophagy and proteasome degradation pathways in apoptosis of PC12 cells overexpressing human -synuclein. Neuroscience Letters 2009; 454:203–208. doi:10.1016/j.neulet.2009.03.027. www.elsevier.com/locate/neulet   http://neurosciletters.com/ Role_of_autophagy_and_proteasome_degradation_pathways_in_apoptosis_of_PC12_cells_overexpressing_human –synuclein/

Parkin-dependent Ubiquitination of Endogenous Bax

Autosomal recessive loss-of-function mutations within the PARK2 gene functionally inactivate the E3 ubiquitin ligase parkin, resulting

  • in neurodegeneration of catecholaminergic neurons and a familial form of Parkinson disease.

Current evidence suggests both

  • a mitochondrial function for parkin and
  • a neuroprotective role, which may in fact be interrelated.

The antiapoptotic effects of Parkin have been widely reported, and may involve

fundamental changes in the threshold for apoptotic cytochrome c release, but the substrate(s) involved in Parkin dependent protection had not been identified. This study demonstrates

  • the Parkin-dependent ubiquitination of endogenous Bax
  • comparing primary cultured neurons from WT and Parkin KO mice and
  • using multiple Parkin-overexpressing cell culture systems.

The direct ubiquitination of purified Bax was also observed in vitro following incubation with recombinant parkin.

  1. Parkin prevented basal and apoptotic stress induced translocation of Bax to the mitochondria.
  2. an engineered ubiquitination-resistant form of Bax retained its apoptotic function,
  3. but Bax KO cells complemented with lysine-mutant Bax
  • did not manifest the antiapoptotic effects of Parkin that were observed in cells expressing WT Bax.

The conclusion is that Bax is the primary substrate responsible for the antiapoptotic effects of Parkin, and provides mechanistic insight into at least a subset of the mitochondrial effects of Parkin.

Johnson BN, Berger AK, Cortese GP, and LaVoie MJ. The ubiquitin E3 ligase Parkin regulates the proapoptotic function of Bax. PNAS 2012, pp 6. www.pnas.org/cgi/doi/10.1073/pnas.1113248109
http://
PNAS.org/ The_ubiquitin_E3_ligase_Parkin_regulates_the_proapoptotic_function_of_Bax

                                                                                                                           nature10774-f3.2   ubiquitin structures  Rn1  Rn2

Ubiquitin is a small, compact protein characterized by a b-grasp fold.

Parkin Promotes Mitochondrial Loss in Autophagy

Parkin, an E3 ubiquitin ligase implicated in Parkinson’s disease,

  • promotes degradation of dysfunctional mitochondria by autophagy.

upon translocation to mitochondria, Parkin activates the ubiquitin–proteasome system (UPS) for

  • widespread degradation of outer membrane proteins.

We observe

  1. an increase in K48-linked polyubiquitin on mitochondria,
  2. recruitment of the 26S proteasome and
  3. rapid degradation of multiple outer membrane proteins.

The degradation of proteins by the UPS occurs independently of the autophagy pathway, and

  • inhibition of the 26S proteasome completely abrogates Parkin-mediated mitophagy in HeLa, SH-SY5Y and mouse cells.

Although the mitofusins Mfn1 and Mfn2 are rapid degradation targets of Parkin, degradation of additional targets is essential for mitophagy.

It appears that remodeling of the mitochondrial outer membrane proteome is important for mitophagy, and reveal

  • a causal link between the UPS and autophagy, the major pathways for degradation of intracellular substrates.

Chan NC, Salazar AM, Pham AH, Sweredoski MJ, et al. Broad activation of the ubiquitin–proteasome system by Parkin is critical for mitophagy. Human Molecular Genetics 2011; 20(9): 1726–1737. doi:10.1093/hmg/ddr048.  http://HumMolecGenetics.com/ Broad_activation_of_the_ubiquitin–proteasome_system_by_Parkin_is_critical_for_mitophagy/

Autophagy impairment: a crossroad

Nassif M and Hetz C.  Autophagy impairment: a crossroad between neurodegeneration and tauopathies.  BMC Biology 2012; 10:78. http://www.biomedcentral.com/1741-7007/10/78

http://BMC.com/Biology/Autophagy impairment: a crossroad between neurodegeneration and tauopathies/
http://
Molecular Neurodegeneration/Nassif M and Hetz C/

Impairment of protein degradation pathways such as autophagy is emerging as

  • a consistent and transversal pathological phenomenon in neurodegenerative diseases, including Alzheimer´s, Huntington´s, and Parkinson´s disease.

Genetic inactivation of autophagy in mice has demonstrated a key role of the pathway in maintaining protein homeostasis in the brain,

  • triggering massive neuronal loss and
  • the accumulation of abnormal protein inclusions.

This paper in Molecular Neurodegeneration from Abeliovich´s group now suggests a role for

  • phosphorylation of Tau and
  • the activation of glycogen synthase kinase 3β (GSK3β)
  • in driving neurodegeneration in autophagy-deficient neurons.

This study illuminatess the factors driving neurofibrillary tangle formation in Alzheimer´s disease and tauopathies.

autophagy & apoptosis          stem cell reprogramming     lysosomes.jpeg   exosomes.jpeg   Epigenetics

images: autophagy, stem cell remodeling, lysosome, exosome, epigenetics,

Alzheimer’s Disease

Alzheimer’s Linked To Rare Gene Mutation That Affects Immune System

Article Date: 15 Nov 2012 –
Two international studies published this week point to a link between Alzheimer’s disease and a rare gene mutation that affects the immune system’s inflammation response. The discovery supports an emerging theory about the role of the immune system in the development of Alzheimer’s disease.  Both studies were published online this week in the New England Journal of Medicine, one led by John Hardy of University College London, and the other led by the Iceland-based global company deCode Genetics.
Alzheimer’s is a form of distressing brain-wasting disease that gradually robs people of their memories and their ability to lead independent lives. Its main characteristic is the build up of
  • protein tangles and
  • plaques inside and between brain cells, which eventually
  • disrupts their ability to communicate with each other.
Both teams conclude that a rare mutation in a gene called TREM2, which helps trigger immune system responses, raises the risk for developing Alzheimer’s disease. One study suggests it raises it three-fold, the other, four-fold.  The UCL-led study included researchers from 44 institutions around the world and data on a total of 25,000 people.
After homing in on the TREM2 gene using new sequencing techniques, they carried out further sequencing that identified a set of
  • rare mutations that occurred more often in 1,092 Alzheimer’s disease patients than in a group of 1,107 healthy controls.
They evaluated the most common mutation, R47H, and confirmed that this variant of TREM2 substantially increases the risk for Alzheimer’s disease.  R47H mutation was present in 1.9 percent of the Alzheimer’s patients and in only 0.37 percent of the controls.  The researchers on the study led by deCode Genetics indicate that this strong effect is on a par with that of the well-established gene variant known as APOE4. Not all people who have  the R47H variant will develop Alzheimer’s and in those who do, other genes and environmental factors will also play a role — but like APOE 4 it does substantially increase risk,” Carrasquillo explains.
The study led by deCode Genetics involved collaborators from Iceland, Holland, Germany and the US, not only found a strong link between the R47H variant and Alzheimer’s disease, but the variant also

  • predicts poorer cognitive function in older people without Alzheimer’s.
 In a statement, lead author Kari Stefánsson, CEO and co-founder of deCODE Genetics says:
The discovery of variant TREM2 is important because
  • it confers high risk for Alzheimer’s and
  • because the gene’s normal biological function has been shown to reduce immune response
 He surmises that the  combined factors make TREM2 an attractive target for drug development.
Using deCode’s genome sequencing and genotyping technology, Stefánsson and colleagues identified
  • approximately 41 million markers, including 191,777 functional variants, from
  • 2,261 Icelandic samples.
They further analyzed these variants against the genomes of
  • 3,550 people with Alzheimer’s disease and
  • a control group of over-85s who did not have a diagnosis of Alzheimer’s.
This led to them finding the TREM2 variant, and to make sure this was not just a feature of Icelandic people,
  • they replicated the findings against other control populations in the United States, Germany, the Netherlands and Norway.
Stefánsson says that the results were enabled by having
  • sophisticated research tools,
  • access to expanded and high quality genomic data sets, and
  • investigators with profound analytic skills,
Researching into genetic causes of disease can, thereby,  be carried out using an approach that combines sequence data and biological knowledge to find new drug targets.

R47H Variant of TREM2 and Immune Response

 Preclinical studies have found that
  • TREM2 is important for clearing away cell debris and amyloid protein, the protein that is associated with the brain plaques
  • that are characteristic of Alzheimer’s disease.
 The gene helps control the
  • inflammation response associated with Alzheimer’s and cognitive decline.
Rosa Rademakers, a co-author in the UCL-led study, runs a lab at the Mayo Clinic in Florida that helped to pinpoint the R47H variant of TREM2.  Other studies also link the immune system to Alzheimer’s disease, but
  • studies are needed to establish that R47H  acts by altering immune function.

EPIGENETICS, HISTONE PROTEINS, AND ALZHEIMER’S DISEASE

12/10/12 · Emily Humphreys
Epigenetic effects were first described by Conrad Waddington in 1942 as phenotypic changes resulting from an organism interacting with its environment.1 Today, epigenetics is
  • heritable effects in gene expression that are
  • not based on the genetic sequence.
One known epigenetic mechanism includes posttranslational modifications of histones that are
  • found in the nuclei of nearly all eukaryotes and
  • function to package DNA into nucleosomes.
Histone proteins can be heavily decorated with posttranslational modifications (PTMs), such as
  • acetyl-,
  • methyl-, and
  • phosphoryl- groups at distinct amino acid residues.
These modifications are mainly
  • located in the N-terminal tails of the histone and
  • protrude from the core nucleosome structure.
Gene regulation, and the downstream epigenetic effects, can also
  • depend on the cis or trans orientation of the PTMs.2
One PTM, acetylation, is an important determinant of cell replication, differentiation, and death.3  Zhang, et al. investigated the acetylation of histone proteins in Alzheimer’s disease (AD) pathology found in postmortem human brain tissue compared to neurological controls. To study histone acetylation,
  • histones were isolated from frozen temporal lobe samples of patients with advanced AD.
Histones were quantified using Selected-reaction-monitoring (SRM)-based targeted proteomics, an LC-MS/MS-based technique demonstrated by the Zhang lab.4  Histones were also analyzed using western blot analysis and LC-MS/MS-TMT (tandem-mass-tagging) quantitative proteomics. The results of these three experimental strategies agreed, further validating the specificity and sensitivity of the targeted proteomics methods. Histone acetylation was  reduced throughout in the AD temporal lobe compared to matched controls.
  • the histone H3 K18/K23 acetylation was significantly reduced.
Alzheimer’s disease and aging have also been associated with loss of histone acetylation in mouse model studies.5 In addition, Francis et al. found
  • cognitively impaired mice had a 50% reduced H4 acetylation in APP/PS1 mice than wild-type littermates.6
In mice, histone deacetylase inhibitors heve restored histone acetylation and improved memory in mice with age-related impairments or in models for other neurodegenerative diseases.7
Further studies of histone acetylation in AD could lead to target therapies in the disease pathology of neurodegenerative diseases, and
  • increase our understanding of how epigenetic mechanisms, such as histone acetylation, alter gene regulation.
References
1. Waddington, C.H., (1942). ‘The epigenotype‘, Endeavour, 1942 (1), (pp. 18-20)
2. Sidoli, S., Cheng, L., and Jensen O.N. (2012) ‘Proteomics in chromatin biology and epigenetics: Elucidation of post-translational modifications of histone proteins by mass spectrometry‘, Journal of Proteomics, 75 (12), (pp. 3419-3433)
3. Zhang. K., et al. (2012) ‘Targeted proteomics for quantification of histone acetylation in Alzheimer’s disease‘, Proteomics, 12 (8), (pp. 1261-1268)
4. Darwanto, A., et al., (2010) ‘A modified “cross-talk” between histone H2B Lys-120 ubiquitination and H3 Lys-K79 methylation‘, The Journal of Biological Chemistry, 285 (28), (pp. 21868-21876)
5. Govindarajan, N., et al. (2011) ‘Sodium butyrate improves memory function in an Alzheimer’s disease model when administered at an advanced stage of disease progression‘, Journal of Alzheimer’s Disease, 26 (1), (pp.187-197)
6. Francis, Y.I., et al., (2009) ‘Dysregulation of histone acetylation in the APP/PS1 mouse model of Alzheimer’s disease‘, Journal of Alzheimer’s Disease, 18 (1), (pp. 131-139)
7. Kilgore, M., et al., (2010) ‘Inhibitors of class 1 histone deacetylases reverse contextual memory deficits in a mouse model of Alzheimer’s disease‘, Neuropsychopharmacology, 35 (4), (pp. 870-880)
Tags: acetylation, alzheimers disease, epigenetics, histone, targeted proteomics

Tau amyloid

An Outcast Among Peers Gains Traction on Alzheimer’s Cure

By JEANNE WHALEN   jeanne.whalen@wsj.com
Gareth Phillips for The Wall Street Journal
 November 10, 2012, on page A1 in the U.S. edition of The Wall Street Journal
After years of effort, researcher Dr. Claude Wischik is awaiting the results of new clinical trials that will test his theory on the cause of Alzheimer’s.
Dr. Wischik, an Australian in his early 30s in the 1980s, was attempting to answer a riddle: What causes Alzheimer’s disease? He needed to examine brain tissue from Alzheimer’s patients soon after death, which required getting family approvals and enlisting mortuary technicians to extract the brains. He collected more than 300 over about a dozen years.
Alzheimer’s researcher Claude Wischik had a view that a brain protein called tau-not plaque is largely responsible. WSJ’s Shirley Wang spoke with Dr. Wischik about his work on a new drug to treat the devastating disease.
The 63-year-old researcher believes that a protein called tau
  • forms twisted fibers known as tangles inside the brain cells of Alzheimer’s patients and is largely responsible for driving the disease.
For 20 years, billions of dollars of pharmaceutical investment has placed chief blame on a different protein, beta amyloid, which
  • forms sticky plaques in the brains of sufferers.
A string of experimental drugs designed to attack beta amyloid have failed recently in clinical trials.

Wherefore Tau thy go?

Dr. Wischik, who now lives in Scotland, sees this as tau’s big moment. The company he co-founded 10 years ago, TauRx Pharmaceuticals Ltd., has developed an experimental Alzheimer’s drug that it will begin testing in the coming weeks in two large clinical trials. Other companies are also investing in tau research. Roche Holding bought the rights to a type of experimental tau drug from Switzerland’s closely held AC Immune SA.

Wischik is a scientist who has struggled against a prevailing orthodoxy. In 1854, British doctor John Snow traced a cholera outbreak in London to a contaminated water supply, but his discovery was rejected. A very infamous example is the discovery of the cause of child-bed fever in Rokitanski’s University of Vienna by Ignaz Semmelweis. In 1982, two Australian scientists declared that bacteria (H. pylori) caused peptic ulcers, later to be awarded the 2005 Nobel Prize in medicine for their discovery.
Dr. Wischik says he and other tau-focused scientists have been shouted down over the years by what he calls the “amyloid orthodoxy.”  But Dr. Wischik has been hampered by inconclusive research. A small clinical trial of TauRx’s drug in 2008 produced  mixed, results. Of course, influential scientists still think that beta amyloid plays a central role. Although Roche is investing in tau, Richard Scheller, head of drug research at Roche’s biotech unit, Genentech, says the company still has a strong interest in beta amyloid (hedging the bet).  He thinks amyloid drugs may have better results if  testing on Alzheimer’s patients occurs much earlier in the disease to prove effective; Roche recently announced plans to conduct such a trial.  Simply put -“Drugs tied to conventional theories on Alzheimer’s causes haven’t so far been effective.” Scientists Dr. Wischik accuses of wrongly fixating on beta amyloid argue that the evidence for pursuing amyloid is strong. One view expressed is that drugs to attack both beta amyloid and tau will be necessary.
Alzheimer’s disease is the leading cause of dementia in the elderly, and according to the World Health Organization, the cost of caring for dementia sufferers totals about $600 billion each year world-wide. The disease was first identified in 1906 by German physician Alois Alzheimer, who found in the brain of a deceased woman who had suffered from dementia the plaques and tangles that riddled the tissue. In the 1960s, Dr. Martin Roth and colleagues showed that
  • the degree of clinical dementia was worse for patients with more tangles in the brain.
In the 1980s, Dr. Wischik joined Dr. Roth’s research group at Cambridge University as a Ph.D student, and was quickly assigned the task of
  • determining what tangles were made of, which launched his brain-collecting mission, and years of examining tissue.
Finally, in 1988, he and colleagues at Cambridge published a paper demonstrating for the first time that
  • the tangles first observed by Alzheimer were made at least in part of the protein tau, which was supported by later research.
Like all of the body’s proteins, tau has a normal, helpful function—working inside neurons to help
  • stabilize the fibers that connect nerve cells.
When it misfires, tau clumps together to form harmful tangles that kill brain cells.
Dr. Wischik’s discovery was important news in the Alzheimer’s field:
  • identifying the makeup of tangles made it possible to start developing ways to stop their formation. But by the early 1990s, tau was overtaken by another protein: beta amyloid.

Signs of Decline

Several pieces of evidence convinced an influential group of scientists that beta amyloid was the primary cause of Alzheimer’s.
  •  the discovery of several genetic mutations that all but guaranteed a person would develop a hereditary type of the disease.
  • these appeared to increase the production or accumulation of beta amyloid in the brain,
  • which led scientists to believe that amyloid deposits were the main cause of the disease.
 Athena Neurosciences, a biotech company whose founders included Harvard’s Dr. Selkoe, focused in earnest on developing drugs to attack amyloid. Meanwhile, tau researchers say they found it hard to get research funding or to publish papers in medical journals. It became difficult to have a good publication on tau, because the amyloid cascade was like a dogma. It became the case that if you were not working in the amyloid field you were not working on Alzheimer’s disease. Dr. Wischik and his colleagues fought to keep funding from the UK’s Medical Research Council for the repository of brain tissue they maintained at Cambridge, he says. The brain bank became an important tool. In the early 1990s, Dr. Wischik and his colleagues compared the postmortem brains of Alzheimer’s sufferers against those of people who had died without dementia, to see how their levels of amyloid and tau differed. They found that both healthy brains and Alzheimer’s brains could be filled with amyloid plaque, but only Alzheimer’s brains contained aggregated tau.
  • as the levels of aggregated tau in a brain increased, so did the severity of dementia.
In the mid-1990s, Dr. Wischik discovered that
  • a drug sometimes used to treat psychosis dissolved tangles
Nevertheless, American and British venture capitalists wanted to invest in amyloid projects, not tau.
By 2002, Dr. Wischik scraped together about $5 million from Asian investors with the help of a Singaporean physician who was the father of a classmate of Dr. Wischik’s son in Cambridge. TauRx is based in Singapore but conducts most of its research in Aberdeen, Scotland. As his tau effort launched, early tests of drugs designed to attack amyloid plaques were disappointing. To better understand these results, a team of British scientists largely unaffiliated with Athena or the failed clinical trial decided to examine the brains of patients who had participated in the study. They waited for the patients to die, and then, after probing the brains, concluded that
  • the vaccine had indeed cleared amyloid plaque but hadn’t prevented further neurodegeneration.

Peter Davies, an Alzheimer’s researcher at the Feinstein Institute for Medical Research in Manhasset, NY, recalls hearing a researcher at a conference in the early 2000s concede that his amyloid research results “don’t fit the hypothesis, but we’ll continue until they do! “I just sat there with my mouth open,” he recalls.

In 2004, TauRx began a clinical trial of its drug, called methylene blue, in 332 Alzheimer’s patients. Around the same time, a drug maker called Elan Corp., which had bought Athena Neurosciences, began a trial of an amyloid-targeted drug called bapineuzumab in 234 patients. A key moment came in 2008, when Dr. Wischik and Elan presented results of their studies at an Alzheimer’s conference in Chicago. The Elan drug
  • failed to improve cognition any better than a placebo pill, causing Elan shares to plummet by more than 60% over the next few days.
The TauRx results Dr. Wischik presented were more positive, though not unequivocal. The study showed that,
  • after 50 weeks of treatment, Alzheimer’s patients taking a placebo had fallen 7.8 points on a test of cognitive function,
  • while people taking 60 mg of TauRx’s drug three times a day had fallen one point—
  • translating into an 87% reduction in the rate of decline for people taking the TauRx drug.
But TauRx didn’t publish a full set of data from the trial, causing some skepticism among researchers. (Dr. Wischik says it didn’t to protect the company’s commercial interests). What’s more,
  • a higher, 100-mg dose of the drug didn’t produce the same positive effects in patients;
Dr. Wischik blames this on the way the 100-mg dose was formulated, and says the company is testing a tweaked version of the drug in its new clinical trials, which will begin enrolling patients late this year.
This summer, a trio of companies that now own the rights to bapineuzumab—Elan, Pfizer and Johnson & Johnson—
  • scrapped development of the drug after it failed to work in two large clinical trials.
Then in August, Eli Lilly & Co. said its experimental medicine targeting beta amyloid,
  • solanezumab, failed to slow the loss of memory or basic skills like bathing and dressing in two trials
  • involving 2,050 patients with mild or moderate Alzheimer’s.
Lilly has disclosed that in one of the trials, when moderate patients were stripped away,
  • the drug slowed cognitive decline only in patients with mild forms of the disease.
Still fervent believers assert that beta amyloid needs to be attacked very early in the disease cycle—
  • perhaps before symptoms begin.
This spring, the U.S. government said it would help fund a $100 million trial of Roche’s amyloid-targeted drug, crenezumab, in 300 people
  • who are genetically predisposed to develop early-onset Alzheimer’s but who don’t yet have symptoms.
This trial should help provide a “definitive” answer about the theory.
Scientists and investors are giving more attention to tau. Roche this year said it would pay Switzerland’s AC Immune an undisclosed upfront fee for the rights to a new type of tau-targeted drug, and up to CHF400 million in additional payments if any drugs make it to market.
Dr. Buee, the longtime tau researcher in France, says Johnson & Johnson asked him to provide advice on tau last year, and that he’s currently discussing a tau research contract with a big pharmaceutical company. (A Johnson & Johnson spokeswoman says the company invited Dr. Buee and other scientists to a meeting to discuss a range of approaches to fighting Alzheimer’s.)
With its new clinical trial program under way, TauRx is the first company to test a tau-targeted drug against Alzheimer’s in a large human study, known in the industry as a phase 3 trial.  Dr. Wischik

  • In the end…it’s down to the phase 3 trial.

Protein Degradation in Neurodegenerative Diseases

Cebollero E , Reggiori F  and Kraft C.  Ribophagy: Regulated Degradation of Protein Production Factories. Int J Cell Biol. 2012; 2012: 182834. doi:  10.1155/2012/182834 (online).

During autophagy, cytosol, protein aggregates, and organelles

  • are sequestered into double-membrane vesicles called autophagosomes and delivered to the lysosome/vacuole for breakdown and recycling of their basic components.

In all eukaryotes this pathway is important for

  • adaptation to stress conditions such as nutrient deprivation, as well as
  • to regulate intracellular homeostasis by adjusting organelle number and clearing damaged structures.

Starvation-induced autophagy has been viewed as a nonselective transport pathway; but recent studies have revealed that

  • autophagy is able to selectively engulf specific structures, ranging from proteins to entire organelles.

In this paper, we discuss recent findings on the mechanisms and physiological implications of two selective types of autophagy:

  • ribophagy, the specific degradation of ribosomes, and
  • reticulophagy, the selective elimination of portions of the ER.

Lee JH, Yu WH,…, Nixon RA.  Lysosomal Proteolysis and Autophagy Require Presenilin 1 and Are Disrupted by Alzheimer-Related PS1 Mutations. Cell 2010; 141, 1146–1158. DOI 10.1016/j.cell.2010.05.008.

Macroautophagy is a lysosomal degradative pathway essential for neuron survival. Here, we show

  • that macroautophagy requires the Alzheimer’s disease (AD)-related protein presenilin-1 (PS1).

In PS1 null blastocysts, neurons from mice hypomorphic for PS1 or conditionally depleted of PS1,

  • substrate proteolysis and autophagosome clearance during macroautophagy are prevented
  • as a result of a selective impairment of autolysosome acidification and cathepsin activation.

These deficits are caused by failed PS1-dependent targeting of the v-ATPase V0a1 subunit to lysosomes. N-glycosylation of the V0a1 subunit,

  • essential for its efficient ER-to-lysosome delivery,
  • requires the selective binding of PS1 holoprotein to the unglycosylated subunit and the  sec61alpha/ oligosaccharyltransferase complex.

PS1 mutations causing early-onset AD produce a similar lysosomal/autophagy phenotype in fibroblasts from AD patients. PS1 is therefore essential for v-ATPase targeting to lysosomes, lysosome acidification, and proteolysis during autophagy. Defective lysosomal proteolysis represents a basis for pathogenic protein accumulations and neuronal cell death in AD and suggests previously unidentified therapeutic targets.

Hanai JI, Cao P, Tanksale P, Imamura S, et al. The muscle-specific ubiquitin ligase atrogin-1/MAFbx mediates statin-induced muscle toxicity. The Journal of Clinical Investigation  2007; 117(12):3930-3951.    http://www.jci.org

Gene Wars Span Eons

Transposons have been barging into genomes and crossing species boundaries throughout evolution. Rapidly evolving bacterial species often use them to transmit antibiotic resistance to one another.  Nearly half of the DNA in the human genome consists of transposons, and the percentage can potentially creep upward with every generation. That’s because nearly 20 percent of transposons are capable of replicating in a way that is unconstrained by the normal rules of DNA replication during cell division ― although through generations over time, most have become inactivated and no longer pose a threat.

While humans are riddled with transposons, compared to some organisms, they’ve gotten off easy, according to Madhani, a professor of biochemistry and biophysics at UCSF. The water lily’s genome is 99 percent derived from transposons. The lowly salamander has about the same number of genes as humans, but in some species the genome is nearly 40 times bigger, due to all the inserted, replicating transposons.

The scientists’ discovery of SCANR and how it targets transposons in the yeast Cryptococcus neoformans builds upon the Nobel-Prize-winning discovery of jumping genes by maize geneticist Barbara McClintock, and the Nobel-prize-winning discovery by molecular biologists Richard Roberts and Phillip Sharp that parts of a single gene may be separated along chromosomes by intervening bits of DNA, called introns. Introns are transcribed into RNA from DNA but then are spliced out of the instructions for building proteins.

In the current study, the researchers discovered that the cell’s splicing machinery stalls when it gets to transposon introns. SCANR recognizes this glitch and

  • prevents transposon replication by
  • triggering the production of “small interfering RNA” molecules, which
  • neutralize the transposon RNA.

The earlier discovery by biologists Andrew Fire and Craig Mello of the phenomenon of RNA interference, a feature of this newly identified transposon targeting, also led to a Nobel Prize. “Scientists might find that many of the peculiar ways in which genes are expressed differently in higher organisms are, like

  • intron splicing in the case of SCANR, useful
  • in distinguishing and defending ‘self’ genes from ‘non-self’ genes,” Madhani said.

Researchers  at UCSF ( Phillip Dumesic, an MD/PhD student and first author of the study, graduate students Prashanthi Natarajan and Benjamin Schiller, and postdoctoral fellow Changbin Chen, PhD.) and collaborators at the Whitehead Institute of Medical Research in Cambridge, Mass., and from the Scripps Research Institute in La Jolla, Calif., contributed to the research.

Researchers Discover Gene Invaders Are Stymied by a Cell’s Genome Defense

If unrestrained, transposons replicate and insert themselves randomly throughout the genome.

San Francisco, CA  (Scicasts) – Gene wars rage inside our cells, with invading DNA regularly threatening to subvert our human blueprint. Now, building on Nobel-Prize-winning findings, UC San Francisco researchers have discovered a molecular machine that helps protect a cell’s genes against these DNA interlopers.

The machine, named SCANR, recognizes and targets foreign DNA. The UCSF team identified it in yeast, but comparable mechanisms might also be found in humans. The targets of SCANR are

  • small stretches of DNA called transposons, a name that conjures images of alien scourges.

But transposons are real, and to some newborns, life threatening. Found inside the genomes

  • of organisms as simple as bacteria and
  • as complex as humans,

they are in a way alien ― at some point,

  • each was imported into its host’s genome from another species.

Unlike an organism’s native genes, which are reproduced a single time during cell division, transposons ― also called jumping genes ― replicate multiple times, and

  • insert themselves at random places within the DNA of the host cell.

When transposons insert themselves in the middle of an important gene, they may cause malfunction, disease or birth defects.

But just as the immune system has ways of distinguishing what is part of the body and what is foreign and does not belong, researchers led by UCSF’s Dr. Hiten Madhani, discovered in

  • SCANR a novel way through which the genetic machinery within a cell’s nucleus recognizes and targets transposons.

“We’ve known that only a fraction of human-inherited diseases are caused by these mobile genetic elements,” Madhani said. “Now we’ve found that cells use a step in gene expression to distinguish ‘self’ from ‘non-self’ and to halt the spread of transposons.” The study was published online Feb. 13 in the journal Cell (http://www.cell.com/abstract/S0092-8674%2813%2900138-4).

Epigenetics of brain and brawn

Study Shows Epigenetics Shapes Fate of Brain vs. Brawn Castes in Carpenter Ants

Philadelphia, PA (Scicasts) – The recently published genome sequences of seven well-studied ant species are opening up new vistas for biology and medicine.  A detailed look at molecular mechanisms that underlie the complex behavioural differences in two worker castes in the Florida carpenter ant, Camponotus floridanus, has revealed a link to epigenetics. This is the study of how the expression or suppression of particular genes by chemical modifications affects an organism’s

  • physical characteristics,
  • development, and
  • behaviour.

Epigenetic processes not only play a significant role in many diseases, but are also involved in longevity and aging. Interdisciplinary research teams led by Dr. Shelley Berger, from the Perelman School of Medicine at the University of Pennsylvania, in collaboration with teams led by Danny Reinberg from New York University and Juergen Liebig from Arizona State University, describe their work in Genome Research. The group found that epigenetic regulation is key to

  • distinguishing one caste, the “majors”, as brawny Amazons of the carpenter ant colony,
  • compared to the “minors”, their smaller, brainier sisters.

These two castes have the same genes, but strikingly distinct behaviours and shape.

Ants, as well as termites and some bees and wasps, are eusocial species that organize themselves into rigid caste-based societies, or colonies, in which only one queen and a small contingent of male ants are usually fertile and reproduce. The rest of a colony is composed of functionally sterile females that are divided into worker castes that perform specialized roles such as

  • foragers,
  • soldiers, and
  • caretakers.

In Camponotus floridanus, there are two worker castes that are physically and behaviourally different, yet genetically very similar.  “For all intents and purposes, those two castes are identical when it comes to their gene sequences,” notes senior author Berger, professor of Cell and Developmental Biology. “The two castes are a perfect situation to understand

  • how epigenetics,
  • how regulation ‘above’ genes,

plays a role in establishing these dramatic differences in a whole organism.”

To understand how caste differences arise, the team examined the role of modifications of histones throughout the genome. They produced the first genome-wide epigenetic maps of genome structure in a social insect. Histones can be altered by the addition of small chemical groups, which affect the expression of genes. Therefore, specific histone modifications can create dramatic differences between genetically similar individuals, such as the physical and behavioural differences between ant castes. “These chemical modifications of histones alter how compact the genome is in a certain region,” Simola explains. “Certain modifications allow DNA to open up more, and some of them to close DNA more. This, in turn, affects how genes get expressed, or turned on, to make proteins.

In examining several different histone modifications, the team found a number of distinct differences between the major and minor castes. Simola states that the most notable modification,

  • discriminates the two castes from each other and
  • correlates well with the expression levels of different genes between the castes.

And if you look at which genes are being expressed between these two castes, these genes correspond very nicely to the brainy versus brawny idea. In the majors we find that genes that are involved in muscle development are expressed at a higher level, whereas in the minors, many genes involved in brain development and neurotransmission are expressed at a higher level.”

These changes in histone modifications between ant castes are likely caused by a regulator gene, called CBP, that has “already been implicated in aspects of learning and behaviour by genetic studies in mice and in certain human diseases,” Berger says. “The idea is that the same CBP regulator and histone modification are involved in a learned behaviour in ants – foraging – mainly in the brainy minor caste, to establish a pattern of gene regulation that leads to neuronal patterning for figuring out where food is and being able to bring the food back to the nest.”  Simola notes that “we know from mouse studies that if you inactivate or delete the CBP regulator, it actually leads to significant learning deficits in addition to craniofacial muscular malformations.  So from mammalian studies, it’s clear this is an important protein involved in learning and memory.”

The research team is looking ahead to expand the work by manipulating the expression of the CBP regulator in ants to observe effects on caste development and behaviour. Berger observes that all of the genes known to be major epigenetic regulators in mammals are conserved in ants, which makes them a  good model for studying behaviour and longevity.

Research Reveals Mechanism of Epigenetic Reprogramming

Cambridge, UK (Scicasts) – New research reveals a potential way for how parents’ experiences could be passed to their offspring’s genes.

Epigenetics is a system that turns our genes on and off. The process works by chemical tags, known as epigenetic marks, attaching to DNA and telling a cell to either use or ignore a particular gene. The most common epigenetic mark is a methyl group.

  • When these groups fasten to DNA through a process called methylation
  • they block the attachment of proteins which normally turn the genes on.

As a result, the gene is turned off.

Scientists have witnessed epigenetic inheritance, the observation that offspring may inherit altered traits due to their parents’ past experiences. For example, historical incidences of famine have resulted in health effects on the children and grandchildren of individuals who had restricted diets,

  • possibly because of inheritance of altered epigenetic marks caused by a restricted diet.

However, it is thought that between each generation

  • the epigenetic marks are erased in cells called primordial gene cells (PGC), the precursors to sperm and eggs.

This ‘reprogramming’ allows all genes to be read afresh for each new person – leaving scientists to question how epigenetic inheritance could occur.

The new Cambridge study initially discovered how the DNA methylation marks are erased in PGCs. The methylation marks are converted to hydroxymethylation which is then

  • progressively diluted out as the cells divide.

This process turns out to be remarkably efficient and seems to reset the genes for each new generation.

The researchers,  also found that some rare methylation can ‘escape’ the reprogramming process and can thus be passed on to offspring – revealing how epigenetic inheritance could occur. This is important because aberrant methylation could accumulate at genes during a lifetime in response to environmental factors, such as chemical exposure or nutrition, and can cause abnormal use of genes, leading to disease. If these marks are then inherited by offspring, their genes could also be affected. The  research demonstrates how genes could retain some memory of their past experiences, indicating that the idea that epigenetic information is erased between generations – should be reassessed.  The precursors to sperm and eggs are very effective in erasing most methylation marks, but they are fallible and at a low frequency may allow some epigenetic information to be transmitted to subsequent generations.

Professor Azim Surani from the University of Cambridge, principal investigator of the research, said: “The new study has the potential to be exploited in two distinct ways.

  1. how to erase aberrant epigenetic marks that may underlie some diseases in adults.
  2. address whether germ cells can acquire new epigenetic marks through environmental or dietary influences on parents that may evade erasure and be transmitted to subsequent generations

The research was published 25 January, in the journal Science. Story adapted from the University of Cambridge.

Study Suggests Expanding the Genetic Alphabet May Be Easier than Previously Thought

Featured In: Academia News | Genomics

Monday, June 4, 2012

A new study led by scientists at The Scripps Research Institute suggests that the replication process for DNA—the genetic instructions for living organisms that is composed of four bases (C, G, A and T)—is more open to unnatural letters than had previously been thought. An expanded “DNA alphabet” could carry more information than natural DNA, potentially coding for a much wider range of molecules and enabling a variety of powerful applications, from precise molecular probes and nanomachines to useful new life forms.

The new study, which appears in the June 3, 2012 issue of Nature Chemical Biology, solves the mystery of how a previously identified pair of artificial DNA bases can go through the DNA replication process almost as efficiently as the four natural bases.

“We now know that the efficient replication of our unnatural base pair isn’t a fluke, and also that the replication process is more flexible than had been assumed,” said Floyd E. Romesberg, associate professor at Scripps Research, principal developer of the new DNA bases, and a senior author of the new study. The Romesberg laboratory collaborated on the new study with the laboratory of co-senior author Andreas Marx at the University of Konstanz in Germany, and the laboratory of Tammy J. Dwyer at the University of San Diego.

Adding to the DNA Alphabet

Romesberg and his lab have been trying to find a way to extend the DNA alphabet since the late 1990s. In 2008, they developed the efficiently replicating bases NaM and 5SICS, which come together as a complementary base pair within the DNA helix, much as, in normal DNA, the base adenine (A) pairs with thymine (T), and cytosine (C) pairs with guanine (G).

The following year, Romesberg and colleagues showed that NaM and 5SICS could be efficiently transcribed into RNA in the lab dish. But these bases’ success in mimicking the functionality of natural bases was a bit mysterious. They had been found simply by screening thousands of synthetic nucleotide-like molecules for the ones that were replicated most efficiently. And it had been clear immediately that their chemical structures lack the ability to form the hydrogen bonds that join natural base pairs in DNA. Such bonds had been thought to be an absolute requirement for successful DNA replication‑—a process in which a large enzyme, DNA polymerase, moves along a single, unwrapped DNA strand and stitches together the opposing strand, one complementary base at a time.

An early structural study of a very similar base pair in double-helix DNA added to Romesberg’s concerns. The data strongly suggested that NaM and 5SICS do not even approximate the edge-to-edge geometry of natural base pairs—termed the Watson-Crick geometry, after the co-discoverers of the DNA double-helix. Instead, they join in a looser, overlapping, “intercalated” fashion. “Their pairing resembles a ‘mispair,’ such as two identical bases together, which normally wouldn’t be recognized as a valid base pair by the DNA polymerase,” said Denis Malyshev, a graduate student in Romesberg’s lab who was lead author along with Karin Betz of Marx’s lab.

Yet in test after test, the NaM-5SICS pair was efficiently replicable. “We wondered whether we were somehow tricking the DNA polymerase into recognizing it,” said Romesberg. “I didn’t want to pursue the development of applications until we had a clearer picture of what was going on during replication.”

Edge to Edge

To get that clearer picture, Romesberg and his lab turned to Dwyer’s and Marx’s laboratories, which have expertise in finding the atomic structures of DNA in complex with DNA polymerase. Their structural data showed plainly that the NaM-5SICS pair maintain an abnormal, intercalated structure within double-helix DNA—but remarkably adopt the normal, edge-to-edge, “Watson-Crick” positioning when gripped by the polymerase during the crucial moments of DNA replication.

“The DNA polymerase apparently induces this unnatural base pair to form a structure that’s virtually indistinguishable from that of a natural base pair,” said Malyshev.

NaM and 5SICS, lacking hydrogen bonds, are held together in the DNA double-helix by “hydrophobic” forces, which cause certain molecular structures (like those found in oil) to be repelled by water molecules, and thus to cling together in a watery medium. “It’s very possible that these hydrophobic forces have characteristics that enable the flexibility and thus the replicability of the NaM-5SICS base pair,” said Romesberg. “Certainly if their aberrant structure in the double helix were held together by more rigid covalent bonds, they wouldn’t have been able to pop into the correct structure during DNA replication.”

An Arbitrary Choice?

The finding suggests that NaM-5SICS and potentially other, hydrophobically bound base pairs could some day be used to extend the DNA alphabet. It also hints that Evolution’s choice of the existing four-letter DNA alphabet—on this planet—may have been somewhat arbitrary. “It seems that life could have been based on many other genetic systems,” said Romesberg.

He and his laboratory colleagues are now trying to optimize the basic functionality of NaM and 5SICS, and to show that these new bases can work alongside natural bases in the DNA of a living cell.

“If we can get this new base pair to replicate with high efficiency and fidelity in vivo, we’ll have a semi-synthetic organism,” Romesberg said. “The things that one could do with that are pretty mind blowing.”

The other contributors to the paper, “KlenTaq polymerase replicates unnatural base pairs by inducing a Watson-Crick geometry,” are Thomas Lavergne of the Romesberg lab, Wolfram Welte and Kay Diederichs of the Marx lab, and Phillip Ordoukhanian of the Center for Protein and Nucleic Acid Research at The Scripps Research Institute.

Source: The Scripps Research Institute

 

Read Full Post »

Genomics and Evolution

Author: Marcus W. Feldman, PhD

 

Insofar as the genetic evolution of modern humans is concerned, large scale SNP studies of worldwide populations have provided a consistent picture of a migration out of Africa that gave rise to the human populations of the other continents. This migration probably began 60–80 kya, was probably not continuous, and could have resulted in a division during the passage through the Levant en route from east Africa. One division may have moved in a more southerly direction towards south and east Asia, possibly to Australia, and eventually, 15–30 kya into the Americas. The other division may have “turned left” and moved towards Europe.

In this process, which we call the “serial founder” model of human expansion (refs. 1, 2), migration and demography probably had effects that constrained the subsequent action of natural selection on human genes.

  • Variation in skin pigmentation genes today provides some of the strongest signals of natural selection during this human expansion. However, it is also likely that the
  • Immune response genes, e.g., MHC genes, achieved their high levels of polymorphism in response to new pathogens encountered in the great expansion.

Many of the strongest signals of natural selection indicate the importance of the innovations of farming and pastoralism. The gene sequences involved in lactose tolerance and starch metabolism, for example, are strikingly different in groups that adopted dairying or farming, respectively, from hunter-gatherers, who did not.

From the analysis of SNPs, I take home two messages.

  • The first is that although some parts of the genome show clear signals of selection, most of our DNA perceived via SNPs does not.
  • The second is that population growth and migration have been major forces in determining the patterns of variation. Indeed,
  • recent analyses of exome sequences confirm that the spectrum of rare allele frequencies is compatible only with recent and rapid population growth (ref. 3). Indeed,
  • recent analyses of the 1000 genomes data, that is, data from whole genome sequencing of one-thousand human genomes representing Africa (Yoruba), Europe (from Utah), and East Asia (China and Japan), identified only 35 non-synonymous SNPs from 33 genes as having been subject to recent adaptive selection (ref. 4).

The next phase of genomic analysis of humans, complete exome sequencing of large cohorts, or whole genome sequencing of samples from many representative populations, will focus more on two themes.

  • The first will be the role of rare alleles in human phenotypes, especially diseases. The previous phase, GWAS (genome-wide association studies), has been disappointing in revealing genetic “causes” of complex traits. However, my view is that
  • the second theme, the molecular genetics of gene regulation, and interaction of this regulation with the environment, is likely to have bigger payoffs, not only for determination of phenotypes, but also in showing where in the genome the strongest signals of selection lie. As more methylation profiles, small RNA patterns of interference, and other gene-regulatory analyses of whole genomes are completed, both the medical and evolutionary significance of DNA variation will become clearer.

Pemberton, T. J., D. Absher, M. W. Feldman, R. M. Myers, N. A. Rosenberg, and J. Z. Li. 2012. Genomic patterns of homozygosity in worldwide human populations. Am. J. Hum. Genet. 91: 275–292.

Genome-wide patterns of homozygosity runs and their variation across individuals provide a valuable and often untapped resource for studying human genetic diversity and evolutionary history. Using genotype data at 577,489 autosomal SNPs, we employed a likelihood-based approach to identify runs of homozygosity (ROH) in 1,839 individuals representing 64 worldwide populations, classifying them by length into three classes—short, intermediate, and long—with a model-based clustering algorithm. For each class, the number and total length of ROH per individual show considerable variation across individuals and populations. The total lengths of short and intermediate ROH per individual increase with the distance of a population from East Africa, in agreement with similar patterns previously observed for locus-wise homozygosity and linkage disequilibrium. By contrast, total lengths of long ROH show large inter-individual variations that probably reflect recent inbreeding patterns, with higher values occurring more often in populations with known high frequencies of consanguineous unions. Across the genome, distributions of ROH are not uniform, and they have distinctive continental patterns. ROH frequencies across the genome are correlated with local genomic variables such as recombination rate, as well as with signals of recent positive selection. In addition, long ROH are more frequent in genomic regions harboring genes associated with autosomal- dominant diseases than in regions not implicated in Mendelian diseases. These results provide insight into the way in which homozygosity patterns are produced, and they generate baseline homozygosity patterns that can be used to aid homozygosity mapping of genes associated with recessive diseases.

Pepperell, C. S., J. M. Granka, D. C. Alexander, M. A. Behr, L. Chui, J. Gordon, J. L. Guthrie, F. B. Jamieson, D. Langlois-Klassen, R. Long, D. Nguyen, W. Wobeser, and M. W. Feldman. 2011. Dispersal of Mycobacterium tuberculosis via the Canadian fur trade. Proc. Natl. Acad. Sci. USA 108: 6526–6531.

Patterns of gene flow can have marked effects on the evolution of populations. To better understand the migration dynamics of Mycobacterium tuberculosis, we studied genetic data from European M. tuberculosis lineages currently circulating in Aboriginal and French Canadian communities. A single M. tuberculosis lineage, characterized by the DS6Quebec genomic deletion, is at highest frequency among Aboriginal populations in Ontario, Saskatchewan, and Alberta; this bacterial lineage is also dominant among tuberculosis (TB) cases in French Canadians resident in Quebec. Substantial contact between these human populations is limited to a specific historical era (1710–1870), during which individuals from these populations met to barter furs. Statistical analyses of extant M. tuberculosis minisatellite data are consistent with Quebec as a source population for M. tuberculosis gene flow into Aboriginal populations during the fur trade era. Historical and genetic analyses suggest that tiny M. tuberculosis populations persisted for ∼100 y among indigenous populations and subsequently expanded in the late 19th century after environmental changes favoring the pathogen. Our study suggests that spread of TB can occur by two asynchronous processes: (i) dispersal of M. tuberculosis by minimal numbers of human migrants, during which small pathogen populations are sustained by ongoing migration and slow disease dynamics, and (ii) expansion of the M. tuberculosis population facilitated by shifts in host ecology. If generalizable, these migration dynamics can help explain the low DNA sequence diversity observed among isolates of M. tuberculosis and the difficulties in global elimination of tuberculosis, as small, widely dispersed pathogen populations are difficult both to detect and to eradicate.

Henn, B. M., C. R. Gignoux, M. Jobin, J. M. Granka, J. M. Macpherson, J. M. Kidd, L. Rodríguez-Botigué, S. Ramachandran, L. Hon, A. Brisbin, A. A. Lin, P. A. Underhill, D. Comas, K. K. Kidd, P. J. Norman, P. Parham, C. D. Bustamante, J. L. Mountain, and M. W. Feldman. 2011. Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc. Natl. Acad. Sci. USA 108: 5154–5162.

Africa is inferred to be the continent of origin for all modern human populations, but the details of human prehistory and evolution in Africa remain largely obscure owing to the complex histories of hundreds of distinct populations. We present data for more than 580,000 SNPs for several hunter-gatherer populations: the Hadza and Sandawe of Tanzania, and the !Khomani Bushmen of South Africa, including speakers of the nearly extinct N|u language. We find that African hunter-gatherer populations today remain highly differentiated, encompassing major components of variation that are not found in other African populations. Hunter-gatherer populations also tend to have the lowest levels of genome-wide linkage disequilibrium among 27 African populations. We analyzed geographic patterns of linkage disequilibrium and population differentiation, as measured by FST, in Africa. The observed patterns are consistent with an origin of modern humans in southern Africa rather than eastern Africa, as is generally assumed. Additionally, genetic variation in African hunter-gatherer populations has been significantly affected by interaction with farmers and herders over the past 5,000 y, through both severe population bottlenecks and sex-biased migration. However, African hunter-gatherer populations continue to maintain the highest levels of genetic diversity in the world.

Casto, A. M., and M. W. Feldman. 2011. Genome-wide association study SNPs in the human genome diversity project populations: does selection affect unlinked SNPs with shared trait associations? PLoS Genet. 7(1): e1001266.

Genome-wide association studies (GWAS) have identified more than 2,000 trait-SNP associations, and the number continues to increase. GWAS have focused on traits with potential consequences for human fitness, including many immunological, metabolic, cardiovascular, and behavioral phenotypes. Given the polygenic nature of complex traits, selection may exert its influence on them by altering allele frequencies at many associated loci, a possibility which has yet to be explored empirically. Here we use 38 different measures of allele frequency variation and 8 iHS scores to characterize over 1,300 GWAS SNPs in 53 globally distributed human populations. We apply these same techniques to evaluate SNPs grouped by trait association. We find that groups of SNPs associated with pigmentation, blood pressure, infectious disease, and autoimmune disease traits exhibit unusual allele frequency patterns and elevated iHS scores in certain geographical locations. We also find that GWAS SNPs have generally elevated scores for measures of allele frequency variation and for iHS in Eurasia and East Asia. Overall, we believe that our results provide evidence for selection on several complex traits that has caused changes in allele frequencies and/or elevated iHS scores at a number of associated loci. Since GWAS SNPs collectively exhibit elevated allele frequency measures and iHS scores, selection on complex traits may be quite widespread. Our findings are most consistent with this selection being either positive or negative, although the relative contributions of the two are difficult to discern. Our results also suggest that trait-SNP associations identified in Eurasian samples may not be present in Africa, Oceania, and the Americas, possibly due to differences in linkage disequilibrium patterns. This observation suggests that non-Eurasian and non-East Asian sample populations should be included in future GWAS.

Casto, A. M., J. Z. Li, D. Absher, R. Myers, S. Ramachandran, and M. W. Feldman. 2010. Characterization of X-linked SNP genotypic variation in globally distributed human populations. Genome Biol. 11:R10.

Background: The transmission pattern of the human X chromosome reduces its population size relative to the autosomes, subjects it to disproportionate influence by female demography, and leaves X-linked mutations exposed to selection in males. As a result, the analysis of X-linked genomic variation can provide insights into the influence of demography and selection on the human genome. Here we characterize the genomic variation represented by 16,297 X-linked SNPs genotyped in the CEPH human genome diversity project samples.
Results: We found that X chromosomes tend to be more differentiated between human populations than autosomes, with several notable exceptions. Comparisons between genetically distant populations also showed an excess of Xlinked SNPs with large allele frequency differences. Combining information about these SNPs with results from tests designed to detect selective sweeps, we identified two regions that were clear outliers from the rest of the X chromosome for haplotype structure and allele frequency distribution. We were also able to more precisely define the geographical extent of some previously described X-linked selective sweeps.
Conclusions: The relationship between male and female demographic histories is likely to be complex as evidence supporting different conclusions can be found in the same dataset. Although demography may have contributed to the excess of SNPs with large allele frequency differences observed on the X chromosome, we believe that selection is at least partially responsible. Finally, our results reveal the geographical complexities of selective sweeps on the X chromosome and argue for the use of diverse populations in studies of selection.

REFERENCES

1.  Cavalli-Sforza, L.L., and M.W. Feldman. 2003. The application of molecular genetic approaches to the study of human evolution. Nat. Genet. Supp. 33: 266–275.

2.  Henn, B. M., L. L. Cavalli-Sforza, and M. W. Feldman. 2012. The great human expansion. Proc. Natl. Acad. Sci. USA 109: 17758–17764.

3.  Keinan, A., and A. G. Clark. 2012. Recent explosive human population growth has resulted in an excess of rate genetic variants. Science 336: 740–743.

4.  Grossman, S. R., K. G. Andersen, I. Shlyakhter, S. Tabrizi, S. Winnicki, A. Yen, D. J. Park, D. Griesemer, E. K. Karlsson, S. H. Wong, M. Cabili, R. A. Adegbola, R. N. K. Bamezai, A. V. S. Hill, F. O. Vannberg, J. L. Rinn, 1000 Genomes Project, E. S. Lander, S. F. Schaffner, and P. C. Sabeti. 2013. Identifying recent adaptations in large-scale genomic data. Cell 152: 703–713.

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

International Consortium Finds 15 Novel Risk Loci for Coronary Artery Disease

“lipid metabolism and inflammation as key biological pathways involved in the genetic pathogenesis of CAD”

Themistocles Assimes from Stanford University Medical Center said in a statement that these findings begin to clear up its role. “Our network analysis of the top approximately 240 genetic signals in this study seems to provide evidence that genetic defects in some pathways related to inflammation are a cause,” he said.

On this Open Access Online Scientific Journal, lipid metabolism and inflammation were researched and exposed in the following entries.

However, it is ONLY,  these 15 Novel Risk Loci for Coronary Artery Disease published on 12/3/2012 that provides the genomics loci and the genetic explanation for the following empirical results obtained in the recent research on Cardiovascular diseases, as present in the second half of this post, below.

Special Considerations in Blood Lipoproteins, Viscosity, Assessment and Treatment

http://pharmaceuticalintelligence.com/2012/11/28/special-considerations-in-blood-lipoproteins-viscosity-assessment-and-treatment/

What is the role of plasma viscosity in hemostasis and vascular disease risk?

http://pharmaceuticalintelligence.com/2012/11/28/what-is-the-role-of-plasma-viscosity-in-hemostasis-and-vascular-disease-risk/

PIK3CA mutation in Colorectal Cancer may serve as a Predictive Molecular Biomarker for adjuvant Aspirin therapy

http://pharmaceuticalintelligence.com/2012/11/28/pik3ca-mutation-in-colorectal-cancer-may-serve-as-a-predictive-molecular-biomarker-for-adjuvant-aspirin-therapy/

Peroxisome proliferator-activated receptor (PPAR-gamma) Receptors Activation: PPARγ transrepression for Angiogenesis in Cardiovascular Disease and PPARγ transactivation for Treatment of Diabetes

http://pharmaceuticalintelligence.com/2012/11/13/peroxisome-proliferator-activated-receptor-ppar-gamma-receptors-activation-pparγ-transrepression-for-angiogenesis-in-cardiovascular-disease-and-pparγ-transactivation-for-treatment-of-dia/

Positioning a Therapeutic Concept for Endogenous Augmentation of cEPCs — Therapeutic Indications for Macrovascular Disease: Coronary, Cerebrovascular and Peripheral

http://pharmaceuticalintelligence.com/2012/08/29/positioning-a-therapeutic-concept-for-endogenous-augmentation-of-cepcs-therapeutic-indications-for-macrovascular-disease-coronary-cerebrovascular-and-peripheral/

Cardiovascular Risk Inflammatory Marker: Risk Assessment for Coronary Heart Disease and Ischemic Stroke – Atherosclerosis.

http://pharmaceuticalintelligence.com/2012/10/30/cardiovascular-risk-inflammatory-marker-risk-assessment-for-coronary-heart-disease-and-ischemic-stroke-atherosclerosis/

The Essential Role of Nitric Oxide and Therapeutic NO Donor Targets in Renal Pharmacotherapy

http://pharmaceuticalintelligence.com/2012/11/26/the-essential-role-of-nitric-oxide-and-therapeutic-no-donor-targets-in-renal-pharmacotherapy/

Nitric Oxide Function in Coagulation

http://pharmaceuticalintelligence.com/2012/11/26/nitric-oxide-function-in-coagulation/Nitric Oxide Function in Coagulation

15 Novel Risk Loci for Coronary Artery Disease

December 03, 2012

NEW YORK (GenomeWeb News) – A large-scale association analysis of coronary artery disease has detected 15 new loci associated with risk of the disease, bringing the total number of known risk alleles to 46. As the international CARDIoGRAMplusC4D Consortium reported in Nature Genetics yesterday, the study also found that lipid metabolism and inflammation pathways may play a part in coronary artery disease pathogenesis.

“The number of genetic variations that contribute to heart disease continues to grow with the publication of each new study,” Peter Weissberg from the British Heart Foundation, a co-sponsor of the study, said in a statement. “This latest research further confirms that blood lipids and inflammation are at the heart of the development of atherosclerosis, the process that leads to heart attacks and strokes.”

For its study, the consortium, which was comprised of more than 180 researchers, performed a meta-analysis of data from the 22,233 cases and 64,762 controls of the CARDIoGRAM genome-wide association study and of the 41,513 cases and 65,919 controls from 34 additional studies of people of European and South Asian descent. Using the custom Metabochip array from Illumina, the team tested SNPs for disease association in those populations. The SNPs that reached significance in that stage of the study were then replicated using data from a further four studies.

From this, the team identified 15 new loci with genome-wide significance for risk of coronary artery disease, in addition to known risk loci.

The consortium also reported an additional 104 SNPs that appeared to be associated with coronary artery disease but did not meet the cut-off for genome-wide significance.

Then looking to other known risk factors for coronary artery disease, like blood pressure and diabetes, the researchers assessed whether any of those risk factors were associated with the risk loci. Of the 45 known risk loci, 12 were associated with blood lipid content and five with blood pressure. And while people with type 2 diabetes have a higher risk of developing coronary artery disease, none of the known risk loci were linked to diabetic traits.

An analysis of the pathways that SNPs linked to coronary artery disease fall in revealed that many of them are involved in lipid metabolism and inflammation pathways — 10 risk loci were found to be involved in lipid metabolism. “Our network analysis identified lipid metabolism and inflammation as key biological pathways involved in the genetic pathogenesis of CAD,” the researchers wrote in the paper. “Indeed, there was significant crosstalk between the lipid metabolism and inflammation pathways identified.”

The role of inflammation in coronary artery disease has been up for debate — a debate centering on whether it is a cause or a consequence of the disease — and study author Themistocles Assimes from Stanford University Medical Center said in a statement that these findings begin to clear up its role. “Our network analysis of the top approximately 240 genetic signals in this study seems to provide evidence that genetic defects in some pathways related to inflammation are a cause,” he said.

Related Stories

SOURCE:

http://www.genomeweb.com//node/1159041?hq_e=el&hq_m=1424172&hq_l=3&hq_v=09187c3305

 

GWAS, Meta-Analyses Uncover New Coronary Artery Disease Risk Loci

March 07, 2011

By a GenomeWeb staff reporter

NEW YORK (GenomeWeb News) – Three new studies — including the largest meta-analysis yet of coronary artery disease — have identified dozens of coronary artery disease risk loci in European, South Asian, and Han Chinese populations. All three papers appeared online yesterday in Nature Genetics.

For the first meta-analysis, members of a large international consortium known as the Coronary Artery Disease Genome-wide Replication and Meta-Analysis study, or CARDIoGRAM, sifted through data on more than 135,000 individuals from the UK, US, Europe, Iceland, and Canada. In so doing, they tracked down nearly two-dozen new and previously reported coronary artery disease risk loci.

Because only a few of these loci have been linked to other heart disease-related risk factors such as high blood pressure, those involved say the work points to yet unexplored heart disease pathways.

“[W]e have discovered several new genes not previously known to be involved in the development of coronary heart disease, which is the main cause of heart attacks,” co-corresponding author Nilesh Samani, a cardiology researcher affiliated with the University of Leicester and Glenfield Hospital, said in a statement. “Understanding how these genes work, which is the next step, will vastly improve our knowledge of how the disease develops, and could ultimately help to develop new treatments.”

Samani and his co-workers identified the loci by bringing together data on 22,233 individuals with coronary artery disease and 64,762 unaffected controls. The participants, all of European descent, had been sampled through 14 previous genome-wide association studies and genotyped at an average of about 2.5 million SNPs each. The team then assessed the top candidate SNPs found in this initial analysis in another 56,582 individuals (roughly half of whom had coronary artery disease).

The search not only confirmed associations between coronary artery disease and 10 known loci, but also uncovered associations with 13 other loci. All but three of these were distinct from loci previously implicated in other heart disease risk factors such as hypertension or cholesterol levels, researchers noted.

Consequently, those involved in the study say that exploring the biological functions of the newly detected genes could offer biological clues about how heart disease develops — along with strategies for preventing and treating it.

The genetic complexity of coronary artery disease being revealed by such studies has diagnostic implications as well, according to some.

“Each new gene identified brings us a small step closer to understanding the biological mechanisms of cardiovascular disease development and potential new treatments,” British Heart Foundation Medical Director Peter Weissberg, who was not directly involved in the new studies, said in a statement. “However, as the number of genes grows, it takes us further away from the likelihood that a simple genetic test will identify those most of risk of suffering a heart attack or a stroke.”

Meanwhile, researchers involved with Coronary Artery Disease Genetics Consortium did their own meta-analysis using data collected from four GWAS to find five coronary artery-associated loci in European and South Asian populations.

The group initially looked at 15,420 individuals with coronary artery disease — including 6,996 individuals from South Asia and 8,424 from Europe — and 15,062 unaffected controls. Participants were genotyped at nearly 575,000 SNPs using Illumina BeadChips. Most South Asian individuals tested came from India and Pakistan, researchers noted, while European samples came from the UK, Italy, Sweden, and Germany.

For the validation phase of the study, the team focused in on 59 SNPs at 50 loci from the discovery group that seemed most likely to yield authentic new disease associations. These variants were assessed in 10 replication groups comprised of 21,408 individuals with coronary artery disease and 19,185 individuals without coronary artery disease.

All told, researchers found five loci that seem to influence coronary artery disease risk in the European and South Asian populations: one locus each on chromosomes 7, 11, and 15, along with a pair of loci on chromosome 10.

The team didn’t see significant differences in the frequency or effect sizes of these newly identified variants between the European and South Asian populations, though they emphasized that their approach may have missed some potential risk variants, particularly in those of South Asian descent.

“[C]urrent genome-wide arrays may not capture all important variants in South Asians,” they explained, “Nevertheless, all of the known and new variants were significantly associated with [coronary artery disease] risk in both the European and South Asian populations in the current study, indicating the importance of genes associated with [coronary artery disease] beyond the European ancestry groups in which they were first defined.”

Finally, using a three-stage discovery, validation, and replication GWAS approach, Chinese researchers identified a single coronary artery disease risk variant in the Han Chinese population.

In this first phase of that study, researchers tested samples from 230 cases and 230 controls from populations in Beijing and in China’s Hubei province that were genotyped at Genentech and CapitalBio using Affymetrix Human SNP5.0 arrays.

From the nearly three-dozen SNPs identified in the first stage of the study, they narrowed in on nine suspect variants. After finding linkage disequilibrium between two of the variants, they did validation testing on eight of these in 572 individuals with coronary artery disease and 436 unaffected controls, all from Hubei province.

That analysis implicated a single chromosome 6 SNP called rs6903956 in coronary artery disease — a finding the team ultimately replicated in another group of 2,668 coronary artery disease cases and 3,917 controls from three independent populations in Hubei, Shandong province, and northern China.

The team’s subsequent experiments suggest that the newly detected polymorphism, which falls within a putative gene called C6orf105 on chromosome 6, curbs the expression of this gene. The functional consequences of this shift in expression, if any, are yet to be determined.

Because C6orf105 shares some identity and homology with an androgen hormone inducible gene known as AIG1, those involved in the study argue that it may be worthwhile to investigate possible ties between C6orf105 expression, androgen signaling, and coronary artery disease.

“Androgen has previously been reported to be associated the pathogenesis of atherosclerosis,” they wrote. “Future studies are needed to explore whether C6orf105 expression can be induced by androgen and to further determine the potential mechanism of [coronary artery disease] associated with decreased C6orf105 expression.”

 SOURCE:

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN 

Mining the Unknown: A Systems Approach to Metabolite Identification Combining Genetic and Metabolic Information

Jan Krumsiek1, Karsten Suhre1,2, Anne M. Evans3, Matthew W. Mitchell3, Robert P. Mohney3, Michael V. Milburn3, Brigitte Wägele1,4, Werner Römisch-Margl1, Thomas Illig5,6, Jerzy Adamski7,8, Christian Gieger9, Fabian J. Theis1,10, Gabi Kastenmüller1*

 

1 Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany, 2 Department of Physiology and Biophysics, Weill Cornell Medical College in Qatar, Education City, Qatar Foundation, Doha, Qatar, 3 Metabolon, Research Triangle Park, North Carolina, United States of America, 4 Department of Genome-Oriented Bioinformatics, Life and Food Science Center Weihenstephan, Technische Universität München, Freising, Germany, 5 Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, Neuherberg, Germany, 6 Biobank of the Hanover Medical School, Hanover Medical School, Hanover, Germany, 7 Institute of Experimental Genetics, Genome Analysis Center, Helmholtz Zentrum München, Neuherberg, Germany, 8 Lehrstuhl für Experimentelle Genetik, Technische Universität München, Freising-Weihenstephan, Germany, 9 Institute of Epidemiology, Helmholtz Zentrum München, Neuherberg, Germany, 10 Department of Mathematics, Technische Universität München, Garching, Germany

Abstract 

Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these unknown metabolites is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype–metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

Introduction 

Recently, genome-wide association studies (GWAS) on metabolic quantitative traits have proven valuable tools to uncover the genetically determined metabolic individuality in the general population [1][5]. Interestingly, a great portion of the genetic loci that were found to significantly associate with levels of specific metabolites are within or in close proximity to metabolic enzymes or transporters with known disease or pharmaceutical relevance. Moreover, compared to GWAS with clinical endpoints the effect sizes of the genotypes are exceptionally high.

The number and type of the metabolic features that went into these GWAS was mainly defined by the metabolomics techniques used: Gieger et al. [1] and Illig et al. [2] used a targeted mass spectrometry (MS)-based approach giving access to the concentrations of 363 and 163 metabolites, respectively. Suhre et al. [3] and Nicholson et al. [4] applied untargeted nuclear magnetic resonance (NMR) based metabolomics techniques, yielding 59 metabolites that had been identified in the spectra prior to the GWAS and 579 manually selected peaks from the spectra, respectively. In Suhre et al. [5], 276 metabolites from an untargeted MS-based approach were analyzed.

While these previous GWAS focused on metabolic features with known identity, untargeted metabolomics approaches additionally provide quantifications of so-called “unknown metabolites”. An unknown metabolite is a small molecule that can reproducibly be detected and quantified in a metabolomics experiment, but whose chemical identity has not been elucidated yet. In an experiment using liquid chromatography (LC) coupled to MS, such an unknown would be defined by a specific retention time, one or multiple masses (e.g. from adducts), and a characteristic fragmentation pattern of the primary ion(s). An unknown observed by NMR spectroscopy would correspond to a pattern in the chemical shifts. Unknowns may constitute previously undocumented small molecules, such as rare xenobiotics or secondary products of metabolism, or they may represent molecules from established pathways which could not be assigned using current libraries of MS fragmentation patterns [6], [7] or NMR reference spectra [8].

The impact of unknown metabolites for biomedical research has been shown in recent metabolomics-based discovery studies of novel biomarkers for diseases and various disease-causing conditions. This includes studies investigating altered metabolite levels in blood for insulin resistance [9], type 2 diabetes [10], and heart disorders [11]. A considerable number of high-ranking hits reported in these biomarker studies represent unknown metabolites. As long as their chemical identities are not clarified the usability of unknown metabolites as functional biomarkers for further investigations and clinical applications is rather limited.

In mass-spectrometry-based metabolomics approaches, the assignment of chemical identity usually involves the interpretation and comparison of experiment-specific parameters, such as accurate masses, isotope distributions, fragmentation patterns, and chromatography retention times [12][14]. Various computer-based methods have been developed to automate this process. For example, Rasche and colleagues [15] elucidated structural information of unknown metabolites in a mass-spectrometry setup using a graph-theoretical approach. Their approach attempts to reconstruct the underlying fragmentation tree based on mass-spectra at varying collision energies. Other authors excluded false candidates for a given unknown by comparing observed and predicted chromatography retention times [16], [17], or by the automatic determination of sum formulas from isotope distributions [18]. Furthermore, Gipson et al. [19] and Weber et al. [20] integrated public metabolic pathway information with correlating peak pairs in order to facilitate metabolite identification. However, these methods might not be applicable for high-throughput metabolomics datasets that have been produced in a fee-for-service manner, since the mass spectra as such might not be readily available.

Approaching the problem from a conceptually different perspective, we here present a novel functional metabolomics method to predict the identities of unknown metabolites using a systems biological framework. By combining high-throughput genotyping data, metabolomics data, and literature-derived metabolic pathway information, we generate testable hypotheses on the metabolite identities based solely on the obtained metabolite quantifications (Figure 1). No further experiment-specific data such as retention times, isotope patterns and fragmentation patterns are required for this analysis.

 

Figure 1. Data integration workflow for the systematic classification of unknown metabolites.

We combine high-throughput metabolomics and genotyping data in Gaussian graphical models (GGMs) [21] and in genome-wide association studies (GWAS) [5] in order to produce testable predictions of the unknown metabolites’ identities. These hypotheses are then subject to experimental verification by mass-spectrometry. Six such cases have been fully worked through and are presented in Table 3. doi:10.1371/journal.pgen.1003005.g001

 http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003005?imageURI=info:doi/10.1371/journal.pgen.1003005.g001#pgen-1003005-g001

thumbnail

Discussion 

We developed and validated a novel integrative approach for the biochemical characterization of “unknown metabolites” from high-throughput metabolomics and genotyping datasets. Our method allows for the functional annotation of previously unidentified metabolites and, as a consequence, enhances the interpretability of metabolomics data in genome-wide association studies and biomarker discovery. For the first time, we systematically evaluated genetic associations of unknown metabolites, thereby discovering seven new loci of metabolic individuality. By classifying a series of unknown metabolites, we gained new insights into the functional interplay between genetic variation and the metabolome both for previously reported and new loci. Furthermore, several of the unknown compounds that we identified as well as their newly associated loci were independently reported in disease-related studies. In the following, we discuss three genetic loci and their associated phenotypes.

COMT and hepatic detoxification

The first example is a recent biomarker study, where Milburn et al. [34] reported an association of X-11593 with hepatic detoxification. In our GWAS, we find a strong association of X-11593 with the COMT locus, which encodes the catechol-O-methyltransferase enzyme. COMT is responsible for the inactivation of catecholamines such as L-dopa and various neuroactive drugs by O-methylation [35]. Following our identification approach, we experimentally confirmed the identity of X-11593 as O-methylascorbate. Notably, O-methylascorbate is a known product of ascorbate (vitamin C) O-methylation by COMT [36], [37]. Thus, our observations establish a link between O-methylascorbate blood levels, common genetic variation in the COMT locus and COMT-mediated liver detoxification processes.

ACE and hypertension

The second example relates to the ACE gene locus, which is a known risk locus for cardiovascular disease, hypertension and kidney failure. The protein encoded by the ACE locus, angiotensin-converting enzyme, is an exopeptidase which cleaves dipeptides from vasoactive oligopeptides, and plays a central role in the blood pressure-controlling renin-angiotensin system [38]. Moreover, the ACE protein is a target for various pharmaceuticals (ACE inhibitors), especially in the treatment of hypertension [39]. In our study, we identified three unknowns as dipeptides (X-14205, X-14208 and X-14478), two of which also associated with the ACE locus. These dipeptides could thus represent novel, interesting biomarkers for the activity of ACE. Moreover, Steffens et al. [11] reported a connection between heart failure and X-11805, which is in close proximity to angiontensin-related peptides in the GGM. This connection might be revisited after a successful identification of X-11805 in a future study.

UGT1A/ACADM and insulin resistance

The third example is an explorative study to detect biomarkers for insulin sensitivity. Gall et al. [9] reported several known metabolites (most prominently α-hydroxybutyrate) as biomarkers for insulin resistance. They also reported a series of unknown metabolites among their top hits. In the present study, we investigated three of these unknowns: X-11793 associates with UGT1A (UDP glucuronosyltransferase 1) and represents a bilirubin-related substance. Moreover, we experimentally validated X-11421 and X-13431, which display a strong association with ACADM (acyl-Coenzyme A dehydrogenase, C-4 to C-12 straight chain), as acylcarnitines containing 10 and 9 carbon atoms, respectively. The identification of these latter two unknown metabolites as medium-chain length acylcarnitines is coherent with reports by Adams et al. [40]. The authors found elevated blood plasma acylcarnitine levels in women with type 2 diabetes. Functionally, they attributed this finding to incomplete β-oxidation. Thus, our identification of X-11421 and X-13431 now suggests incomplete β-oxidation as an explanation for the associations found by Gall et al. and implies that acylcarnitines containing 10 and 9 carbon atoms are potential biomarkers for insulin resistance.

Conclusion

In summary, we integrated high-throughput metabolomics and genotyping data from a large population cohort for elucidating the biochemical identities of unknown metabolites. To this end, we applied metabolomics genome-wide association studies and Gaussian graphical modeling in order to link these unknown metabolites with known metabolic classes and biological processes. For six specific scenarios, we went from systematic hypothesis generation over detailed investigation and identity prediction to direct experimental confirmation. Similar validations may now be undertaken for the remaining predictions that we report in Table S1. Finally, we demonstrated the benefit of our method by discussing several of these newly identified metabolites in the context of existing biomarker discovery studies on liver detoxification, hypertension and insulin resistance.

It is to be noted that our method does not specifically require genotyping data. Even metabolomics measurements alone, analyzed through the GGMs, may provide sufficient information for the classification and even precise identity prediction. The unknowns with GGM evidence but without GWAS hits in Figure 4 as well as the HETE scenario represent examples for this approach.

One limitation of our approach is the requirement for associations with functionally described loci or known metabolites. Certain metabolite groups might thus systematically not be identifiable. For instance, if the identity of a whole class of biochemically related molecules is unknown (which might be due to experimental reasons), then the GGM associations between those compounds will not aid in identity elucidation. The 118 unknown compounds for which we could not derive any classification might represent such cases. Thus, our functionally oriented method should be regarded as a complementary extension to the existing identity determination methods.

Accordingly, our approach can be extended in several directions. It can be combined with method-specific, automated techniques that further exclude sets of metabolites. Previously mentioned methods relying on mass-spectra [15] or chromatographic properties [17] are suitable candidates here. Moreover, the method can be directly transferred to other types of metabolomics datasets not specifically originating from MS experiments, such as NMR-based metabolomics.

Beyond the application to metabolite identification, our study demonstrates the general potential of functional metabolomics in the context of genome-wide association studies. The comprehensive metabolic picture provided by GGMs in combination with GWAS allows for the detailed analysis of metabolic functions, chemical classes, enzyme-metabolite relationships and metabolic pathways.

Author Contributions 

Conceived and designed the experiments: JK KS FJT GK. Performed the experiments: AME MWM RPM MVM. Analyzed the data: JK GK. Contributed reagents/materials/analysis tools: BW WR-M TI JA CG. Wrote the paper: JK KS FJT GK.

References 

Gieger C, Geistlinger L, Altmaier E, de MH, Kronenberg F, et al. (2008) Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet 4: e1000282 doi:10.1371/journal.pgen.1000282.

Illig T, Gieger C, Zhai G, Römisch-Margl W, Wang-Sattler R, et al. (2010) A genome-wide perspective of genetic variation in human metabolism. Nat Genet 42: 137–141. FIND THIS ARTICLE ONLINE

Suhre K, Wallaschofski H, Raffler J, Friedrich N, Haring R, et al. (2011) A genome-wide association study of metabolic traits in human urine. Nat Genet 43: 565–569. FIND THIS ARTICLE ONLINE

Nicholson G, Rantalainen M, Li JV, Maher AD, Malmodin D, et al. (2011) A genome-wide metabolic QTL analysis in Europeans implicates two loci shaped by recent positive selection. PLoS Genet 7: e1002270 doi:10.1371/journal.pgen.1002270.

Suhre K, Shin S-Y, Petersen A-K, Mohney RP, Meredith D, et al. (2011) Human metabolic individuality in biomedical and pharmaceutical research. Nature 477: 54–60. FIND THIS ARTICLE ONLINE

Horai H, Arita M, Kanaya S, Nihei Y, Ikeda T, et al. (2010) MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom 45: 703–714. FIND THIS ARTICLE ONLINE

Afeefy HY, Liebman JF, Stein SE (2011) NIST Chemistry WebBook, NIST Standard Reference Database Number 69. In: Linstrom PJ, Mallard WG, editors.

Wishart DS, Knox C, Guo AC, Eisner R, Young N, et al. (2009) HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 37: D603–D610. FIND THIS ARTICLE ONLINE

Gall WE, Beebe K, Lawton KA, Adam K-P, Mitchell MW, et al. (2010) alpha-hydroxybutyrate is an early biomarker of insulin resistance and glucose intolerance in a nondiabetic population. PLoS ONE 5: e10883 doi:10.1371/journal.pone.0010883.

Fiehn O, Garvey WT, Newman JW, Lok KH, Hoppel CL, et al. (2010) Plasma metabolomic profiles reflective of glucose homeostasis in non-diabetic and type 2 diabetic obese African-American women. PLoS ONE 5: e15234 doi:10.1371/journal.pone.0015234.

Steffens DC, Jiang W, R KR, Karoly ED, Mitchell MW, et al. (2010) Metabolomic differences in heart failure patients with and without major depression. J Geriatr Psychiatry Neurol 23: 138–146. FIND THIS ARTICLE ONLINE

Kind T, Fiehn O (2007) Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics 8: 105. FIND THIS ARTICLE ONLINE

Bowen BP, Northen TR (2010) Dealing with the unknown: metabolomics and metabolite atlases. J Am Soc Mass Spectrom 21: 1471–1476. FIND THIS ARTICLE ONLINE

Wishart DS (2011) Advances in metabolite identification. Bioanalysis 3: 1769–1782. FIND THIS ARTICLE ONLINE

Rasche F, Svatoš A, Maddula RK, Böttcher C, Böcker S (2011) Computing Fragmentation Trees from Tandem Mass Spectrometry Data. Analytical Chemistry 83: 1243–1251. FIND THIS ARTICLE ONLINE

Mihaleva VV, Verhoeven HA, de Vos RCH, Hall RD, van Ham RCHJ (2009) Automated procedure for candidate compound selection in GC-MS metabolomics based on prediction of Kovats retention index. Bioinformatics 25: 787–794. FIND THIS ARTICLE ONLINE

Creek DJ, Jankevics A, Breitling R, Watson DG, Barrett MP, et al. (2011) Towards Global Metabolomics Analysis with Liquid Chromatography-Mass Spectrometry: Improved Metabolite Identification by Retention Time Prediction. Anal Chem FIND THIS ARTICLE ONLINE

Böcker S, Letzel MC, Lipták Z, Pervukhin A (2009) SIRIUS: decomposing isotope patterns for metabolite identification. Bioinformatics 25: 218–224. FIND THIS ARTICLE ONLINE

Gipson G, Tatsuoka K, Sokhansanj B, Ball R, Connor S (2008) Assignment of MS-based metabolomic datasets via compound interaction pair mapping. Metabolomics 4: 94–103. FIND THIS ARTICLE ONLINE

Weber RJM, Viant MR (2010) MI-Pack: Increased confidence of metabolite identification in mass spectra by integrating accurate masses and metabolic pathways. Chemometrics and Intelligent Laboratory Systems 104: 75–82. FIND THIS ARTICLE ONLINE

Krumsiek J, Suhre K, Illig T, Adamski J, Theis FJ (2011) Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data. BMC Syst Biol 5: 21. doi: 10.1186/1752-0509-5-21. FIND THIS ARTICLE ONLINE

Mittelstrass K, Ried JS, Yu Z, Krumsiek J, Gieger C, et al. (2011) Discovery of Sexual Dimorphisms in Metabolic and Genetic Biomarkers. PLoS Genet 7: e1002215 doi:10.1371/journal.pgen.1002215.

Nayak RR, Kearns M, Spielman RS, Cheung VG (2009) Coexpression network based on natural variation in human gene expression reveals gene interactions and functions. Genome Res 19: 1953–1962. FIND THIS ARTICLE ONLINE

Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, et al. (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39: D561–D568. FIND THIS ARTICLE ONLINE

Holle R, Happich M, Löwel H, Wichmann HE (2005) Group MKS (2005) KORA–a research platform for population based health research. Gesundheitswesen 67 Suppl 1: S19–S25. FIND THIS ARTICLE ONLINE

Hindorff L, MacArthur J, Wise A, Junkins H, Hall P, et al. A Catalog of Published Genome-Wide Association Studies.

Takeuchi F, McGinnis R, Bourgeois S, Barnes C, Eriksson N, et al. (2009) A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose. PLoS Genet 5: e1000433 doi:10.1371/journal.pgen.1000433.

Chung CM, Wang RY, Chen JW, Fann CS, Leu HB, et al. (2010) A genome-wide association study identifies new loci for ACE activity: potential implications for response to ACE inhibitor. Pharmacogenomics J 10: 537–544. FIND THIS ARTICLE ONLINE

GTEx (Genotype-Tissue Expression) eQTL Browser.

Otterness DM, Wieben ED, Wood TC, Watson WG, Madden BJ, et al. (1992) Human liver dehydroepiandrosterone sulfotransferase: molecular cloning and expression of cDNA. Mol Pharmacol 41: 865–872. FIND THIS ARTICLE ONLINE

Berg JM, Tymoczko JL, Stryer L (2006) Biochemistry: W. H. Freeman.

Tate SS, Meister A (1985) gamma-Glutamyl transpeptidase from kidney. Methods Enzymol 113: 400–419. FIND THIS ARTICLE ONLINE

Kováts E (1958) Gas-chromatographische Charakterisierung organischer Verbindungen. Teil 1: Retentionsindices aliphatischer Halogenide, Alkohole, Aldehyde und Ketone. Helvetica Chimica Acta 41: 1915–1932. FIND THIS ARTICLE ONLINE

Milburn M, Guo L, WULFF JE, Lawton KA (2010) DETERMINATION OF THE LIVER TOXICITY OF AN AGENT.

Männistö PT, Kaakkola S (1999) Catechol-O-methyltransferase (COMT): biochemistry, molecular biology, pharmacology, and clinical efficacy of the new selective COMT inhibitors. Pharmacol Rev 51: 593–628. FIND THIS ARTICLE ONLINE

Bowers-Komro DM, McCormick DB, King GA, Sweeny JG, Iacobucci GA (1982) Confirmation of 2-O-methyl ascorbic acid as the product from the enzymatic methylation of L-ascorbic acid by catechol-O-methyltransferase. Int J Vitam Nutr Res 52: 186–193. FIND THIS ARTICLE ONLINE

Butterworth M, Lau SS, Monks TJ (1996) 17 beta-Estradiol metabolism by hamster hepatic microsomes. Implications for the catechol-O-methyl transferase-mediated detoxication of catechol estrogens. Drug Metab Dispos 24: 588–594. FIND THIS ARTICLE ONLINE

Imig JD (2004) ACE Inhibition and Bradykinin-Mediated Renal Vascular Responses: EDHF Involvement. Hypertension 43: 533–535. FIND THIS ARTICLE ONLINE

Acharya KR, Sturrock ED, Riordan JF, W MR (2003) Ace revisited: a new target for structure-based drug design. Nat Rev Drug Discov 2: 891–902. FIND THIS ARTICLE ONLINE

Adams SH, Hoppel CL, Lok KH, Zhao L, Wong SW, et al. (2009) Plasma acylcarnitine profiles suggest incomplete long-chain fatty acid beta-oxidation and altered tricarboxylic acid cycle activity in type 2 diabetic African-American women. J Nutr 139: 1073–1081. FIND THIS ARTICLE ONLINE

Purcell S, Neale B, Todd-Brown K, Thomas L, R MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575. FIND THIS ARTICLE ONLINE

The International HapMap 3 Consortium (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467: 52–58. FIND THIS ARTICLE ONLINE

Buuren Sv, Groothuis-Oudshoorn K (2010) MICE: Multivariate Imputation by Chained Equations in R. Journal of statistical software in press 1–68. FIND THIS ARTICLE ONLINE

Fox J (1997) Applied Regression Analysis, Linear Models, and Related Methods: Sage Publications.

Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, et al. (2007) Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad Sci U S A 104: 1777–1782. FIND THIS ARTICLE ONLINE

Ma H, Sorokin A, Mazein A, Selkov A, Selkov E, et al. (2007) The Edinburgh human metabolic network reconstruction and its functional analysis. Mol Syst Biol 3: 135. FIND THIS ARTICLE ONLINE

Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 27–30. FIND THIS ARTICLE ONLINE

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29. FIND THIS ARTICLE ONLINE

Johnson AD, Kavousi M, Smith AV, Chen MH, Dehghan A, et al. (2009) Genome-wide association meta-analysis for total serum bilirubin levels. Hum Mol Genet 18: 2700–2710. FIND THIS ARTICLE ONLINE

Bielinski SJ, Chai HS, Pathak J, Talwalkar JA, Limburg PJ, et al. (2011) Mayo Genome Consortia: a genotype-phenotype resource for genome-wide association studies with an application to the analysis of circulating bilirubin levels. Mayo Clin Proc 86: 606–614. FIND THIS ARTICLE ONLINE

Link E, Parish S, Armitage J, Bowman L, Heath S, et al. (2008) SLCO1B1 variants and statin-induced myopathy–a genomewide study. N Engl J Med 359: 789–799. FIND THIS ARTICLE ONLINE

Chambers JC, Zhang W, Lord GM, van der Harst P, Lawlor DA, et al. (2010) Genetic loci influencing kidney function and chronic kidney disease. Nat Genet 42: 373–375. FIND THIS ARTICLE ONLINE

Kottgen A, Pattaro C, Boger CA, Fuchsberger C, Olden M, et al. (2010) New loci associated with kidney function and chronic kidney disease. Nat Genet 42: 376–384. FIND THIS ARTICLE ONLINE

Zhai G, Teumer A, Stolk L, B JR, Vandenput L, et al. (2011) Eight common genetic variants associated with serum DHEAS levels suggest a key role in ageing mechanisms. PLoS Genet 7: e1002025 doi:10.1371/journal.pgen.1002025.

Sanna S, Busonero F, Maschio A, McArdle PF, Usala G, et al. (2009) Common variants in the SLCO1B3 locus are associated with bilirubin levels and unconjugated hyperbilirubinemia. Hum Mol Genet 18: 2711–2718. FIND THIS ARTICLE ONLINE

Chen G, Ramos E, Adeyemo A, Shriner D, Zhou J, et al. (2012) UGT1A1 is a major locus influencing bilirubin levels in African Americans. Eur J Hum Genet 20: 463–468. FIND THIS ARTICLE ONLINE

Jylhava J, Lyytikainen LP, Kahonen M, Hutri-Kahonen N, Kettunen J, et al. (2012) A genome-wide association study identifies UGT1A1 as a regulator of serum cell-free DNA in young adults: The Cardiovascular Risk in Young Finns Study. PLoS ONE 7: e35426 doi:10.1371/journal.pone.0035426.

Source:

http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1003005

Read Full Post »

Author: Margaret Baker, PhD, Registered Patent Agent

The Encyclopedia of DNA Elements (ENCODE) Project was launched in September of 2003. In 2007 the ENCODE project was expanded to study the entire human genome, Genome-wide association studies or GWAS, and published a Nature paper entitled “An integrated encyclopedia of DNA elements in the human genome,” this month also all data are available at http://genome.ucsc.edu/ENCODE/.  Novel functional roles have been discovered for both transcribed and non-transcribed portions of DNA.  See several articles and commentary in Science 7 September 2012: Vol. 337 no. 6099 including Maurano et al. pp. 1190-1195  DOI: 10.1126/science.1222794b

For the first time, the 3-dimensional connections that cross the genome have been mapped as long-range looping interactions between functional elements and the genes controlled. These regions of the genome, formerly referred to as “junk DNA”, have the potential to be involved in disease initiation, pathophysiology, and complications. Further, epigenetic factors may be seen to play a more direct role in the expression or silencing of protein coding genes as DNase I hot spots, nucleosomal anchor points, and DNA methylation sites are added to the map.

Non-coding transcribed DNA includes a large percentage of sequences coding for RNA. In fact, RNA encoding genes number nearly equal to the protein encoding genes- 18,400 v 20,687 – and previously unknown non-coding RNA (ncRNA) have also been characterized.

Some of the known elements that were cataloged include:

  • cis elements – promoters, transcription factor binding sites;
  • gene contiguous non-coding stretches such as introns, polyA, and UTR, splice variants;
  • pseudogenes (11,224);
  • long range gene associated elements – enhancers, insulators, suppressors, and predicted promoter flanking regions;
  • ribosomal RNA genes; and
  • sequences for 7,052 small RNAs of which 85% are small nuclear(sn)RNA, small nucleolar(sno)RNA), transfer(t)RNA, and micro(mi)RNA.

What has been found is that distinct non-coding regions, including ncRNA, can be associated with distinct disease traits. miRNA are among the non-gene encoding sequences in the genome which have already been shown to play a major post-transcriptional role in expression of multiple genes..

Most miRNA genes are intergenic or oriented antisense to neighboring genes and therefore assumed to be controlled by independent promoter units. However, in some cases a microRNA gene is transcribed together with its target gene implying coupled regulation of miRNA and protein-coding gene. About one third of miRNA genes reside in polycistronic clusters. miRNA genes can occupy the introns of protein, non-protein coding genes, or nonprotein-coding transcripts. The promoters have been shown to have some similarities in their motifs to promoters of other genes transcribed by RNA polymerase II such as protein coding genes. The ENCODE project also noted that miRNA promoters were in chromatin regions of high promiscuity. There may be up to 1000 miRNA genes in the human genome. In addition, human miRNAs show RNA editing of sequences to yield products different from those encoded by their DNA.  miRNA are implicated in cellular roles as diverse as developmental timing in worms, cell death and fat metabolism in flies, haematopoiesis in mammals, and leaf development and floral patterning in plants

The final miRNA gene product is a ∼22 nt functional RNA molecule. The mature miRNA (designated miR-#) is processed from a characteristic stem–loop sequence (called a pre-mir), which in turn may be excised from a longer primary transcript (or pri-mir). It is processed by the same enzyme (DICER) that processes short hairpin RNA, forming interfering RNA, which provides and additional level of control.

MiRNA controls gene expression by binding to complementary regions of messenger transcripts in the 3’ untranslated region to repress their translation or regulate degradation. What makes the mechanism more powerful (or complicated) is the imperfect but specific binding motif associates with a large number of mRNAs in the 3’ untranslated region having the complimentary motif.  Conversely then, each mRNA can potentially associate with a number miRNA. Mature processed cytosolic miRNA can act in a manner akin to small interfering(si)RNA, and form the RNA-induced silencing complex (RISC) to block translation. Computational methods have been used to identify potential gene targets based on complimentarity between the miRNA and mRNA sequences.

Gerstein et al. explored the “Architecture of the human regulatory network derived from ENCODE data” Nature 489:91-100 (06 Sep 2012) focusing on the regulation of transcription factors (TF) and association between TF and miRNAs, miRNA and miRNA, protein-protein interactions, and protein phosphorylation. Not surprisingly, not all TF are the upstream factor in each network.

These new and remarkably detailed examinations of the different elements within and transcribed from the human genome perhaps do more to aid our knowledge of why we have stumbled in attempts to eradicate diseases, initially by focusing on a single gene or constellation of coding regions. The miRNA wikipedia is also being re-written on a daily basis and new disease associations made*.  As an example of a pathological state that may be linked to miRNA controlled elements, in vitro as well as in small population studies have examined miRNA species in diabetogenic conditions and patients with diabetes (Type I and Type II).

Diabetes and miRNA

In adult β-cell islets, miR-375 is low when glucose is freely available and low miR-375 induces insulin secretion. Interestingly, miR-375 is found only in brain and β-cells which share a secretion pathway.

Diabetic Complications

Organ specific miRNA have been identified in liver, skeletal muscle, kidney, vascular, and adipose tissue which are responsive to transient or sustained hyperglycemia.

miR-17-5p and miR-132 were reported to show significant differences between obese and non obese omental fat and were also abnormal in the blood of obese subjects.  Altered expression of miR-17-5p and miR-132 were found to correlate significantly with BMI, fasting blood glucose and glycosylated hemoglobin. (Kloting et al. PLoS ONE 4(3), e4699 (2009).

Clinical practice related to miRNA in diabetes may be possible as one group has identified eight miRNAs (miR-144, miR-146a, miR-150, miR-182, miR-192, miR-29a, miR-30d and miR-320) as potential ‘signature miRNAs’ that could distinguish prediabetic patients from those with overt T2D (Karolina DS, Armugam A, Tavintharan S et al. MicroRNA 144 impairs insulin signaling by inhibiting the expression of insulin receptor substrate 1 in Type 2 diabetes mellitus. PLoS ONE 6(8), e22839 (2011).

Due to the autoimmune component of T1D, the constellation of miRNA would be expected to be different: upregulation of miR-510 and underexpression of miR-191 and miR-342 were observed in the Tregs (regulatory T-cells) of T1D patients (Hezova R, Slaby O, Faltejskova P et al. microRNA-342, microRNA-191 and microRNA-510 are differentially expressed in T regulatory cells of Type 1 diabetic patients. Cell. Immunol. 260(2),70–74 (2010).

Taken together with the “physical” mapping of miRNA genes in the context of the 3-dimensional genome provided by the ENCODE studies and new understanding of potential concerted regulatory mechanisms, the miRNA data for tissues and specific cell types involved in disease pathology form a new approach to either detecting or possibly correcting gene (coding or non-coding) dysregulation.  miRNA mimics and anti-miRNA agents are being developed as new therapeutic modalities.

References

Bartel, DP et al. MicroRNAs: Genomics, Biogenesis, Mechanism, and Function” Cell 2004, 116:281-297.

Fernandez-Valverde, SL et al. MicroRNAs in beta-cell Biology, insulin resistance, diabetes and its complications. Diabetes July 2011 60 (7):1825-31.

Kantharidis, et al.  Diabetes Complications: The MicroRNA Perspective http://diabetes.diabetesjournals.org/content/60/7/1832.short

MEDSCAPE Review article: “miRNAs and Diabetes Mellitus: miRNAs in Diabetic Complicatons”  http://www.medscape.org/viewarticle/763729_6

*Based on initial studies in the worm C. elegans showing the temporal appearance of 21- and 22-nt RNAs during development, a family of highly conserved micro RNA sequences (miRNA) existing in invertebrates and vertebrates, were cataloged by Tuschl et al. at the Max-Planck-Institute and others (see Eddy, SR  Non-coding RNA genes and the modern RNA world Nature Reviews Genetics, 2:920-929, 2001). The sequence-specific post-transcriptional regulatory mechanisms mediated by these miRNAs have been associated with certain disease states such as cancer miR-21) and more specifically, lung cancer (miR-124) or breast cancer (miR-7, miR-21) and new species and function continue to be found (see http://www.mirbase.org/ ).

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

Set of Papers Outline ENCODE Findings as Consortium Looks Ahead to Future Studies

NEW YORK (GenomeWeb News) – An international collaboration involving more than 400 researchers working to characterize gene regulatory networks in the human genome is publishing dozens of new studies this week.

In papers appearing in NatureScienceGenome ResearchGenome BiologyJournal of Biological Chemistry, and elsewhere, members of the Encyclopedia of DNA Elements, or ENCODE, consortium describe approaches used to define some four million regulatory regions in the genome, among other things. All told, the team explained, ENCODE efforts have made it possible assign biological functions to around 80 percent of genome sequences — filling in large gaps left by studies that focused on protein-coding sequences alone.

“We found that a much bigger part of the genome — a surprising amount, in fact — is involved in controlling when and where proteins are produced, than in simply manufacturing the building blocks,” ENCODE’s lead analysis coordinator Ewan Birney, associate director of the European Molecular Biology Laboratory European Bioinformatics Institute, said in a statement.

“This concept of ‘junk DNA,’ which has been sort of perpetuated for the past 20 years or so is really not accurate,” ENCODE researcher Rick Myers, director of the HudsonAlpha Institute for Biotechnology, said during a telephone briefing with reporters today. “Most of the genome — more than 80 percent of the base pairs in the genome — has some biological activity, some biological function.”

Researchers participating in a complementary effort within the larger ENCODE project, known as GENCODE, more completely characterize the coding portions of the genome. “As part of the ENCODE project, we both tidied up the protein-coding genes and we also found many non-coding RNA genes as well,” Birney said during today’s telebriefing.

Based on the success of ENCODE so far, the project is expected to be extended by another four years or so. The amount of new funding from the National Human Genome Research Institute for that follow-up work is expected to be as high as $123 million.

“Later this month, NHGRI will be announcing a new round of funding that will take the ENCODE project into its next phase,” NHGRI Director Eric Green said during the call.

Studies done in the decade or so since the human genome was deciphered have highlighted how little of the genome is actually comprised of gene sequences. With the realization that only around 2 percent of the genome is dedicated to protein-coding functions came a spate of speculation about the role of the other 98 percent of genome.

While this portion of the genome was suspected of harboring regulatory sequences, the extent of that regulation and its impact on coding sequences in human tissues over time was not known.

“When the Human Genome Project ended in 2003, we quickly realized that we understood the meaning of only a very small percent of the human genome’s letters,” Green explained. “We did know the genetic code for determining the order of amino acids and proteins, but we understood precious little about the signals that turned genes on or off — or that controlled the amount of proteins produced in different tissues.”

To begin studying such control networks systematically, the international ENCODE consortium kicked off the main phase of its analyses in 2007, following an earlier pilot study.

NHGRI has provided $123 million for the project over the past five years. Another $30 million went to support the development of ENCODE-related technologies since the ENCODE pilot started in 2003, while $40.6 million from NHGRI went towards the pilot itself.

During the study’s main phase, investigators from nearly three-dozen labs around the world took multi-pronged approaches to assess transcription factor binding patterns, histone modification patterns, chromatin structure signatures and other features of the genome that interact with one another to control gene expression over time and across different tissues in the body.

To accomplish the roughly 1,600 experiments done to test some 180 cell types for ENCODE, teams turned to methods such as chromatin immunoprecipitation coupled with sequencing to define the genome-wide binding patterns for more than 100 different transcription factors, for example, while other strategies were used to profile DNA methylation patterns, chromatin features, and so forth.

“It’s really a detailed hierarchy, where proteins bind and epigenetic marks — like DNA methylation and other marks — precisely cooperate and regulate how the genes are going to get turned on [or off] and the amount of this,” Myers said. “These complex networks are one of the big components of the contributions of the 30 papers that are being published today.”

For example, a University of Washington-led team reporting in Science online todaydefined millions of regulatory regions, including some that are operational during normal development, by taking advantage of an enzyme known as DNase I, which chops off DNA specifically at open chromatin sites in the genome. That group found that more than three-quarters of disease-associated variants identified in genome-wide association studies fall in parts of the genome that overlap with regulatory sites.

“We now know that the majority of these changes that are associated with common diseases and traits that don’t fall within genes actually occur within the gene-controlling switches,” University of Washington genome sciences researcher John Stamatoyannopoulos, senior author on that study, said during today’s telebriefing. “This phenomenon is not confined to a particular type of disease. It seems to be present across the board for a very wide variety of different diseases and traits.”

Results from such analyses also hint that some outwardly unrelated conditions might be traced back to similar regulatory processes. And, researchers say, by bringing together information on active regulatory regions with disease-risk variants, it may be possible to define new functionally important tissues for certain conditions.

“By creating these extensive blueprints of the control circuitry, we’re now exposing previously hidden connections between different kinds of diseases that may explain common clinical features,” Stamatoyannopoulos said.

“This has also allowed us to see that the GWAS studies that have been performed contain far more information than was previously believed,” he added, “because hundreds of additional DNA changes that were not thought to be important also appear to affect these gene-controlling switches.”

The new data are also expected to help in understanding genetic disease and interpreting information from personal genomes, according to Michael Snyder, an ENCODE investigator and director of Stanford University’s Center of Genomics and Personalized Medicine.

“We believe the ENCODE project will have a profound impact on personal genomes and, ultimately on personalized medicine,” Snyder told reporters. “We can now better see what personal variants do, in terms of causing phenotypic differences, drug responses, and disease risk.”

Many of the studies stemming from ENCODE can be viewed through a Nature,Genome Research, and Genome Biology-conceived website that links ENCODE papers that share themes or “threads” that are related to one another.

Along with the newly published papers, the ENCODE team is making data available to other members of the research community through the project’s website. Data from studies can also be accessed through an ENCODE browser housed at the University of California at Santa Cruz or via NCBI or EBI sites.

“For basic researchers, the ENCODE data represents a powerful resource for understanding fundamental questions about how life is encoded in our genome,” NHGRI’s Green said. “For more clinically-oriented researchers, the ENCODE data provide key information about which genome sequences are functionally important.”

Related Stories

  • Team IDs Characteristic Epigenetic Enhancer Patterns in Colon Cancer
    April 12, 2012 / GenomeWeb Daily News
  • NIH to Award $25M for Newborn Sequencing Studies
    August 10, 2012 / GenomeWeb Daily News
  • Illumina Q2 Revenues Down 2 Percent
    July 25, 2012 / GenomeWeb Daily News
  • Study: Exon Arrays Have Benefits over RNA-seq, but Fall Short in Finding Novel Transcription Events
    July 10, 2012 / In Sequence
  • Consortium Members Publish Collection of Studies Stemming from Human Microbiome Project
    June 13, 2012 / GenomeWeb Daily News
    Source:

    NEWS & VIEWS

    52 | NATURE | VOL 489 | 6 SEPTEMBER 2012

    FORUM: Genomics

    ENCODE explained

    The Encyclopedia of DNA Elements (ENCODE) project dishes up a hearty banquet of data that illuminate the roles of the functional elements of the human genome. Here, five scientists describe the project and discuss how the data are influencing research directions across many fields. See Articles p.57, p.75, p.83, p.91, p.101 & Letter p.109

    Serving up a genome feast

    JOSEPH R. ECKER

    Starting with a list of simple ingredients and blending them in the precise amounts needed to prepare a gourmet meal is a challenging task. In many respects, this task is analogous to the goal of the ENCODE project1, the recent progress of which is described in this issue2–7. The project aims to fully describe the list of common ingredients (functional elements) that make up the human genome (Fig. 1). When mixed in the right proportions, these ingredients constitute the information needed to build all the types of cells, body organs and, ultimately, an entire person from a single genome.

    The ENCODE pilot project8 focused on just 1% of the genome — a mere appetizer — and its results hinted that the list of human genes was incomplete. Although there was scepticism about the feasibility of scaling up the project to the entire genome and to many hundreds of cell types, recent advances in low-cost, rapid DNA-sequencing technology radically changed that view9. Now the ENCODE consortium presents a menu of 1,640 genome-wide data sets prepared from 147 cell types, providing a six-course serving of papers in Nature, along with many companion publications in other journals.

    One of the more remarkable findings described in the consortium’s ‘entrée’ paper (page 57)2 is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly ‘junk DNA’. The authors report that the space between genes is filled with enhancers (regulatory DNA elements), promoters (the sites at which DNA’s transcription into RNA is initiated) and numerous previously overlooked regions that encode RNA transcripts that are not translated into proteins but might have regulatory roles. Of note, these results show that many DNA variants previously correlated with certain diseases lie within or very near non-coding functional DNA elements, providing new leads for linking genetic variation and disease.

    The five companion articles3–7 dish up diverse sets of genome-wide data regarding the mapping of transcribed regions, DNA binding of regulatory proteins (transcription factors) and the structure and modifications of chromatin (the association of DNA and proteins that makes up chromosomes), among other delicacies.

    Djebali and colleagues3 (page 101) describe ultra-deep sequencing of RNAs prepared from many different cell lines and from specific compartments within the cells. They conclude that about 75% of the genome is transcribed at some point in some cells, and that genes are highly interlaced with overlapping transcripts that are synthesized from both DNA strands. These findings force a rethink of the definition of a gene and of the minimum unit of heredity.

    Moving on to the second and third courses, Thurman et al.4 and Neph et al.5 (pages 75 and 83) have prepared two tasty chromatin-related treats. Both studies are based on the DNase I hypersensitivity assay, which detects genomic regions at which enzyme access to, and subsequent cleavage of, DNA is unobstructed by chromatin proteins. The authors identified cell-specific patterns of DNase I hypersensitive sites that show remarkable concordance with experimentally determined and computationally predicted binding sites of transcription factors. Moreover, they have doubled the number of known recognition sequences for DNA-binding proteins in the human genome, and have revealed a 50-base-pair ‘footprint’ that is present in thousands of promoters5.

    The next course, provided by Gerstein and colleagues6 (page 91) examines the principles behind the wiring of transcription-factor networks. In addition to assigning relatively simple functions to genome elements (such as ‘protein X binds to DNA element Y’), this study attempts to clarify the hierarchies of transcription factors and how the intertwined networks arise.

    Beyond the linear organization of genes and transcripts on chromosomes lies a more complex (and still poorly understood) network of chromosome loops and twists through which promoters and more distal elements, such as enhancers, can communicate their regulatory information to each other. In the final course of the ENCODE genome feast, Sanyal and colleagues7 (page 109) map more than 1,000 of these long-range signals in each cell type. Their findings begin to overturn the long-held (and probably oversimplified) prediction that the regulation of a gene is dominated by its proximity to the closest regulatory elements.

    One of the major future challenges for ENCODE (and similarly ambitious projects) will be to capture the dynamic aspects of gene regulation. Most assays provide a single snapshot of cellular regulatory events, whereas a time series capturing how such processes change is preferable. Additionally, the examination of large batches of cells — as required for the current assays — may present too simplified a view of the underlying regulatory complexity, because individual cells in a batch (despite being genetically identical) can sometimes behave in different ways. The development of new technologies aimed at the simultaneous capture of multiple data types, along with their regulatory dynamics in single cells, would help to tackle these issues.

    A further challenge is identifying how the genomic ingredients are combined to assemble the gene networks and biochemical pathways that carry out complex functions, such as cell-to-cell communication, which enable organs and tissues to develop. An even greater challenge will be to use the rapidly growing body

    “These findings force a rethink of the definition of a gene and of the minimum unit of heredity.”ENCODEEncyclopedia of DNA Elementsnature.com/encode

    © 2012 Macmillan Publishers Limited. All rights reserved

    RESEARCH

    NEWS & VIEWS

    6 SEPTEMBER 2012 | VOL 489 | NATURE | 53

    of data from genome-sequencing projects to understand the range of human phenotypes (traits), from normal developmental processes, such as ageing, to disorders such as Alzheimer’s disease10.

    Achieving these ambitious goals may require a parallel investment of functional studies using simpler organisms — for example, of the type that might be found scampering around the floor, snatching up crumbs in the chefs’ kitchen. All in all, however, the ENCODE project has served up an all-you-can-eat feast of genomic data that we will be digesting for some time. Bon appétit!

    Joseph R. Ecker is at the Howard Hughes Medical Institute and the Salk Institute for Biological Studies, La Jolla, California 92037, USA.

    e-mail: ecker@salk.eduNucleosomeHistoneChromatinmodicationsLong-rangechromatin interactionsFunctionalgenomicelementsDNase IhypersensitivesitesDNA methylationChromosomeDNALong-rangeregulatoryelementsProtein-codingand non-codingtranscriptsPromoterarchitectureTranscriptionfactorTranscriptionmachineryTranscription-factorbinding sitesTranscribed region

    Figure 1 | Beyond the sequence. The ENCODE project2–7 provides information on the human genome far beyond that contained within the DNA sequence — it describes the functional genomic elements that orchestrate the development and function of a human. The project contains data about the degree of DNA methylation and chemical modifications to histones that can influence the rate of transcription of DNA into RNA molecules (histones are the proteins around which DNA is wound to form chromatin). ENCODE also examines long-range chromatin interactions, such as looping, that alter the relative proximities of different chromosomal regions in three dimensions and also affect transcription. Furthermore, the project describes the binding activity of transcription-factor proteins and the architecture (location and sequence) of gene-regulatory DNA elements, which include the promoter region upstream of the point at which transcription of an RNA molecule begins, and more distant (long-range) regulatory elements. Another section of the project was devoted to testing the accessibility of the genome to the DNA-cleavage protein DNase I. These accessible regions, called DNase I hypersensitive sites, are thought to indicate specific sequences at which the binding of transcription factors and transcription-machinery proteins has caused nucleosome displacement. In addition, ENCODE catalogues the sequences and quantities of RNA transcripts, from both non-coding and protein-coding regions.

    Expression control

    WENDY A. BICKMORE

    Once the human genome had been sequenced, it became apparent that an encyclopaedic knowledge of chromatin organization would be needed if we were to understand how gene expression is regulated. The ENCODE project goes a long way to achieving this goal and highlights the pivotal role of transcription factors in sculpting the chromatin landscape.

    Although some of the analyses largely confirm conclusions from previous smaller-scale studies, this treasure trove of genome-wide data provides fresh insight into regulatory pathways and identifies prodigious numbers of regulatory elements. This is particularly so for Thurman and colleagues’ data4 regarding DNase I hypersensitive sites (DHSs) and for Gerstein and colleagues’ results6 concerning DNA binding of transcription factors. DHSs are genomic regions that are accessible to enzymatic cleavage as a result of the displacement of nucleosomes (the basic units of chromatin) by DNA-binding proteins (Fig. 1). They are the hallmark of cell-type-specific enhancers, which are often located far away from promoters.

    The ENCODE papers expose the profusion of DHSs — more than 200,000 per cell type, far outstripping the number of promoters — and their variability between cell types. Through the simultaneous presence in the same cell type of a DHS and a nearby active promoter, the researchers paired half a million enhancers with their probable target genes. But this leaves

    © 2012 Macmillan Publishers Limited. All rights reserved

    RESEARCH

    NEWS & VIEWS

    more than 2 million putative enhancers without known targets, revealing the enormous expanse of the regulatory genome landscape that is yet to be explored. Chromosome-conformation-capture methods that detect long-range physical associations between distant DNA regions are attempting to bridge this gap. Indeed, Sanyal and colleagues7 applied these techniques to survey such associations across 1% of the genome.

    The ENCODE data start to paint a picture of the logic and architecture of transcriptional networks, in which DNA binding of a few high-affinity transcription factors displaces nucleosomes and creates a DHS, which in turn facilitates the binding of further, lower-affinity factors. The results also support the idea that transcription-factor binding can block DNA methylation (a chemical modification of DNA that affects gene expression), rather than the other way around — which is highly relevant to the interpretation of disease-associated sites of altered DNA methylation11.

    The exquisite cell-type specificity of regulatory elements revealed by the ENCODE studies emphasizes the importance of having appropriate biological material on which to test hypotheses. The researchers have focused their efforts on a set of well-established cell lines, with selected assays extended to some freshly isolated cells. Challenges for the future include following the dynamic changes in the regulatory landscape during specific developmental pathways, and understanding chromatin structure in tissues containing heterogeneous cell populations.

    Wendy A. Bickmore is in the Medical Research Council Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK.

    e-mail: wendy.bickmore@igmm.ed.ac.uk 

    “The results imply that sequencing studies focusing on protein-coding sequences risk missing crucial parts of the genome.”

    11 Years Ago

    The draft human genome

    OUR GENOME UNVEILED

    Unless the human genome contains a lot of genes that are opaque to our computers, it is clear that we do not gain our undoubted complexity over worms and plants by using many more genes. Understanding what does give us our complexity — our enormous behavioural repertoire, ability to produce conscious action, remarkable physical coordination (shared with other vertebrates), precisely tuned alterations in response to external variations of the environment, learning, memory … need I go on? — remains a challenge for the future.

    David Baltimore

    From Nature 15 February 2001

    GENOME SPEAK

    With the draft in hand, researchers have a new tool for studying the regulatory regions and networks of genes. Comparisons with other genomes should reveal common regulatory elements, and the environments of genes shared with other species may offer insight into function and regulation beyond the level of individual genes. The draft is also a starting point for studies of the three-dimensional packing of the genome into a cell’s nucleus. Such packing is likely to influence gene regulation … The human genome lies before us, ready for interpretation.

    Peer Bork and Richard Copley

    From Nature 15 February 2001

    Non-codingbut functional

    INÊS BARROSO

    The vast majority of the human genome does not code for proteins and, until now, did not seem to contain defined gene-regulatory elements. Why evolution would maintain large amounts of ‘useless’ DNA had remained a mystery, and seemed wasteful. It turns out, however, that there are good reasons to keep this DNA. Results from the ENCODE project2–8 show that most of these stretches of DNA harbour regions that bind proteins and RNA molecules, bringing these into positions from which they cooperate with each other to regulate the function and level of expression of protein-coding genes. In addition, it seems that widespread transcription from non-coding DNA potentially acts as a reservoir for the creation of new functional molecules, such as regulatory RNAs.

    What are the implications of these results for genetic studies of complex human traits and disease? Genome-wide association studies (GWAS), which link variations in DNA sequence with specific traits and diseases, have in recent years become the workhorse of the field, and have identified thousands of DNA variants associated with hundreds of complex traits (such as height) and diseases (such as diabetes). But association is not causality, and identifying those variants that are causally linked to a given disease or trait, and understanding how they exert such influence, has been difficult. Furthermore, most of these associated variants lie in non-coding regions, so their functional effects have remained undefined.

    The ENCODE project provides a detailed map of additional functional non-coding units in the human genome, including some that have cell-type-specific activity. In fact, the catalogue contains many more functional non-coding regions than genes. These data show that results of GWAS are typically enriched for variants that lie within such non-coding functional units, sometimes in a cell-type-specific manner that is consistent with certain traits, suggesting that many of these regions could be causally linked to disease. Thus, the project demonstrates that non-coding regions must be considered when interpreting GWAS results, and it provides a strong motivation for reinterpreting previous GWAS findings. Furthermore, these results imply that sequencing studies focusing on protein-coding sequences (the ‘exome’) risk missing crucial parts of the genome and the ability to identify true causal variants.

    However, although the ENCODE catalogues represent a remarkable tour de force, they contain only an initial exploration of the depths of our genome, because many more cell types must yet be investigated. Some of the remaining challenges for scientists searching for causal disease variants lie in: accessing data derived from cell types and tissues relevant to the disease under study; understanding how these functional units affect genes that may be distantly located7; and the ability to generalize such results to the entire organism.

    Inês Barroso is at the Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK, and at the University of Cambridge Metabolic Research Laboratories and NIHR Cambridge Biomedical Research Centre, Cambridge, UK.e-mail: ib1@sanger.ac.uk5 4 | N AT U R E | VO L 4 8 9 | 6 S E P T E M B E R 2 0 1 2

    © 2012 Macmillan Publishers Limited. All rights reserved

    Evolution and the code

    JONATHAN K. PRITCHARD & YOAV GILAD

    One of the great challenges in evolutionary biology is to understand how differences in DNA sequence between species determine differences in their phenotypes. Evolutionary change may occur both through changes in protein-coding sequences and through sequence changes that alter gene regulation.

    There is growing recognition of the importance of this regulatory evolution, on the basis of numerous specific examples as well as on theoretical grounds. It has been argued that potentially adaptive changes to protein-coding sequences may often be prevented by natural selection because, even if they are beneficial in one cell type or tissue, they may be detrimental elsewhere in the organism. By contrast, because gene-regulatory sequences are frequently associated with temporally and spatially specific gene-expression patterns, changes in these regions may modify the function of only certain cell types at specific times, making it more likely that they will confer an evolutionary advantage12.

    However, until now there has been little information about which genomic regions have regulatory activity. The ENCODE project has provided a first draft of a ‘parts list’ of these regulatory elements, in a wide range of cell types, and moves us considerably closer to one of the key goals of genomics: understanding the functional roles (if any) of every position in the human genome.

    Nonetheless, it will take a great deal of work to identify the critical sequence changes in the newly identified regulatory elements that drive functional differences between humans and other species. There are some precedents for identifying key regulatory differences (see, for example, ref. 13), but ENCODE’s improved identification of regulatory elements should greatly accelerate progress in this area. The data may also allow researchers to begin to identify sequence alterations occurring simultaneously in multiple genomic regions, which, when added together, drive phenotypic change — a process called polygenic adaptation14.

    However, despite the progress brought by the ENCODE consortium and other research groups, it remains difficult to discern with confidence which variants in putative regulatory regions will drive functional changes, and what these changes will be. We also still have an incomplete understanding of how regulatory sequences are linked to target genes. Furthermore, the ENCODE project focused mainly on the control of transcription, but many aspects of post-transcriptional regulation, which may also drive evolutionary changes, are yet to be fully explored.

    Nonetheless, these are exciting times for studies of the evolution of gene regulation. With such new resources in hand, we can expect to see many more descriptions of adaptive regulatory evolution, and how this has contributed to human evolution.

    Jonathan K. Pritchard and Yoav Gilad are in the Department of Human Genetics, University of Chicago, Chicago 60637 Illinois, USA. J.K.P. is also at the Howard Hughes Medical Institute, University of Chicago.

    e-mails: pritch@uchicago.edu; gilad@uchicago.edu 

    From catalogue to function

    ERAN SEGAL

    Projects that produce unprecedented amounts of data, such as the human genome project15 or the ENCODE project, present new computational and data-analysis challenges and have been a major force driving the development of computational methods in genomics. The human genome project produced one bit of information per DNA base pair, and led to advances in algorithms for sequence matching and alignment. By contrast, in its 1,640 genome-wide data sets, ENCODE provides a profile of the accessibility, methylation, transcriptional status, chromatin structure and bound molecules for every base pair. Processing the project’s raw data to obtain this functional information has been an immense effort.

    For each of the molecular-profiling methods used, the ENCODE researchers devised novel processing algorithms designed to remove outliers and protocol-specific biases, and to ensure the reliability of the derived functional information. These processing pipelines and quality-control measures have been adapted by the research community as the standard for the analysis of such data. The high quality of the functional information they produce is evident from the exquisite detail and accuracy achieved, such as the ability to observe the crystallographic topography of protein–DNA interfaces in DNase I footprints5, and the observation of more than one-million-fold variation in dynamic range in the concentrations of different RNA transcripts3.

    But beyond these individual methods for data processing, the profound biological insights of ENCODE undoubtedly come from computational approaches that integrated multiple data types. For example, by combining data on DNA methylation, DNA accessibility and transcription-factor expression. Thurman et al.4 provide fascinating insight into the causal role of DNA methylation in gene silencing. They find that transcription-factor binding sites are, on average, less frequently methylated in cell types that express those transcription factors, suggesting that binding-site methylation often results from a passive mechanism that methylates sites not bound by transcription factors.

    Despite the extensive functional information provided by ENCODE, we are still far from the ultimate goal of understanding the function of the genome in every cell of every person, and across time within the same person. Even if the throughput rate of the ENCODE profiling methods increases dramatically, it is clear that brute-force measurement of this vast space is not feasible. Rather, we must move on from descriptive and correlative computational analyses, and work towards deriving quantitative models that integrate the relevant protein, RNA and chromatin components. We must then describe how these components interact with each other, how they bind the genome and how these binding events regulate transcription.

    If successful, such models will be able to predict the genome’s function at times and in settings that have not been directly measured. By allowing us to determine which assumptions regarding the physical interactions of the system lead to models that better explain measured patterns, the ENCODE data provide an invaluable opportunity to address this next immense computational challenge. ■

    Eran Segal is in the Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.

    e-mail: eran.segal@weizmann.ac.il

    1. The ENCODE Project Consortium Science 306, 636–640 (2004).

    2. The ENCODE Project Consortium Nature 489, 57–74 (2012).

    3. Djebali, S. et al. Nature 489, 101–108 (2012).

    4. Thurman, R. E. et al. Nature 489, 75–82 (2012).

    5. Neph, S. et al. Nature 489, 83–90 (2012).

    6. Gerstein, M. B. et al. Nature 489, 91–100 (2012).

    7. Sanyal, A., Lajoie, B., Jain, G. & Dekker, J. Nature 489, 109–113 (2012).

    8. Birney, E. et al. Nature 447, 799–816 (2007).

    9. Mardis, E. R. Nature 470, 198–203 (2011).

    10. Gonzaga-Jauregui, C., Lupski, J. R. & Gibbs, R. A. Annu. Rev. Med. 63, 35–61 (2012).

    11. Sproul, D. et al. Proc. Natl Acad. Sci. USA 108, 4364–4369 (2011).

    12. Carroll, S. B. Cell 134, 25–36 (2008).

    13. Prabhakar, S. et al. Science 321, 1346–1350 (2008).

    14. Pritchard, J. K., Pickrell, J. K. & Coop, G. Curr. Biol. 20, R208–R215 (2010).

    15. Lander, E. S. et al. Nature 409, 860–921 (2001).

    “The high quality of the functional information produced is evident from the exquisite detail and accuracy achieved.” 

    6 S E P T E M B E R 2 0 1 2 | VO L 4 8 9 | N AT U R E | 5 5 NEWS & VIEWS RESEARCH © 2012 Macmillan Publishers Limited. All rights reserved

    http://www.sciencemag.org SCIENCE VOL 337 7 SEPTEMBER 2012 1159

    NEWS&ANALYSIS

    When researchers fi rst sequenced the human

    genome, they were astonished by how few

    traditional genes encoding proteins were

    scattered along those 3 billion DNA bases.

    Instead of the expected 100,000 or more

    genes, the initial analyses found about 35,000

    and that number has since been whittled down

    to about 21,000. In between were megabases

    of “junk,” or so it seemed.

    This week, 30 research papers, including

    six in Nature and additional papers published

    by Science, sound the death knell for

    the idea that our DNA is mostly littered with

    useless bases. A decadelong project, the

    Encyclopedia of DNA Elements (ENCODE),

    has found that 80% of the human genome

    serves some purpose, biochemically speaking.

    “I don’t think anyone would have anticipated

    even close to the amount of sequence

    that ENCODE has uncovered that looks like

    it has functional importance,” says John A.

    Stamatoyannopoulos, an ENCODE re searcher

    at the University of Washington, Seattle.

    Beyond defi ning proteins, the DNA bases

    highlighted by ENCODE specify landing

    spots for proteins that infl uence gene activity,

    strands of RNA with myriad roles, or

    simply places where chemical modifi cations

    serve to silence stretches of our chromosomes.

    These results are going “to change

    the way a lot of [genomics] concepts are

    written about and presented in textbooks,”

    Stamatoyannopoulos predicts.

    The insights provided by ENCODE into

    how our DNA works are already clarifying

    genetic risk factors for a variety of diseases

    and offering a better understanding of gene

    regulation and function. “It’s a treasure trove

    of information,” says Manolis Kellis, a computational

    biologist at Massachusetts Institute

    of Technology (MIT) in Cambridge who analyzed

    data from the project.

    The ENCODE effort has revealed that

    a gene’s regulation is far more complex

    than previously thought, being infl uenced

    by multiple stretches of regulatory DNA

    located both near and far from the gene

    itself and by strands of RNA not translated

    into proteins, so-called noncoding RNA.

    “What we found is how beautifully complex

    the biology really is,” says Jason Lieb,

    an ENCODE researcher at the University of

    North Carolina, Chapel Hill.

    Throughout the 1990s, various researchers

    called the idea of junk DNA into question.

    With the human genome in hand, the

    National Human Genome Research Institute

    (NHGRI) in Bethesda, Maryland, decided it

    wanted to fi nd out once and for all how much

    of the genome was a wasteland with no functional

    purpose. In 2003, it funded a pilot

    ENCODE, in which 35 research teams analyzed

    44 regions of the genome—30 million

    bases in all, about 1% of the total genome. In

    2007, the pilot project’s results revealed that

    much of this DNA sequence was active in

    some way. The work called into serious question

    our gene-centric view of the genome,

    fi nding extensive RNA-generating activity

    beyond traditional gene boundaries (Science,

    15 June 2007, p. 1556). But the question

    remained whether the rest of the genome was

    like this 1%. “We want to know what all the

    bases are doing,” says Yale University bioinformatician

    Mark Gerstein.

    Teams at 32 institutions worldwide have

    now carried out scores of tests, generating

    1640 data sets. While the pilot phase tests

    depended on computer chip–like devices

    called microarrays to analyze DNA samples,

    the expanded phase benefi ted from the arrival

    of new sequencing technology, which made it

    cost-effective to directly read the DNA bases.

    Taken together, the tests present “a greater

    idea of what the landscape of the genome

    looks like,” says NHGRI’s Elise Feingold.

    Because the parts of the genome used

    could differ among various kinds of cells,

    ENCODE needed to look at DNA function

    in multiple types of cells and tissues. At

    fi rst the goal was to study intensively three

    types of cells. They included GM12878, the

    immature white blood cell line used in the

    1000 Genomes Project, a large-scale effort to

    catalog genetic variation across humans; a leukemia

    cell line called K562; and an approved

    human embryonic stem cell line, H1-hESC.

    As ENCODE was ramping up, new

    sequencing technology brought the cost of

    sequencing down enough to make it feasible

    to test extensively even more cell types.

    ENCODE added a liver cancer cell line,

    HepG2; the laboratory workhorse cancer cell

    line, HeLa S3; and human umbilical cord tissue

    to the mix. Another 140 cell types were

    studied to a much lesser degree.

    In these cells, ENCODE researchers

    closely examined which DNA bases are transcribed

    into RNA and then whether those

    strands of RNA are subsequently translated

    into proteins, verifying predicted proteincoding

    genes and more precisely locating

    each gene’s beginning, end, and coding

    regions. The latest protein-coding gene count

    is 20,687, with hints of about 50 more, the

    consortium reports in Nature. Those genes

    account for about 3% of the human genome,

    less if one counts only their coding regions.

    Another 11,224 DNA stretches are classifi ed

    as pseudogenes, “dead” genes now known to

    be active in some cell types or individuals.

    ENCODE Project Writes Eulogy

    For Junk DNA

    GENOMICS

    Hypersensitive

    sites

    CH3CO

    CH3

    Long-range regulatory elements

    (enhancers, repressors/

    silencers, insulators)

    cis-regulatory elements

    (promoters, transcription

    factor binding sites)

    Gene Transcript

    RNA

    polymerase

    CH3CO (Epigenetic modifications)

    ChIP-seq

    Computational

    predictions and

    RT-PCR

    RNA-seq

    DNase-seq

    FAIRE-seq

    5C

    Zooming in. A diagram of DNA in ever-greater detail shows how ENCODE’s various tests (gray boxes) translate

    DNA’s features into functional elements along a chromosome.

    CREDIT: ADAPTED FROM THE ENCODE PROJECT CONSORTIUM, PLOS BIOLOGY 9, 4 (APRIL 2011)

    Published by AAAS

    Downloaded from http://www.sciencemag.org on September 10, 2012

    http://www.sciencemag.org SCIENCE VOL 337 7 SEPTEMBER 2012 1161

    NEWS&ANALYSIS

    ENCODE drives home, however, that

    there are many “genes” out there in which

    DNA codes for RNA, not a protein, as the end

    product. The big surprise of the pilot project

    was that 93% of the bases studied were transcribed

    into RNA; in the full genome, 76%

    is transcribed. ENCODE defi ned 8800 small

    RNA molecules and 9600 long noncoding

    RNA molecules, each of which is at least 200

    bases long. Thomas Gingeras of Cold Spring

    Harbor Laboratory in New York has found

    that various ones home in on different cell

    compartments, as if they have fi xed addresses

    where they operate. Some go to the nucleus,

    some to the nucleolus, and some to the cytoplasm,

    for example. “So there’s quite a lot

    of sophistication in how RNA works,” says

    Ewan Birney of the European Bioinformatics

    Institute in Hinxton, U.K., one of the key leaders

    of ENCODE (see p. 1162).

    As a result of ENCODE, Gingeras and

    others argue that the fundamental unit of

    the genome and the basic unit of heredity

    should be the transcript—the piece of

    RNA decoded from DNA—and not the

    gene. “The project has played an important

    role in changing our concept of the gene,”

    Stamatoyannopoulos says.

    Another way to test for functionality of

    DNA is to evaluate whether specific base

    sequences are conserved between species, or

    among individuals in a species. Previous studies

    have shown that 5% of the human genome

    is conserved across mammals, even though

    ENCODE studies implied that much more

    of the genome is functional. So MIT’s Lucas

    Ward and Kellis compared functional regions

    newly identifi ed by ENCODE among multiple

    humans, sampling from the

    1000 Genomes Project. Some

    DNA sequences not conserved

    between humans and other

    mammals were nonetheless

    very much preserved across

    multiple people, indicating

    that an additional 4% of the

    genome is newly under selection

    in the human lineage, they

    report in a paper published

    online by Science (http://scim.

    ag/WardKellis). Two such regions were near

    genes for nerve growth and the development

    of cone cells in the eye, which underlie distinguishing

    traits in humans. On the fl ip side,

    they also found that some supposedly conserved

    regions of the human genome, as highlighted

    by the comparison with 29 mammals,

    actually varied among humans, suggesting

    these regions were no longer functional.

    Beyond transcription, DNA’s bases function

    in gene regulation through their interactions

    with transcription factors and other

    proteins. ENCODE carried out several tests

    to map where those proteins bind along the

    genome (Science, 25 May 2007, p. 1120). Two,

    DNase-seq and FAIRE-seq, gave an overview

    of the genome, identifying where the protein-

    DNA complex chromatin unwinds and a protein

    can hook up with the DNA, and were

    applied to multiple cell types. ENCODE’s

    DNase-seq found 2.89 million such sites

    in 125 cell types. Stamatoyannopoulos and

    his colleagues describe their more extensive

    DNase-seq studies in Science (p. 1190): His

    team examined 349 types of cells, including

    233 60- to 160-day-old fetal tissue samples.

    Each type of cell had about 200,000 accessible

    locations, and there seemed to be at least

    3.9 million regions where transcription factors

    can bind in the genome. Across all cell

    types, about 42% of the genome can be accessible,

    he and his colleagues report. In many

    cases, the assays were able to pinpoint the specifi

    c bases involved in binding.

    Last year, Stamatoyannopoulos showed

    that these newly discovered functional regions

    sometimes overlap with specifi c DNA bases

    linked to higher or lower risks of various diseases,

    suggesting that the regulation of genes

    might be at the heart of these risk variations

    (Science, 27 May 2011, p. 1031). The work

    demonstrated how researchers could use

    ENCODE data to come up with new hypotheses

    about the link between genetics and a

    particular disorder. (The ENCODE analysis

    found that 12% of these bases, or SNPs,

    colocate with transcription factor binding

    sites and 34% are in open chromatin defi ned

    by the DNase-seq tests.) Now, in their new

    work published in Science,

    Stamatoyannopoulos’s lab has

    linked those regulatory regions

    to their specifi c target genes,

    homing in on the risk-enhancing

    ones. In addition, the group

    fi nds it can predict the cell type

    involved in a given disease.

    For example, the analysis fi ngered

    two types of T cells as

    pathogenic in Crohn’s disease,

    both of which are involved in

    this inflammatory bowel disorder. “We are

    informing disease studies in a way that would

    be very hard to do otherwise,” Birney says.

    Another test, called ChIP-seq, uses an

    antibody to home in on a particular DNAbinding

    protein and helps pinpoint the locations

    along the genome where that protein

    works. To date, ENCODE has examined

    about 100 of the 1500 or so transcription

    factors and about 20 other DNA binding

    proteins, including those involved in modifying

    the chromatin-associated proteins

    called histones. The binding sites found

    through ChIP-seq coincided with the sites

    mapped through FAIRE-seq and DNAseseq.

    Overall, 8% of the genome falls within

    a transcription factor binding site, a percentage

    that is expected to double once more

    transcription factors have been tested.

    Yale’s Gerstein used these results to fi gure

    out all the interactions among the transcription

    factors studied and came up with a network

    view of how these regulatory proteins

    work. These transcription factors formed a

    three-layer hierarchy, with the ones at the top

    having the broadest effects and the ones in

    the middle working together to coregulate a

    common target gene, he and his colleagues

    report in Nature.

    Using a technique called 5C, other

    researchers looked for places where DNA

    from distant regions of a chromosome, or

    even different chromosomes, interacted. It

    found that an average of 3.9 distal stretches

    of DNA linked up with the beginning of each

    gene. “Regulation is a 3D puzzle that has to

    be put together,” Gingeras says. “That’s what

    ENCODE is putting out on the table.”

    To date, NHGRI has put $288 million

    toward ENCODE, including the pilot project,

    technology development, and ENCODE

    efforts for the mouse, nematode, and fruit fl y.

    All together, more than 400 papers have been

    published by ENCODE researchers. Another

    110 or more studies have used ENCODE data,

    says NHGRI molecular biologist Michael

    Pazin. Molecular biologist Mathieu Lupien of

    the University of Toronto in Canada authored

    one of those papers, a study looking at epigenetics

    and cancer. “ENCODE data were

    fundamental” to the work, he says. “The cost

    is defi nitely worth every single dollar.”

    –ELIZABETH PENNISI

    ENCODE By the Numbers

    147 cell types studied

    80% functional portion of human genome

    20,687 protein-coding genes

    18,400 RNA genes

    1640 data sets

    30 papers published this week

    442 researchers

    $288 million funding for pilot,

    technology, model organism, and current project

    “ We are informing

    disease studies in a

    way that would be

    very hard to do

    otherwise.”

    —EWAN BIRNEY,

    EUROPEAN BIOINFORMATICS

    INSTITUTE

    Published by AAAS

    Downloaded from http://www.sciencemag.org on September 10, 2012

    http://www.nature.com/encode/

Read Full Post »

« Newer Posts