Feeds:
Posts
Comments

Posts Tagged ‘National Human Genome Research Institute’

Human Variome Project: encyclopedic catalog of sequence variants indexed to the human genome sequence

Reporter: Aviva Lev-Ari, PhD, RN

Article ID #4: Human Variome Project: encyclopedic catalog of sequence variants indexed to the human genome sequence. Published on 11/24/2012

WordCloud Image Produced by Adam Tubman

 

What is the Human Variome Project?

Abstract

The successor to the Human Genome Project intends to establish, by international cooperation, an encyclopedic catalog of sequence variants indexed to the human genome sequence.

Introduction

Genomics is not just for rich countries any more. Anyone can contribute to the Human Variome Project (HVP; see Commentary,page 433). Indeed, the project might just be ambitious enough that everyone really will need to contribute. By stating that all human genetics and genomics contributes to a single aim, the HVP essentially reduces duplication of effort while increasing credit for participation.

However, it will have to find ways to coordinate the disparate activities of clinicians, researchers, database curators and bioinformaticians by providing the means and incentives to lodge the variants they have found in public databases. Variome aims to get all to use compatible nomenclature and phenotype reporting systems and to index variant and phenotype data to gene models in the coordinate system generated by the Human Genome Project. Automation and expert curation, and open comment and expert review, will all have a place in this endeavor. How will we do this without creating more than a necessary minimum of new databases, procedures and bureaucracy?

A very important point, but a tough one to get across, is that much of the necessary work is currently happening across the globe—but is just insufficiently coordinated. The individuals already hard at work aren’t getting the credit they deserve. In a sense, the rest of the world’s geneticists deserve the kind of service that US researchers receive from the excellent coordinating work of the National Human Genome Research Institute and the repositories of the National Center for Biotechnology Information (NCBI), together with the kind of attention afforded by international journals. If only these kinds of coordination, recording and attention could be brought to bear, however briefly, on publication units as small as single instances of a variant gene! Thus, Variome aims to add value to databases such as OMIM, GenBank, dbSNP, dbGAP and the HapMap and organizations including NCBI and the European Bioinformatics Institute (EBI) by working with them all. It will start gene by gene, evaluating variants already found and curated for mendelian diseases, and will add rare and common variants in common diseases as they are reported. As it does so, HVP participants will develop mechanisms to expedite and automate reporting of variants and their occurrence.

In the consensus-building exercise of the first Human Variome meeting (page 433), delegates constructed a wish list of recommendations that numerically exceeded the number of participants at the meeting. We think that two points emerge as particularly important to the success of the project: publication and credit.

To be successful in persuading clinical and diagnostic laboratories to contribute variations and persuading researchers to evaluate the pathogenic potential of each variant, the HVP will need to introduce publishing innovations at both ends of the citation spectrum. It will need to track the citation of each variant’s accession code in papers, database entries and across the web. This closing of the online publication loop might be termed microattribution. Perhaps existing journals could be persuaded to take responsibility for monitoring and highlighting the citation of database entries in their papers, so that the HVP can readily aggregate this information. A journal devoted to the human variome could commission peer-reviewed, gene-based synopses of mendelian mutations based on information in locus-specific databases (see pages 425 and 427), meta-analyses of association studies and resequencing data such as those reported by Jonathan Cohen and colleagues in this issue (page 513, with News and Views on page 439). Phenotypic and diagnostic information might be linked to these synopses from existing databases such as the dysmorphology databases, PharmGKB (page 426) and GeneTests (http://www.genetests.org). Genome browsers including Ensembl and UCSC might then be persuaded to display a Variome track. We envisage such synopses to be a gene-based extension of the disease-based annual synopses for association studies we proposed last year (Nat. Genet. 38, 1; 2006). The first of these, on Alzheimer disease, was published by Lars Bertram and colleagues (Nat. Genet. 39, 17–23; 2007) using their newly created AlzGene database.

Which genes should the HVP annotate first to demonstrate the utility and impact of its coordinating activities? Perhaps we can learn from one of the most impressive recent exercises in evidence-based medicine: namely, the American College of Medical Genetics‘ systematic prioritization of genes for newborn screening (http://mchb.hrsa.gov/screening/). Variome synopses would take into account the prevalence, seriousness and treatability of the clinical condition(s), the value added by combining all three types of genetic study listed above and the availability of all three kinds of evidence in existing laboratories, databases and publications.

There are, inevitably, limits to what can be achieved by a gene-based view of human variation. Gene models are revised and re-annotated, and structural genomic variation plays havoc with reference genome builds and the context within which point variants and haplotypes are found. Physicians and the general public will want a disease-based view—and the associated diagnostic genetic tests, rather than genome annotation. Delaying the appearance of such alternative views, there is often a many-to-many correspondence between genes and disease phenotypes. On the brighter side, this complexity should provide good business for database designers and review journals.

As the participants of the Variome meeting note in their Commentary, the effort to index and evaluate all of human variation will provide many new opportunities in genomics for researchers whose home countries did not participate in the initial human genome sequencing project. They are right that this is both the project and the time to achieve the globalization of genomics.

SOURCE:

Nature Genetics 39, 423 (2007)
doi:10.1038/ng0407-423

Our Vision for the Future

E-mail

Imagine you are sick. For many, this is not a difficult task. Now imagine you are sick and none of your doctors know why. Your symptoms suggest that you have a rare genetic disease, and you’ve been tested for a mutation in the gene responsible, but the results are inconclusive. The laboratory found a change in your genetic sequence, but is unable to definitively state that it’s what’s causing your symptoms. And with no definitive result from the test, your doctor—and your insurance company—are unwilling to prescribe the expensive course of drugs needed to control your symptoms.While many people might be willing to dismiss the chances of this happening to them, when you start to look at the facts, things start to get a little frightening. There are over 6,000 diseases that can be caused by a mutation in a single gene and it is estimated that 1 child in every 200 born will suffer from one of these diseases. Add to that the number of cancers that have an inherited genetic component and the chances of you, or someone you know being in this position is quite high.

Now imagine that the information the laboratory and your doctor needed to make an accurate diagnosis was out there, but it wasn’t accessible to them: it was hidden away in an obscure academic paper, or in some researcher’s forgotten notes.

Unfortunately, this is the situation that is currently facing thousands of people across the globe who are suffering the devastating effects of genetic illnesses.

The role that our genes play in our health and well-being is well known. The genetic makeup of an individual can cause a host of genetic disorders that can manifest from early childhood (cystic fibrosis, Prader-Willi Syndrome, Fragile X Syndrome) to adulthood (Alzheimer’s disease, polycystic kidney disease, Huntington’s disease) as well as significantly increase the risk of contracting more common diseases such as schizophrenia, diabetes, depression and cancer.

The world is rapidly moving towards an era where it is both economically and scientifically feasible to sequence the genome of every patient presenting with a chronic condition; already in the past decade the cost of a whole-genome sequence has dropped from several billion dollars to a few thousand.

But being able to sequence the genome of a patient cheaply and easily will be useless if we are unable to determine if the variations present in a sequence have an effect on human health. We are suffering from a critical lack of information about the consequences of the vast majority of the mutations possible within the human genome. And, even more concerning, is the fact that even when that information exists, it is not being shared and captured by the global medical research community in a manner that guarantees widespread dissemination and long-term preservation.

The Human Variome Project is trying to change this. We strongly believe in the free and open sharing of information on genetic variation and its consequences and are dedicated to developing and maintaining the standards, systems and infrastructure that will embed information sharing into routine clinical practice. We envision a world where the availability of, and access to, genetic variation information is not an impediment to diagnosis and treatment; where the burden of genetic disease on the human population is significantly decreased; where never again will a doctor have to look at a genetic sequence and ask, “What does this change mean for my patient?”

The Human Variome Project is motivated by the knowledge that by working together, we will be able to significantly reduce the needless physical, psychological, emotional and economic suffering of millions of people.

SOURCE:

http://www.humanvariomeproject.org/index.php/about/our-vision-for-the-future

Human Variome Project International Limited is a not-for-profit Australian public company limited by guarantee that was founded in 2010 to provide central coordination efforts to the global Human Variome Project effort and run the International Coordinating Office. The company has no shareholders and is endorsed by the Australian Tax Office as a deductible gift recipient as a Health Project Charity.

Human Variome Project International Limited, as a company limited by guarantee, is a public unlisted company. It must file accounts annually with the Australian Securities and Investment Commission, it must be audited and, as a public company, the directors and officers of the company must comply with all the duties and responsibilities set out in the Australian Corporations Act. UNESCO also stipulates strict conditions for compliance with its functions and operation as a non-government and non-profit making organisation.

Human Variome Project International’s objects and powers include:

  • to promote the prevention or the control of diseases in human beings
  • to develop and provide educational programs, training and courses in public administration, public sector management, public policy, public affairs and any other related fields
  • to alleviate human suffering by collecting, organising and sharing data on genetic variation;
  • to further the Human Variome Project
  • to act as the co-ordinating office for the Human Variome Project
  • to attract and employ academics, researchers, practitioners and other staff as required to provide and support the services to further the objects of the Company
  • to provide facilities for research, study and education related to the Human Variome Project
  • to carry out and conduct the business of provider of administrative and consulting services;
  • to seek, encourage and accept gifts, grants, donations or endorsements
  • to affiliate with and enter into co-operative agreements with research educational institutions, government, local governments, practitioner bodies, non-government organisations, commercial, cultural and any other institutions or bodies

Company Members

  • Mr David Abraham
  • Professor Richard Cotton
  • Sir John Burn
  • Dr David Rimoin
  • Dr Eric Haan
  • Professor Jean-Jacques Cassiman
  • (representative of) National Institute of Gene Science and Technology Development (China)

SOURCE:
http://www.humanvariomeproject.org/index.php?option=com_content&view=article&id=164&Itemid=152

Scientific Advisory Committee E-mail
The Board of Directors is advised by the Scientific Advisory Committee in matters of strategic scientific direction for current and future projects. The Scientific Advisory Committee has a variety of {ln:roles and responsibilities}, as wells as the delegated authority of the Board of Directors on the publication of all HVP Standards and Guidelines, and the arbitration of any dispute resolution processes in the generation of HVP Standards and Guidelines.The Scientific Advisory Committee consists of twelve members including one Chair. The Scientific Advisory Committee members are elected by the two Advisory Councils every two years, with half the positions on the Committee becoming vacant every two years. The Chair of the Scientific Advisory Committee is appointed by the Coordinating Office from among the members of the Scientific Advisory Committee. Membership of the Committee, in an ex-officio capacity, is also extended to:

  • the Scientific Director of the Human Variome Project Coordinating Office;
  • the President of the Human Genome Variation Society;
  • the President of the International Federation of Human Genetics Societies; and
  • a representative from the central genetic databases, chosen from amongst themselves.

Any Individual Member of the Human Variome Project Consortium is eligible to stand for election to the Scientific Advisory Committee. Candidates must be nominated and seconded by a member of either of the Advisory Councils.

The Scientific Advisory Committee meets on a face–to–face basis once per year, usually in conjunction with the HVP Fora series. The Scientific Advisory Committee also regularly meets via telephone/video–conference.

Current Committee

Arleen Auerbach The Rockefeller University USA
Mireille Claustres IURC, Institut Universitaire Clinical Research France
Richard Cotton Human Variome Project Australia
Garry Cutting Johns Hopkins School of Medicine USA
Johan T. den Dunnen Leiden University Medical Center The Netherlands
Mona El Ruby National Research Centre Egypt
Aida Falcón de Vargas Venezuelan Central University Venezuela
Marc Greenblatt University of Vermont USA
Stephen Lam Hong Kong Department of Health Hong Kong
Finlay Macrae The Royal Melbourne Hospital Australia
Yoichi Matsubara Tohoku University School of Medicine Japan
Gert-Jan B. van Ommen Leiden University Medical Center The Netherlands
Mauno Vihinen Lund University Sweden
Non-Voting Members
Professor Sir John Burn National Institute of Health Research  UK
Ming Qi Zhejiang University Medical School and James Watson Institute of Genome Sciences China
Richard Gibbs Baylor College of Medicine USA

Document Repository

Documents (minutes, etc.) relating to the International Scientific Adviosry Committee can be found here.

SOURCE:

http://www.humanvariomeproject.org/index.php/about/scientific-advisory-committee

Nature Genetics Journal

Table of contents

November 2012, Volume 44 No11 pp1171-1285

  • Credit for clinical trial data –p1171

topof page

News and Views

Tracking the evolution of cancer methylomes –pp1173 – 1174

Arnaud R Krebs & Dirk Schübeler

doi:10.1038/ng.2451

Cellular transformation in cancer has long been associated with aberrant DNA methylation, most notably, hypermethylation of promoter sequences. A new study uses a clever approach of selective high-resolution profiling to follow DNA methylation over a time course of cellular transformation and challenges the notion that hypermethylation in cancer arises in an orchestrated fashion.

Full Text- Tracking the evolution of cancer methylomes | PDF (2,267 KB)- Tracking the evolution of cancer methylomes

See also: Article by Landan et al.

Older males beget more mutations –pp1174 – 1176

Matthew Hurles

doi:10.1038/ng.2448

Three papers characterizing human germline mutation rates bolster evidence for a relatively low rate of base substitution in modern humans and highlight a central role for paternal age in determining rates of mutation. These studies represent the advent of a transformation in our understanding of mutation rates and processes, which may ultimately have public health implications.

Full Text- Older males beget more mutations | PDF (2,319 KB)- Older males beget more mutations

See also: Letter by Campbell et al.

FOXA1 and breast cancer risk –pp1176 – 1177

Kerstin B Meyer & Jason S Carroll

doi:10.1038/ng.2449

Many SNPs associated with human disease are located in non-coding regions of the genome. A new study shows that SNPs associated with breast cancer risk are located in enhancer regions and alter binding affinity for the pioneer factor FOXA1.

Full Text- FOXA1 and breast cancer risk | PDF (254 KB)- FOXA1 and breast cancer risk

See also: Article by Cowper-Sal·lari et al.

Recurrent somatic TET2 mutations in normal elderly individuals with clonal hematopoiesis –pp1179 – 1181

Lambert Busque, Jay P Patel, Maria E Figueroa, Aparna Vasanthakumar, Sylvie Provost, Zineb Hamilou, Luigina Mollica, Juan Li, Agnes Viale, Adriana Heguy, Maryam Hassimi, Nicholas Socci, Parva K Bhatt, Mithat Gonen, Christopher E Mason, Ari Melnick, Lucy A Godley, Cameron W Brennan, Omar Abdel-Wahab & Ross L Levine

doi:10.1038/ng.2413

Ross Levine, Lambert Busque and colleagues report the identification of recurrent somatic mutations in TET2 in elderly female individuals with clonal hematopoiesis. The mutations were identified in individuals without clinically apparent hematological malignancies.

Abstract- Recurrent somatic TET2 mutations in normal elderly individuals with clonal hematopoiesis | Full Text- Recurrent somatic TET2 mutations in normal elderly individuals with clonal hematopoiesis | PDF (324 KB)- Recurrent somatic TET2 mutations in normal elderly individuals with clonal hematopoiesis | Supplementary information

Genome-wide association study identifies a common variant in RAD51B associated with male breast cancer risk –pp1182 – 1184

Nick Orr, Alina Lemnrau, Rosie Cooke, Olivia Fletcher, Katarzyna Tomczyk, Michael Jones, Nichola Johnson, Christopher J Lord, Costas Mitsopoulos, Marketa Zvelebil, Simon S McDade, Gemma Buck, Christine Blancher, KConFab Consortium, Alison H Trainer, Paul A James, Stig E Bojesen, Susanne Bokmand, Heli Nevanlinna, Johanna Mattson, Eitan Friedman, Yael Laitman, Domenico Palli, Giovanna Masala, Ines Zanna, Laura Ottini, Giuseppe Giannini, Antoinette Hollestelle, Ans M W van den Ouweland, Srdjan Novaković, Mateja Krajc, Manuela Gago-Dominguez, Jose Esteban Castelao, Håkan Olsson, Ingrid Hedenfalk, Douglas F Easton, Paul D P Pharoah, Alison M Dunning, D Timothy Bishop, Susan L Neuhausen, Linda Steele, Richard S Houlston, Montserrat Garcia-Closas, Alan Ashworth & Anthony J Swerdlow

doi:10.1038/ng.2417

Nick Orr and colleagues report a genome-wide association study for male breast cancer. They identify a new susceptibility locus atRAD51B and examine association evidence for known female breast cancer loci in these cohorts.

Abstract- Genome-wide association study identifies a common variant in RAD51B associated with male breast cancer risk | Full Text- Genome-wide association study identifies a common variant in RAD51B associated with male breast cancer risk | PDF (301 KB)- Genome-wide association study identifies a common variant in RAD51B associated with male breast cancer risk | Supplementary information

A common single-nucleotide variant in T is strongly associated with chordoma –pp1185 – 1187

Nischalan Pillay, Vincent Plagnol, Patrick S Tarpey, Samira B Lobo, Nadège Presneau, Karoly Szuhai, Dina Halai, Fitim Berisha, Stephen R Cannon, Simon Mead, Dalia Kasperaviciute, Jutta Palmen, Philippa J Talmud, Lars-Gunnar Kindblom, M Fernanda Amary, Roberto Tirabosco & Adrienne M Flanagan

doi:10.1038/ng.2419

Adrienne Flanagan and colleagues identify a common variant in the T gene associated with strong risk of chordoma, a rare malignant bone tumor. The risk variant alters an amino acid in the DNA-binding domain of the T transcription factor and is associated with differential expression of T and its downstream targets.

Abstract- A common single-nucleotide variant in T is strongly associated with chordoma | Full Text- A common single-nucleotide variant in T is strongly associated with chordoma | PDF (317 KB)- A common single-nucleotide variant in T is strongly associated with chordoma | Supplementary information

Missense mutations in the sodium-gated potassium channel gene KCNT1 cause severe autosomal dominant nocturnal frontal lobe epilepsy –pp1188 – 1190

Sarah E Heron, Katherine R Smith, Melanie Bahlo, Lino Nobili, Esther Kahana, Laura Licchetta, Karen L Oliver, Aziz Mazarib, Zaid Afawi, Amos Korczyn, Giuseppe Plazzi, Steven Petrou, Samuel F Berkovic, Ingrid E Scheffer & Leanne M Dibbens

doi:10.1038/ng.2440

Samuel Berkovic and colleagues report the identification of missense mutations in KCNT1, which encodes a sodium-gated potassium channel, that cause severe autosomal dominant nocturnal frontal lobe epilepsy.

Abstract- Missense mutations in the sodium-gated potassium channel gene KCNT1 cause severe autosomal dominant nocturnal frontal lobe epilepsy | Full Text- Missense mutations in the sodium-gated potassium channel gene KCNT1 cause severe autosomal dominant nocturnal frontal lobe epilepsy | PDF (294 KB)- Missense mutations in the sodium-gated potassium channel gene KCNT1 cause severe autosomal dominant nocturnal frontal lobe epilepsy | Supplementary information


Articles

Breast cancer risk–associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression –pp1191 – 1198

Richard Cowper-Sal·lari, Xiaoyang Zhang, Jason B Wright, Swneke D Bailey, Michael D Cole, Jerome Eeckhoute, Jason H Moore & Mathieu Lupien

doi:10.1038/ng.2416

Mathieu Lupien, Jason Moore and colleagues show that breast cancer risk–associated SNPs commonly disrupt the binding of FOXA1 to chromatin, thereby directly affecting gene expression.

Abstract- Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression | Full Text- Breast cancer risk–associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression | PDF (1,353 KB)- Breast cancer risk–associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression | Supplementary information

See also: News and Views by Meyer & Carroll

LIN28B induces neuroblastoma and enhances MYCN levels via let-7 suppression –pp1199 – 1206

Jan J Molenaar, Raquel Domingo-Fernández, Marli E Ebus, Sven Lindner, Jan Koster, Ksenija Drabek, Pieter Mestdagh, Peter van Sluis, Linda J Valentijn, Johan van Nes, Marloes Broekmans, Franciska Haneveld, Richard Volckmann, Isabella Bray, Lukas Heukamp, Annika Sprüssel, Theresa Thor, Kristina Kieckbusch, Ludger Klein-Hitpass, Matthias Fischer, Jo Vandesompele, Alexander Schramm, Max M van Noesel, Luigi Varesio, Frank Speleman, Angelika Eggert, Raymond L Stallings, Huib N Caron, Rogier Versteeg & Johannes H Schulte

doi:10.1038/ng.2436

Jan Molenaar and colleagues show that LIN28B is overexpressed and amplified in human neuroblastomas and that LIN28B regulates let-7 family miRNAs and MYCN. They create a transgenic mouse model of LIN28B overexpression and show that these mice develop neuroblastoma tumors.

Abstract- LIN28B induces neuroblastoma and enhances MYCN levels via let-7 suppression | Full Text- LIN28B induces neuroblastoma and enhances MYCN levels via let-7 suppression | PDF (1,453 KB)- LIN28B induces neuroblastoma and enhances MYCN levels via let-7 suppression | Supplementary information

Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues –pp1207 – 1214

Gilad Landan, Netta Mendelson Cohen, Zohar Mukamel, Amir Bar, Alina Molchadsky, Ran Brosh, Shirley Horn-Saban, Daniela Amann Zalcenstein, Naomi Goldfinger, Adi Zundelevich, Einav Nili Gal-Yam, Varda Rotter & Amos Tanay

doi:10.1038/ng.2442

Amos Tanay and colleagues characterize DNA methylation polymorphism within cell populations and track immortalized fibroblasts in culture for over 300 generations to show that formation of differentially methylated regions occurs through a stochastic process and nearly deterministic epigenetic remodeling.

Abstract- Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues | Full Text- Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues | PDF (1,518 KB)- Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues | Supplementary information

See also: News and Views by Krebs & Schübeler

Intracontinental spread of human invasive SalmonellaTyphimurium pathovariants in sub-Saharan Africa-pp1215 – 1221

Chinyere K Okoro, Robert A Kingsley, Thomas R Connor, Simon R Harris, Christopher M Parry, Manar N Al-Mashhadani, Samuel Kariuki, Chisomo L Msefula, Melita A Gordon, Elizabeth de Pinna, John Wain, Robert S Heyderman, Stephen Obaro, Pedro L Alonso, Inacio Mandomando, Calman A MacLennan, Milagritos D Tapia, Myron M Levine, Sharon M Tennant, Julian Parkhill & Gordon Dougan

doi:10.1038/ng.2423

Gordon Dougan and colleagues report whole-genome sequencing of a global collection of 179 Salmonella Typhimurium isolates, including 129 diverse sub-Saharan African isolates associated with invasive disease. They determine the phylogenetic structure of invasive Salmonella Typhimurium in sub-Saharan Africa and find that the majority are from two closely related highly conserved lineages, which emerged in the last 60 years in close temporal association with the current HIV epidemic.

Abstract- Intracontinental spread of human invasive Salmonella Typhimurium pathovariants in sub-Saharan Africa | Full Text- Intracontinental spread of human invasive Salmonella Typhimurium pathovariants in sub-Saharan Africa | PDF (1,126 KB)- Intracontinental spread of human invasive Salmonella Typhimurium pathovariants in sub-Saharan Africa | Supplementary information


Letters

Genome-wide association study identifies eight new susceptibility loci for atopic dermatitis in the Japanese population –pp1222 – 1226

Tomomitsu Hirota, Atsushi Takahashi, Michiaki Kubo, Tatsuhiko Tsunoda, Kaori Tomita, Masafumi Sakashita, Takechiyo Yamada, Shigeharu Fujieda, Shota Tanaka, Satoru Doi, Akihiko Miyatake, Tadao Enomoto, Chiharu Nishiyama, Nobuhiro Nakano, Keiko Maeda, Ko Okumura, Hideoki Ogawa, Shigaku Ikeda, Emiko Noguchi, Tohru Sakamoto, Nobuyuki Hizawa, Koji Ebe, Hidehisa Saeki, Takashi Sasaki, Tamotsu Ebihara, Masayuki Amagai, Satoshi Takeuchi, Masutaka Furue, Yusuke Nakamura & Mayumi Tamari

doi:10.1038/ng.2438

Mayumi Tamari and colleagues report a genome-wide association study for atopic dermatitis, a chronic inflammatory skin disease, in a Japanese population. They identify eight new susceptibility loci for atopic dermatitis and compare their results to those of previous studies in European and Chinese populations.

First Paragraph- Genome-wide association study identifies eight new susceptibility loci for atopic dermatitis in the Japanese population | Full Text- Genome-wide association study identifies eight new susceptibility loci for atopic dermatitis in the Japanese population | PDF (999 KB)- Genome-wide association study identifies eight new susceptibility loci for atopic dermatitis in the Japanese population | Supplementary information

CSK regulatory polymorphism is associated with systemic lupus erythematosus and influences B-cell signaling and activation –pp1227 – 1230

Nataly Manjarrez-Orduño, Emiliano Marasco, Sharon A Chung, Matthew S Katz, Jenna F Kiridly, Kim R Simpfendorfer, Jan Freudenberg, David H Ballard, Emil Nashi, Thomas J Hopkins, Deborah S Cunninghame Graham, Annette T Lee, Marieke J H Coenen, Barbara Franke, Dorine W Swinkels, Robert R Graham, Robert P Kimberly, Patrick M Gaffney, Timothy J Vyse, Timothy W Behrens, Lindsey A Criswell, Betty Diamond & Peter K Gregersen

doi:10.1038/ng.2439

Peter Gregersen and colleagues identify a regulatory variant inCSK, coding for an intracellular kinase that physically interacts with Lyp (PTPN22), associated with systemic lupus erythematosus (SLE). Their work suggests that the Lyp-Csk complex influences susceptibility to SLE through regulation of B-cell signaling, maturation and activation.

First Paragraph- CSK regulatory polymorphism is associated with systemic lupus erythematosus and influences B-cell signaling and activation | Full Text- CSK regulatory polymorphism is associated with systemic lupus erythematosus and influences B-cell signaling and activation | PDF (747 KB)- CSK regulatory polymorphism is associated with systemic lupus erythematosus and influences B-cell signaling and activation | Supplementary information

Genome-wide association study in Chinese men identifies two new prostate cancer risk loci at 9q31.2 and 19q13.4 –pp1231 – 1235

Jianfeng Xu, Zengnan Mo, Dingwei Ye, Meilin Wang, Fang Liu, Guangfu Jin, Chuanliang Xu, Xiang Wang, Qiang Shao, Zhiwen Chen, Zhihua Tao, Jun Qi, Fangjian Zhou, Zhong Wang, Yaowen Fu, Dalin He, Qiang Wei, Jianming Guo, Denglong Wu, Xin Gao, Jianlin Yuan, Gongxian Wang, Yong Xu, Guozeng Wang, Haijun Yao, Pei Dong, Yang Jiao, Mo Shen, Jin Yang, Jun Ou-Yang, Haowen Jiang, Yao Zhu, Shancheng Ren, Zhengdong Zhang, Changjun Yin, Xu Gao, Bo Dai, Zhibin Hu, Yajun Yang, Qijun Wu, Hongyan Chen, Peng Peng, Ying Zheng, Xiaodong Zheng, Yongbing Xiang, Jirong Long, Jian Gong, Rong Na, Xiaoling Lin, Hongjie Yu, Zhong Wang, Sha Tao, Junjie Feng, Jishan Sun, Wennuan Liu, Ann Hsing, Jianyu Rao, Qiang Ding, Fredirik Wiklund, Henrik Gronberg, Xiao-Ou Shu, Wei Zheng, Hongbing Shen, Li Jin, Rong Shi, Daru Lu, Xuejun Zhang, Jielin Sun, S Lilly Zheng & Yinghao Sun

doi:10.1038/ng.2424

Yinghao Sun and colleagues report a genome-wide association study for prostate cancer in Han Chinese men. They identify two new risk-associated loci at chromosomes 9q31 and 19q13.

First Paragraph- Genome-wide association study in Chinese men identifies two new prostate cancer risk loci at 9q31.2 and 19q13.4 | Full Text- Genome-wide association study in Chinese men identifies two new prostate cancer risk loci at 9q31.2 and 19q13.4 | PDF (686 KB)- Genome-wide association study in Chinese men identifies two new prostate cancer risk loci at 9q31.2 and 19q13.4 | Supplementary information

Epigenomic analysis detects widespread gene-body DNA hypomethylation in chronic lymphocytic leukemia-pp1236 – 1242

Marta Kulis, Simon Heath, Marina Bibikova, Ana C Queirós, Alba Navarro, Guillem Clot, Alejandra Martínez-Trillos, Giancarlo Castellano, Isabelle Brun-Heath, Magda Pinyol, Sergio Barberán-Soler, Panagiotis Papasaikas, Pedro Jares, Sílvia Beà, Daniel Rico, Simone Ecker, Miriam Rubio, Romina Royo, Vincent Ho, Brandy Klotzle, Lluis Hernández, Laura Conde, Mónica López-Guerra, Dolors Colomer, Neus Villamor, Marta Aymerich, María Rozman, Mónica Bayes, Marta Gut, Josep L Gelpí, Modesto Orozco, Jian-Bing Fan, Víctor Quesada, Xose S Puente, David G Pisano, Alfonso Valencia, Armando López-Guillermo, Ivo Gut, Carlos López-Otín, Elías Campo & José I Martín-Subero

doi:10.1038/ng.2443

José Martin-Subero and colleagues report whole-genome bisulfite sequencing and methylome analysis of two CLLs and three B-cell subpopulations using high-density microarrays on 139 CLLs. They identify widespread hypomethylation in the gene body that is largely associated with intragenic enhancer elements.

First Paragraph- Epigenomic analysis detects widespread gene-body DNA hypomethylation in chronic lymphocytic leukemia | Full Text- Epigenomic analysis detects widespread gene-body DNA hypomethylation in chronic lymphocytic leukemia | PDF (2,067 KB)- Epigenomic analysis detects widespread gene-body DNA hypomethylation in chronic lymphocytic leukemia | Supplementary information

Mutations in ADAR1 cause Aicardi-Goutières syndrome associated with a type I interferon signature –pp1243 – 1248

Gillian I Rice, Paul R Kasher, Gabriella M A Forte, Niamh M Mannion, Sam M Greenwood, Marcin Szynkiewicz, Jonathan E Dickerson, Sanjeev S Bhaskar, Massimiliano Zampini, Tracy A Briggs, Emma M Jenkinson, Carlos A Bacino, Roberta Battini, Enrico Bertini, Paul A Brogan, Louise A Brueton, Marialuisa Carpanelli, Corinne De Laet, Pascale de Lonlay, Mireia del Toro, Isabelle Desguerre, Elisa Fazzi, Àngels Garcia-Cazorla, Arvid Heiberg, Masakazu Kawaguchi, Ram Kumar, Jean-Pierre S-M Lin, Charles M Lourenco, Alison M Male, Wilson Marques Jr, Cyril Mignot, Ivana Olivieri, Simona Orcesi, Prab Prabhakar, Magnhild Rasmussen, Robert A Robinson, Flore Rozenberg, Johanna L Schmidt, Katharina Steindl, Tiong Y Tan, William G van der Merwe, Adeline Vanderver, Grace Vassallo, Emma L Wakeling, Evangeline Wassmer, Elizabeth Whittaker, John H Livingston, Pierre Lebon, Tamio Suzuki, Paul J McLaughlin, Liam P Keegan, Mary A O’Connell, Simon C Lovell & Yanick J Crow

doi:10.1038/ng.2414

Yanick Crow and colleagues show that mutations in ADAR1 cause the autoimmune disorder Aicardi-Goutières syndrome, accompanied by upregulation of interferon-stimulated genes.ADAR1 encodes an enzyme that catalyzes the deamination of adeonosine to inosine in double-stranded RNA, and the findings suggest a possible role for RNA editing in limiting the accumulation of repeat-derived RNA species.

First Paragraph- Mutations in ADAR1 cause Aicardi-Goutieres syndrome associated with a type I interferon signature | Full Text- Mutations in ADAR1 cause Aicardi-Goutières syndrome associated with a type I interferon signature | PDF (844 KB)- Mutations in ADAR1 cause Aicardi-Goutières syndrome associated with a type I interferon signature | Supplementary information

Mutations in the TGF-β repressor SKI cause Shprintzen-Goldberg syndrome with aortic aneurysm-pp1249 – 1254

Alexander J Doyle, Jefferson J Doyle, Seneca L Bessling, Samantha Maragh, Mark E Lindsay, Dorien Schepers, Elisabeth Gillis, Geert Mortier, Tessa Homfray, Kimberly Sauls, Russell A Norris, Nicholas D Huso, Dan Leahy, David W Mohr, Mark J Caulfield, Alan F Scott, Anne Destrée, Raoul C Hennekam, Pamela H Arn, Cynthia J Curry, Lut Van Laer, Andrew S McCallion, Bart L Loeys & Harry C Dietz

doi:10.1038/ng.2421

Harry Dietz and colleagues report the identification of mutations in SKI in Shprintzen-Goldberg syndrome, which shares features with Marfan syndrome and Loeys-Dietz syndrome. SKI encodes a known repressor of TGF-β activity, and this work provides evidence for paradoxical increased TGF-β signaling as the mechanism underlying these related syndromes.

First Paragraph- Mutations in the TGF-[beta] repressor SKI cause Shprintzen-Goldberg syndrome with aortic aneurysm | Full Text- Mutations in the TGF-β repressor SKI cause Shprintzen-Goldberg syndrome with aortic aneurysm | PDF (1,158 KB)- Mutations in the TGF-β repressor SKI cause Shprintzen-Goldberg syndrome with aortic aneurysm | Supplementary information

De novo gain-of-function KCNT1 channel mutations cause malignant migrating partial seizures of infancy-pp1255 – 1259

Giulia Barcia, Matthew R Fleming, Aline Deligniere, Valeswara-Rao Gazula, Maile R Brown, Maeva Langouet, Haijun Chen, Jack Kronengold, Avinash Abhyankar, Roberta Cilio, Patrick Nitschke, Anna Kaminska, Nathalie Boddaert, Jean-Laurent Casanova, Isabelle Desguerre, Arnold Munnich, Olivier Dulac, Leonard K Kaczmarek, Laurence Colleaux & Rima Nabbout

doi:10.1038/ng.2441

Rima Nabbout and colleagues report the identification of de novomutations in the KCNT1 potassium channel gene in individuals with malignant migrating partial seizures of infancy, a rare epileptic encephalopathy with pharmacoresistant seizures and developmental delay. The authors show that the mutations have a gain-of-function effect on KCNT1 channel activity.

First Paragraph- De novo gain-of-function KCNT1 channel mutations cause malignant migrating partial seizures of infancy | Full Text- De novo gain-of-function KCNT1 channel mutations cause malignant migrating partial seizures of infancy | PDF (745 KB)- De novo gain-of-function KCNT1 channel mutations cause malignant migrating partial seizures of infancy | Supplementary information

CHMP1A encodes an essential regulator of BMI1-INK4A in cerebellar development –pp1260 – 1264

Ganeshwaran H Mochida, Vijay S Ganesh, Maria I de Michelena, Hugo Dias, Kutay D Atabay, Katie L Kathrein, Hsuan-Ting Huang, R Sean Hill, Jillian M Felie, Daniel Rakiec, Danielle Gleason, Anthony D Hill, Athar N Malik, Brenda J Barry, Jennifer N Partlow, Wen-Hann Tan, Laurie J Glader, A James Barkovich, William B Dobyns, Leonard I Zon & Christopher A Walsh

doi:10.1038/ng.2425

Christopher Walsh and colleagues identify mutations in CHMP1Ain human cerebellar hypoplasia and microcephaly. Cells lackingCHMP1A show decreased cell proliferation and decreased expression of BMI1, a negative regulator of stem cell proliferation.

First Paragraph- CHMP1A encodes an essential regulator of BMI1-INK4A in cerebellar development | Full Text- CHMP1A encodes an essential regulator of BMI1-INK4A in cerebellar development | PDF (1,449 KB)- CHMP1A encodes an essential regulator of BMI1-INK4A in cerebellar development | Supplementary information

Alterations of the CIB2 calcium- and integrin-binding protein cause Usher syndrome type 1J and nonsyndromic deafness DFNB48 –pp1265 – 1271

Saima Riazuddin, Inna A Belyantseva, Arnaud P J Giese, Kwanghyuk Lee, Artur A Indzhykulian, Sri Pratima Nandamuri, Rizwan Yousaf, Ghanshyam P Sinha, Sue Lee, David Terrell, Rashmi S Hegde, Rana A Ali, Saima Anwar, Paula B Andrade-Elizondo, Asli Sirmaci, Leslie V Parise, Sulman Basit, Abdul Wali, Muhammad Ayub, Muhammad Ansar, Wasim Ahmad, Shaheen N Khan, Javed Akram, Mustafa Tekin, Sheikh Riazuddin, Tiffany Cook, Elke K Buschbeck, Gregory I Frolenkov, Suzanne M Leal, Thomas B Friedman & Zubair M Ahmed

doi:10.1038/ng.2426

Zubair Ahmed and colleagues identify homozygous mutations inCIB2, a gene that encodes a calcium- and integrin-binding protein, that cause Usher syndrome type 1J and nonsyndromic deafness DFNB48. CIB2 is required for hair cell development and retinal photoreceptor cells in zebrafish and Drosophila melanogaster.

First Paragraph- Alterations of the CIB2 calcium- and integrin-binding protein cause Usher syndrome type 1J and nonsyndromic deafness DFNB48 | Full Text- Alterations of the CIB2 calcium- and integrin-binding protein cause Usher syndrome type 1J and nonsyndromic deafness DFNB48 | PDF (1,380 KB)- Alterations of the CIB2 calcium- and integrin-binding protein cause Usher syndrome type 1J and nonsyndromic deafness DFNB48 | Supplementary information

Haploinsufficiency for AAGAB causes clinically heterogeneous forms of punctate palmoplantar keratoderma –pp1272 – 1276

Elizabeth Pohler, Ons Mamai, Jennifer Hirst, Mozheh Zamiri, Helen Horn, Toshifumi Nomura, Alan D Irvine, Benvon Moran, Neil J Wilson, Frances J D Smith, Christabelle S M Goh, Aileen Sandilands, Christian Cole, Geoffrey J Barton, Alan T Evans, Hiroshi Shimizu, Masashi Akiyama, Mitsuhiro Suehiro, Izumi Konohana, Mohammad Shboul, Sebastien Teissier, Lobna Boussofara, Mohamed Denguezli, Ali Saad, Moez Gribaa, Patricia J Dopping-Hepenstal, John A McGrath, Sara J Brown, David R Goudie, Bruno Reversade, Colin S Munro & W H Irwin McLean

doi:10.1038/ng.2444

Irwin McLean and colleagues report that heterozygous loss-of-function mutations in AAGAB, which encodes a cytosolic protein implicated in vesicular trafficking, cause punctate palmoplantar keratoderma. They further show that knockdown of AAGAB in keratinocytes leads to increased cell proliferation accompanied by highly elevated levels of epidermal growth factor receptor.

First Paragraph- Haploinsufficiency for AAGAB causes clinically heterogeneous forms of punctate palmoplantar keratoderma | Full Text- Haploinsufficiency for AAGAB causes clinically heterogeneous forms of punctate palmoplantar keratoderma | PDF (848 KB)- Haploinsufficiency for AAGAB causes clinically heterogeneous forms of punctate palmoplantar keratoderma | Supplementary information

Estimating the human mutation rate using autozygosity in a founder population –pp1277 – 1281

Catarina D Campbell, Jessica X Chong, Maika Malig, Arthur Ko, Beth L Dumont, Lide Han, Laura Vives, Brian J O’Roak, Peter H Sudmant, Jay Shendure, Mark Abney, Carole Ober & Evan E Eichler

doi:10.1038/ng.2418

Evan Eichler and colleagues report an estimate of the mutation rate in humans that is based on the whole-genome sequences of five parent-offspring trios from a Hutterite population and genotyping data from an extended pedigree. They use a new approach for estimating the mutation rate over multiple generations that takes into account the extensive autozygosity in this founder population.

First Paragraph- Estimating the human mutation rate using autozygosity in a founder population | Full Text- Estimating the human mutation rate using autozygosity in a founder population | PDF (620 KB)- Estimating the human mutation rate using autozygosity in a founder population | Supplementary information

See also: News and Views by Hurles

Variation in germline mtDNA heteroplasmy is determined prenatally but modified during subsequent transmission –pp1282 – 1285

Christoph Freyer, Lynsey M Cree, Arnaud Mourier, James B Stewart, Camilla Koolmeister, Dusanka Milenkovic, Timothy Wai, Vasileios I Floros, Erik Hagström, Emmanouella E Chatzidaki, Rudolf J Wiesner, David C Samuels, Nils-Göran Larsson & Patrick F Chinnery

doi:10.1038/ng.2427

Patrick Chinnery, Nils-Goran Larsson and colleagues show that mitochondrial heteroplasmy levels are principally determined prenatally within the developing female germline in mice transmitting a heteroplasmic single base-pair deletion in the mitochondrial tRNAMet gene.

First Paragraph- Variation in germline mtDNA heteroplasmy is determined prenatally but modified during subsequent transmission | Full Text- Variation in germline mtDNA heteroplasmy is determined prenatally but modified during subsequent transmission | PDF (523 KB)- Variation in germline mtDNA heteroplasmy is determined prenatally but modified during subsequent transmission | Supplementary information

SOURCE:

http://www.nature.com/ng/journal/v44/n11/index.html 

Read Full Post »

Recurrent somatic mutations in chromatin-remodeling and ubiquitin ligase complex genes in serous endometrial tumors

Reporter and Curator: Dr. Sudipta Saha, Ph.D.

Endometrial cancer is the sixth most commonly diagnosed cancer in women worldwide, causing ~74,000 deaths annually1. Serous endometrial cancers are a clinically aggressive subtype with a poorly defined genetic etiology2–4.

Whole-exome sequencing was used to comprehensively search for somatic mutations within ~22,000 protein-encoding genes in 1 13 primary serous endometrial tumors. Subsequently 18 genes were resequenced, which were mutated in more than 1 1 1 tumor and/or were components of an enriched functional grouping, from 40 additional serous tumors. High frequencies of somatic mutations in CHD4 (17%), EP300 (8%), ARID1A (6%), TSPYL2 (6%), FBXW7 (29%), SPOP (8%), MAP3K4 (6%) and ABCC9 (6%) were identified. Overall, 36.5% of serous tumors had a mutated chromatin-remodeling gene, and 35% had a mutated ubiquitin ligase complex gene, implicating frequent mutational disruption of these processes in the molecular pathogenesis of one of the deadliest forms of endometrial cancer.

The study provides new insights into the somatic mutations present in serous endometrial cancer exomes. However, it is important to acknowledge that this discovery screen is underpowered to detect all somatically mutated genes that drive serous tumors. For example, PIK3R1, which was previously found to be somatically mutated in 8% of serous endometrial tumors58, was not somatically mutated in the tumors that formed this discovery screen.

It was estimated that, for genes that are mutated in 8% of all serous endometrial cancers, a discovery screen of 12 tumors has 25% power to detect 2 mutated tumors and 63% power to detect 1 mutated tumor; for genes that are mutated in 20% of all serous endometrial cancers, the discovery screen had an estimated 72.5% power to detect 2 mutated tumors and 93% power to detect 1 mutated tumor.

Massively parallel sequencing of additional cases will undoubtedly yield deeper insights into the mutational landscape of serous endometrial cancer. Here, it was reported one of the first exome sequencing analyses of serous endometrial cancers, which are clinically aggressive tumors that have been poorly characterized genomically.

The findings implicate the disruption of chromatin-remodeling and ubiquitin ligase complex genes in

  • 50% of serous endometrial tumors and
  • 35% of clear-cell endometrial tumors.

The high frequency and specific distributions of mutations in CHD4, FBXW7 and SPOP strongly suggest that these are likely to be driver events in serous endometrial cancer.

Source References:

http://www.ncbi.nlm.nih.gov/pubmed?term=Exome%20sequencing%20of%20serous%20endometrial%20tumors%20identifies%20recurrent%20somatic%20mutations%20in%20chromatin-remodeling%20and%20ubiquitin%20ligase%20complex%20genes

 

Read Full Post »

ENCODE data reveals important information from Genome Wide Association Studies relevant to understanding complex genetic diseases

Author: Ritu Saxena, Ph.D.

 

Introduction

“The depth, quality, and diversity of the ENCODE data are unprecedented” is what was stated by John Stamatoyannopoulos, professor of genomic sciences at the University of Washington and one of the many principle investigators of ENCODE project. ENCODE (Encyclopedia of DNA elements), indeed, was an ambitious project launched as a pilot in 2003 and then expanded in 2007 for the whole genome analysis and identification of all the functional elements of the human genome. The findings were striking as they challenged the definition of “gene” and ‘the central dogma of genetics (Gene-mRNA-protein). Infact, the non-coding part that constitutes about 80% of the genome or the so-called “junk DNA” was found to contain elements crucial for gene regulation. The elements, in large part, include RNA transcripts that are not transcribed into proteins but might have a regulatory role. For detailed reading, refer to the findings published in the issue of Nature, The ENCODE Project Consortium Nature 489, 57–74 (2012) An integrated encyclopedia of DNA elements in the human genome

Key features of the data, as explained in the National Human Genome Research Institute website (National Human Genome Research Institute News feature), include comprehensive mapping of:

  • Protein-coding genes — Proteins are molecules made of amino acids linked together in a specific sequence; the amino acid sequence is encoded by the sequence of DNA subunits called nucleotides that make up genes.
  • Non-coding genes — Stretches of DNA that are read by the cell as if they were genes but do not encode proteins. These appear to help regulate the activity of the genome.
  • Chromatin structure features — Complex physical structures made from a combination of DNA and binding proteins that make up the contents of the nucleus and affects genome function.
  • Histone modifications — Histones are the proteins that make up the chromatin structures that help shape and control the genome. In addition, histone proteins can be physically modified by adding chemical groups, such as a methyl molecule, that further regulates genomic activity.
  • DNA methylation — Just like histones, methyl groups can be added to DNA itself in a process called DNA methylation. Chemically attaching methyl groups to DNA physically changes the ability of enzymes to reach the DNA and thus alters the gene expression pattern in cells. Methylation helps cells “remember what they are doing” or alter levels of gene expression, and it is a crucial part of normal development and cellular differentiation in higher organisms.
  • Transcription factor binding sites — Transcription factors are proteins that bind to specific DNA sequences, controlling the flow (or transcription) of genetic information from DNA to mRNA. Mapping the binding sites can help researchers understand how genomic activity is controlled.

How could ENCODE be helpful in the study of complex human diseases?

Complex diseases and Genome wide association studies (GWAS)

Coronary artery disease, type 2 diabetes and many forms of cancer are complex human diseases that have a significant genetic component. Unlike mendelian disorders that have defined loci, the genetic component of complex disorders lies in the form of genetic variations in the genome making an individual susceptible to these complex diseases.

Researchers have performed Genome-wide association studies (GWAS) of the human genome, leading to the identification of thousands of DNA variants that could be linked with complex traits and diseases. However, identifying the variants, referred to as SNPs (Single Nucleotide Polymorphisms), that actually contribute to the disease, and understanding how they exert influence on a disease has been more of a mystery.

How would ENCODE solve the puzzle?

The puzzle lies in interpreting how the SNPs found in the genome affect a person’s susceptibility to a particular trait or disease and what is the mechanism behind it. As identified in the GWAS, most variants that are associated with the phenotype of the trait or disease lie in the non-coding region of the genome. Infact, in more than 400 studies compiled in the GWAS catalog only a small minority of the trait/disease-associated SNPs occur in protein-coding regions; the large majority (89%) are in noncoding regions. These variants fall in the gene deserts that lie far from protein-coding region, similar to those where cis-regulatory modules (CRMs) are found. CRMs such as promoters and enhancers are a group of binding sites for transcription factors, and the presence of transcription factors bound to these sites is a good indicator of the potential regulatory regions.

The integrative analysis of ENCODE data has give important insights to the results of GWAS studies. Investigators have employed ENCODE data as an initial guide to discover regulatory regions in which genetic variation is affecting a complex trait. Additionally, ENCODE study when examined the SNPs from GWAS that were associated with the phenotype of the trait, found that these regions are enriched in DNase-sensitive regions i.e, lie in the function-associated DNA region of the genome as it could be bound by transcription factors affecting the regulation of gene expression. Thus, the project demonstrates that non-coding regions must be considered when interpreting GWAS results, and it provides a strong motivation for reinterpreting previous GWAS findings.

Using ENCODE Data to Interpret GWAS Results

ENCODE and predisposition to CANCER:

C-Myc, a proto-oncogene, codes for a transcripton factor, when expressed constitutively leads to uninhibited cell proliferation resulting in cancer. It has been observed that common variants within a ~1 Mb region upstream of c-Myc gene have been associated with cancers of the colon, prostate, and breast. Several SNPs have been reported in this region, that although affect the phenotype, lie in the distal cis-region of the MYC gene. Alignment of the ENCODE data in this region with the significant variants from the GWAS also reveals that key variants are found in the transcription factor occupied DNA segments mapped by this consortium. One variant rs698327, lies within a DNase hypersensitive site that is bound by several transcription factors, enhancer-associated protein p300, and contains histone modifications relative to enhancers (high H3K4me1, low H3K4me3). ENCODE data indicates that non-coding regions in the human chromosome 8q24 loci are associated with cancer and as observed in the case of c-myc gene, similar studies on cancer-related genes could help explain predisposition to cancer.

ENCODE and fetal hemoglobin expression:

Another example of the use of ENCODE data is that of gene regulation of fetal hemoglobin. Several regions were predicted via ENCODE that were involved in the regulation of fetal hemoglobin. It was found that these predicted regions are close to the SNPs in the BLC11A gene that is associated with persistent expression of fetal hemoglobin.

Future perspective

As evident from the above examples, the ENCODE data shows that genetic variants do affect regulated expression of a target gene. Recently, several research groups in the UK performed a large-scale GWAS study to determine the genetic predisposition to fracture risk. The collaborative effort, published in a recent issue of the PLoS journal, was made to identify genetic variants associated with cortical bone thickness (CBT) and bone mineral density (BMD) with data from more than 10,000 subjects. http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002745 The study generated a wealth of data including the result – identification of SNPs in the WNT16 and its adjacent gene, FAM3C were found to be relevant to CBT and BMD. ENCODE data, in this case, could be helpful in interpreting more detailed information including determining additional SNPs, the regulatory information of the genes involved and much more. Thus, it could be concluded that ENCODE data could be immensely useful in interpreting associations between disease and DNA sequences that can vary from person to person.

Sources:

Research articles

An integrated encyclopedia of DNA elements in the human genome

A User’s Guide to the Encyclopedia of DNA Elements (ENCODE)

What does our genome encode?

Genome-wide Epigenetic Data Facilitate Understanding of Disease Susceptibility Association Studies

Genomics: ENCODE explained

ENCODE Project Writes Eulogy For Junk DNA

WNT16 Influences Bone Mineral Density, Cortical Bone Thickness, Bone Strength, and Osteoporotic Fracture Risk

 News articles

ENCODE project: In massive genome analysis new data suggests ‘gene’ redefinition

National Human Genome Research Institute News feature

Related posts

Expanding the Genetic Alphabet and linking the genome to the metabolome

Junk DNA codes for valuable miRNAs: non-coding DNA controls Diabetes

ENCODE Findings as Consortium

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

Set of Papers Outline ENCODE Findings as Consortium Looks Ahead to Future Studies

NEW YORK (GenomeWeb News) – An international collaboration involving more than 400 researchers working to characterize gene regulatory networks in the human genome is publishing dozens of new studies this week.

In papers appearing in NatureScienceGenome ResearchGenome BiologyJournal of Biological Chemistry, and elsewhere, members of the Encyclopedia of DNA Elements, or ENCODE, consortium describe approaches used to define some four million regulatory regions in the genome, among other things. All told, the team explained, ENCODE efforts have made it possible assign biological functions to around 80 percent of genome sequences — filling in large gaps left by studies that focused on protein-coding sequences alone.

“We found that a much bigger part of the genome — a surprising amount, in fact — is involved in controlling when and where proteins are produced, than in simply manufacturing the building blocks,” ENCODE’s lead analysis coordinator Ewan Birney, associate director of the European Molecular Biology Laboratory European Bioinformatics Institute, said in a statement.

“This concept of ‘junk DNA,’ which has been sort of perpetuated for the past 20 years or so is really not accurate,” ENCODE researcher Rick Myers, director of the HudsonAlpha Institute for Biotechnology, said during a telephone briefing with reporters today. “Most of the genome — more than 80 percent of the base pairs in the genome — has some biological activity, some biological function.”

Researchers participating in a complementary effort within the larger ENCODE project, known as GENCODE, more completely characterize the coding portions of the genome. “As part of the ENCODE project, we both tidied up the protein-coding genes and we also found many non-coding RNA genes as well,” Birney said during today’s telebriefing.

Based on the success of ENCODE so far, the project is expected to be extended by another four years or so. The amount of new funding from the National Human Genome Research Institute for that follow-up work is expected to be as high as $123 million.

“Later this month, NHGRI will be announcing a new round of funding that will take the ENCODE project into its next phase,” NHGRI Director Eric Green said during the call.

Studies done in the decade or so since the human genome was deciphered have highlighted how little of the genome is actually comprised of gene sequences. With the realization that only around 2 percent of the genome is dedicated to protein-coding functions came a spate of speculation about the role of the other 98 percent of genome.

While this portion of the genome was suspected of harboring regulatory sequences, the extent of that regulation and its impact on coding sequences in human tissues over time was not known.

“When the Human Genome Project ended in 2003, we quickly realized that we understood the meaning of only a very small percent of the human genome’s letters,” Green explained. “We did know the genetic code for determining the order of amino acids and proteins, but we understood precious little about the signals that turned genes on or off — or that controlled the amount of proteins produced in different tissues.”

To begin studying such control networks systematically, the international ENCODE consortium kicked off the main phase of its analyses in 2007, following an earlier pilot study.

NHGRI has provided $123 million for the project over the past five years. Another $30 million went to support the development of ENCODE-related technologies since the ENCODE pilot started in 2003, while $40.6 million from NHGRI went towards the pilot itself.

During the study’s main phase, investigators from nearly three-dozen labs around the world took multi-pronged approaches to assess transcription factor binding patterns, histone modification patterns, chromatin structure signatures and other features of the genome that interact with one another to control gene expression over time and across different tissues in the body.

To accomplish the roughly 1,600 experiments done to test some 180 cell types for ENCODE, teams turned to methods such as chromatin immunoprecipitation coupled with sequencing to define the genome-wide binding patterns for more than 100 different transcription factors, for example, while other strategies were used to profile DNA methylation patterns, chromatin features, and so forth.

“It’s really a detailed hierarchy, where proteins bind and epigenetic marks — like DNA methylation and other marks — precisely cooperate and regulate how the genes are going to get turned on [or off] and the amount of this,” Myers said. “These complex networks are one of the big components of the contributions of the 30 papers that are being published today.”

For example, a University of Washington-led team reporting in Science online todaydefined millions of regulatory regions, including some that are operational during normal development, by taking advantage of an enzyme known as DNase I, which chops off DNA specifically at open chromatin sites in the genome. That group found that more than three-quarters of disease-associated variants identified in genome-wide association studies fall in parts of the genome that overlap with regulatory sites.

“We now know that the majority of these changes that are associated with common diseases and traits that don’t fall within genes actually occur within the gene-controlling switches,” University of Washington genome sciences researcher John Stamatoyannopoulos, senior author on that study, said during today’s telebriefing. “This phenomenon is not confined to a particular type of disease. It seems to be present across the board for a very wide variety of different diseases and traits.”

Results from such analyses also hint that some outwardly unrelated conditions might be traced back to similar regulatory processes. And, researchers say, by bringing together information on active regulatory regions with disease-risk variants, it may be possible to define new functionally important tissues for certain conditions.

“By creating these extensive blueprints of the control circuitry, we’re now exposing previously hidden connections between different kinds of diseases that may explain common clinical features,” Stamatoyannopoulos said.

“This has also allowed us to see that the GWAS studies that have been performed contain far more information than was previously believed,” he added, “because hundreds of additional DNA changes that were not thought to be important also appear to affect these gene-controlling switches.”

The new data are also expected to help in understanding genetic disease and interpreting information from personal genomes, according to Michael Snyder, an ENCODE investigator and director of Stanford University’s Center of Genomics and Personalized Medicine.

“We believe the ENCODE project will have a profound impact on personal genomes and, ultimately on personalized medicine,” Snyder told reporters. “We can now better see what personal variants do, in terms of causing phenotypic differences, drug responses, and disease risk.”

Many of the studies stemming from ENCODE can be viewed through a Nature,Genome Research, and Genome Biology-conceived website that links ENCODE papers that share themes or “threads” that are related to one another.

Along with the newly published papers, the ENCODE team is making data available to other members of the research community through the project’s website. Data from studies can also be accessed through an ENCODE browser housed at the University of California at Santa Cruz or via NCBI or EBI sites.

“For basic researchers, the ENCODE data represents a powerful resource for understanding fundamental questions about how life is encoded in our genome,” NHGRI’s Green said. “For more clinically-oriented researchers, the ENCODE data provide key information about which genome sequences are functionally important.”

Related Stories

  • Team IDs Characteristic Epigenetic Enhancer Patterns in Colon Cancer
    April 12, 2012 / GenomeWeb Daily News
  • NIH to Award $25M for Newborn Sequencing Studies
    August 10, 2012 / GenomeWeb Daily News
  • Illumina Q2 Revenues Down 2 Percent
    July 25, 2012 / GenomeWeb Daily News
  • Study: Exon Arrays Have Benefits over RNA-seq, but Fall Short in Finding Novel Transcription Events
    July 10, 2012 / In Sequence
  • Consortium Members Publish Collection of Studies Stemming from Human Microbiome Project
    June 13, 2012 / GenomeWeb Daily News
    Source:

    NEWS & VIEWS

    52 | NATURE | VOL 489 | 6 SEPTEMBER 2012

    FORUM: Genomics

    ENCODE explained

    The Encyclopedia of DNA Elements (ENCODE) project dishes up a hearty banquet of data that illuminate the roles of the functional elements of the human genome. Here, five scientists describe the project and discuss how the data are influencing research directions across many fields. See Articles p.57, p.75, p.83, p.91, p.101 & Letter p.109

    Serving up a genome feast

    JOSEPH R. ECKER

    Starting with a list of simple ingredients and blending them in the precise amounts needed to prepare a gourmet meal is a challenging task. In many respects, this task is analogous to the goal of the ENCODE project1, the recent progress of which is described in this issue2–7. The project aims to fully describe the list of common ingredients (functional elements) that make up the human genome (Fig. 1). When mixed in the right proportions, these ingredients constitute the information needed to build all the types of cells, body organs and, ultimately, an entire person from a single genome.

    The ENCODE pilot project8 focused on just 1% of the genome — a mere appetizer — and its results hinted that the list of human genes was incomplete. Although there was scepticism about the feasibility of scaling up the project to the entire genome and to many hundreds of cell types, recent advances in low-cost, rapid DNA-sequencing technology radically changed that view9. Now the ENCODE consortium presents a menu of 1,640 genome-wide data sets prepared from 147 cell types, providing a six-course serving of papers in Nature, along with many companion publications in other journals.

    One of the more remarkable findings described in the consortium’s ‘entrée’ paper (page 57)2 is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly ‘junk DNA’. The authors report that the space between genes is filled with enhancers (regulatory DNA elements), promoters (the sites at which DNA’s transcription into RNA is initiated) and numerous previously overlooked regions that encode RNA transcripts that are not translated into proteins but might have regulatory roles. Of note, these results show that many DNA variants previously correlated with certain diseases lie within or very near non-coding functional DNA elements, providing new leads for linking genetic variation and disease.

    The five companion articles3–7 dish up diverse sets of genome-wide data regarding the mapping of transcribed regions, DNA binding of regulatory proteins (transcription factors) and the structure and modifications of chromatin (the association of DNA and proteins that makes up chromosomes), among other delicacies.

    Djebali and colleagues3 (page 101) describe ultra-deep sequencing of RNAs prepared from many different cell lines and from specific compartments within the cells. They conclude that about 75% of the genome is transcribed at some point in some cells, and that genes are highly interlaced with overlapping transcripts that are synthesized from both DNA strands. These findings force a rethink of the definition of a gene and of the minimum unit of heredity.

    Moving on to the second and third courses, Thurman et al.4 and Neph et al.5 (pages 75 and 83) have prepared two tasty chromatin-related treats. Both studies are based on the DNase I hypersensitivity assay, which detects genomic regions at which enzyme access to, and subsequent cleavage of, DNA is unobstructed by chromatin proteins. The authors identified cell-specific patterns of DNase I hypersensitive sites that show remarkable concordance with experimentally determined and computationally predicted binding sites of transcription factors. Moreover, they have doubled the number of known recognition sequences for DNA-binding proteins in the human genome, and have revealed a 50-base-pair ‘footprint’ that is present in thousands of promoters5.

    The next course, provided by Gerstein and colleagues6 (page 91) examines the principles behind the wiring of transcription-factor networks. In addition to assigning relatively simple functions to genome elements (such as ‘protein X binds to DNA element Y’), this study attempts to clarify the hierarchies of transcription factors and how the intertwined networks arise.

    Beyond the linear organization of genes and transcripts on chromosomes lies a more complex (and still poorly understood) network of chromosome loops and twists through which promoters and more distal elements, such as enhancers, can communicate their regulatory information to each other. In the final course of the ENCODE genome feast, Sanyal and colleagues7 (page 109) map more than 1,000 of these long-range signals in each cell type. Their findings begin to overturn the long-held (and probably oversimplified) prediction that the regulation of a gene is dominated by its proximity to the closest regulatory elements.

    One of the major future challenges for ENCODE (and similarly ambitious projects) will be to capture the dynamic aspects of gene regulation. Most assays provide a single snapshot of cellular regulatory events, whereas a time series capturing how such processes change is preferable. Additionally, the examination of large batches of cells — as required for the current assays — may present too simplified a view of the underlying regulatory complexity, because individual cells in a batch (despite being genetically identical) can sometimes behave in different ways. The development of new technologies aimed at the simultaneous capture of multiple data types, along with their regulatory dynamics in single cells, would help to tackle these issues.

    A further challenge is identifying how the genomic ingredients are combined to assemble the gene networks and biochemical pathways that carry out complex functions, such as cell-to-cell communication, which enable organs and tissues to develop. An even greater challenge will be to use the rapidly growing body

    “These findings force a rethink of the definition of a gene and of the minimum unit of heredity.”ENCODEEncyclopedia of DNA Elementsnature.com/encode

    © 2012 Macmillan Publishers Limited. All rights reserved

    RESEARCH

    NEWS & VIEWS

    6 SEPTEMBER 2012 | VOL 489 | NATURE | 53

    of data from genome-sequencing projects to understand the range of human phenotypes (traits), from normal developmental processes, such as ageing, to disorders such as Alzheimer’s disease10.

    Achieving these ambitious goals may require a parallel investment of functional studies using simpler organisms — for example, of the type that might be found scampering around the floor, snatching up crumbs in the chefs’ kitchen. All in all, however, the ENCODE project has served up an all-you-can-eat feast of genomic data that we will be digesting for some time. Bon appétit!

    Joseph R. Ecker is at the Howard Hughes Medical Institute and the Salk Institute for Biological Studies, La Jolla, California 92037, USA.

    e-mail: ecker@salk.eduNucleosomeHistoneChromatinmodicationsLong-rangechromatin interactionsFunctionalgenomicelementsDNase IhypersensitivesitesDNA methylationChromosomeDNALong-rangeregulatoryelementsProtein-codingand non-codingtranscriptsPromoterarchitectureTranscriptionfactorTranscriptionmachineryTranscription-factorbinding sitesTranscribed region

    Figure 1 | Beyond the sequence. The ENCODE project2–7 provides information on the human genome far beyond that contained within the DNA sequence — it describes the functional genomic elements that orchestrate the development and function of a human. The project contains data about the degree of DNA methylation and chemical modifications to histones that can influence the rate of transcription of DNA into RNA molecules (histones are the proteins around which DNA is wound to form chromatin). ENCODE also examines long-range chromatin interactions, such as looping, that alter the relative proximities of different chromosomal regions in three dimensions and also affect transcription. Furthermore, the project describes the binding activity of transcription-factor proteins and the architecture (location and sequence) of gene-regulatory DNA elements, which include the promoter region upstream of the point at which transcription of an RNA molecule begins, and more distant (long-range) regulatory elements. Another section of the project was devoted to testing the accessibility of the genome to the DNA-cleavage protein DNase I. These accessible regions, called DNase I hypersensitive sites, are thought to indicate specific sequences at which the binding of transcription factors and transcription-machinery proteins has caused nucleosome displacement. In addition, ENCODE catalogues the sequences and quantities of RNA transcripts, from both non-coding and protein-coding regions.

    Expression control

    WENDY A. BICKMORE

    Once the human genome had been sequenced, it became apparent that an encyclopaedic knowledge of chromatin organization would be needed if we were to understand how gene expression is regulated. The ENCODE project goes a long way to achieving this goal and highlights the pivotal role of transcription factors in sculpting the chromatin landscape.

    Although some of the analyses largely confirm conclusions from previous smaller-scale studies, this treasure trove of genome-wide data provides fresh insight into regulatory pathways and identifies prodigious numbers of regulatory elements. This is particularly so for Thurman and colleagues’ data4 regarding DNase I hypersensitive sites (DHSs) and for Gerstein and colleagues’ results6 concerning DNA binding of transcription factors. DHSs are genomic regions that are accessible to enzymatic cleavage as a result of the displacement of nucleosomes (the basic units of chromatin) by DNA-binding proteins (Fig. 1). They are the hallmark of cell-type-specific enhancers, which are often located far away from promoters.

    The ENCODE papers expose the profusion of DHSs — more than 200,000 per cell type, far outstripping the number of promoters — and their variability between cell types. Through the simultaneous presence in the same cell type of a DHS and a nearby active promoter, the researchers paired half a million enhancers with their probable target genes. But this leaves

    © 2012 Macmillan Publishers Limited. All rights reserved

    RESEARCH

    NEWS & VIEWS

    more than 2 million putative enhancers without known targets, revealing the enormous expanse of the regulatory genome landscape that is yet to be explored. Chromosome-conformation-capture methods that detect long-range physical associations between distant DNA regions are attempting to bridge this gap. Indeed, Sanyal and colleagues7 applied these techniques to survey such associations across 1% of the genome.

    The ENCODE data start to paint a picture of the logic and architecture of transcriptional networks, in which DNA binding of a few high-affinity transcription factors displaces nucleosomes and creates a DHS, which in turn facilitates the binding of further, lower-affinity factors. The results also support the idea that transcription-factor binding can block DNA methylation (a chemical modification of DNA that affects gene expression), rather than the other way around — which is highly relevant to the interpretation of disease-associated sites of altered DNA methylation11.

    The exquisite cell-type specificity of regulatory elements revealed by the ENCODE studies emphasizes the importance of having appropriate biological material on which to test hypotheses. The researchers have focused their efforts on a set of well-established cell lines, with selected assays extended to some freshly isolated cells. Challenges for the future include following the dynamic changes in the regulatory landscape during specific developmental pathways, and understanding chromatin structure in tissues containing heterogeneous cell populations.

    Wendy A. Bickmore is in the Medical Research Council Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK.

    e-mail: wendy.bickmore@igmm.ed.ac.uk 

    “The results imply that sequencing studies focusing on protein-coding sequences risk missing crucial parts of the genome.”

    11 Years Ago

    The draft human genome

    OUR GENOME UNVEILED

    Unless the human genome contains a lot of genes that are opaque to our computers, it is clear that we do not gain our undoubted complexity over worms and plants by using many more genes. Understanding what does give us our complexity — our enormous behavioural repertoire, ability to produce conscious action, remarkable physical coordination (shared with other vertebrates), precisely tuned alterations in response to external variations of the environment, learning, memory … need I go on? — remains a challenge for the future.

    David Baltimore

    From Nature 15 February 2001

    GENOME SPEAK

    With the draft in hand, researchers have a new tool for studying the regulatory regions and networks of genes. Comparisons with other genomes should reveal common regulatory elements, and the environments of genes shared with other species may offer insight into function and regulation beyond the level of individual genes. The draft is also a starting point for studies of the three-dimensional packing of the genome into a cell’s nucleus. Such packing is likely to influence gene regulation … The human genome lies before us, ready for interpretation.

    Peer Bork and Richard Copley

    From Nature 15 February 2001

    Non-codingbut functional

    INÊS BARROSO

    The vast majority of the human genome does not code for proteins and, until now, did not seem to contain defined gene-regulatory elements. Why evolution would maintain large amounts of ‘useless’ DNA had remained a mystery, and seemed wasteful. It turns out, however, that there are good reasons to keep this DNA. Results from the ENCODE project2–8 show that most of these stretches of DNA harbour regions that bind proteins and RNA molecules, bringing these into positions from which they cooperate with each other to regulate the function and level of expression of protein-coding genes. In addition, it seems that widespread transcription from non-coding DNA potentially acts as a reservoir for the creation of new functional molecules, such as regulatory RNAs.

    What are the implications of these results for genetic studies of complex human traits and disease? Genome-wide association studies (GWAS), which link variations in DNA sequence with specific traits and diseases, have in recent years become the workhorse of the field, and have identified thousands of DNA variants associated with hundreds of complex traits (such as height) and diseases (such as diabetes). But association is not causality, and identifying those variants that are causally linked to a given disease or trait, and understanding how they exert such influence, has been difficult. Furthermore, most of these associated variants lie in non-coding regions, so their functional effects have remained undefined.

    The ENCODE project provides a detailed map of additional functional non-coding units in the human genome, including some that have cell-type-specific activity. In fact, the catalogue contains many more functional non-coding regions than genes. These data show that results of GWAS are typically enriched for variants that lie within such non-coding functional units, sometimes in a cell-type-specific manner that is consistent with certain traits, suggesting that many of these regions could be causally linked to disease. Thus, the project demonstrates that non-coding regions must be considered when interpreting GWAS results, and it provides a strong motivation for reinterpreting previous GWAS findings. Furthermore, these results imply that sequencing studies focusing on protein-coding sequences (the ‘exome’) risk missing crucial parts of the genome and the ability to identify true causal variants.

    However, although the ENCODE catalogues represent a remarkable tour de force, they contain only an initial exploration of the depths of our genome, because many more cell types must yet be investigated. Some of the remaining challenges for scientists searching for causal disease variants lie in: accessing data derived from cell types and tissues relevant to the disease under study; understanding how these functional units affect genes that may be distantly located7; and the ability to generalize such results to the entire organism.

    Inês Barroso is at the Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK, and at the University of Cambridge Metabolic Research Laboratories and NIHR Cambridge Biomedical Research Centre, Cambridge, UK.e-mail: ib1@sanger.ac.uk5 4 | N AT U R E | VO L 4 8 9 | 6 S E P T E M B E R 2 0 1 2

    © 2012 Macmillan Publishers Limited. All rights reserved

    Evolution and the code

    JONATHAN K. PRITCHARD & YOAV GILAD

    One of the great challenges in evolutionary biology is to understand how differences in DNA sequence between species determine differences in their phenotypes. Evolutionary change may occur both through changes in protein-coding sequences and through sequence changes that alter gene regulation.

    There is growing recognition of the importance of this regulatory evolution, on the basis of numerous specific examples as well as on theoretical grounds. It has been argued that potentially adaptive changes to protein-coding sequences may often be prevented by natural selection because, even if they are beneficial in one cell type or tissue, they may be detrimental elsewhere in the organism. By contrast, because gene-regulatory sequences are frequently associated with temporally and spatially specific gene-expression patterns, changes in these regions may modify the function of only certain cell types at specific times, making it more likely that they will confer an evolutionary advantage12.

    However, until now there has been little information about which genomic regions have regulatory activity. The ENCODE project has provided a first draft of a ‘parts list’ of these regulatory elements, in a wide range of cell types, and moves us considerably closer to one of the key goals of genomics: understanding the functional roles (if any) of every position in the human genome.

    Nonetheless, it will take a great deal of work to identify the critical sequence changes in the newly identified regulatory elements that drive functional differences between humans and other species. There are some precedents for identifying key regulatory differences (see, for example, ref. 13), but ENCODE’s improved identification of regulatory elements should greatly accelerate progress in this area. The data may also allow researchers to begin to identify sequence alterations occurring simultaneously in multiple genomic regions, which, when added together, drive phenotypic change — a process called polygenic adaptation14.

    However, despite the progress brought by the ENCODE consortium and other research groups, it remains difficult to discern with confidence which variants in putative regulatory regions will drive functional changes, and what these changes will be. We also still have an incomplete understanding of how regulatory sequences are linked to target genes. Furthermore, the ENCODE project focused mainly on the control of transcription, but many aspects of post-transcriptional regulation, which may also drive evolutionary changes, are yet to be fully explored.

    Nonetheless, these are exciting times for studies of the evolution of gene regulation. With such new resources in hand, we can expect to see many more descriptions of adaptive regulatory evolution, and how this has contributed to human evolution.

    Jonathan K. Pritchard and Yoav Gilad are in the Department of Human Genetics, University of Chicago, Chicago 60637 Illinois, USA. J.K.P. is also at the Howard Hughes Medical Institute, University of Chicago.

    e-mails: pritch@uchicago.edu; gilad@uchicago.edu 

    From catalogue to function

    ERAN SEGAL

    Projects that produce unprecedented amounts of data, such as the human genome project15 or the ENCODE project, present new computational and data-analysis challenges and have been a major force driving the development of computational methods in genomics. The human genome project produced one bit of information per DNA base pair, and led to advances in algorithms for sequence matching and alignment. By contrast, in its 1,640 genome-wide data sets, ENCODE provides a profile of the accessibility, methylation, transcriptional status, chromatin structure and bound molecules for every base pair. Processing the project’s raw data to obtain this functional information has been an immense effort.

    For each of the molecular-profiling methods used, the ENCODE researchers devised novel processing algorithms designed to remove outliers and protocol-specific biases, and to ensure the reliability of the derived functional information. These processing pipelines and quality-control measures have been adapted by the research community as the standard for the analysis of such data. The high quality of the functional information they produce is evident from the exquisite detail and accuracy achieved, such as the ability to observe the crystallographic topography of protein–DNA interfaces in DNase I footprints5, and the observation of more than one-million-fold variation in dynamic range in the concentrations of different RNA transcripts3.

    But beyond these individual methods for data processing, the profound biological insights of ENCODE undoubtedly come from computational approaches that integrated multiple data types. For example, by combining data on DNA methylation, DNA accessibility and transcription-factor expression. Thurman et al.4 provide fascinating insight into the causal role of DNA methylation in gene silencing. They find that transcription-factor binding sites are, on average, less frequently methylated in cell types that express those transcription factors, suggesting that binding-site methylation often results from a passive mechanism that methylates sites not bound by transcription factors.

    Despite the extensive functional information provided by ENCODE, we are still far from the ultimate goal of understanding the function of the genome in every cell of every person, and across time within the same person. Even if the throughput rate of the ENCODE profiling methods increases dramatically, it is clear that brute-force measurement of this vast space is not feasible. Rather, we must move on from descriptive and correlative computational analyses, and work towards deriving quantitative models that integrate the relevant protein, RNA and chromatin components. We must then describe how these components interact with each other, how they bind the genome and how these binding events regulate transcription.

    If successful, such models will be able to predict the genome’s function at times and in settings that have not been directly measured. By allowing us to determine which assumptions regarding the physical interactions of the system lead to models that better explain measured patterns, the ENCODE data provide an invaluable opportunity to address this next immense computational challenge. ■

    Eran Segal is in the Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.

    e-mail: eran.segal@weizmann.ac.il

    1. The ENCODE Project Consortium Science 306, 636–640 (2004).

    2. The ENCODE Project Consortium Nature 489, 57–74 (2012).

    3. Djebali, S. et al. Nature 489, 101–108 (2012).

    4. Thurman, R. E. et al. Nature 489, 75–82 (2012).

    5. Neph, S. et al. Nature 489, 83–90 (2012).

    6. Gerstein, M. B. et al. Nature 489, 91–100 (2012).

    7. Sanyal, A., Lajoie, B., Jain, G. & Dekker, J. Nature 489, 109–113 (2012).

    8. Birney, E. et al. Nature 447, 799–816 (2007).

    9. Mardis, E. R. Nature 470, 198–203 (2011).

    10. Gonzaga-Jauregui, C., Lupski, J. R. & Gibbs, R. A. Annu. Rev. Med. 63, 35–61 (2012).

    11. Sproul, D. et al. Proc. Natl Acad. Sci. USA 108, 4364–4369 (2011).

    12. Carroll, S. B. Cell 134, 25–36 (2008).

    13. Prabhakar, S. et al. Science 321, 1346–1350 (2008).

    14. Pritchard, J. K., Pickrell, J. K. & Coop, G. Curr. Biol. 20, R208–R215 (2010).

    15. Lander, E. S. et al. Nature 409, 860–921 (2001).

    “The high quality of the functional information produced is evident from the exquisite detail and accuracy achieved.” 

    6 S E P T E M B E R 2 0 1 2 | VO L 4 8 9 | N AT U R E | 5 5 NEWS & VIEWS RESEARCH © 2012 Macmillan Publishers Limited. All rights reserved

    http://www.sciencemag.org SCIENCE VOL 337 7 SEPTEMBER 2012 1159

    NEWS&ANALYSIS

    When researchers fi rst sequenced the human

    genome, they were astonished by how few

    traditional genes encoding proteins were

    scattered along those 3 billion DNA bases.

    Instead of the expected 100,000 or more

    genes, the initial analyses found about 35,000

    and that number has since been whittled down

    to about 21,000. In between were megabases

    of “junk,” or so it seemed.

    This week, 30 research papers, including

    six in Nature and additional papers published

    by Science, sound the death knell for

    the idea that our DNA is mostly littered with

    useless bases. A decadelong project, the

    Encyclopedia of DNA Elements (ENCODE),

    has found that 80% of the human genome

    serves some purpose, biochemically speaking.

    “I don’t think anyone would have anticipated

    even close to the amount of sequence

    that ENCODE has uncovered that looks like

    it has functional importance,” says John A.

    Stamatoyannopoulos, an ENCODE re searcher

    at the University of Washington, Seattle.

    Beyond defi ning proteins, the DNA bases

    highlighted by ENCODE specify landing

    spots for proteins that infl uence gene activity,

    strands of RNA with myriad roles, or

    simply places where chemical modifi cations

    serve to silence stretches of our chromosomes.

    These results are going “to change

    the way a lot of [genomics] concepts are

    written about and presented in textbooks,”

    Stamatoyannopoulos predicts.

    The insights provided by ENCODE into

    how our DNA works are already clarifying

    genetic risk factors for a variety of diseases

    and offering a better understanding of gene

    regulation and function. “It’s a treasure trove

    of information,” says Manolis Kellis, a computational

    biologist at Massachusetts Institute

    of Technology (MIT) in Cambridge who analyzed

    data from the project.

    The ENCODE effort has revealed that

    a gene’s regulation is far more complex

    than previously thought, being infl uenced

    by multiple stretches of regulatory DNA

    located both near and far from the gene

    itself and by strands of RNA not translated

    into proteins, so-called noncoding RNA.

    “What we found is how beautifully complex

    the biology really is,” says Jason Lieb,

    an ENCODE researcher at the University of

    North Carolina, Chapel Hill.

    Throughout the 1990s, various researchers

    called the idea of junk DNA into question.

    With the human genome in hand, the

    National Human Genome Research Institute

    (NHGRI) in Bethesda, Maryland, decided it

    wanted to fi nd out once and for all how much

    of the genome was a wasteland with no functional

    purpose. In 2003, it funded a pilot

    ENCODE, in which 35 research teams analyzed

    44 regions of the genome—30 million

    bases in all, about 1% of the total genome. In

    2007, the pilot project’s results revealed that

    much of this DNA sequence was active in

    some way. The work called into serious question

    our gene-centric view of the genome,

    fi nding extensive RNA-generating activity

    beyond traditional gene boundaries (Science,

    15 June 2007, p. 1556). But the question

    remained whether the rest of the genome was

    like this 1%. “We want to know what all the

    bases are doing,” says Yale University bioinformatician

    Mark Gerstein.

    Teams at 32 institutions worldwide have

    now carried out scores of tests, generating

    1640 data sets. While the pilot phase tests

    depended on computer chip–like devices

    called microarrays to analyze DNA samples,

    the expanded phase benefi ted from the arrival

    of new sequencing technology, which made it

    cost-effective to directly read the DNA bases.

    Taken together, the tests present “a greater

    idea of what the landscape of the genome

    looks like,” says NHGRI’s Elise Feingold.

    Because the parts of the genome used

    could differ among various kinds of cells,

    ENCODE needed to look at DNA function

    in multiple types of cells and tissues. At

    fi rst the goal was to study intensively three

    types of cells. They included GM12878, the

    immature white blood cell line used in the

    1000 Genomes Project, a large-scale effort to

    catalog genetic variation across humans; a leukemia

    cell line called K562; and an approved

    human embryonic stem cell line, H1-hESC.

    As ENCODE was ramping up, new

    sequencing technology brought the cost of

    sequencing down enough to make it feasible

    to test extensively even more cell types.

    ENCODE added a liver cancer cell line,

    HepG2; the laboratory workhorse cancer cell

    line, HeLa S3; and human umbilical cord tissue

    to the mix. Another 140 cell types were

    studied to a much lesser degree.

    In these cells, ENCODE researchers

    closely examined which DNA bases are transcribed

    into RNA and then whether those

    strands of RNA are subsequently translated

    into proteins, verifying predicted proteincoding

    genes and more precisely locating

    each gene’s beginning, end, and coding

    regions. The latest protein-coding gene count

    is 20,687, with hints of about 50 more, the

    consortium reports in Nature. Those genes

    account for about 3% of the human genome,

    less if one counts only their coding regions.

    Another 11,224 DNA stretches are classifi ed

    as pseudogenes, “dead” genes now known to

    be active in some cell types or individuals.

    ENCODE Project Writes Eulogy

    For Junk DNA

    GENOMICS

    Hypersensitive

    sites

    CH3CO

    CH3

    Long-range regulatory elements

    (enhancers, repressors/

    silencers, insulators)

    cis-regulatory elements

    (promoters, transcription

    factor binding sites)

    Gene Transcript

    RNA

    polymerase

    CH3CO (Epigenetic modifications)

    ChIP-seq

    Computational

    predictions and

    RT-PCR

    RNA-seq

    DNase-seq

    FAIRE-seq

    5C

    Zooming in. A diagram of DNA in ever-greater detail shows how ENCODE’s various tests (gray boxes) translate

    DNA’s features into functional elements along a chromosome.

    CREDIT: ADAPTED FROM THE ENCODE PROJECT CONSORTIUM, PLOS BIOLOGY 9, 4 (APRIL 2011)

    Published by AAAS

    Downloaded from http://www.sciencemag.org on September 10, 2012

    http://www.sciencemag.org SCIENCE VOL 337 7 SEPTEMBER 2012 1161

    NEWS&ANALYSIS

    ENCODE drives home, however, that

    there are many “genes” out there in which

    DNA codes for RNA, not a protein, as the end

    product. The big surprise of the pilot project

    was that 93% of the bases studied were transcribed

    into RNA; in the full genome, 76%

    is transcribed. ENCODE defi ned 8800 small

    RNA molecules and 9600 long noncoding

    RNA molecules, each of which is at least 200

    bases long. Thomas Gingeras of Cold Spring

    Harbor Laboratory in New York has found

    that various ones home in on different cell

    compartments, as if they have fi xed addresses

    where they operate. Some go to the nucleus,

    some to the nucleolus, and some to the cytoplasm,

    for example. “So there’s quite a lot

    of sophistication in how RNA works,” says

    Ewan Birney of the European Bioinformatics

    Institute in Hinxton, U.K., one of the key leaders

    of ENCODE (see p. 1162).

    As a result of ENCODE, Gingeras and

    others argue that the fundamental unit of

    the genome and the basic unit of heredity

    should be the transcript—the piece of

    RNA decoded from DNA—and not the

    gene. “The project has played an important

    role in changing our concept of the gene,”

    Stamatoyannopoulos says.

    Another way to test for functionality of

    DNA is to evaluate whether specific base

    sequences are conserved between species, or

    among individuals in a species. Previous studies

    have shown that 5% of the human genome

    is conserved across mammals, even though

    ENCODE studies implied that much more

    of the genome is functional. So MIT’s Lucas

    Ward and Kellis compared functional regions

    newly identifi ed by ENCODE among multiple

    humans, sampling from the

    1000 Genomes Project. Some

    DNA sequences not conserved

    between humans and other

    mammals were nonetheless

    very much preserved across

    multiple people, indicating

    that an additional 4% of the

    genome is newly under selection

    in the human lineage, they

    report in a paper published

    online by Science (http://scim.

    ag/WardKellis). Two such regions were near

    genes for nerve growth and the development

    of cone cells in the eye, which underlie distinguishing

    traits in humans. On the fl ip side,

    they also found that some supposedly conserved

    regions of the human genome, as highlighted

    by the comparison with 29 mammals,

    actually varied among humans, suggesting

    these regions were no longer functional.

    Beyond transcription, DNA’s bases function

    in gene regulation through their interactions

    with transcription factors and other

    proteins. ENCODE carried out several tests

    to map where those proteins bind along the

    genome (Science, 25 May 2007, p. 1120). Two,

    DNase-seq and FAIRE-seq, gave an overview

    of the genome, identifying where the protein-

    DNA complex chromatin unwinds and a protein

    can hook up with the DNA, and were

    applied to multiple cell types. ENCODE’s

    DNase-seq found 2.89 million such sites

    in 125 cell types. Stamatoyannopoulos and

    his colleagues describe their more extensive

    DNase-seq studies in Science (p. 1190): His

    team examined 349 types of cells, including

    233 60- to 160-day-old fetal tissue samples.

    Each type of cell had about 200,000 accessible

    locations, and there seemed to be at least

    3.9 million regions where transcription factors

    can bind in the genome. Across all cell

    types, about 42% of the genome can be accessible,

    he and his colleagues report. In many

    cases, the assays were able to pinpoint the specifi

    c bases involved in binding.

    Last year, Stamatoyannopoulos showed

    that these newly discovered functional regions

    sometimes overlap with specifi c DNA bases

    linked to higher or lower risks of various diseases,

    suggesting that the regulation of genes

    might be at the heart of these risk variations

    (Science, 27 May 2011, p. 1031). The work

    demonstrated how researchers could use

    ENCODE data to come up with new hypotheses

    about the link between genetics and a

    particular disorder. (The ENCODE analysis

    found that 12% of these bases, or SNPs,

    colocate with transcription factor binding

    sites and 34% are in open chromatin defi ned

    by the DNase-seq tests.) Now, in their new

    work published in Science,

    Stamatoyannopoulos’s lab has

    linked those regulatory regions

    to their specifi c target genes,

    homing in on the risk-enhancing

    ones. In addition, the group

    fi nds it can predict the cell type

    involved in a given disease.

    For example, the analysis fi ngered

    two types of T cells as

    pathogenic in Crohn’s disease,

    both of which are involved in

    this inflammatory bowel disorder. “We are

    informing disease studies in a way that would

    be very hard to do otherwise,” Birney says.

    Another test, called ChIP-seq, uses an

    antibody to home in on a particular DNAbinding

    protein and helps pinpoint the locations

    along the genome where that protein

    works. To date, ENCODE has examined

    about 100 of the 1500 or so transcription

    factors and about 20 other DNA binding

    proteins, including those involved in modifying

    the chromatin-associated proteins

    called histones. The binding sites found

    through ChIP-seq coincided with the sites

    mapped through FAIRE-seq and DNAseseq.

    Overall, 8% of the genome falls within

    a transcription factor binding site, a percentage

    that is expected to double once more

    transcription factors have been tested.

    Yale’s Gerstein used these results to fi gure

    out all the interactions among the transcription

    factors studied and came up with a network

    view of how these regulatory proteins

    work. These transcription factors formed a

    three-layer hierarchy, with the ones at the top

    having the broadest effects and the ones in

    the middle working together to coregulate a

    common target gene, he and his colleagues

    report in Nature.

    Using a technique called 5C, other

    researchers looked for places where DNA

    from distant regions of a chromosome, or

    even different chromosomes, interacted. It

    found that an average of 3.9 distal stretches

    of DNA linked up with the beginning of each

    gene. “Regulation is a 3D puzzle that has to

    be put together,” Gingeras says. “That’s what

    ENCODE is putting out on the table.”

    To date, NHGRI has put $288 million

    toward ENCODE, including the pilot project,

    technology development, and ENCODE

    efforts for the mouse, nematode, and fruit fl y.

    All together, more than 400 papers have been

    published by ENCODE researchers. Another

    110 or more studies have used ENCODE data,

    says NHGRI molecular biologist Michael

    Pazin. Molecular biologist Mathieu Lupien of

    the University of Toronto in Canada authored

    one of those papers, a study looking at epigenetics

    and cancer. “ENCODE data were

    fundamental” to the work, he says. “The cost

    is defi nitely worth every single dollar.”

    –ELIZABETH PENNISI

    ENCODE By the Numbers

    147 cell types studied

    80% functional portion of human genome

    20,687 protein-coding genes

    18,400 RNA genes

    1640 data sets

    30 papers published this week

    442 researchers

    $288 million funding for pilot,

    technology, model organism, and current project

    “ We are informing

    disease studies in a

    way that would be

    very hard to do

    otherwise.”

    —EWAN BIRNEY,

    EUROPEAN BIOINFORMATICS

    INSTITUTE

    Published by AAAS

    Downloaded from http://www.sciencemag.org on September 10, 2012

    http://www.nature.com/encode/

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

Genomics and the State of Science Clarity

Projects supported by the US National Institutes of Health will have produced 68,000 total human genomes — around 18,000 of those whole human genomes — through the end of this year, National Human Genome Research Institute estimates indicate. And in his book, The Creative Destruction of Medicine, the Scripps Research Institute‘s Eric Topol projects that 1 million human genomes will have been sequenced by 2013 and 5 million by 2014.

“There’s a lot of inventory out there, and these things are being generated at a fiendish rate,” says Daniel MacArthur, a group leader in Massachusetts General Hospital‘s Analytic and Translational Genetics Unit. “From a capacity perspective … millions of genomes are not that far off. If you look at the rate that we’re scaling, we can certainly achieve that.”

The prospect of so many genomes has brought clinical interpretation into focus — and for good reason. Save for regulatory hurdles, it seems to be the single greatest barrier to the broad implementation of genomic medicine.

But there is an important distinction to be made between the interpretation of an apparently healthy person’s genome and that of an individual who is already affected by a disease, whether known or unknown.

In an April Science Translational Medicine paper, Johns Hopkins University School of Medicine‘s Nicholas Roberts and his colleagues reported that personal genome sequences for healthy monozygotic twin pairs are not predictive of significant risk for 24 different diseases in those individuals. The researchers then concluded that whole-genome sequencing was not likely to be clinically useful for that purpose. (See sidebar, story end.)

“The Roberts paper was really about the value of omniscient interpretation of whole-genome sequences in asymptomatic individuals and what were the likely theoretical limits,” says Isaac Kohane, chair of the informatics program at Children’s Hospital Boston. “That was certainly an important study, and it was important to establish what those limits of knowledge are in asymptomatic populations. But, in fact, the major and most important use cases [for whole-genome sequencing] may be in cases of disease.”

Still, targeted clinical interpretations are not cut and dried. “Even in cases of disease, it’s not clear that we know now how to look across multiple genes and figure out which are relevant, which are not,” Kohane adds.

While substantial progress has been made — in particular, for genetic diseases, including certain cancers — ambiguities have clouded even the most targeted interpretation efforts to date. Technological challenges, meager sample sizes, and a need for increased, fail-safe automation all have hampered researchers’ attempts to reliably interpret the clinical significance of genomic variation. But perhaps the greatest problem, experts say, is a lack of community-wide standards for the task.

Genes to genomes

When scientists analyzed James Watson’s genome — his was the first personal sequence, completed in 2007 and published in Nature in 2008 — they were surprised to find that he harbored two putative homozygous SNPs matching Human Gene Mutation Database entries that, were they truly homozygous, would have produced severe clinical pheno-types.

But Watson was not sick.

As researchers search more and more genomes, such inconsistencies are increasingly common.

“My take on what has happened is that the people who were doing the interpretation of the raw sequence largely were coming from a SNPs world, where they were thinking about sequence variants that have been observed before, or that have an appreciable frequency, and weren’t thinking very much about the single-ton sequence variants,” says Sean Tavtigian, associate professor of oncology at the University of Utah.

“There is a qualitative difference between looking at whole-genome sequences and looking at single genes or, even more typically, small numbers of variants that have been previously implicated in a disease,” Boston’s Kohane adds.
“Previously, because of the cost and time limitations around sequencing and genotyping, we only looked at variants in genes for which we had a clinical indication. Now, since we can essentially see that in the near future we will be able to do a full genome sequence for essentially the same cost as just a focused set-of-variants test, all of the sudden we have to ask ourselves: What is the meaning of variants that fall outside where we would have ordinarily looked for a given disease or, in fact, if there is no disease at all?”

Mass General’s MacArthur says it has been difficult to pinpoint causal variants because they are enriched for both sequencing and annotation errors. “In the genome era, we can generate those false positives at an amazing rate, and we need to work hard to filter them back out,” he says.

“Clinical geneticists have been working on rare diseases for a long time, and have identified many genes, and are used to working in a world where there is sequence data available only from, say, one gene with a strong biological hypothesis. Suddenly, they’re in this world where they have data from patients on all 20,000 genes,” MacArthur adds. “There’s a fundamental mind-shift there, in shifting from one gene through to every gene. My impression is that the community as a whole hasn’t really internalized that shift; people still have a sense in their head that if you see a strongly damaging variant that segregates with the disease, and maybe there’s some sort of biological plausibility around it as well, that that’s probably the causal variant.”

Studies have shown that that’s not necessarily so. Because of this, “I do worry that in the next year or so we’ll see increasing numbers of mutations published that later prove to just be benign polymorphisms,” MacArthur adds.

“The meaning of whole-genome -sequence I think is very much front-and-center of where genomics is going to go. What is the true, clinical meaning? What is the interpretation? And, there’s really a double-edged sword,” Kohane says. On one hand, “if you only focus on the genes that you believe are relevant to the condition you’re studying, then you might miss some important findings,” he says. Conversely, “if you look at every-thing, the likelihood of a false positive becomes very, very high. Because, if you look at enough things, invariably you will find something abnormal,” he adds.

False positives are but one of the several challenges scientists working to analyze genomes in a clinical context face.

Technical difficulties

That advances in sequencing technologies are far outstripping researchers’ abilities to analyze the data they produce has become a truism of the field. But current sequencing platforms are still far from perfect, making most analyses complicated and nuanced. Among other things, improvements in both read length and quality are needed to enable accurate and reproducible interpretations.

“The most promising thing is the rate at which the cost-per-base-pair of massively parallel sequencing has dropped,” Utah’s Tavtigian says. Still, the cost of clinical sequencing is not inconsequential. “The $1,000, $2,000, $3,000 whole-genome sequences that you can do right now do not come anywhere close to 99 percent probability to identify a singleton sequence variant, especially a biologically severe singleton sequence variant,” he says. “Right now, the real price of just the laboratory sequencing to reach that quality is at least $5,000, if not $10,000.”

However, Tavtigian adds, “techniques for multiplexing many samples into a channel for sequencing have come along. They’re not perfect yet, but they’re going to improve over the next year or so.”

Using next-generation sequencing platforms, researchers have uncovered a variety of SNPs, copy-number variants, and small indels. But to MacArthur’s mind, current read lengths are not up to par when it comes to clinical-grade sequencing, and they have made supernumerary quality-control measures necessary.

“There’s no question that we’re already seeing huge improvements. … And as we add in to that changes in technology — for instance much, much longer sequencing reads, more accurate reads, possibly combining different platforms — I think these sorts of [quality-control] issues will begin to go away over the next couple of years,” MacArthur says. “But at this stage, there is still a substantial quality-control component in any sort of interpretation process. We don’t have perfect genomes.”

In a 2011 Nature Biotechnology paper, Stanford University’s Michael Snyder and his colleagues sought to examine the accuracy and completeness of single-nucleotide variant and indel calls from both the Illumina and Complete Genomics platforms by sequencing the genome of one individual using both technologies. Though the researchers found that more than 88 percent of the unique single-nucleotide variants they detected were concordant between the two platforms, only around one-quarter of the indel calls they generated matched up. Overall, the authors reported having found tens of thousands of platform-specific variant calls, around 60 percent of which they later validated by genotyping array.

For clinical sequencing to ever become widespread, “we’re going to have to be able to show the same reproducibility and test characteristic modification as we have for, let’s say, an LDL cholesterol level,” Boston’s Kohane says. “And if you measure it in one place, it should not be too different from another place. … Even before we can get to the clinical meaning of the genomes, we’re going to have to get some industry-wide standards around quality of sequencing.”
Scripps’ Topol adds that when it comes to detecting rare variants, “there still needs to be a big upgrade in accuracy.”

Analytical issues

Beyond sequencing, technological advances must also be made on the analysis end. “The next thing, of course, is once you have better -accuracy … being able to do all of the analytical work,” Topol says. “We’re getting better at the exome, but every-thing outside of protein-coding -elements, there’s still a tremendous challenge.”

Indeed, that challenge has inspired another — a friendly competition among bioinformaticians working to analyze pediatric genomes in a pedigree study.

With enrollment closed and all sequencing completed, participants in the Children’s Hospital Boston-sponsored CLARITY Challenge have rolled up their shirtsleeves and begun to dig into the data — de-identified clinical summaries and exome or whole-genome sequences generated by Complete Genomics and Life Technologies for three children affected by rare diseases of unknown genetic basis, and their parents. According to its organizers, the competition aims to help set standards for genomic analysis and interpretation in a clinical setting, and for returning actionable results to clinicians and patients.

“A bunch of teams have signed up to provide clinical-grade reports that will be checked by a blue-ribbon panel of judges later this year to compare and contrast the different forms of clinical reporting at the genome-wide level,” Kohane says. The winning team will be announced this fall and will receive a $25,000 prize, he adds.

While the competition covers all aspects of clinical sequencing — from readout to reporting — it is important to recognize that, more generally, there may not be one right answer and that the challenges are far-reaching, affecting even the most basic aspects of analysis.

“There is a lot of algorithm investment still to be made in order to get very good at identifying the very rare or singleton sequence variants from the massively parallel sequencing reads efficiently, accurately, [and with] sensitivity,” Utah’s Tavtigian says.

Picking up a variant that has been seen before is one thing, but detecting a potentially causal, though as-yet-unclassified variant is a beast of another nature.

“Novel mutations usually need extensive knowledge but also validation. That’s one of the challenges,” says Zhongming Zhao, associate professor of biomedical informatics at Vanderbilt University. “Validation in terms of a disease study is most challenging right now, because it is very time-consuming, and usually you need to find a good number of samples with similar disease to show this is not by chance.”

Search for significance

Much like sequencing a human genome in the early- to mid-2000s was more laborious than it is now, genome interpretation has also become increasingly automated.

Beyond standard quality-control checks, the process of moving from raw data to calling variants is now semiautomatic. “There’s essentially no manual intervention required there, apart from running our eyes over [the calls], making sure nothing has gone horribly wrong,” says Mass General’s MacArthur. “The step that requires manual intervention now is all about taking that list of variants that comes out of that and looking at all the available biological data that exists on the Web, [coming] up with a short-list of genes, and then all of us basically have a look at all sorts of online resources to see if any of them have some kind of intuitive biological profile that fits with the disease we’re thinking about.”

Of course, intuitive leads are not foolproof, nor are current mutation data-bases. (See sidebar, story end.) And so, MacArthur says, “we need to start replacing the sort of intuitive biological approach with a much more data-informed approach.”

Developing such an approach hinges in part on having more genomes. “If we get thousands — tens of thousands — of people sequenced with various different phenotypes that have been crisply identified, that’s going to be so important because it’s the coupling of the processing of the data with having rare variants, structural variants, all the other genomic variations to understand the relationship of whole-genome sequence of any particular phenotype and a sequence variant,” Scripps’ Topol says.

Vanderbilt’s Zhao says that sample size is still an issue. “Right now, the number of samples in each whole-genome sequencing-based publication is still very limited,” he says. At the same time, he adds, “when I read peers’ grant applications, they are proposing more and more whole-genome sequencing.”

When it comes to disease studies, sequencing a whole swath of apparently healthy people is not likely to ever be worthwhile. According to Utah’s Tavtigian, “the place where it is cost-effective is when you test cases and then, if something is found in the case, go on and test all of the first-degree relatives of the case — reflex testing for the first-degree relatives,” he says. “If there is something that’s pathogenic for heart disease or colon cancer or whatever is found in an index case, then there is a roughly 50 percent chance that the first-degree relatives are going to carry the same thing, whereas if you go and apply that same test to someone in the general population, the probability that they carry something of interest is a lot lower.”

But more genomes, even familial ones, are not the only missing elements. To fill in the functional blanks, researchers require multiple data types.

“We’ve been pretty much sequence-centric in our thinking for many years now because that was where are the attention [was],” Topol says. “But that leaves the other ‘omes out there.”

From the transcriptome to the proteome, the metabolome, the microbiome, and beyond — Topol says that because all the ‘omes contribute to human health, they all merit review.

“The ability to integrate information about the other ‘omics will probably be a critical direction to understand the underpinnings of disease,” he says. “I call it the ‘panoromic’ view — that is really going to become a critical future direction once we can do those other ‘omics readily. We’re quite a ways off from that right now.”

Mass General’s MacArthur envisages “rolling in data from protein-protein interaction networks and tissue expression data — pulling all of these together into a model that predicts, given the phenotype, given the systems that appear to be disrupted by this variant, what are the most likely set of genes to be involved,” he says. From there, whittling that set down to putative causal variants would be simpler.

“And at the end of that, I think we’ll end up with a relatively small number of variants, each of which has a probability score associated with it, along with a whole host of additional information that a clinician can just drill down into in an intuitive way in making a diagnosis in that individual,” he adds.

According to MacArthur, “we’re already moving in this direction — in five years I think we will have made substantial progress toward that.” He adds, “I certainly think within five years we will be diagnosing the majority of severe genetic disease patients; the vast majority of those we’ll be able to assign a likely causal variant using this type of approach.”

Tavtigian, however, highlights a potential pitfall. While he says that “integration of those [multivariate] data helps a lot with assessing unclassified variants,” it is not enough to help clinicians ascertain causality. Functional assays, which can be both inconclusive and costly, will be needed for some unclassified variant hits, particularly those that are thought to be clinically meaningful.

“I don’t see how you’re going to do a functional assay for less than like $1,000,” he says. “That means that unless the cost of the sequencing test also includes a whole bunch of money for assessing the unclassified variants, a sequencing test is going to create more of a mess than it cleans up.”

Rare, common

Despite the challenges, there have been plenty of clinical sequencing success stories. Already, Scripps’ Topol says there have been “two big fronts in 2012: One is the unknown diseases [and] the other one, of course, is cancer.” But scientists say that despite the challenges, whole–genome sequencing might also become clinically useful for asymptomatic individuals in the future.

Down the line, scientists have their sights set on sequencing asymptomatic individuals to predict disease risk. “The long-term goal is to have any person walk off the street, be able to take a look at their genome and, without even looking at them clinically, say: ‘This is a person who will almost certainly have phenotype X,'” MacArthur says. “That is a long way away. And, of course, there are many phenotypes that can’t be predicted from genetic data alone.”

Nearer term, Boston’s Kohane imagines that newborns might have their genomes screened for a number of neonatal or pediatric conditions.

Overall, he says, it’s tough to say exactly where all of the chips might fall. “It’s going to be an interesting few years where the sequencing companies will be aligning themselves with laboratory testing companies and with genome interpretation companies,” Kohane says.

Even if clinical sequencing does not show utility for cases other than genetic diseases, it could still become common practice.

“Worldwide, there are certainly millions of people with severe diseases that would benefit from whole–genome sequencing, so the demand is certainly there,” MacArthur says. “It’s just a question of whether we can develop the infrastructure that is required to turn the research-grade genomes that we’re generating at the moment into clinical-grade genomes. Given the demand and the practical benefit of having this information … I don’t think there is any question that we will continue to drive, pretty aggressively, towards large-scale -genome sequencing.”

Kohane adds that “although rare diseases are rare, in aggregate they’re actually not — 5 percent of the population, or 1 in 20, is beginning to look common.”

Despite conflicting reports as to its clinical value, given the rapid declines in cost, Kohane says it’s possible that a whole-genome sequence could be less expensive than a CT scan in the next five years. Confident that many of the interpretation issues will be worked out by then, he adds, “this soon-to-be-very-inexpensive test will actually have a lot of clinical value in a variety of situations. I think it will become part the decision procedure of most doctors.”


[Sidebar] ‘Predictive Capacity’ Challenged

In Science Translational Medicine in April, Johns Hopkins University School of Medicine’s Nicholas Roberts and his colleagues showed that personal genome sequences for healthy monozygotic twin pairs are not predictive of significant risk for 24 different diseases in those individuals and concluded that whole-genome sequencing was unlikely to be useful for that purpose.

As the Scripps Research Institute’s Eric Topol says, that Roberts and his colleagues examined the predictive capacity of personal genome sequencing “without any genome sequences” was but one flaw of their interpretation.

In a comment appearing in the same journal in May, Topol elaborated on this criticism, and noted that the Roberts et al. study essentially showed nothing new. “We cannot know the predictive capacity of whole-genome sequencing until we have sequenced a large number of individuals with like conditions,” Topol wrote.

Elsewhere in the journal, Tel Aviv University’s David Golan and Saharon Rosset noted that slightly tweaking the gene-environment parameters of the mathematical model used by Roberts et al. showed that the “predictive capacity of genomes may be higher than their maximal estimates.”

Colin Begg and Malcolm Pike from Memorial Sloan-Kettering Cancer Center also commented on the study in Science Translational Medicine, reporting their -alternative calculation of the predictive capacity of personal sequencing and their analysis of cancer occurrence in the second breast of breast cancer patients, both of which, they wrote, “offer a more optimistic view of the predictive value of genetic data.”

In response to those comments, Bert Vogelstein — who co-authored the Roberts et al. study — and his colleagues wrote in Science Translational Medicine that their “group was the first to show that unbiased genome-wide sequencing could illuminate the basis for a hereditary disease,” adding that they are “acutely aware of its immense power to elucidate disease pathogenesis.” However, Vogelstein and his colleagues also said that recognizing the potential limitations of personal genome sequencing is important to “minimize false expectations and foster the most fruitful investigations.”


[Sidebar] ‘The Single Biggest Problem’

That there is currently no comprehensive, accurate, and openly accessible database of human disease-causing mutations “is the single greatest failure of modern human genetics,” Massachusetts General Hospital’s Daniel MacArthur says.

“We’ve invested so much effort and so much money in researching these Mendelian diseases, and yet we have never managed as a community to centralize all of those mutations in a single resource that’s actually useful,” MacArthur says. While he notes that several groups have produced enormously helpful resources and that others are developing more, currently “none covers anywhere close to the whole of the literature with the degree of detail that is required to make an accurate interpretation.”

Because of this, he adds, researchers are pouring time and resources into rehashing one another’s efforts and chasing down false leads.

“As anyone at the moment who is sequencing genomes can tell you, when you look at a person’s genome and you compare it to any of these databases, you find things that just shouldn’t be there — homozygous mutations that are predicted to be severe, recessive, disease-causing variants and dominant mutations all over the place, maybe a dozen or more, that they’ve seen in every genome,” MacArthur says. “Those things are clearly not what they claim to be, in the sense that a person isn’t sick.” Most often, he adds, the researchers who reported that variant as disease-causing were mistaken. Less commonly, the database moderators are at fault.

“The single biggest problem is that the literature contains a lot of noise. There are things that have been reported to be mutations that just aren’t. And, of course, a lot of the databases are missing a lot of mutations as well,” MacArthur adds. “Until we have a complete database of severe disease mutations that we can trust, genome interpretation will always be far more complicated than it should be.”

Tracy Vence is a senior editor of Genome Technology.

Source: 

http://www.genomeweb.com/node/1098636/

NIST Consortium Embarks on Developing ‘Meter Stick of the Genome’ for Clinical Sequencing

September 05, 2012

The National Institute of Standards and Technology has founded a consortium, called “Genome in a Bottle,” to develop reference materials and performance metrics for clinical human genome sequencing.

Following an initial workshop in April, consortium members – which include stakeholders from industry, academia, and the government – met at NIST last month to discuss details and timelines for the project.

The current aim is to have the first reference genome — consisting of genomic DNA for a specific human sample and whole-genome sequencing data with variant calls for that sample — available by the end of next year, and another, more complete version by mid-2014.

“At present, there are no widely accepted genomics standards or quantitative performance metrics for confidence in variant calling,” the consortium wrote in its work plan, which was discussed at the meeting. Its main motivation is “to develop widely accepted reference materials and accompanying performance metrics to provide a strong scientific foundation for the development of regulations and professional standards for clinical sequencing.”

“This is like the meter stick of the genome,” said Marc Salit, leader of the Multiplexed Biomolecular Science group in NIST’s Materials Measurement Laboratory and one of the consortium’s organizers. He and his colleagues were approached by several vendors of next-generation sequencing instrumentation about the possibility of generating standards for assessing the performance of next-gen sequencing in clinical laboratories. The project, he said, will focus on whole-genome sequencing but will also include targeted sequencing applications.

The consortium, which receives funding from NIST and the Food and Drug Administration, is open for anyone to participate. About 100 people, representing 40 to 50 organizations, attended last month’s meeting, among them representatives from Illumina, Life Technologies, Pacific Biosciences, Complete Genomics, the FDA, the Centers for Disease Control and Prevention, commercial and academic clinical laboratories, and a number of large-scale sequencing centers.

Four working groups will be responsible for different aspects of the project: a group led by Andrew Grupe at Celera will select and design the reference materials; a group headed by Elliott Margulies at Illumina will characterize the reference materials experimentally, using multiple sequencing platforms; Steve Sherry at the National Center for Biotechnology Information is heading a bioinformatics, data integration, and data representation group to analyze and represent the experimental data; and Justin Johnson from EdgeBio is in charge of a performance metrics and “figures of merit” group to help laboratories use the reference materials to characterize their own performance.

The reference materials will include both human genomic DNA and synthetic DNA that can be used as spike-in controls. Eventually, NIST plans to release the references as Standard Reference Materials that will be “internationally recognized as certified reference materials of higher order.”

According to Salit, there was some discussion at the meeting about what sample to select for a national reference genome. The initial plan was to use a HapMap sample – NA12878, a female from the CEPH pedigree from Utah – but it turned out that HapMap samples are consented for research use only and not for commercial use, for example in an in vitro diagnostic or for potential re-identification from sequence data.

The genome of NA12878 has already been extensively characterized, and the CDC is developing it as a reference for clinical laboratories doing targeted sequencing. “We were going to build on that momentum and make our first reference material the same genome,” Salit said. But because of the consent issues, NIST’s institutional review board and legal experts are currently evaluating whether the sample can be used.

In the meantime, consortium members have been “quite enthusiastic” about using samples from the Harvard University’s Personal Genome Project, which are broadly consented, Salit said.

The reference material working group issued a recommendation to develop a set of genomes from eight ethnically diverse parent-child trios as references, he said. For cancer applications, the references may also potentially include a tumor-normal pair.

The consortium will characterize all reference materials by several sequencing platforms. Several instrument vendors, as well as a couple of academic labs, have offered to contribute to data production. According to Justin Zook, a biomedical engineer at NIST and another organizer of the consortium, the current plan is to use sequencing technology from Illumina, Life Technologies, Complete Genomics, and – at least for the first genome – PacBio. Some of the sequencing will be done internally at NIST, which has Life Tech’s 5500 and Ion Torrent PGM available. In addition, the consortium might consider fosmid sequencing, which would provide phasing information and lower the error rate, as well as optical mapping to gain structural information, Zook said.

He and his colleagues have developed new methods for calling consensus variants from different data sets already available for the NA12878 sample, which they are planning to submit for publication in the near future. A fraction of the genotype calls will be validated using other methods, such as microarrays and Sanger sequencing. Consensus genotypes with associated confidence levels will eventually be released publicly as NIST Reference Data.

An important part of NIST’s work on the data analysis will be to develop probabilistic confidence estimates for the variant calls. It will also be important to distinguish between homozygous reference genotypes and areas in the genome “where you’re not sure what the genotype is,” Zook said, adding that this will require new data formats.

Coming up with confidence estimates for the different types of variants will be challenging, Zook said, particularly for indels and structural variants. Also, representing complex variants has not been standardized yet.

Several meeting participants called for “reproducible research and transparency in the analysis,” Salit said, and there were discussions about how to implement that at the technical level, including data archives so anyone can re-analyze the reference data.

One of the challenges will be to establish the infrastructure for hosting the reference data, which will require help from the NCBI, Salit said. Also, analyzing the data collaboratively is “not a solved problem,” and the consortium is looking into cloud computing services for that.

The consortium will also develop methods that describe how to use the reference materials to assess the performance of a particular sequencing method, including both experimental protocols and open source software for comparing genotypes. “We could throw this over the fence and tell someone, ‘Here is the genome and here is the variant table,'” Salit said, but, he noted, the consortium would like to help clinical labs use those tools to understand their own performance.

Edge Bio’s Johnson, who is chairing the working group in charge of this effort, is also involved in developing bioinformatic tools to judge the quality of genomes for the Archon Genomics X Prize (CSN 11/2/2011). Salit said that NIST is “leveraging some excellent work coming out of the X Prize” and is collaborating with a member of the X Prize team on the consensus genotype calling project.

By the end of 2013, the consortium wants to have its first “genome in a bottle” and reference data with SNV and maybe indel calls available, which will not yet include all confidence estimates. Another version, to be released in mid-2014, will include further analysis of error rates and uncertainties, as well as additional types of variants, such as structural variation.

Julia Karow tracks trends in next-generation sequencing for research and clinical applications for GenomeWeb’s In Sequenceand Clinical Sequencing News. E-mail her here or follow her GenomeWeb Twitter accounts at @InSequence and@ClinSeqNews.
Source:

At AACC, NHGRI’s Green Lays out Vision for Genomic Medicine

July 16, 2012

LOS ANGELES – The age of genomic medicine is within “striking distance,” Eric Green, director of the National Human Genome Research Institute, told attendees of the American Association of Clinical Chemistry’s annual meeting here on Sunday.

Speaking at the conference’s opening plenary session, Green discussed NHGRI’sroadmap for moving genomic findings into clinical practice. While this so-called “helix to healthcare” vision may take many years to fully materialize, “I predict absolutely that it’s coming,” he said.

Green noted that rapid advances in DNA sequencing have put genomics on a similar development path as clinical chemistry, which is also a technology-driven field. “If you look over the history of clinical chemistry, whenever there were technology advances, it became incredibly powerful and new opportunities sprouted up left and right,” he said.

Green likened next-gen sequencing to the autoanalyzers that “changed the face of clinical chemistry” by providing a generic platform that enabled a range of applications. In a similar fashion, low-cost sequencing is becoming a “general purpose technology” that can not only read out DNA sequence but can also provide information about RNA, epigenetic modifications, and other associated biology, he said.

The “low-hanging fruit” for genomic medicine is cancer, where molecular profiling is already being used alongside traditional histopathology to provide information on prognosis and to help guide treatment, he said.

Another area where Green said that genomic medicine is already bearing fruit is pharmacogenomics, where genomic data is proving useful in determining which patients will respond to specific drugs.

Nevertheless, while it’s clear that “sequencing is already altering the clinical landscape,” Green urged caution. “We have to manage expectations and realize it’s going to be many years from going from the most basic information about our genome sequence to actually changing medical care in any serious way,” he said.

In particular, he noted that the clinical interpretation of genomic data is still a challenge. Not only are the data volumes formidable, but the functional role of most variants is still unknown, he noted.

This knowledge gap should be addressed over the next several years as NHGRI and other organizations worldwide sequence “hundreds of thousands” of human genomes as part of large-scale research studies.

“We’re increasingly thinking about how to use that data to actually do clinical care, but I want to emphasize that the great majority of this data being generated will and should be part of research studies and not part of primary clinical care quite yet,” Green said.

Source:

http://www.genomeweb.com/sequencing/aacc-nhgris-green-lays-out-vision-genomic-medicine

Startup Aims to Translate Hopkins Team’s Cancer Genomics Expertise into Patient Care

May 16, 2012

Researchers at Johns Hopkins University who helped pioneer cancer genome sequencing have launched a commercial effort intended to translate their experience into clinical care.

Personal Genome Diagnostics, founded in 2010 by Victor Velculescu and Luis Diaz, aims to commercialize a number of cancer genome analysis methods that have been developed at Hopkins over the past several decades. Velculescu, chief scientific officer of PGDx, is director of cancer genetics at the Ludwig Center for Cancer Genetics and Therapeutics at Hopkins; while Diaz, chief medical officer of the company, is director of translational medicine at the Ludwig Center.

Other founders include Ludwig Center Director Bert Vogelstein as well as Hopkins researchers Ken Kinzler, Nick Papadopoulos, and Shibin Zhou. The team has led a number of seminal cancer sequencing projects, including the first effort to apply large-scale sequencing to cancer genomes, one of the first cancer exome sequencingstudies, and the discovery of a number of cancer-related genes, including TP53, PIK3CA, APC, IDH1 and IDH2.

Velculescu told Clinical Sequencing News that the 10-person company, headquartered in the Science and Technology Park at Johns Hopkins in Baltimore, is a natural extension of the Hopkins group’s research activities.

Several years ago, “we began receiving requests from other researchers, other physicians, collaborators, and then actually patients, family members, and friends, wanting us to do these whole-exome analyses on cancer samples,” he said. “We realized that doing this in the laboratory wasn’t really the best place to do it, so for that reason we founded Personal Genome Diagnostics.”

The goal of the company, he said, “is to translate this history of our group’s experience of cancer genetics and our understanding of cancer biology, together with the technology that has now become available, and to ultimately perform these analyses for individual patients.”

The fledgling company has reached two commercial milestones in the last several weeks. First, it gained CLIA certification for cancer exome sequencing using the HiSeq 2000. In addition, it secured exclusive licensing rights from Hopkins for a technology called digital karyotyping, developed by Velculescu and colleagues to analyze copy number changes in cancer genomes.

PGDx offers a comprehensive cancer genome analysis service that combines exome sequencing with digital karyotyping, which isolates short sequence tags from specific genomic loci in order to identify chromosomal changes as well as amplifications and deletions.

The company sequences tumor-normal pairs and promises a turnaround time of six to 10 weeks, though Velculescu said that ongoing improvements in sequencing technology and the team’s analysis methods promise to reduce that time “significantly.” It is currently seeing turnaround times of under a month.

To date, the company has focused solely on the research market. Customers have included pharmaceutical and biotech companies, individual clinicians and researchers, and contract research organizations, while the scale of these projects has ranged from individual patients to thousands of exomes for clinical trials.

While the company performs its own sequencing for smaller projects, it relies on third-party service providers for larger studies.

PGDx specializes in all aspects of cancer genome analyses, but has a particular focus on the front and back end of the workflow, Velculescu said, including “library construction, pathologic review of the samples, dissection of tumor samples to enrich tumor purity, next generation sequencing, identification of tumor-specific alterations, and linking of these data to clinical and biologic information about human cancer.”

The sequencing step in the middle, however, “is really almost becoming a commodity,” he noted. “Although we’ve done it in house, we typically do outsource it and that allows us to scale with the size of these projects.”

He said that PGDx typically works with “a number of very high-quality sequence partners to do that part of it,” but he declined to disclose these partners.

On the front end, PGDx has developed “a variety of techniques that we’ve licensed and optimized from Hopkins that have allowed us to improve extraction of DNA from both frozen tissue and [formalin-fixed, paraffin-embedded] tissue, even at very small quantities,” Diaz said. The team has also developed methods “to maximize our ability to construct libraries, capture, and then perform exomic sequencing with digital karyotyping.”

Once the sequence data is in hand, “we have a pipeline that takes that information and deciphers the changes that are most likely to be related to the cancer and its genetic make-up,” he said. “That’s not trivial. It requires inspection by an experienced cancer geneticist.”

While the firm is working on automating the analysis, “it’s not something that is entirely automatable at this time and therefore cannot be commoditized,” Diaz said.

The firm issues a report for its customers that “provides information not only on the actual sequence changes which are of high quality, but what these changes are likely to do,” Velculescu said, including “information about diagnosis, prognosis, therapeutic targeting [information] or predictive information about the therapy, and clinical trials.”

So far, the company has relied primarily on word of mouth to raise awareness of its offerings. “We’ve literally been swamped with requests from people who just know us,” Velculescu said. “I think one of the major reasons people have been coming to us for either these small or very large contracts is that people are getting this type of NGS data and they don’t know what to do with it — whether it’s a researcher who doesn’t have a lot of experience in cancer or a clinician who hasn’t seen this type of data before.”

While there’s currently “a wealth in the ability to get data, there’s an inadequacy in being able to understand and interpret the data,” he said.

Pricing for the company’s services is on a case-by-case basis, but Diaz estimated that retail costs are currently between $5,000 and $10,000 per tumor-normal pair for research purposes. Clinical cases are more costly because the depth of coverage is deeper and additional analyses are required, as well as a physician interpretation.

A Cautious Approach

While the company’s ultimate goal is to help oncologists use genomic information to inform treatment for their patients, PGDx is “proceeding cautiously” in that direction, Diaz said.

The firm has so far sequenced around 50 tumor-normal pairs for individual patients, but these have been for “informational purposes,” he said, stressing that the company believes the field of cancer genomics is still in the “discovery” phase.

“I think we’re really at the beginning of the genomic revolution in cancer,” Diaz said. “We are partnering with pharma, with researchers, and with certain clinicians to start bringing this forward — not only as a discovery tool but eventually as a clinical application.”

“We do think that rushing into this right now is too soon, but we are building the infrastructure — for example our recent CLIA approval for cancer genome analyses — to do that,” he added.

This cautious approach sets the firm apart from some competitors, including Foundation Medicine, which is about to launch a targeted sequencing test that it is marketing as a diagnostic aid to help physicians tailor therapy for their patients. Diagnostic firm Asuragen is also offering cancer sequencing services based on a targeted approach (CSN 1/12/12), as are a number of academic labs.

Diaz said that PGDx’s comprehensive approach also sets it apart from these groups. “We think there’s a lot of clinically actionable information in the genome … and we don’t want to limit ourselves by just looking at a set of genes and saying that these may or may not have importance.”

While the genes in targeted panels “may have some data surrounding them with regard to prognosis, or in relation to a therapy, that’s really only a small part of the story when it comes to the patient’s cancer,” Diaz said.

“That’s why we would like to remain the company that looks at the entire cancer genome in a comprehensive fashion, because we don’t know enough yet to break it down to a few genes,” he said.

The company’s proprietary use of digital karyotyping to find copy number alterations is another differentiator, Velculescu said, because many cancer-associated genes — such as p16, EGFR, MYC, and HER2/neu — are only affected by copy number changes, not point mutations.

Ultimately, “we want to develop something that has value for the clinician,” Diaz said. “A clinician currently sees 20 to 30 patients a day and may have only a few minutes to look at a report. If [information from sequencing] doesn’t have immediate high-impact value, it’s going to be very hard to justify its use down the road.”

He added that the company is “thinking very hard about what we can squeeze out of the cancer genome to provide that high-impact clinical value — something that isn’t just going to improve the outcome of patients by a few months or weeks, but actually change the outlook of that patient substantially.”

Source:

http://www.genomeweb.com/sequencing/startup-aims-translate-hopkins-teams-cancer-genomics-expertise-patient-care

 
Bernadette Toner is editorial director for GenomeWeb’s premium content. E-mail her here or follow her GenomeWeb Twitter account at @GenomeWeb.

In Educational Symposium, Illumina to Sequence, Interpret Genomes of 50 Participants for $5K Each

June 27, 2012

This story was originally published June 25.

As part of a company-sponsored symposium this fall to “explore best practices for deploying next-generation sequencing in a clinical setting,” Illumina plans to sequence and analyze the genomes of around 50 participants for $5,000 each, Clinical Sequencing News has learned.

According to Matt Posard, senior vice president and general manager of Illumina’s translational and consumer genomics business, the event is part of a “multi-step process to engage experts in the field around whole-genome sequencing, and to support the conversation.”

The “Understand your Genome” symposium will take place Oct. 22-23 at Illumina’s headquarters in San Diego.

The company sent out invitations to the event over the last few months, targeting individuals with a professional interest in whole-genome sequencing, including medical geneticists, pathologists, academics, and industry or business leaders, Posard told CSN this week. To provide potential participants with more information about the symposium, Illumina also hosted a webinar this month that included a Q&A session.

Registration closed June 14 and has exceeded capacity — initially 50 spots, a number that may increase slightly, Posard said. Everyone else is currently waitlisted, and Illumina plans to host additional symposia next year.

“There has been quite a bit of unanticipated enthusiasm around this from people who are speaking at the event or planning to attend the event,” including postings on blogs and listservs, Posard said.

As part of their $5,000 registration fee, which does not include travel and lodging, participants will have their whole genome sequenced in Illumina’s CLIA-certified and CAP-accredited lab prior to the event. It is also possible to participate without having one’s genome sequenced, but only as a companion to a full registrant, according to Illumina’s website. The company prefers that participants submit their own sample, but as an alternative, they may submit a patient sample instead.

The general procedure is very similar to Illumina’s Individual Genome Sequencing, or IGS, service in that it requires a prescription from a physician, who also receives the results to review them with the participant. However, participants pay less than they would through IGS, where a single human genome currently costs $9,500.

Participants will also have a one-on-one session with an Illumina geneticist prior to being sequenced, and they can choose to not receive certain medical information as part of the genome interpretation.

Doctors will receive the results and review them with the participants sometime before the event. “There will be no surprises for these participants when they come to the symposium,” Posard said.

Results will include not only a list of variants but also a clinical interpretation of the data by Illumina geneticists. This is currently not part of IGS, which requires an interpretation of the data by a third party, but Illumina plans to start offering interpretation services for IGS before the symposium, Posard said.

“Our stated intent has always been that we want to fill in all of the pieces that the physicians require, so we are building a human resource, as well as an informatics team, to provide that clinical interpretation, and we are using that apparatus for the ‘Understand your Genome’ event,” Posard said.

The interpretation will include “a specified subset of genes relating to Mendelian conditions, drug response, and complex disease risks,” according to the website, which notes that “as with any clinical test, the patient and physician must discuss any medically significant results.”

The first day of the symposium will feature presentations on clinical, laboratory, ethical, legal, and social issues around whole-genome sequencing by experts in the field. Speakers include Eric Topol from the Scripps Translational Science Institute, Matthew Ferber from the Mayo Clinic, Robert Green from Brigham and Women’s Hospital and Harvard Medical School, Heidi Rehm from the Harvard Partners Center for Genetics and Genomics, Gregory Tsongalis from the Dartmouth Hitchcock Medical Center, Robert Best from the University of South Carolina School of Medicine, Kenneth Chahine from Ancestry.com, as well as Illumina’s CEO Jay Flatley and chief scientist David Bentley.

On the second day, participants will receive their genome data on an iPad and learn how to analyze their results using the iPad MyGenome application that Illumina launched in April.

The planned symposium stirred some controversy at the European Society of Human Genetics annual meeting in Nuremberg, Germany, this week. During a presentation in a session on the diagnostic use of next-generation sequencing, Gert Matthijs, head of the Laboratory for Molecular Diagnostics at the Center for Human Genetics in Leuven, Belgium, said he was upset because the invitation to Illumina’s event apparently not only reached selected individuals but also patient organizations.

“To me, personally, [the event] tells that some people are really exploring the limits of business, and business models, to get us to genome sequencing,” he said.

“We have to be very careful when we put next-generation sequencing direct to the consumer, or to patient testing, but it’s a free world,” he added later.

Posard said that Illumina welcomes questions about and criticism of the symposium. “This is another example of us being extremely responsible and transparent in how we’re handling this novel application that everybody acknowledges is the wave of the future,” he said. “We want to responsibly introduce that wave, and I believe we’re doing so, through such things as the ‘Understand your Genome’ event, but not limited to this event.”

Julia Karow tracks trends in next-generation sequencing for research and clinical applications for GenomeWeb’s In Sequenceand Clinical Sequencing News. E-mail her here or follow her GenomeWeb Twitter accounts at @InSequence and@ClinSeqNews.
Source:

Federal Court Rules Helicos Patent Invalid; Company Reaches Payment Agreement with Lenders

August 30, 2012

NEW YORK (GenomeWeb News) – A federal court has ruled in Illumina’s favor in a lawsuit filed by Helicos BioSciences that had alleged patent infringement.

In a decision dated Aug. 28, District Judge Sue Robinson of the US District Court for the District of Delaware granted Illumina’s motion for summary judgment declaring US Patent No 7,593,109 held by Helicos invalid for “lack of written description.”

Titled “Apparatus and methods for analyzing samples,” the patent relates to an apparatus, systems, and methods for biological sample analysis.

The ‘109 patent was the last of three patents that Helicos accused Illumina of infringing, following voluntary dismissal by Helicos earlier this year with prejudice of the other two patents. In October 2010 Helicos included Illumina and Life Technologies in a lawsuit that originally accused Pacific Biosciences of patent infringement.

Helicos dropped its lawsuit against Life Tech and settled with PacBio earlier this year, leaving Illumina as the sole defendant.

In seeking a motion for summary judgment, Illumina argued that the ‘109 patent does not disclose “a focusing light source operating with any one of the analytical light sources to focus said optical instrument on the sample.” Illumina’s expert witness further said that the patent “does not describe how focusing light source works” nor does it provide an illustration of such a system, according to court documents.

In handing down her decision, Robinson said, “In sum, and based on the record created by the parties, the court concludes that Illumina has demonstrated, by clear and convincing evidence, that the written description requirement has not been met.”

In a statement, Illumina President and CEO Jay Flatley said he was pleased with the court’s decision.

“The court’s ruling on the ‘109 patent, and Helicos’ voluntary dismissal of the other patents in the suit, vindicates our position that we do not infringe any valid Helicos patent,” he said. “While we respect valid and enforceable intellectual property rights of others, Illumina will continue to vigorously defend against unfounded claims of infringement.”

After the close of the market Wednesday, Helicos also disclosed that it had reached an agreement with lenders to waive defaults arising from Helicos’ failure to pay certain risk premium payments in connection with prior liquidity transactions. The transactions are part of risk premium payment agreement Helicos entered into with funds affiliated with Atlas Venture and Flagship Ventures in November 2010.

The lenders have agreed to defer the risk premium payments “until [10] business days after receipt of a written notice from the lenders demanding the payment of such risk premium payments,” Helicos said in a document filed with the US Securities and Exchange Commission.

The Cambridge, Mass.-based firm also disclosed that Noubar Afeyan and Peter Barrett have resigned from its board.

Helicos said two weeks ago that its second-quarter revenues dipped 29 percent year over year to $577,000. In an SEC document, it also warned that existing funds were not sufficient to support its operations and related litigation expenses through the planned September trial date for its dispute with Illumina.

In Thursday trade on the OTC market, shares of Helicos closed down 20 percent at $.04.

Source:

http://www.genomeweb.com/sequencing/federal-court-rules-helicos-patent-invalid-company-reaches-payment-agreement-len

State of the Science: Genomics and Cancer Research

April 2012
Basic research allows for a better understanding of cancer and, eventually, improved patient outcomes. Zhu Chen, China’s minister of health, and Shanghai Jiao Tong University’s Zhen-Yi Wang received the seventh annual Szent-Györgyi prize from the National Foundation for Cancer Research for their work on a treatment for acute promyelocytic leukemia. Genome Technology‘s Ciara Curtin spoke to Chen, Wang, and past prize winners about the state of cancer research.

Genome Technology: Doctors Wang and Chen, can you tell me a bit about the work you did that led to you receiving the Szent-Györgyi prize?

Zhen-Yi Wang: I am a physician. I am working in the clinic, so I have to serve the patients. … I know the genes very superficially, not very deeply, but the question raised to me is: There are so many genes, but how are [we] to judge what is the most important?

Zhu Chen: The work that is recognized by this year’s Szent-Györgyi Prize concerns … acute promyelocytic leukemia. Over the past few decades, we have been involved in developing new treatment strategies against this disease.

You have two [therapies — all-trans retinoic acid and arsenic trioxide] — that target the same protein but with slightly different mechanisms, so we call this synergistic targeting. When the two drugs combine together for the induction therapy, then we see very nice response in terms of the complete remission rate. But more importantly, we see that this synergistic targeting, together with the effect of the chemotherapy, can achieve a very high five-year disease-free survival — as high as 90 percent.

But we were more interested in the functional aspects of the genome, to understand what each gene does and also to particularly understand the network behavior of the genes.

GT: There are a number of consortiums looking at the genome sequences of many cancer types. What do you hope to see from such studies?

Webster Cavenee: This is a way that tumors are being sequenced in a rational kind of way. It would have been done anyway by labs individually, which would have taken a lot more money and taken a lot longer, too. The human genome sequence, everybody said, ‘Why are you going to do that?’ … But that now turns out to be a tremendous resource. … From the point of view of The Cancer Genome Atlas, having the catalog of all of the kinds of mutations which are present in tumors can be very useful because you can see patterns. For example, in the glioblastoma cancer genome project, they found an unexpected association of some mutations and combinations of mutations with drug sensitivity. Nobody would have thought that.

The problem, of course, is that when you are sequencing all these tumors, it’s a very static thing. You get one point in time and you sequence whatever comes out of this big lump of tissue. That big lump is made up of a lot of different kinds of pieces, so when you see a mutation, you can’t know where it came from and you don’t know whether it actually does anything. That then leads into what’s going to be the functionalizing of the genome. Because in the absence of knowing that it has a function, it’s not going to be of very much use to develop drugs or anything like that. And that’s a much bigger exercise because that involves a lot of experiments, not just stuffing stuff into a sequencer.Peter Vogt: [The genome] has to be used primarily to determine function. Without function, there’s not much you can do with these mutations, because the distinction between a driver mutation and a passenger mutation can’t be made just on the basis of sequence.

Carlo Croce: After that, you have to be able to validate all of the genetic operations in model systems where you can reproduce the same changes and see whether there are the same consequences. Otherwise, without validation, to develop therapy doesn’t make much sense because maybe those so-called driver mutations will turn out to be something else.

GT: Will sequencing of patient’s tumors come to the clinic?

CC: It is inevitable. Naturally, there are a lot of bottlenecks. To do the sequencing is the, quote, trivial part and it is going to cost less and less. But then interpreting the data might be a little bit more cumbersome.

Sujuan Ba: Dr. Chen, there is an e-health card in China right now. Do you think some day gene sequencing will be stored in that card?

ZC: We are developing a digital healthcare in China. We started with electronic health records and now by providing the e-health card to the people, that will facilitate the individualized health management and also the supervision of our healthcare system. In terms of the use of genetic information for clinical purposes, as Professor Croce said, it’s going to happen.

GT: What do you think are the major questions in cancer research that still need to be addressed?

PV: There are increasingly two schools of thought on cancer. One is that it is all an engineering problem: We have all the information we need, we just need to engineer the right drugs. The other school says it’s still a basic knowledge problem. I think more and more people think it’s just an engineering problem — give us the money and we’ll do it all. A lot of things can be done, but we still don’t have complete knowledge.

Roundtable Participants
Sujuan Ba, National Foundation for Cancer Research
Webster Cavenee, University of California, San Diego
Zhu Chen, Ministry of Health, China
Carlo Croce, Ohio State University
Peter Vogt, Scripps Research Institute
Zhen-Yi Wang, Shanghai Jiao Tong University

Source:

Read Full Post »

 

Harvard Group Using Bio-Rad Digital PCR System as Part of NHGRI-Funded Study of Multi-Allelic CNV

 

Reporter: Aviva Lev-Ari, PhD, RN

August 23, 2012

Researchers in the Department of Genetics at the Harvard University Medical School have been awarded $500,000 by the National Institutes of Health for the first year of a four-year project to study multi-allelic copy number variation in the human genome.

As part of the research, the Harvard team is using a Bio-Rad QX100 Droplet Digital PCR system as one of two methods to analyze multi-allelic CNVs in human cohorts. The researchers are also using a computational method that compares available whole-genome sequencing data.

Steven McCarroll, a professor of genetics at Harvard Med and director of genetics at the Stanley Center for Psychiatric Research at the Broad Institute, is principal investigator on the grant, which is being administered by NIH’s National Human Genome Research Institute.

According to a recently published grant abstract, McCarroll and colleagues seek to analyze multi-allelic CNVs, which involve genes and other functional elements for which three or more segregating alleles give rise to a wide range of copy numbers — between two and 10 — per diploid human genome.

These multi-allelic CNVs have been “refractory to widely used analysis methods and are not assessed in the genome-scale molecular or statistical approaches used to study genetically complex phenotypes in humans,” the researchers wrote.

The project builds on research that McCarroll’s group previously conducted on characterizing multi-allelic duplication CNVs of a megabase-long inversion polymorphism in a particular locus of chromosome 17 called 17q21.31, which contains markers previously associated with female fertility, female meiotic recombination, and neurological disease.

As part of that research, published in the August 2012 issue of Nature Genetics, the group analyzed read depth in the locus by applying an algorithm called Genome Structure in Populations, or Genome STRiP, to whole-genome sequencing data from 946 unrelated individuals sampled as part of the 1000 Genomes Project; and used droplet-based digital PCR to analyze 120 parent-offspring trios from HapMap.

http://www.nature.com/ng/journal/v44/n8/full/ng.2334.html

They found that their measurements of integer copy number varied from two to eight, and were 99.1 percent concordant across 234 genotypes in overlapping samples, thus validating both the computational and digital PCR methods.

More specifically, for the digital PCR assay, the group designed a pair of PCR primers and a dual-labeled fluorescence-FRET oligonucleotide probe to both the CNV locus and a two-copy control locus. Then they used a droplet generator from QuantaLife to compartmentalize the PCR reaction into uniform 1-nanoliter emulsion-based droplets containing zero, one, or very few template molecules for each locus; and a droplet reader from QuantaLife to count the number of positive and negative droplets, comparing the droplet counts of the CNV locus to the control locus to determine absolute copy number.

QuantaLife originally developed the droplet-based digital PCR system, but was acquired in October by Bio-Rad, which rebranded the platform as the QX100 Droplet Digital PCR system (PCR Insider, 10/6/2011).

Annette Tumolo, director of the digital biology center at Bio-Rad, told PCR Insider this week that McCarroll has access to two such platforms, one of which is in use at Harvard and was obtained from QuantaLife, and one of which Bio-Rad sold to the Broad Institute.

Tumolo said that Bio-Rad maintains “an active and positive relationship” with the McCarroll lab. “They’ve gotten great results [with the QX100], and were able to rapidly publish the Nature Genetics paper,” Tumolo said.

Under the new NHGRI grant, McCarroll and colleagues plan to “accurately analyze mCNVs in reference populations” using both the computational and digital PCR approach, the researchers wrote in their grant abstract.

“By analyzing these data in a statistical framework that incorporates information about genotypes, allele frequencies, inheritance, and haplotypes, we will place mCNV alleles onto the haplotype maps created by HapMap and 1000 Genomes, and render mCNVs accessible to genotype imputation to the fullest extent possible,” the grant abstract states.

In addition, McCarroll’s group hopes to “deeply characterize mCNVs at 10 biomedically important loci, to understand these polymorphisms at the levels of population genetics, mutational rates and histories, and relationships to clinical phenotypes. Finally, we will pilot inexpensive in silico genome-wide association studies for mCNVs based on statistical imputation into existing GWAS data sets.”

The end goal of the project is to discover relationships between disease risk and gene dosage, which will help reveal the molecular etiology of human disease, the researchers wrote.

Related Stories

Ben Butkus is senior editor of GenomeWeb’s premium content and the editor of PCR Insider. He covers technologies and trends in PCR, qPCR, nucleic acid amplification, and sample prep. E-mail him here or follow his GenomeWeb Twitter account at@PCRInsider.

 

Read Full Post »

« Newer Posts