Feeds:
Posts
Comments

Archive for the ‘Personalized and Precision Medicine & Genomic Research’ Category

 

Molecular pathogen identification comes to the bedside

Reporter:  Larry H Bernstein, MD, FCAP

The developments in molecular diagnostics have been proceeding at a rapid pace.  Naturally it is not surprising that it would reach into clinical microbiology early.  Microbiology and virology have many methods for validation of type of pathogen, and the identification of new pathogens can require delay because of use of a State laboratory.  This will be less an issue with the consolidation of regional facilities and associated laboratories.

I present an example of point-of-care technology from the University of California, Davis developed by Gerald Kost and colleagues with UC Lawrence Livermore National Point-of-Care Technologies Center .

Tran NK, Wisner DH, Albertson TE, Cohen S, et al.  Multiplex polymerase chain reaction pathogen detection in patients with suspected septicemia after trauma, emergency, and burn surgery. Surgery 2012 Mar;151(3):456-63. Epub 2011 Oct 5.  nktran@ucdavis.edu

The goal of the study:  to determine the clinical value of multiplex polymerase chain reaction (PCR) study for enhancing pathogen detection in patients with suspected septicemia after trauma, emergency, and burn surgery.

Finding: PCR-based pathogen detection quickly reveals occult bloodstream infections in these high-risk patients and may accelerate the initiation of targeted antimicrobial therapy.

Type study: a prospective observational study

Population:  30 trauma and emergency surgery patients compared to 20 burn patients.

Method:  Whole- routine blood cultures (BCs) were tested using a new multiplex, PCR-based, pathogen detection system. PCR results were compared to culture data.

Arbitrated Case Review

Arbitrated case review was performed by a medical intensivist, 3 trauma surgeons, 3 burn surgeons, 1 microbiologist, and an infectious disease physician to determine antimicrobial adequacy based on paired PCR/BC results. The arbitrated case review process is adapted from a previous study. Physicians were first presented cases with only BC results. Cases were then represented with PCR results included.

Results:

  • PCR detected rapidly more pathogens than culture methods.
  • Acute Physiology and Chronic Health Evaluation II (APACHE II), Sequential Organ Failure Assessment (SOFA), and Multiple Organ Dysfunction (MODS) scores were greater in PCR-positive versus PCR-negative trauma and emergency surgery patients (P ≤ .033).
  • Negative PCR results (odds ratio, 0.194; 95% confidence interval, 0.045-0.840; P = .028) acted as an independent predictor of survival for the combined surgical patient population.

CONCLUSION:

  • PCR results were reported faster than blood culture results.
  • Severity scores were significantly greater in PCR-positive trauma and emergency surgery patients.
  • The lack of pathogen DNA as determined by PCR served as a significant predictor of survival in the combined patient population.
  • PCR testing independent of traditional prompts for culturing may have clinical value in burn patients.

NK Tran, et al.  Multiplex Polymerase Chain Reaction Pathogen Detection in Trauma, Emergency, and Burn Surgery Patients with Suspected Septicemia.  Surgery. 2012 March; 151(3): 456–463. PMID: 21975287 [PubMed – indexed for MEDLINE] PMCID: PMC3304499 On-line 2011 October 5.
doi:  10.1016/j.surg.2011.07.030
PMCID: PMC3304499.  NIHMSID: NIHMS288960

Plymerase chain reaction, PCR

Plymerase chain reaction, PCR (Photo credit: Wikipedia)

 

Read Full Post »

 

Reporter: Aviva Lev-Ari, PhD, RN

Ten Biotech Powerhouses Such as Abbott Laboratories (ABT),AstraZeneca PLC (AZN) Unite to Form TransCelerate BioPharma Inc. to Accelerate the Development of New Meds

TransCelerate – New Non-Profit Organization to Speed Pharmaceutical R&D,  headquartered in Philadelphia

“This initiative is complementary to efforts of CTTI, and we look forward to working with TransCelerate BioPharma to improve the conduct of clinical trials.”
As shared solutions in clinical research and other areas are developed, TransCelerate will involve industry alliances including:

9/19/2012 9:29:28 AM

PHILADELPHIA, Sept. 19, 2012 /PRNewswire/ — Ten leading biopharmaceutical companies announced today that they have formed a non-profit organization to accelerate the development of new medicines. Abbott, AstraZeneca, Boehringer Ingelheim, Bristol-Myers Squibb, Eli Lilly and Company, GlaxoSmithKline, Johnson & Johnson, Pfizer, Genentech a member of the Roche Group, and Sanofi launched TransCelerate BioPharma Inc. (“TransCelerate”), the largest ever initiative of its kind, to identify and solve common drug development challenges with the end goals of improving the quality of clinical studies and bringing new medicines to patients faster.

 

Through participation in TransCelerate, each of the ten founding companies will combine financial and other resources, including personnel, to solve industry-wide challenges in a collaborative environment. Together, member companies have agreed to specific outcome-oriented objectives and established guidelines for sharing meaningful information and expertise to advance collaboration.

“There is widespread alignment among the heads of R&D at major pharmaceutical companies that there is a critical need to substantially increase the number of innovative new medicines, while eliminating inefficiencies that drive up R&D costs,” said newly appointed acting CEO of TransCelerate BioPharma, Garry Neil, MD, Partner at Apple Tree Partners and formerly Corporate Vice President, Science & Technology, Johnson & Johnson. “Our mission at TransCelerate BioPharma is to work together across the global research and development community and share research and solutions that will simplify and accelerate the delivery of exciting new medicines for patients.”

Members of TransCelerate have identified clinical study execution as the initiative’s initial area of focus. Five projects have been selected by the group for funding and development, including: development of a shared user interface for investigator site portals, mutual recognition of study site qualification and training, development of risk-based site monitoring approach and standards, development of clinical data standards, and establishment of a comparator drug supply model.

As shared solutions in clinical research and other areas are developed, TransCelerate will involve industry alliances including Clinical Data Interchange Standards Consortium (CDISC), Critical-Path Institute (C-Path), Clinical Trials Transformation Initiative (CTTI), Innovative Medicines Initiative (IMI), regulatory bodies including the US Food and Drug Administration (FDA) and European Medicines Agency (EMA), and Contract Research Organizations (CROs).

Janet Woodcock, MD, director of FDA’s Center for Drug Evaluation and Research, said, “We applaud the companies in TransCelerate BioPharma for joining forces to address a series of longstanding challenges in new drug development. This collaborative approach in the pre-competitive arena, utilizing the collective experience and resources of 10 leading drug companies and others to follow, has the promise to lead to new paradigms and cost savings in drug development, all of which would strengthen the industry and its ability to develop innovative and much-needed therapies for patients.”

“These leading pharmaceutical companies are in a position to significantly influence changes in the way that clinical trials are done, so that better answers about the benefits and risks of drugs and other therapies are provided in a more efficient manner,” said Robert Califf, MD, Co-Chair of CTTI and Director of the Duke Translational Medicine Institute. “This initiative is complementary to efforts of CTTI, and we look forward to working with TransCelerate BioPharma to improve the conduct of clinical trials.”

TransCelerate BioPharma evolved from relationships fostered via the Hever Group, a forum for executive R&D leadership to discuss relevant issues facing the industry and solutions for addressing common challenges. TransCelerate was incorporated in early August 2012 and will file for non-profit status this fall. The Board of Directors includes R&D heads of ten member companies. Membership in TransCelerate is open to all pharmaceutical and biotechnology companies who can contribute to and benefit from these shared solutions. TransCelerate’s headquarters will be located in Philadelphia, PA.

http://news.bms.com/press-release/rd-news/ten-pharmaceutical-companies-unite-accelerate-development-new-medicines-0&t=634836499683795253

 

Read Full Post »

Breast Cancer, drug resistance, and biopharmaceutical targets
Reporter: Larry H Bernstein, MD
There has been a continuing improvement in breast cancer treatment and extended survival, that is rapidly changing with respect to disease free survival and decreased toxicity. This is a snapshot of recently published work.
1. Breast Cancer Drug Resistance Linked to Gene Family
Two related proteins have been implicated in the mechanisms that allow breast cancer, and potentially other tumor types, to resist therapy with tyrosine kinase inhibitors (TKIs).
Case Western Reserve University
FAM83A and FAM83B have been identified by separate research groups reporting in parallel in the Journal of Clinical Investigation. The proteins may represent promising new therapeutic targets.
Researchers at Case Western Reserve University used a validation-based insertional mutagenesis (VBIM) strategy to generate libraries of HEM1 immortalized human mammary epithelieal cells (HMECs) that carried unique, single genetic alterationsto identify genes that promoted anchorage-independent growth (AIG), and tumorigenicity, and then carried out further cell-based assays to see which of the identified genes could promote tumor growth independently of RAS.

  • Expression of FAM83B was found to promote AIG and tumorigenesis, in naïve HME1 cells, and FAM83B-expressing cells were also shown to be capable of forming tumors in immunodeficient mice, confirming its function as a transforming oncogene.
  • FAM83B expression levels were associated with specific cancer subtypes, with increased tumor grade, and with decreased overall survival. For example, increased expression of FAM83B was significantly associated with estrogen receptor– (ER-) and progesterone receptor–negative (PR-negative) breast tumors, with higher grade and poor outcome.
  • FAM83B also binds with a downstream RAS effector, CRAF, and this binding increased MAPK and mTOR signaling, and reduced sensitivity to EGFR-TKIs.

The Case Western Reserve group concludes that knocking out FAM83B inhibited the proliferation and malignant phenotype of tumor-derived cells or RAS-transformed HMECs.
The authors say targeting FAM38B therapeutically may increase the sensitivity of breast cancer to EGFR-TKI therapy. Given the requirement for FAM83B as an activator of CRAF/MAPK in EGFR and RAS signaling, the levels of FAM83B and FAM83A may be important to consider when determining which patients receive TKI treatment.
MW Jackson., RCipriano, et al. “FAM83B mediates EGFR- and RAS-driven oncogenic transformation.” J Clin Invest 2012.

Lawrence Berkeley National Laboratory-led team
The Berkely group also identified this potential target for addressing drug resistance in breast cancer and potentially other tumor types. The  team have reported in the same issue of JCI that a gene known as FAM83A has oncogenic properties, and when overexpressed in cancer cells confers resistance to EGFR-tyrosine kinase inhibitor (EGFR-TKI) drugs and also promotes tumor proliferation and invasion.

  • FAM38A had coincidentally previously been identified as highly expressed in lung cancer. The researchers found that, while normal tissue didn’t produce the FAM38A, it was highly expressed in malignant tissue.
  • They hypothesized that resistance to EGFR-TKIs may occur in part as a result of a molecular mechanism that triggers phosphorylation signaling downstream of EGFRs.
  • They developed a novel three-dimensional cell culture assay based on the phenotypic reversion of malignant cells into phenotypically nonmalignant cells, to screen for genes involved in EGFR-TKI resistance both in normal and cancerous human cell lines.
  • FAM83A is expressed in every breast cancer cell line they looked at and was particularly elevated in those that were more resistant to EGFR-TKI treatment.
  • FAM38A interacts with and triggers phosphorylation of signaling proteins downstream of EGFR, that act to block the therapeutic effects of EGFR-TKIs.
  • When breast cancer cells were treated with an shRNA that blocked FAM38A expression, the cells became less proliferative and more sensitive to EGFR-TKI treatment. Conversely, FAM83A overexpression led to elevated invasiveness.

Mechanistically, FAM83A was shown to interact with and cause phosphorylation of CRAF and PI3K, upstream of MAPK and downstream of EGFR. This finding correlates well with the mechanism reported for FAM38B by the Case Western team.
The published data highlights the potential importance of this family of proteins as potential drug targets, and helps explain existing data demonstrating a clinical correlation between high FAM83A expression and poor cancer prognosis. Moreover, Dr. Bissell states the finding also reveals a whole new family of potential oncogenes that could be a target for all types of cancer, including breast cancer.
Bissell et al. “FAM83A confers EGFR-TKI resistance in breast cancer cells and in mice.” Journal of Clinical Investigation 2012
GEN News Highlights : Sep 12, 2012

2. Targets Identified to Prevent Breast Cancer Spread
Hypoxia-inducible factor 1 (HIF-1) and platelet-derived growth factor B (PDGF-B) may represent promising therapeutic targets for preventing breast cancer from spreading to the lymph nodes and metastasizing to other organs. The protein’s role in lymphatic dissemination of cancer hasn’t been well understood.

Researchers at the Johns Hopkins University School of Medicine and partners in Italy have found that HIF-1 promotes lymphatic metastasis of breast cancer directly by activating the gene encoding PDGF-B, which triggers the growth of new lymphatic vessels.

  • Previous work by Gregg L. Semenza, M.D., and colleagues had shown that knocking out HIF-1α or HIF2α in mice implanted with human breast cancer cells (BCCs) slowed tumor growth and lung metastasis.
  • Treating animals with the HIF-1 inhibitor digoxin similarly impaired primary tumor growth and lung metastasis.
  • Mice injected with HIF-1 knockdown human BCC cells exhibited 76% fewer cancer cells in their lymph nodes after 24 days than animals injected with unengineered BCC cells, supporting a role for HIF-1 in the spread of breast cancer to lymph nodes.
  • HIF-1 binds directly to the gene for PDGF-B, which is overexpressed under the hypoxic conditions found in tumors, and triggers the growth of lymphatic vessels.
  • Moreover, coexpression of HIF-1α and PDGF-B was also found in invasive breast carcinomas, and this coexpression correlated with survival and response to chemotherapy, the researchers stress.
  • PDGF-B produced as a result of HIF-1 binding is released from the tumor cells and binds to its cognate receptor PDGFRβ, which is upregulated on lymphatic endothelial cells (LEC) under hypoxic conditions.
  • This PDGFβ signaling triggers LEC proliferation and migration, and the growth of lymphatic vessels.
  • When the researchers turned off PDGFRβ signalling by blocking HIF-1 or PDGF-B using either RNA interference, or chemical inhibitors (digoxin or the tyrosine kinase inhibitor imatinib), both lymphatic vessel density and lymph node metastasis were significantly reduced.

Reporting their findings in PNAS, the investigators suggest:
highlighting HIF-1 and PDGF-B as potential therapeutic targets for breast cancer their results suggest that co-expression of the two proteins may help to identify lymph node-negative patients who are at a high risk for developing lymph node metastasis
GL Semenza, et al. “Hypoxia-inducible factor 1-dependent expression of platelet-derived growth factor B promotes lymphatic metastasis of hypoxic breast cancer cells.” PNAS 2012.
GEN News Highlights : Sep 12, 2012

3. how tamoxifen-resistant breast-cancer cells grow and proliferate
A study by researchers at the Ohio State University Comprehensive Cancer Center (OSUCCC – James) has discovered how tamoxifen-resistant breast-cancer cells grow and proliferate.
It suggests that an experimental agent might offer a novel targeted therapy for tamoxifen-resistant breast cancer.

  • Like a second door that opens after the first door closes, a signaling pathway called hedgehog (Hhg) can promote the growth of breast-cancer cells after tamoxifen shuts down the pathway activated by the hormone estrogen.
  • A second signaling pathway, called PI3K/AKT, is also involved. Activation of the Hhg pathway renders tamoxifen treatment ineffective and enables the tumor to resume its growth and progression.
  • The researchers found that the tumors with an activated Hhg pathway had a worse prognosis.
  • an experimental drug called vismodegib, which blocks the Hhg pathway, inhibits the growth of tamoxifen-resistant human breast tumors in an animal model. The drug is in clinical trials testing for other types of cancer.

This study has identified targeted therapies that could be an alternative to chemotherapy for these resistant tumors. The study is published in the journal Cancer Research  2012.
### Follow Ronald’s contributions at Boston Biotech & Golden Triangle Biotech ###
Possible therapy for tamoxifen-resistant breast cancer identified cphi-online.com

EGF Signalling

EGF Signalling (Photo credit: TheJCB)

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

A New Approach Uses Compression to Speed Up Genome Analysis

Public-Domain Computing Resources

Structural Bioinformatics

The BetaWrap program detects the right-handed parallel beta-helix super-secondary structural motif in primary amino acid sequences by using beta-strand interactions learned from non-beta-helix structures.
Wrap-and-pack detects beta-trefoils in protein sequences by using both pairwise beta-strand interactions and 3-D energetic packing information
The BetaWrapPro program predicts right-handed beta-helices and beta-trefoils by using both sequence profiles and pairwise beta-strand interactions, and returns coordinates for the structure.
The MSARi program indentifies conserved RNA secondary structure in non-coding RNA genes and mRNAs by searching multiple sequence alignments of a large set of candidate catalogs for correlated arrangements of reverse-complementary regions
The Paircoil2 program predicts coiled-coil domains in protein sequences by using pairwise residue correlations obtained from a coiled-coil database. The original Paircoil program is still available for use.
The MultiCoil program predicts the location of coiled-coil regions in amino acid sequences and classifies the predictions as dimeric or trimeric. An updated version, Multicoil2, will soon be available.
The LearnCoil Histidase Kinase program uses an iterative learning algorithm to detect possible coiled-coil domains in histidase kinase receptors.
The LearnCoil-VMF program uses an iterative learning algorithm to detect coiled-coil-like regions in viral membrane-fusion proteins.
The Trilogy program discovers novel sequence-structure patterns in proteins by exhaustively searching through three-residue motifs using both sequence and structure information.
The ChainTweak program efficiently samples from the neighborhood of a given base configuration by iteratively modifying a conformation using a dihedral angle representation.
The TreePack program uses a tree-decomposition based algorithm to solve the side-chain packing problem more efficiently. This algorithm is more efficient than SCWRL 3.0 while maintaining the same level of accuracy.
PartiFold: Ensemble prediction of transmembrane protein structures. Using statistical mechanics principles, partiFold computes residue contact probabilities and sample super-secondary structures from sequence only.
tFolder: Prediction of beta sheet folding pathways. Predict a coarse grained representation of the folding pathway of beta sheet proteins in a couple of minutes.
RNAmutants: Algorithms for exploring the RNA mutational landscape.Predict the effect of mutations on structures and reciprocally the influence of structures on mutations. A tool for molecular evolution studies and RNA design.
AmyloidMutants is a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability.

Genomics

GLASS aligns large orthologous genomic regions using an iterative global alignment system. Rosetta identifies genes based on conservation of exonic features in sequences aligned by GLASS.
RNAiCut – Automated Detection of Significant Genes from Functional Genomic Screens.
MinoTar – Predict microRNA Targets in Coding Sequence.

Systems Biology

The Struct2Net program predicts protein-protein interactions (PPI) by integrating structure-based information with other functional annotations, e.g. GO, co-expression and co-localization etc. The structure-based protein interaction prediction is conducted using a protein threading server RAPTOR plus logistic regression.
IsoRank is an algorithm for global alignment of multiple protein-protein interaction (PPI) networks. The intuition is that a protein in one PPI network is a good match for a protein in another network if the former’s neighbors are good matches for the latter’s neighbors.

Other

t-sample is an online algorithm for time-series experiments that allows an experimenter to determine which biological samples should be hybridized to arrays to recover expression profiles within a given error bound.

http://people.csail.mit.edu/bab/computing_new.html#systems

Compressive genomics

http://www.nature.com/nbt/journal/v30/n7/abs/nbt.2241.html

Nature Biotechnology 30, 627–630 (2012) doi:10.1038/nbt.2241

Published online 10 July 2012
Algorithms that compute directly on compressed genomic data allow analyses to keep pace with data generation.

Figures at a glance

Introduction

In the past two decades, genomic sequencing capabilities have increased exponentially123, outstripping advances in computing power45678. Extracting new insights from the data sets currently being generated will require not only faster computers, but also smarter algorithms. However, most genomes currently sequenced are highly similar to ones already collected9; thus, the amount of new sequence information is growing much more slowly.
Here we show that this redundancy can be exploited by compressing sequence data in such a way as to allow direct computation on the compressed data using methods we term ‘compressive’ algorithms. This approach reduces the task of computing on many similar genomes to only slightly more than that of operating on just one. Moreover, its relative advantage over existing algorithms will grow with the accumulation of genomic data. We demonstrate this approach by implementing compressive versions of both the Basic Local Alignment Search Tool (BLAST)10 and the BLAST-Like Alignment Tool (BLAT)11, and we emphasize how compressive genomics will enable biologists to keep pace with current data.

Conclusions

Compressive algorithms for genomics have the great advantage of becoming proportionately faster with the size of the available data. Although the compression schemes for BLAST and BLAT that we presented yield an increase in computational speed and, more importantly, in scaling, they are only a first step. Many enhancements of our proof-of-concept implementations are possible; for example, hierarchical compression structures, which respect the phylogeny underlying a set of sequences, may yield additional long-term performance gains. Moreover, analyses of such compressive structures will lead to insights as well. As sequencing technologies continue to improve, the compressive genomic paradigm will become critical to fully realizing the potential of large-scale genomics.Software is available at http://cast.csail.mit.edu/.
References
  1. Lander, E.S. et alNature 409, 860–921 (2001).
  2. Venter, J.C. et alScience 291, 1304–1351 (2001).
  3. Kircher, M. & Kelso, J. Bioessays 32, 524–536 (2010).
  4. Kahn, S.D. Science 331, 728–729 (2011).
  5. Gross, M. Curr. Biol. 21, R204–R206 (2011).
  6. Huttenhower, C. & Hofmann, O. PLoS Comput. Biol. 6, e1000779 (2010).
  7. Schatz, M., Langmead, B. & Salzberg, S. Nat. Biotechnol. 28, 691–693 (2010).
  8. 1000 Genomes Project data available on Amazon Cloud. NIH press release, 29 March 2012.
  9. Stratton, M. Nat. Biotechnol. 26, 65–66 (2008).
  10. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. J. Mol. Biol. 215, 403–410 (1990).
  11. Kent, W.J. Genome Res. 12, 656–664 (2002).
  12. Grumbach, S. & Tahi, F. J. Inf. Process. Manag. 30, 875–886 (1994).
  13. Chen, X., Li, M., Ma, B. & Tromp, J. Bioinformatics 18, 1696–1698 (2002).
  14. Christley, S., Lu, Y., Li, C. & Xie, X. Bioinformatics 25, 274–275 (2009).
  15. Brandon, M.C., Wallace, D.C. & Baldi, P. Bioinformatics 25, 1731–1738 (2009).
  16. Mäkinen, V., Navarro, G., Sirén, J. & Välimäki, N. in Research in Computational Molecular Biology, vol. 5541 of Lecture Notes in Computer Science (Batzoglou, S., ed.) 121–137 (Springer Berlin/Heidelberg, 2009).
  17. Kozanitis, C., Saunders, C., Kruglyak, S., Bafna, V. & Varghese, G. in Research in Computational Molecular Biology, vol. 6044 of Lecture Notes in Computer Science (Berger, B., ed.) 310–324 (Springer Berlin/Heidelberg, 2010).
  18. Hsi-Yang Fritz, M., Leinonen, R., Cochrane, G. & Birney, E. Genome Res. 21, 734–740 (2011).
  19. Mäkinen, V., Navarro, G., Sirén, J. & Välimäki, N. J. Comput. Biol. 17, 281–308 (2010).
  20. Deorowicz, S. & Grabowski, S. Bioinformatics 27, 2979–2986 (2011).
  21. Li, H., Ruan, J. & Durbin, R. Genome Res. 18, 1851–1858 (2008).
  22. Li, H. & Durbin, R. Bioinformatics 25, 1754–1760 (2009).
  23. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. Genome Biol. 10, R25 (2009).
  24. Carter, D.M. Saccharomyces genome resequencing project. Wellcome Trust Sanger Institute http://www.sanger.ac.uk/Teams/Team118/sgrp/ (2005).
  25. Tweedie, S. et alNucleic Acids Res. 37, D555–D559 (2009).

Primary authors

  1. P.-R.L. and M.B. contributed equally to this work.
    • Po-Ru Loh &
    • Michael Baym

Affiliations

  1. Po-Ru Loh, Michael Baym and Bonnie Berger are in the Department of Mathematics and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
  2. Michael Baym is also in the Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

September 2012

Compressing a dataset with specialized algorithms is typically done in the context of data storage, where compression tools can shrink data to save space on a hard drive. But a group of researchers at MIT has developed tools that compute directly on compressed genomic datasets by exploiting the fact that most sequenced genomes are very similar to previously sequenced genomes.

 Speed Up Genome Analysis

by exploiting the fact that most sequenced genomes are very similar to previously sequenced genomes.

Led by MIT professor Bonnie Berger, the group has recently released tools called CaBlast and CaBlat, compressive versions of the widely used Blast and Blat alignment tools, respectively.

In a Nature Biotechnology paper published in July, Berger and her colleagues describe how the algorithms deliver alignment and analysis results up to four times faster than Blast and Blat when searching for a particular sequence in 36 yeast genomes.

“What we demonstrate is that the more highly similar genomes there are in a database, the greater the relative speed of CaBlast and CaBlat compared to the original non-compressive versions,” Berger says. “As we increase the number of genomes, the amount of work required for compressive algorithms scales only linearly in the amount of non-redundant data. The idea is that we’ve already done most of the work on the first genome.”

These two algorithms are still in the beta phase, and the MIT team has several refinements planned for future release to optimize performance. To that end, Berger has made the code for both algorithms available with the hope that developers will help them build “industrial-strength” software that can be used by the research community.

“To achieve optimal performance in real-use cases, we expect the code will need to be tuned for the engineering trade-offs specific to the application at hand,” she says. “The algorithm used to find and compress similar sequences in the database may need to be tweaked to take this issue into account, and the coarse- and fine-search steps should be aware of these constraints as well.”

While computing resources are becoming increasingly powerful, Berger contends that better algorithms and the use of compression technology will play a crucial role in helping researchers to keep up with the production of next-generation sequencing data.

Matthew Dublin is a senior writer at Genome Technology.

Read Full Post »

 

Reporter: Aviva Lev-Ari, hD, RN

 

A research team from Massachusetts and Maryland used array-based transcriptome profiling to explore the genetic basis of a progressive neuromuscular condition called facioscapulohumeral muscular dystrophy, or FSHD. By testing bicep and deltoid muscle biopsy samples from dozens of individuals with FSHD and almost as many unaffected relatives of those subjects, the team tracked down hundreds of genes showing expression shifts in those with FSHD. Of those, 29 genes were differentially expressed in both bicep and deltoid muscle samples, the researchers report. And, they found expression levels at 15 genes could distinguish between bicep samples from those with or without the disease around 90 percent of the time in follow-up experiments. The accuracy was closer to 80 percent when classifying deltoid tissue based on expression of these genes. Those involved in the study say such a ‘molecular signature’ of FSHD could help in understanding the disease and in testing new treatments for it.

http://www.genomeweb.com//node/1126816?hq_e=el&hq_m=1349154&hq_l=4&hq_v=09187c3305

Transcriptional profiling in facioscapulohumeral muscular dystrophy to identify candidate biomarkers

  1. Fedik Rahimova,b,1,

  2. Oliver D. Kingb,c,1,
  3. Doris G. Leungd,e,
  4. Genila M. Bibatd,
  5. Charles P. Emerson, Jrb,c,
  6. Louis M. Kunkela,b,f,2, and
  7. Kathryn R. Wagnerd,e,g,2

+Author Affiliations


  1. aProgram in Genomics, Division of Genetics, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115;

  2. bThe Senator Paul D. Wellstone Muscular Dystrophy Cooperative Research Center and

  3. cBoston Biomedical Research Institute, Watertown, MA 02472;

  4. dHugo W. Moser Research Institute at Kennedy Krieger Institute, Baltimore, MD 21205; Departments of

  5. eNeurology and

  6. gNeuroscience, The Johns Hopkins School of Medicine, Baltimore, MD 21205; and

  7. fThe Manton Center for Orphan Disease Research, Boston Children’s Hospital, Boston, MA 02115
  1. Contributed by Louis M. Kunkel, June 4, 2012 (sent for review May 24, 2012)

Abstract

Facioscapulohumeral muscular dystrophy (FSHD) is a progressive neuromuscular disorder caused by contractions of repetitive elements within the macrosatellite D4Z4 on chromosome 4q35. The pathophysiology of FSHD is unknown and, as a result, there is currently no effective treatment available for this disease. To better understand the pathophysiology of FSHD and develop mRNA-based biomarkers of affected muscles, we compared global analysis of gene expression in two distinct muscles obtained from a large number of FSHD subjects and their unaffected first-degree relatives. Gene expression in two muscle types was analyzed using GeneChip Gene 1.0 ST arrays: biceps, which typically shows an early and severe disease involvement; and deltoid, which is relatively uninvolved. For both muscle types, the expression differences were mild: using relaxed cutoffs for differential expression (fold change ≥1.2; nominal P value <0.01), we identified 191 and 110 genes differentially expressed between affected and control samples of biceps and deltoid muscle tissues, respectively, with 29 genes in common. Controlling for a false-discovery rate of <0.25 reduced the number of differentially expressed genes in biceps to 188 and in deltoid to 7. Expression levels of 15 genes altered in this study were used as a “molecular signature” in a validation study of an additional 26 subjects and predicted them as FSHD or control with 90% accuracy based on biceps and 80% accuracy based on deltoids.

Footnotes

 

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

During Investor Day, Roche Highlights Personalized Medicine as Key Area for Future Growth

September 12, 2012

As regulators and payors around the world are demanding more evidence that healthcare products improve patient outcomes and save money, Roche this week attempted to reassure investors that its strategy to develop innovative products — with a strong focus on molecularly guided personalized medicines — will place it ahead of competitors.

Through several presentations during an investor day in London, Roche officials highlighted a number of drugs for cancer, neuropsychiatric conditions, and autoimmune diseases for which the company is investigating biomarkers that can help target treatment to specific groups of patients. The company said that more than 60 percent of the compounds in its drug pipeline are currently paired with a companion diagnostic and that it has more than 200 companion diagnostic projects underway across its pharma and diagnostic business groups.

Personalized medicines are not only a major part of Roche’s plan for future growth, but they also represent a way for the company to differentiate its products from competitors. By setting its drugs apart from other me-too treatments in the marketplace, the company is hoping that its products won’t be as heavily affected by the pricing pressures currently plaguing the pharma and biotech sectors.

“Yes, regulators are very stringent. But if I look back at our most recent launches, particularly in the US, if you have true medical innovation, then regulators are very willing to bring those medicines and novel diagnostics to the market,” Roche CEO Severin Schwan said during the investor conference. He highlighted that the US Food and Drug Administration reviewed and approved the BRAF inhibitor Zelboraf for metastatic melanoma and its companion diagnostic in record time and that the recent approval of the HER2-targeted breast cancer drug Perjeta also occurred ahead of schedule (PGx Reporter 8/17/2011 and 6/13/2012).

“Likewise, if you look at the payors, there is cost pressure,” Schwan reflected, but he noted that the “innovative nature” of its portfolio helps it to “negotiate better prices with payors.”

Despite this optimistic forecast, Roche has experienced some pushback from cost-conscious national payors in Europe. For example, in June the UK’s National Institute for Health and Clinical Excellence deemed Zelboraf, which costs more than $82,000 for a seven-month treatment, too pricey. Zelboraf, which Roche launched in the US market last year and in European countries earlier this year, netted the company around $97 million in revenue for the six months ended June 30.

In an effort to battle pushback from national payors, Roche is in discussions with European governments about value-based pricing schemes for several of its products. In this regard, high priced personalized medicine drugs are well suited to these types of arrangements. David Loew, chief marketing officer at Roche, told investors that governments are increasingly developing registries to track how individual patients are doing on various treatments. This information will help governments move from a volume-based pricing model for drugs to paying for them based on the drug’s indication.

He noted that in Germany, for example, Roche has developed a payment scheme where in colorectal cancer, patients pay a certain amount for up to 10 grams of the oncologic Avastin, receive it for free for up to 12 months, and then the scheme repeats. For personalized medicines, such as Herceptin, Perjeta, T-DM1, and Zelboraf, “we will have to think about different ways of pricing those new combinations,” Loew said.

Schwan highlighted that one of the major advantages for Roche in this difficult environment is that it has both drug and diagnostic capabilities in house. This, according to Schwan, enables Roche to have significant internal capabilities in early-phase research, and makes the company attractive for partnerships, as well. Roche currently has more than 70 new molecular entities in clinical development and since 2011 there have been 25 late-stage clinical trials that have yielded positive results. The firm plans to bring three more products into late-stage clinical trials by the end of the year and would like to move 10 products into late-stage development in 2013.

On the diagnostics side, newly hired chief operating officer Roland Diggelmann said that Roche is aiming to grow its presence in the testing market by becoming “the partner of choice” for developing companion assays and collaborating internally with Roche pharma to advance personalized medicine.

“We need to make sure that science translates into great medicines by designing trials that take smart risk into account, that really focus on ensuring that the molecules are being developed in the right diseases; to make sure we have the right dose; to make sure, whenever possible, we have the … companion diagnostic strategies,” Chief Medical Officer Hal Barron said at the meeting. “This whole strategy needs to result in a higher probability of success so that the return on investment is above the cost of capital and an important driver for our business.”

While Roche plans on identifying new product opportunities through a mix of its internal capabilities and external collaborations, growth through large mergers and acquisitions – a strategy that other large pharmaceutical companies have readily utilized to expand product portfolios – doesn’t seem to be a priority at the company. Noting that there may be opportunities for smaller M&A deals, Alan Hippe, chief financial and information technology officer, noted that at Roche, “we are not big fans of big mergers and big M&A.”

Targeting Cancer

A large portion of Roche’s personalized medicine strategy will be directed toward oncology, where the company has allocated 50 percent of its research and development budget.

In June, the FDA approved Perjeta in combination with Herceptin and decetaxel chemotherapy as a treatment for metastatic breast cancer patients whose tumors overexpress the HER2 protein. The agency simultaneously also approved two companion tests that can help doctors discern best responders to the treatment (PGx Reporter 6/13/2012).

Herceptin (trastuzumab), approved in 1998, still comprises a big chunk of Roche’s therapeutics business, contributing 11 percent of the $18.2 billion the firm netted in overall drug sales in the first half of the year. Roche is hoping to preserve earnings from this blockbuster drug — often hailed as the first personalized medicine success story — by combining it with Perjeta and linking it with a derivative of the chemotherapy maytansine, DM1.

Recently, Roche announced data from a late-stage clinical trial called EMILIA that showed that advanced breast cancer patients receiving the antibody drug conjugate trastuzumab emtansine, or T-DM1, lived “significantly” longer than those treated with a combination of Genentech’s Xeloda (capecitabine) and GlaxoSmithKline’s Tykerb (lapatinib). The patients in EMILIA had to have progressed after initial treatment with Herceptin and taxane chemotherapy.

According to Loew, the company is currently conducting a study looking at T-DM1 as a potential option for first-line metastatic breast cancer patients. In addition, Roche is also studying T-DM1 as an adjuvant treatment in early-stage breast cancer patients with residual disease; comparing T-DM1 plus Perjeta against Herceptin plus Perjeta in the adjuvant early-stage breast cancer setting; and looking at T-DM1-based chemotherapy in the neoadjuvant setting.

“So if we are successfully delivering those results, I think the HER2-positive breast cancer space has been completely changed and redefined,” Loew told investors.

At the end of the year, another study, called the Protocol of Herceptin Adjuvant with Reduced Exposure, or PHARE, is slated to report results, and the outcome could have a negative impact on Herceptin sales. PHARE is comparing whether patients given Herceptin for 12 months, which is currently the standard of care in the US, fare better than those given the drug for six months.

Industry observers have projected that Perjeta and T-DM1 could be a sufficient buffer against a scenario in which six months of Herceptin is found to be non-inferior to a year of the drug.

Barron noted that Roche is readily applying the strategy behind antibody-drug conjugates such as T-DM1 – where antibodies to attach to antigens on the surface of cancer cells to localize chemotherapy delivery and reduce adverse reactions – in 25 projects across its portfolio. He added that antibody-drug conjugates offer a promising mechanism for personalizing treatments.

In non-small cell lung cancer, Roche is studying MetMab (onartuzumab) in combination with Tarceva in patients with tumors that overexpress the Met protein. Data from this Phase III trial, called METLUNG, is expected in 2014. Data from a Phase II study looking at MetMab and Tarceva as a second-line NSCLC treatment yielded negative results when all comers were considered. However, the subgroup of patients who over-expressed Met had a “doubling” of progression-free survival and a “pronounced” effect on overall survival compared to the low-Met group.

Roche is also investigating MetMab in metastatic gastric cancer (Phase III), triple-negative breast cancer, (Phase II), metastatic colorectal cancer (Phase II), glioblastoma (Phase II), as well as in combination with Avastin in various cancer indications.

Other Areas of Personalization

Outside of oncology, Roche is exploring biomarker strategies to personalize drugs for Alzheimer’s disease and schizophrenia. Phase I data from a study involving gantenerumab, a IgG1 monoclonal antibody, suggest that the drug could potentially reduce amyloid plaque in Alzheimer’s patients’ brains.

Investigational drugs targeting beta-amyloid, which many researchers believe to be involved in the pathogenesis of Alzheimer’s disease, haven’t fared well in clinical trials. Most recently, Johnson & Johnson/Pfizer’s drug bapineuzumab, which also targeted the β-amyloid protein, failed to benefit Alzheimer’s patients who were non-carriers of APOE4 gene variations.

Wall Street analysts are hoping that Roche’s biomarker-driven strategy for gantenerumab will help it avoid a similar fate. The company is currently conducting a 770-patient trial called Scarlet Road, in which researchers will measure Tau/Aβ levels in study participants’ spinal fluid to identify early onset or prodormal Alzheimer’s patients and treat them with gantenerumab. Roche is developing a companion test to gauge Tau/Aβ levels in trial participants. Results from Scarlet Road are expected in 2015.

Roche subsidiary Genentech is testing another compound, crenezumab, to see if it can prevent Alzheimer’s in a population genetically predisposed to getting the disease. Genentech, in collaboration with Banner Alzheimer’s Institute and the National Institutes of Health, is conducting a Phase II trial investigating crenezumab in the residents of Medellin, Colombia, where people share a common ancestor and have a high prevalence of mutations in the presenelin 1 gene. Those harboring the dominant gene mutation will start to lose their memory in their mid-40s and their cognitive functions will deteriorate by age 50.

The five-year study will involve approximately 300 participants, of whom approximately 100 mutation carriers will receive crenezumab and another 100 mutation carriers will receive a placebo. In a third arm, approximately 100 participants who don’t carry the mutations will receive a placebo. Study investigators will begin recruiting patients for this study next year.

In schizophrenia, Roche is exploring bitopertin, a glycine reuptake inhibitor, in six Phase III studies slated for completion next year. Three of these studies are looking at the drug’s ability to control negative symptoms in schizophrenia, while the other three trials are studying the drug’s impact on sub-optimally controlled disease symptoms. “A companion diagnostics assay is in development to validate the hypothesis for an exploratory biomarker predicting response to therapy with bitopertin,” Roche said in a statement.

For lupus, Roche is conducting a proof of concept Phase II trial involving rontalizumab, an anti-interferon-alpha antibody, in which researchers are using a biomarker to identify patients most likely to respond to the drug. Data from this trial will be presented at a medical conference later this year.

Growing Role of Diagnostics

Daniel O’Day, who served as CEO of Roche Molecular Diagnostics until last week when he was appointed chief operating officer of the company’s pharma division, valued the worldwide diagnostics market at $53 billion. “We represent 20 percent of that, or around 10 billion Swiss francs ($11 billion),” he said in his investor day presentation.

While molecular diagnostics promise to be a growing part of Roche’s business in the coming years, these products currently only represent a single-digit percent of Roche’s overall diagnostics business. For the first half of this year, molecular diagnostics comprised around 6 percent of Roche’s diagnostics sales of $5.3 billion.

Roche’s Ventana Medical Systems subsidiary will likely play a large role in advancing Roche’s presence in the companion diagnostics space. This year, Ventana announced it was developing companion tests for a number of drug makers, including Aeterna Zentaris, Syndax Pharmaceuticals, Pfizer, and Bayer (PGx Reporter 1/18/2012).

In addition to these external collaborations, Roche officials highlighted the company’s internal diagnostics capabilities as particularly advantageous for expanding its presence in the personalized medicine space. For example, Roche developed the BRAF companion test for Zelboraf. The company is also developing a companion EGFR-mutation test for its non-small cell lung cancer drug Tarceva in the first-line setting, and a test to gauge so-called “super-responders” to the investigational asthma drug lebrikizumab being developed by Genentech.

In terms of molecular diagnostics, O’Day highlighted a test that gauges the overexpression of the p16 gene in cervical Pap test samples to gauge whether women have precancerous lesions.

Additionally, the FDA this year approved the use of Ventana’s INFORM HER2 Dual ISH DNA Probe cocktail on the BenchMark ULTRA automated slide staining platform, which allows labs to analyze fluorescent in situ hybridization and immunohistochemistry samples in one assay. According to O’Day, this test has been more successful than standard FISH tests in identifying HER2 status in difficult-to-diagnose patients. The company will be publishing data on this test soon, showing that it can “identify about 4 percent more [HER2-postiive patients] than FISH alone.”

When it comes to molecular technologies, Roche, like other pharma and biotech players, appear to be sticking to tried and tested technologies, such as IHC, FISH, and PCR, and reserving whole-genome sequencing for research use. “Today, sequencing is predominantly a research tool. And it’s a very valuable research tool in the future,” O’Day said, estimating that sequencing-based tests will “go into the clinic” in the next half decade.

Turna Ray is the editor of GenomeWeb’s Pharmacogenomics Reporter. She covers pharmacogenomics, personalized medicine, and companion diagnostics. E-mail her here or follow her GenomeWeb Twitter account at @PGxReporter.

Related Stories

SOURCE

 

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

Medical Education Firm Launches Online Tool to Help Docs Guide Personalized Rx Decisions in NSCLC

September 12, 2012
Clinical Care Options, a developer of continuing education and medical decision support resources, has launched a web-based tool to help oncologists figure out which lung cancer patients may benefit from molecularly guided personalized treatments.

The online decision-support tool provides oncologists with expert recommendations on first-line and maintenance treatment options for non-small cell lung cancer patients based on their patients’ medical information and tumor features, including oncogenic markers.

Clinical Care Options developed the online tool based on the treatment choices made by five US experts who were presented 96 cases with specific variables regarding patients’ medical history, such as tumor histology, genomic mutations, age, and smoking history.

In order to use the tool, oncologists select their patients’ medical information and desires and select their treatment of choice. The tool then displays how the five experts would treat this patient. The program then surveys users about how the expert recommendations impacted their treatment decisions.

The firm presented the results of this survey in a poster at the Chicago Multidisciplinary Symposium in Thoracic Oncology this week. The tool has been used by approximately 1,000 physicians around the world, according to Jim Mortimer, senior director of oncology programs and partnership development at Clinical Care Options. Overall, approximately 23 percent of clinicians who used the tool have said it helped change their decisions, while 50 percent indicated the tool helped confirm their initial treatment strategy.

Specifically, with regard to genomically guided personalized NSCLC treatments, all five of the experts selected Pfizer’s Xalkori (crizotinib) whenever a patient case involved the ALK fusion gene. However, out of 80 cases entered by oncologists involving this marker, only around 40 percent selected Xalkori. And although in NSCLC cases with mutated EGFR the experts selected Genentech’s Tarceva (erlotinib), only 60 percent of the 100 such cases entered by clinicians into the tool chose the drug.

The data collected by Clinical Care Options suggest that its decision-support tool may be a useful resource when oncologists want to assess how their peers would prescribe a genomically targeted personalized treatment. These drugs, compared to standard treatments, are relatively new to the market and expensive. Pfizer’s Xalkori was approved by the US Food and Drug Administration last year while Genentech is in the process of getting approval for Tarceva in the US as a first-line treatment for NSCLC patients who have EGFR mutations. Last year, the European Commission approved the use of Tarceva as a first-line treatment for NSCLC in patients with EGFR mutations (PGx Reporter 9/7/2011).

Clinical Care Options said launched the online tool because it noticed that physicians often look for advice beyond broad treatment guidelines when it comes to making decisions for specific patients.

“The tool recommendations align very well with the treatment guidelines but the advantage of the tool is the granularity of the case specifics. Users of the tool can quickly enter in details of a case and see the results for what five experts would recommend,” Mortimer told PGx Reporter. “This contrasts with guidelines that apply to broad groups and provide lists of suitable treatments.”

Mortimer noted that some of the experts’ recommendations included in the tool are outside of the exact indication of a particular drug. However, because the experts’ treatment decisions were evidence based, they “did not indicate any issues with reimbursement.”

Clinical Care Options has developed a continuing medical education-certified program that includes the tool with educational grants from Genentech and Pfizer.

Read Full Post »

Head and Neck Cancer Studies Suggest Alternative Markers More Prognostically Useful than HPV DNA Testing

Reporter: Aviva Lev-Ari, PhD, RN

September 18, 2012
 

NEW YORK (GenomeWeb News) – The presence or absence of human papillomavirus DNA on its own in an individual’s head or neck cancer does not provide enough information to help predict a patient’s survival, according to a pair of new papers in the journal Cancer Research.

Two research teams — headed by investigators at Brown University and Heidelberg University, respectively — looked at the reliability of using PCR-based HPV testing to determine which head and neck squamous cell carcinomas were HPV-related and, thus, more apt to respond to treatment.

Previous studies have shown that individuals with HPV-associated head and neck cancers tend to have more favorable outcomes than individuals whose head and neck cancers that are not related to HPV infection.

“Everybody who has studied it has shown that people with virally associated disease do better,” Brown University pathology researcher Karl Kelsey, a senior author on one of the new studies, explained in a statement.

“There are now clinical trials underway to determine if they should be treated differently,” he added. “The problem is that you need to appropriately diagnose virally related disease, and our data suggests that people need to take a close look at that.”

For their part, Kelsey and his co-authors from the US and Germany assessed the utility of testing for the presence of HPV by various means in individuals with head and neck cancer. This included PCR-based tests for HPV DNA in the tumor itself, tests aimed at detecting infection-associated antibodies in an individual’s blood, and tests for elevated levels of an HPV-related tumor suppressor protein.

For 488 individuals with HNSCC, researchers did blood-based testing for antibodies targeting HPV16 in general, as well as testing for antibodies that target the viral proteins E6 and E7.

For a subset of patients, the team assessed the tumors themselves for the presence of HPV DNA and/or for elevated levels of the host tumor suppressor protein p16.

Based on patterns in the samples, the group determined that the presence of viral E6 and E7 proteins in the blood was linked to increased survival for individuals with an oropharyngeal form of HNSCC, which affects part of the throat known as the oropharynx.

A positive test for HPV DNA alone was not significantly linked to head and neck cancer outcomes. On the other hand, when found in combination with E6 and E7 expression, a positive HPV16 test did coincide with improved oropharyngeal cancer outcomes.

Likewise, elevated levels of p16 in a tumor were not especially informative on their own, though they did correspond to better oropharyngeal cancer survival when found together with positive blood tests for E6 and E7.

Based on these findings, Kelsey and his team concluded that “[a] stronger association of HPV presence with prognosis (assessed by all-cause survival) is observed when ‘HPV-associated’ HNSCC is defined using tumor status (HPV DNA or P16) and HPV E6/E7 serology in combination rather [than] using tumor HPV status alone.”

In a second study, meanwhile, a German group that focused on the oropharyngeal form of the disease found its own evidence arguing against the use of HPV DNA as a solo marker for HPV-associated head and neck cancer.

For that analysis, researchers assessed 199 fresh-frozen oropharyngeal squamous cell carcinoma samples, testing the tumors for HPV DNA and p16. They also considered the viral load in the tumors and looked for gene expression profiles resembling those described in cervical carcinoma — another cancer associated with HPV infection.

Again, the presence of HPV DNA appeared to be a poor indicator of HPV-associated cancers or predictor of cancer outcomes. Whereas nearly half of the tumors tested positive for HPV16 DNA, just 16 percent and 20 percent had high viral loads and cervical cancer-like expression profiles, respectively.

The researchers found that a subset of HPV DNA-positive tumors with high viral load or HPV-associated expression patterns belonged to individuals with better outcomes. In particular, they found that cervical cancer-like expression profiles in oropharyngeal tumors coincided with the most favorable outcomes, while high viral load in the tumors came a close second.

“We showed that high viral load and a cancer-specific pattern of viral gene expression are most suited to identify patients with HPV-driven tumors among patients with oropharyngeal cancer,” Dana Holzinger, that study’s corresponding author, said in a statement.

“Once standardized assays for these markers, applicable in routine clinical laboratories, are established, they will allow precise identification of patients with oropharyngeal cancer with or without HPV-driven cancers and, thus, will influence prognosis and potentially treatment decisions,” added Holzinger, who is affiliated with the German Cancer Research Center and Heidelberg University.

In a commentary article online today in Cancer Research, Eduardo Méndez, a head and neck surgery specialist with the University of Washington and Fred Hutchinson Cancer Research Centerdiscussed the significance of the two studies and their potential impact on oropharyngeal squamous cell carcinoma prognoses and treatment.

But he also cautioned that more research is needed to understand whether the patterns described in the new studies hold in other populations and to tease apart the prognostic importance of HPV infection in relation to additional prognostic markers.

 

 

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

Set of Papers Outline ENCODE Findings as Consortium Looks Ahead to Future Studies

NEW YORK (GenomeWeb News) – An international collaboration involving more than 400 researchers working to characterize gene regulatory networks in the human genome is publishing dozens of new studies this week.

In papers appearing in NatureScienceGenome ResearchGenome BiologyJournal of Biological Chemistry, and elsewhere, members of the Encyclopedia of DNA Elements, or ENCODE, consortium describe approaches used to define some four million regulatory regions in the genome, among other things. All told, the team explained, ENCODE efforts have made it possible assign biological functions to around 80 percent of genome sequences — filling in large gaps left by studies that focused on protein-coding sequences alone.

“We found that a much bigger part of the genome — a surprising amount, in fact — is involved in controlling when and where proteins are produced, than in simply manufacturing the building blocks,” ENCODE’s lead analysis coordinator Ewan Birney, associate director of the European Molecular Biology Laboratory European Bioinformatics Institute, said in a statement.

“This concept of ‘junk DNA,’ which has been sort of perpetuated for the past 20 years or so is really not accurate,” ENCODE researcher Rick Myers, director of the HudsonAlpha Institute for Biotechnology, said during a telephone briefing with reporters today. “Most of the genome — more than 80 percent of the base pairs in the genome — has some biological activity, some biological function.”

Researchers participating in a complementary effort within the larger ENCODE project, known as GENCODE, more completely characterize the coding portions of the genome. “As part of the ENCODE project, we both tidied up the protein-coding genes and we also found many non-coding RNA genes as well,” Birney said during today’s telebriefing.

Based on the success of ENCODE so far, the project is expected to be extended by another four years or so. The amount of new funding from the National Human Genome Research Institute for that follow-up work is expected to be as high as $123 million.

“Later this month, NHGRI will be announcing a new round of funding that will take the ENCODE project into its next phase,” NHGRI Director Eric Green said during the call.

Studies done in the decade or so since the human genome was deciphered have highlighted how little of the genome is actually comprised of gene sequences. With the realization that only around 2 percent of the genome is dedicated to protein-coding functions came a spate of speculation about the role of the other 98 percent of genome.

While this portion of the genome was suspected of harboring regulatory sequences, the extent of that regulation and its impact on coding sequences in human tissues over time was not known.

“When the Human Genome Project ended in 2003, we quickly realized that we understood the meaning of only a very small percent of the human genome’s letters,” Green explained. “We did know the genetic code for determining the order of amino acids and proteins, but we understood precious little about the signals that turned genes on or off — or that controlled the amount of proteins produced in different tissues.”

To begin studying such control networks systematically, the international ENCODE consortium kicked off the main phase of its analyses in 2007, following an earlier pilot study.

NHGRI has provided $123 million for the project over the past five years. Another $30 million went to support the development of ENCODE-related technologies since the ENCODE pilot started in 2003, while $40.6 million from NHGRI went towards the pilot itself.

During the study’s main phase, investigators from nearly three-dozen labs around the world took multi-pronged approaches to assess transcription factor binding patterns, histone modification patterns, chromatin structure signatures and other features of the genome that interact with one another to control gene expression over time and across different tissues in the body.

To accomplish the roughly 1,600 experiments done to test some 180 cell types for ENCODE, teams turned to methods such as chromatin immunoprecipitation coupled with sequencing to define the genome-wide binding patterns for more than 100 different transcription factors, for example, while other strategies were used to profile DNA methylation patterns, chromatin features, and so forth.

“It’s really a detailed hierarchy, where proteins bind and epigenetic marks — like DNA methylation and other marks — precisely cooperate and regulate how the genes are going to get turned on [or off] and the amount of this,” Myers said. “These complex networks are one of the big components of the contributions of the 30 papers that are being published today.”

For example, a University of Washington-led team reporting in Science online todaydefined millions of regulatory regions, including some that are operational during normal development, by taking advantage of an enzyme known as DNase I, which chops off DNA specifically at open chromatin sites in the genome. That group found that more than three-quarters of disease-associated variants identified in genome-wide association studies fall in parts of the genome that overlap with regulatory sites.

“We now know that the majority of these changes that are associated with common diseases and traits that don’t fall within genes actually occur within the gene-controlling switches,” University of Washington genome sciences researcher John Stamatoyannopoulos, senior author on that study, said during today’s telebriefing. “This phenomenon is not confined to a particular type of disease. It seems to be present across the board for a very wide variety of different diseases and traits.”

Results from such analyses also hint that some outwardly unrelated conditions might be traced back to similar regulatory processes. And, researchers say, by bringing together information on active regulatory regions with disease-risk variants, it may be possible to define new functionally important tissues for certain conditions.

“By creating these extensive blueprints of the control circuitry, we’re now exposing previously hidden connections between different kinds of diseases that may explain common clinical features,” Stamatoyannopoulos said.

“This has also allowed us to see that the GWAS studies that have been performed contain far more information than was previously believed,” he added, “because hundreds of additional DNA changes that were not thought to be important also appear to affect these gene-controlling switches.”

The new data are also expected to help in understanding genetic disease and interpreting information from personal genomes, according to Michael Snyder, an ENCODE investigator and director of Stanford University’s Center of Genomics and Personalized Medicine.

“We believe the ENCODE project will have a profound impact on personal genomes and, ultimately on personalized medicine,” Snyder told reporters. “We can now better see what personal variants do, in terms of causing phenotypic differences, drug responses, and disease risk.”

Many of the studies stemming from ENCODE can be viewed through a Nature,Genome Research, and Genome Biology-conceived website that links ENCODE papers that share themes or “threads” that are related to one another.

Along with the newly published papers, the ENCODE team is making data available to other members of the research community through the project’s website. Data from studies can also be accessed through an ENCODE browser housed at the University of California at Santa Cruz or via NCBI or EBI sites.

“For basic researchers, the ENCODE data represents a powerful resource for understanding fundamental questions about how life is encoded in our genome,” NHGRI’s Green said. “For more clinically-oriented researchers, the ENCODE data provide key information about which genome sequences are functionally important.”

Related Stories

  • Team IDs Characteristic Epigenetic Enhancer Patterns in Colon Cancer
    April 12, 2012 / GenomeWeb Daily News
  • NIH to Award $25M for Newborn Sequencing Studies
    August 10, 2012 / GenomeWeb Daily News
  • Illumina Q2 Revenues Down 2 Percent
    July 25, 2012 / GenomeWeb Daily News
  • Study: Exon Arrays Have Benefits over RNA-seq, but Fall Short in Finding Novel Transcription Events
    July 10, 2012 / In Sequence
  • Consortium Members Publish Collection of Studies Stemming from Human Microbiome Project
    June 13, 2012 / GenomeWeb Daily News
    Source:

    NEWS & VIEWS

    52 | NATURE | VOL 489 | 6 SEPTEMBER 2012

    FORUM: Genomics

    ENCODE explained

    The Encyclopedia of DNA Elements (ENCODE) project dishes up a hearty banquet of data that illuminate the roles of the functional elements of the human genome. Here, five scientists describe the project and discuss how the data are influencing research directions across many fields. See Articles p.57, p.75, p.83, p.91, p.101 & Letter p.109

    Serving up a genome feast

    JOSEPH R. ECKER

    Starting with a list of simple ingredients and blending them in the precise amounts needed to prepare a gourmet meal is a challenging task. In many respects, this task is analogous to the goal of the ENCODE project1, the recent progress of which is described in this issue2–7. The project aims to fully describe the list of common ingredients (functional elements) that make up the human genome (Fig. 1). When mixed in the right proportions, these ingredients constitute the information needed to build all the types of cells, body organs and, ultimately, an entire person from a single genome.

    The ENCODE pilot project8 focused on just 1% of the genome — a mere appetizer — and its results hinted that the list of human genes was incomplete. Although there was scepticism about the feasibility of scaling up the project to the entire genome and to many hundreds of cell types, recent advances in low-cost, rapid DNA-sequencing technology radically changed that view9. Now the ENCODE consortium presents a menu of 1,640 genome-wide data sets prepared from 147 cell types, providing a six-course serving of papers in Nature, along with many companion publications in other journals.

    One of the more remarkable findings described in the consortium’s ‘entrée’ paper (page 57)2 is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly ‘junk DNA’. The authors report that the space between genes is filled with enhancers (regulatory DNA elements), promoters (the sites at which DNA’s transcription into RNA is initiated) and numerous previously overlooked regions that encode RNA transcripts that are not translated into proteins but might have regulatory roles. Of note, these results show that many DNA variants previously correlated with certain diseases lie within or very near non-coding functional DNA elements, providing new leads for linking genetic variation and disease.

    The five companion articles3–7 dish up diverse sets of genome-wide data regarding the mapping of transcribed regions, DNA binding of regulatory proteins (transcription factors) and the structure and modifications of chromatin (the association of DNA and proteins that makes up chromosomes), among other delicacies.

    Djebali and colleagues3 (page 101) describe ultra-deep sequencing of RNAs prepared from many different cell lines and from specific compartments within the cells. They conclude that about 75% of the genome is transcribed at some point in some cells, and that genes are highly interlaced with overlapping transcripts that are synthesized from both DNA strands. These findings force a rethink of the definition of a gene and of the minimum unit of heredity.

    Moving on to the second and third courses, Thurman et al.4 and Neph et al.5 (pages 75 and 83) have prepared two tasty chromatin-related treats. Both studies are based on the DNase I hypersensitivity assay, which detects genomic regions at which enzyme access to, and subsequent cleavage of, DNA is unobstructed by chromatin proteins. The authors identified cell-specific patterns of DNase I hypersensitive sites that show remarkable concordance with experimentally determined and computationally predicted binding sites of transcription factors. Moreover, they have doubled the number of known recognition sequences for DNA-binding proteins in the human genome, and have revealed a 50-base-pair ‘footprint’ that is present in thousands of promoters5.

    The next course, provided by Gerstein and colleagues6 (page 91) examines the principles behind the wiring of transcription-factor networks. In addition to assigning relatively simple functions to genome elements (such as ‘protein X binds to DNA element Y’), this study attempts to clarify the hierarchies of transcription factors and how the intertwined networks arise.

    Beyond the linear organization of genes and transcripts on chromosomes lies a more complex (and still poorly understood) network of chromosome loops and twists through which promoters and more distal elements, such as enhancers, can communicate their regulatory information to each other. In the final course of the ENCODE genome feast, Sanyal and colleagues7 (page 109) map more than 1,000 of these long-range signals in each cell type. Their findings begin to overturn the long-held (and probably oversimplified) prediction that the regulation of a gene is dominated by its proximity to the closest regulatory elements.

    One of the major future challenges for ENCODE (and similarly ambitious projects) will be to capture the dynamic aspects of gene regulation. Most assays provide a single snapshot of cellular regulatory events, whereas a time series capturing how such processes change is preferable. Additionally, the examination of large batches of cells — as required for the current assays — may present too simplified a view of the underlying regulatory complexity, because individual cells in a batch (despite being genetically identical) can sometimes behave in different ways. The development of new technologies aimed at the simultaneous capture of multiple data types, along with their regulatory dynamics in single cells, would help to tackle these issues.

    A further challenge is identifying how the genomic ingredients are combined to assemble the gene networks and biochemical pathways that carry out complex functions, such as cell-to-cell communication, which enable organs and tissues to develop. An even greater challenge will be to use the rapidly growing body

    “These findings force a rethink of the definition of a gene and of the minimum unit of heredity.”ENCODEEncyclopedia of DNA Elementsnature.com/encode

    © 2012 Macmillan Publishers Limited. All rights reserved

    RESEARCH

    NEWS & VIEWS

    6 SEPTEMBER 2012 | VOL 489 | NATURE | 53

    of data from genome-sequencing projects to understand the range of human phenotypes (traits), from normal developmental processes, such as ageing, to disorders such as Alzheimer’s disease10.

    Achieving these ambitious goals may require a parallel investment of functional studies using simpler organisms — for example, of the type that might be found scampering around the floor, snatching up crumbs in the chefs’ kitchen. All in all, however, the ENCODE project has served up an all-you-can-eat feast of genomic data that we will be digesting for some time. Bon appétit!

    Joseph R. Ecker is at the Howard Hughes Medical Institute and the Salk Institute for Biological Studies, La Jolla, California 92037, USA.

    e-mail: ecker@salk.eduNucleosomeHistoneChromatinmodicationsLong-rangechromatin interactionsFunctionalgenomicelementsDNase IhypersensitivesitesDNA methylationChromosomeDNALong-rangeregulatoryelementsProtein-codingand non-codingtranscriptsPromoterarchitectureTranscriptionfactorTranscriptionmachineryTranscription-factorbinding sitesTranscribed region

    Figure 1 | Beyond the sequence. The ENCODE project2–7 provides information on the human genome far beyond that contained within the DNA sequence — it describes the functional genomic elements that orchestrate the development and function of a human. The project contains data about the degree of DNA methylation and chemical modifications to histones that can influence the rate of transcription of DNA into RNA molecules (histones are the proteins around which DNA is wound to form chromatin). ENCODE also examines long-range chromatin interactions, such as looping, that alter the relative proximities of different chromosomal regions in three dimensions and also affect transcription. Furthermore, the project describes the binding activity of transcription-factor proteins and the architecture (location and sequence) of gene-regulatory DNA elements, which include the promoter region upstream of the point at which transcription of an RNA molecule begins, and more distant (long-range) regulatory elements. Another section of the project was devoted to testing the accessibility of the genome to the DNA-cleavage protein DNase I. These accessible regions, called DNase I hypersensitive sites, are thought to indicate specific sequences at which the binding of transcription factors and transcription-machinery proteins has caused nucleosome displacement. In addition, ENCODE catalogues the sequences and quantities of RNA transcripts, from both non-coding and protein-coding regions.

    Expression control

    WENDY A. BICKMORE

    Once the human genome had been sequenced, it became apparent that an encyclopaedic knowledge of chromatin organization would be needed if we were to understand how gene expression is regulated. The ENCODE project goes a long way to achieving this goal and highlights the pivotal role of transcription factors in sculpting the chromatin landscape.

    Although some of the analyses largely confirm conclusions from previous smaller-scale studies, this treasure trove of genome-wide data provides fresh insight into regulatory pathways and identifies prodigious numbers of regulatory elements. This is particularly so for Thurman and colleagues’ data4 regarding DNase I hypersensitive sites (DHSs) and for Gerstein and colleagues’ results6 concerning DNA binding of transcription factors. DHSs are genomic regions that are accessible to enzymatic cleavage as a result of the displacement of nucleosomes (the basic units of chromatin) by DNA-binding proteins (Fig. 1). They are the hallmark of cell-type-specific enhancers, which are often located far away from promoters.

    The ENCODE papers expose the profusion of DHSs — more than 200,000 per cell type, far outstripping the number of promoters — and their variability between cell types. Through the simultaneous presence in the same cell type of a DHS and a nearby active promoter, the researchers paired half a million enhancers with their probable target genes. But this leaves

    © 2012 Macmillan Publishers Limited. All rights reserved

    RESEARCH

    NEWS & VIEWS

    more than 2 million putative enhancers without known targets, revealing the enormous expanse of the regulatory genome landscape that is yet to be explored. Chromosome-conformation-capture methods that detect long-range physical associations between distant DNA regions are attempting to bridge this gap. Indeed, Sanyal and colleagues7 applied these techniques to survey such associations across 1% of the genome.

    The ENCODE data start to paint a picture of the logic and architecture of transcriptional networks, in which DNA binding of a few high-affinity transcription factors displaces nucleosomes and creates a DHS, which in turn facilitates the binding of further, lower-affinity factors. The results also support the idea that transcription-factor binding can block DNA methylation (a chemical modification of DNA that affects gene expression), rather than the other way around — which is highly relevant to the interpretation of disease-associated sites of altered DNA methylation11.

    The exquisite cell-type specificity of regulatory elements revealed by the ENCODE studies emphasizes the importance of having appropriate biological material on which to test hypotheses. The researchers have focused their efforts on a set of well-established cell lines, with selected assays extended to some freshly isolated cells. Challenges for the future include following the dynamic changes in the regulatory landscape during specific developmental pathways, and understanding chromatin structure in tissues containing heterogeneous cell populations.

    Wendy A. Bickmore is in the Medical Research Council Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK.

    e-mail: wendy.bickmore@igmm.ed.ac.uk 

    “The results imply that sequencing studies focusing on protein-coding sequences risk missing crucial parts of the genome.”

    11 Years Ago

    The draft human genome

    OUR GENOME UNVEILED

    Unless the human genome contains a lot of genes that are opaque to our computers, it is clear that we do not gain our undoubted complexity over worms and plants by using many more genes. Understanding what does give us our complexity — our enormous behavioural repertoire, ability to produce conscious action, remarkable physical coordination (shared with other vertebrates), precisely tuned alterations in response to external variations of the environment, learning, memory … need I go on? — remains a challenge for the future.

    David Baltimore

    From Nature 15 February 2001

    GENOME SPEAK

    With the draft in hand, researchers have a new tool for studying the regulatory regions and networks of genes. Comparisons with other genomes should reveal common regulatory elements, and the environments of genes shared with other species may offer insight into function and regulation beyond the level of individual genes. The draft is also a starting point for studies of the three-dimensional packing of the genome into a cell’s nucleus. Such packing is likely to influence gene regulation … The human genome lies before us, ready for interpretation.

    Peer Bork and Richard Copley

    From Nature 15 February 2001

    Non-codingbut functional

    INÊS BARROSO

    The vast majority of the human genome does not code for proteins and, until now, did not seem to contain defined gene-regulatory elements. Why evolution would maintain large amounts of ‘useless’ DNA had remained a mystery, and seemed wasteful. It turns out, however, that there are good reasons to keep this DNA. Results from the ENCODE project2–8 show that most of these stretches of DNA harbour regions that bind proteins and RNA molecules, bringing these into positions from which they cooperate with each other to regulate the function and level of expression of protein-coding genes. In addition, it seems that widespread transcription from non-coding DNA potentially acts as a reservoir for the creation of new functional molecules, such as regulatory RNAs.

    What are the implications of these results for genetic studies of complex human traits and disease? Genome-wide association studies (GWAS), which link variations in DNA sequence with specific traits and diseases, have in recent years become the workhorse of the field, and have identified thousands of DNA variants associated with hundreds of complex traits (such as height) and diseases (such as diabetes). But association is not causality, and identifying those variants that are causally linked to a given disease or trait, and understanding how they exert such influence, has been difficult. Furthermore, most of these associated variants lie in non-coding regions, so their functional effects have remained undefined.

    The ENCODE project provides a detailed map of additional functional non-coding units in the human genome, including some that have cell-type-specific activity. In fact, the catalogue contains many more functional non-coding regions than genes. These data show that results of GWAS are typically enriched for variants that lie within such non-coding functional units, sometimes in a cell-type-specific manner that is consistent with certain traits, suggesting that many of these regions could be causally linked to disease. Thus, the project demonstrates that non-coding regions must be considered when interpreting GWAS results, and it provides a strong motivation for reinterpreting previous GWAS findings. Furthermore, these results imply that sequencing studies focusing on protein-coding sequences (the ‘exome’) risk missing crucial parts of the genome and the ability to identify true causal variants.

    However, although the ENCODE catalogues represent a remarkable tour de force, they contain only an initial exploration of the depths of our genome, because many more cell types must yet be investigated. Some of the remaining challenges for scientists searching for causal disease variants lie in: accessing data derived from cell types and tissues relevant to the disease under study; understanding how these functional units affect genes that may be distantly located7; and the ability to generalize such results to the entire organism.

    Inês Barroso is at the Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK, and at the University of Cambridge Metabolic Research Laboratories and NIHR Cambridge Biomedical Research Centre, Cambridge, UK.e-mail: ib1@sanger.ac.uk5 4 | N AT U R E | VO L 4 8 9 | 6 S E P T E M B E R 2 0 1 2

    © 2012 Macmillan Publishers Limited. All rights reserved

    Evolution and the code

    JONATHAN K. PRITCHARD & YOAV GILAD

    One of the great challenges in evolutionary biology is to understand how differences in DNA sequence between species determine differences in their phenotypes. Evolutionary change may occur both through changes in protein-coding sequences and through sequence changes that alter gene regulation.

    There is growing recognition of the importance of this regulatory evolution, on the basis of numerous specific examples as well as on theoretical grounds. It has been argued that potentially adaptive changes to protein-coding sequences may often be prevented by natural selection because, even if they are beneficial in one cell type or tissue, they may be detrimental elsewhere in the organism. By contrast, because gene-regulatory sequences are frequently associated with temporally and spatially specific gene-expression patterns, changes in these regions may modify the function of only certain cell types at specific times, making it more likely that they will confer an evolutionary advantage12.

    However, until now there has been little information about which genomic regions have regulatory activity. The ENCODE project has provided a first draft of a ‘parts list’ of these regulatory elements, in a wide range of cell types, and moves us considerably closer to one of the key goals of genomics: understanding the functional roles (if any) of every position in the human genome.

    Nonetheless, it will take a great deal of work to identify the critical sequence changes in the newly identified regulatory elements that drive functional differences between humans and other species. There are some precedents for identifying key regulatory differences (see, for example, ref. 13), but ENCODE’s improved identification of regulatory elements should greatly accelerate progress in this area. The data may also allow researchers to begin to identify sequence alterations occurring simultaneously in multiple genomic regions, which, when added together, drive phenotypic change — a process called polygenic adaptation14.

    However, despite the progress brought by the ENCODE consortium and other research groups, it remains difficult to discern with confidence which variants in putative regulatory regions will drive functional changes, and what these changes will be. We also still have an incomplete understanding of how regulatory sequences are linked to target genes. Furthermore, the ENCODE project focused mainly on the control of transcription, but many aspects of post-transcriptional regulation, which may also drive evolutionary changes, are yet to be fully explored.

    Nonetheless, these are exciting times for studies of the evolution of gene regulation. With such new resources in hand, we can expect to see many more descriptions of adaptive regulatory evolution, and how this has contributed to human evolution.

    Jonathan K. Pritchard and Yoav Gilad are in the Department of Human Genetics, University of Chicago, Chicago 60637 Illinois, USA. J.K.P. is also at the Howard Hughes Medical Institute, University of Chicago.

    e-mails: pritch@uchicago.edu; gilad@uchicago.edu 

    From catalogue to function

    ERAN SEGAL

    Projects that produce unprecedented amounts of data, such as the human genome project15 or the ENCODE project, present new computational and data-analysis challenges and have been a major force driving the development of computational methods in genomics. The human genome project produced one bit of information per DNA base pair, and led to advances in algorithms for sequence matching and alignment. By contrast, in its 1,640 genome-wide data sets, ENCODE provides a profile of the accessibility, methylation, transcriptional status, chromatin structure and bound molecules for every base pair. Processing the project’s raw data to obtain this functional information has been an immense effort.

    For each of the molecular-profiling methods used, the ENCODE researchers devised novel processing algorithms designed to remove outliers and protocol-specific biases, and to ensure the reliability of the derived functional information. These processing pipelines and quality-control measures have been adapted by the research community as the standard for the analysis of such data. The high quality of the functional information they produce is evident from the exquisite detail and accuracy achieved, such as the ability to observe the crystallographic topography of protein–DNA interfaces in DNase I footprints5, and the observation of more than one-million-fold variation in dynamic range in the concentrations of different RNA transcripts3.

    But beyond these individual methods for data processing, the profound biological insights of ENCODE undoubtedly come from computational approaches that integrated multiple data types. For example, by combining data on DNA methylation, DNA accessibility and transcription-factor expression. Thurman et al.4 provide fascinating insight into the causal role of DNA methylation in gene silencing. They find that transcription-factor binding sites are, on average, less frequently methylated in cell types that express those transcription factors, suggesting that binding-site methylation often results from a passive mechanism that methylates sites not bound by transcription factors.

    Despite the extensive functional information provided by ENCODE, we are still far from the ultimate goal of understanding the function of the genome in every cell of every person, and across time within the same person. Even if the throughput rate of the ENCODE profiling methods increases dramatically, it is clear that brute-force measurement of this vast space is not feasible. Rather, we must move on from descriptive and correlative computational analyses, and work towards deriving quantitative models that integrate the relevant protein, RNA and chromatin components. We must then describe how these components interact with each other, how they bind the genome and how these binding events regulate transcription.

    If successful, such models will be able to predict the genome’s function at times and in settings that have not been directly measured. By allowing us to determine which assumptions regarding the physical interactions of the system lead to models that better explain measured patterns, the ENCODE data provide an invaluable opportunity to address this next immense computational challenge. ■

    Eran Segal is in the Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.

    e-mail: eran.segal@weizmann.ac.il

    1. The ENCODE Project Consortium Science 306, 636–640 (2004).

    2. The ENCODE Project Consortium Nature 489, 57–74 (2012).

    3. Djebali, S. et al. Nature 489, 101–108 (2012).

    4. Thurman, R. E. et al. Nature 489, 75–82 (2012).

    5. Neph, S. et al. Nature 489, 83–90 (2012).

    6. Gerstein, M. B. et al. Nature 489, 91–100 (2012).

    7. Sanyal, A., Lajoie, B., Jain, G. & Dekker, J. Nature 489, 109–113 (2012).

    8. Birney, E. et al. Nature 447, 799–816 (2007).

    9. Mardis, E. R. Nature 470, 198–203 (2011).

    10. Gonzaga-Jauregui, C., Lupski, J. R. & Gibbs, R. A. Annu. Rev. Med. 63, 35–61 (2012).

    11. Sproul, D. et al. Proc. Natl Acad. Sci. USA 108, 4364–4369 (2011).

    12. Carroll, S. B. Cell 134, 25–36 (2008).

    13. Prabhakar, S. et al. Science 321, 1346–1350 (2008).

    14. Pritchard, J. K., Pickrell, J. K. & Coop, G. Curr. Biol. 20, R208–R215 (2010).

    15. Lander, E. S. et al. Nature 409, 860–921 (2001).

    “The high quality of the functional information produced is evident from the exquisite detail and accuracy achieved.” 

    6 S E P T E M B E R 2 0 1 2 | VO L 4 8 9 | N AT U R E | 5 5 NEWS & VIEWS RESEARCH © 2012 Macmillan Publishers Limited. All rights reserved

    http://www.sciencemag.org SCIENCE VOL 337 7 SEPTEMBER 2012 1159

    NEWS&ANALYSIS

    When researchers fi rst sequenced the human

    genome, they were astonished by how few

    traditional genes encoding proteins were

    scattered along those 3 billion DNA bases.

    Instead of the expected 100,000 or more

    genes, the initial analyses found about 35,000

    and that number has since been whittled down

    to about 21,000. In between were megabases

    of “junk,” or so it seemed.

    This week, 30 research papers, including

    six in Nature and additional papers published

    by Science, sound the death knell for

    the idea that our DNA is mostly littered with

    useless bases. A decadelong project, the

    Encyclopedia of DNA Elements (ENCODE),

    has found that 80% of the human genome

    serves some purpose, biochemically speaking.

    “I don’t think anyone would have anticipated

    even close to the amount of sequence

    that ENCODE has uncovered that looks like

    it has functional importance,” says John A.

    Stamatoyannopoulos, an ENCODE re searcher

    at the University of Washington, Seattle.

    Beyond defi ning proteins, the DNA bases

    highlighted by ENCODE specify landing

    spots for proteins that infl uence gene activity,

    strands of RNA with myriad roles, or

    simply places where chemical modifi cations

    serve to silence stretches of our chromosomes.

    These results are going “to change

    the way a lot of [genomics] concepts are

    written about and presented in textbooks,”

    Stamatoyannopoulos predicts.

    The insights provided by ENCODE into

    how our DNA works are already clarifying

    genetic risk factors for a variety of diseases

    and offering a better understanding of gene

    regulation and function. “It’s a treasure trove

    of information,” says Manolis Kellis, a computational

    biologist at Massachusetts Institute

    of Technology (MIT) in Cambridge who analyzed

    data from the project.

    The ENCODE effort has revealed that

    a gene’s regulation is far more complex

    than previously thought, being infl uenced

    by multiple stretches of regulatory DNA

    located both near and far from the gene

    itself and by strands of RNA not translated

    into proteins, so-called noncoding RNA.

    “What we found is how beautifully complex

    the biology really is,” says Jason Lieb,

    an ENCODE researcher at the University of

    North Carolina, Chapel Hill.

    Throughout the 1990s, various researchers

    called the idea of junk DNA into question.

    With the human genome in hand, the

    National Human Genome Research Institute

    (NHGRI) in Bethesda, Maryland, decided it

    wanted to fi nd out once and for all how much

    of the genome was a wasteland with no functional

    purpose. In 2003, it funded a pilot

    ENCODE, in which 35 research teams analyzed

    44 regions of the genome—30 million

    bases in all, about 1% of the total genome. In

    2007, the pilot project’s results revealed that

    much of this DNA sequence was active in

    some way. The work called into serious question

    our gene-centric view of the genome,

    fi nding extensive RNA-generating activity

    beyond traditional gene boundaries (Science,

    15 June 2007, p. 1556). But the question

    remained whether the rest of the genome was

    like this 1%. “We want to know what all the

    bases are doing,” says Yale University bioinformatician

    Mark Gerstein.

    Teams at 32 institutions worldwide have

    now carried out scores of tests, generating

    1640 data sets. While the pilot phase tests

    depended on computer chip–like devices

    called microarrays to analyze DNA samples,

    the expanded phase benefi ted from the arrival

    of new sequencing technology, which made it

    cost-effective to directly read the DNA bases.

    Taken together, the tests present “a greater

    idea of what the landscape of the genome

    looks like,” says NHGRI’s Elise Feingold.

    Because the parts of the genome used

    could differ among various kinds of cells,

    ENCODE needed to look at DNA function

    in multiple types of cells and tissues. At

    fi rst the goal was to study intensively three

    types of cells. They included GM12878, the

    immature white blood cell line used in the

    1000 Genomes Project, a large-scale effort to

    catalog genetic variation across humans; a leukemia

    cell line called K562; and an approved

    human embryonic stem cell line, H1-hESC.

    As ENCODE was ramping up, new

    sequencing technology brought the cost of

    sequencing down enough to make it feasible

    to test extensively even more cell types.

    ENCODE added a liver cancer cell line,

    HepG2; the laboratory workhorse cancer cell

    line, HeLa S3; and human umbilical cord tissue

    to the mix. Another 140 cell types were

    studied to a much lesser degree.

    In these cells, ENCODE researchers

    closely examined which DNA bases are transcribed

    into RNA and then whether those

    strands of RNA are subsequently translated

    into proteins, verifying predicted proteincoding

    genes and more precisely locating

    each gene’s beginning, end, and coding

    regions. The latest protein-coding gene count

    is 20,687, with hints of about 50 more, the

    consortium reports in Nature. Those genes

    account for about 3% of the human genome,

    less if one counts only their coding regions.

    Another 11,224 DNA stretches are classifi ed

    as pseudogenes, “dead” genes now known to

    be active in some cell types or individuals.

    ENCODE Project Writes Eulogy

    For Junk DNA

    GENOMICS

    Hypersensitive

    sites

    CH3CO

    CH3

    Long-range regulatory elements

    (enhancers, repressors/

    silencers, insulators)

    cis-regulatory elements

    (promoters, transcription

    factor binding sites)

    Gene Transcript

    RNA

    polymerase

    CH3CO (Epigenetic modifications)

    ChIP-seq

    Computational

    predictions and

    RT-PCR

    RNA-seq

    DNase-seq

    FAIRE-seq

    5C

    Zooming in. A diagram of DNA in ever-greater detail shows how ENCODE’s various tests (gray boxes) translate

    DNA’s features into functional elements along a chromosome.

    CREDIT: ADAPTED FROM THE ENCODE PROJECT CONSORTIUM, PLOS BIOLOGY 9, 4 (APRIL 2011)

    Published by AAAS

    Downloaded from http://www.sciencemag.org on September 10, 2012

    http://www.sciencemag.org SCIENCE VOL 337 7 SEPTEMBER 2012 1161

    NEWS&ANALYSIS

    ENCODE drives home, however, that

    there are many “genes” out there in which

    DNA codes for RNA, not a protein, as the end

    product. The big surprise of the pilot project

    was that 93% of the bases studied were transcribed

    into RNA; in the full genome, 76%

    is transcribed. ENCODE defi ned 8800 small

    RNA molecules and 9600 long noncoding

    RNA molecules, each of which is at least 200

    bases long. Thomas Gingeras of Cold Spring

    Harbor Laboratory in New York has found

    that various ones home in on different cell

    compartments, as if they have fi xed addresses

    where they operate. Some go to the nucleus,

    some to the nucleolus, and some to the cytoplasm,

    for example. “So there’s quite a lot

    of sophistication in how RNA works,” says

    Ewan Birney of the European Bioinformatics

    Institute in Hinxton, U.K., one of the key leaders

    of ENCODE (see p. 1162).

    As a result of ENCODE, Gingeras and

    others argue that the fundamental unit of

    the genome and the basic unit of heredity

    should be the transcript—the piece of

    RNA decoded from DNA—and not the

    gene. “The project has played an important

    role in changing our concept of the gene,”

    Stamatoyannopoulos says.

    Another way to test for functionality of

    DNA is to evaluate whether specific base

    sequences are conserved between species, or

    among individuals in a species. Previous studies

    have shown that 5% of the human genome

    is conserved across mammals, even though

    ENCODE studies implied that much more

    of the genome is functional. So MIT’s Lucas

    Ward and Kellis compared functional regions

    newly identifi ed by ENCODE among multiple

    humans, sampling from the

    1000 Genomes Project. Some

    DNA sequences not conserved

    between humans and other

    mammals were nonetheless

    very much preserved across

    multiple people, indicating

    that an additional 4% of the

    genome is newly under selection

    in the human lineage, they

    report in a paper published

    online by Science (http://scim.

    ag/WardKellis). Two such regions were near

    genes for nerve growth and the development

    of cone cells in the eye, which underlie distinguishing

    traits in humans. On the fl ip side,

    they also found that some supposedly conserved

    regions of the human genome, as highlighted

    by the comparison with 29 mammals,

    actually varied among humans, suggesting

    these regions were no longer functional.

    Beyond transcription, DNA’s bases function

    in gene regulation through their interactions

    with transcription factors and other

    proteins. ENCODE carried out several tests

    to map where those proteins bind along the

    genome (Science, 25 May 2007, p. 1120). Two,

    DNase-seq and FAIRE-seq, gave an overview

    of the genome, identifying where the protein-

    DNA complex chromatin unwinds and a protein

    can hook up with the DNA, and were

    applied to multiple cell types. ENCODE’s

    DNase-seq found 2.89 million such sites

    in 125 cell types. Stamatoyannopoulos and

    his colleagues describe their more extensive

    DNase-seq studies in Science (p. 1190): His

    team examined 349 types of cells, including

    233 60- to 160-day-old fetal tissue samples.

    Each type of cell had about 200,000 accessible

    locations, and there seemed to be at least

    3.9 million regions where transcription factors

    can bind in the genome. Across all cell

    types, about 42% of the genome can be accessible,

    he and his colleagues report. In many

    cases, the assays were able to pinpoint the specifi

    c bases involved in binding.

    Last year, Stamatoyannopoulos showed

    that these newly discovered functional regions

    sometimes overlap with specifi c DNA bases

    linked to higher or lower risks of various diseases,

    suggesting that the regulation of genes

    might be at the heart of these risk variations

    (Science, 27 May 2011, p. 1031). The work

    demonstrated how researchers could use

    ENCODE data to come up with new hypotheses

    about the link between genetics and a

    particular disorder. (The ENCODE analysis

    found that 12% of these bases, or SNPs,

    colocate with transcription factor binding

    sites and 34% are in open chromatin defi ned

    by the DNase-seq tests.) Now, in their new

    work published in Science,

    Stamatoyannopoulos’s lab has

    linked those regulatory regions

    to their specifi c target genes,

    homing in on the risk-enhancing

    ones. In addition, the group

    fi nds it can predict the cell type

    involved in a given disease.

    For example, the analysis fi ngered

    two types of T cells as

    pathogenic in Crohn’s disease,

    both of which are involved in

    this inflammatory bowel disorder. “We are

    informing disease studies in a way that would

    be very hard to do otherwise,” Birney says.

    Another test, called ChIP-seq, uses an

    antibody to home in on a particular DNAbinding

    protein and helps pinpoint the locations

    along the genome where that protein

    works. To date, ENCODE has examined

    about 100 of the 1500 or so transcription

    factors and about 20 other DNA binding

    proteins, including those involved in modifying

    the chromatin-associated proteins

    called histones. The binding sites found

    through ChIP-seq coincided with the sites

    mapped through FAIRE-seq and DNAseseq.

    Overall, 8% of the genome falls within

    a transcription factor binding site, a percentage

    that is expected to double once more

    transcription factors have been tested.

    Yale’s Gerstein used these results to fi gure

    out all the interactions among the transcription

    factors studied and came up with a network

    view of how these regulatory proteins

    work. These transcription factors formed a

    three-layer hierarchy, with the ones at the top

    having the broadest effects and the ones in

    the middle working together to coregulate a

    common target gene, he and his colleagues

    report in Nature.

    Using a technique called 5C, other

    researchers looked for places where DNA

    from distant regions of a chromosome, or

    even different chromosomes, interacted. It

    found that an average of 3.9 distal stretches

    of DNA linked up with the beginning of each

    gene. “Regulation is a 3D puzzle that has to

    be put together,” Gingeras says. “That’s what

    ENCODE is putting out on the table.”

    To date, NHGRI has put $288 million

    toward ENCODE, including the pilot project,

    technology development, and ENCODE

    efforts for the mouse, nematode, and fruit fl y.

    All together, more than 400 papers have been

    published by ENCODE researchers. Another

    110 or more studies have used ENCODE data,

    says NHGRI molecular biologist Michael

    Pazin. Molecular biologist Mathieu Lupien of

    the University of Toronto in Canada authored

    one of those papers, a study looking at epigenetics

    and cancer. “ENCODE data were

    fundamental” to the work, he says. “The cost

    is defi nitely worth every single dollar.”

    –ELIZABETH PENNISI

    ENCODE By the Numbers

    147 cell types studied

    80% functional portion of human genome

    20,687 protein-coding genes

    18,400 RNA genes

    1640 data sets

    30 papers published this week

    442 researchers

    $288 million funding for pilot,

    technology, model organism, and current project

    “ We are informing

    disease studies in a

    way that would be

    very hard to do

    otherwise.”

    —EWAN BIRNEY,

    EUROPEAN BIOINFORMATICS

    INSTITUTE

    Published by AAAS

    Downloaded from http://www.sciencemag.org on September 10, 2012

    http://www.nature.com/encode/

Read Full Post »

Comprehensive Genomic Characterization of Squamous Cell Lung Cancers

Reporter: Aviva Lev-Ari, PhD, RN

Nature (2012) doi:10.1038/nature11404

Received 09 March 2012 
Accepted 09 July 2012 
Published online 09 September 2012
Correspondence to: 

The primary and processed data used to generate the analyses presented here can be downloaded by registered users fromThe Cancer Genome Atlas (https://tcga-data.nci.nih.gov/tcga/tcgaDownload.jsp,https://cghub.ucsc.edu/ and https://tcga-data.nci.nih.gov/docs/publications/lusc_2012/).

Lung squamous cell carcinoma is a common type of lung cancer, causing approximately 400,000 deaths per year worldwide. Genomic alterations in squamous cell lung cancers have not been comprehensively characterized, and no molecularly targeted agents have been specifically developed for its treatment. As part of The Cancer Genome Atlas, here we profile 178 lung squamous cell carcinomas to provide a comprehensive landscape of genomic and epigenomic alterations. We show that the tumour type is characterized by complex genomic alterations, with a mean of 360 exonic mutations, 165 genomic rearrangements, and 323 segments of copy number alteration per tumour. We find statistically recurrent mutations in 11 genes, including mutation of TP53 in nearly all specimens. Previously unreported loss-of-function mutations are seen in the HLA-A class I major histocompatibility gene. Significantly altered pathways included NFE2L2 andKEAP1 in 34%, squamous differentiation genes in 44%, phosphatidylinositol-3-OH kinase pathway genes in 47%, and CDKN2A and RB1 in 72% of tumours. We identified a potential therapeutic target in most tumours, offering new avenues of investigation for the treatment of squamous cell lung cancers.

Read Full Post »

« Newer Posts - Older Posts »