Gene Expression: Algorithms for Protein Dynamics

October 26, 2013 by 2012pharmaceutical

Gene Expression: Algorithms for Protein Dynamics

Reporter: Aviva Lev-Ari, PhD, RN

Stanford-developed algorithm reveals complex protein dynamics behind gene expression

BY KRISTA CONGER

Michael Snyder

In yet another coup for a research concept known as “big data,” researchers at the Stanford University School of Medicine have developed a computerized algorithm to understand the complex and rapid choreography of hundreds of proteins that interact in mindboggling combinations to govern how genes are flipped on and off within a cell.

To do so, they coupled findings from 238 DNA-protein-binding experiments performed by the ENCODE project — a massive, multiyear international effort to identify the functional elements of the human genome — with a laboratory-based technique to identify binding patterns among the proteins themselves.

The analysis is sensitive enough to have identified many previously unsuspected, multipartner trysts. It can also be performed quickly and repeatedly to track how a cell responds to environmental changes or crucial developmental signals.

“At a very basic level, we are learning who likes to work with whom to regulate around 20,000 human genes,” said Michael Snyder, PhD, professor and chair of genetics at Stanford. “If you had to look through all possible interactions pair-wise, it would be ridiculously impossible. Here we can look at thousands of combinations in an unbiased manner and pull out important and powerful information. It gives us an unprecedented level of understanding.”

Snyder is the senior author of a paper describing the research published Oct. 24 in Cell. The lead authors are postdoctoral scholars Dan Xie, PhD, Alan Boyle, PhD, and Linfeng Wu, PhD.

Related News

Proteins control gene expression by either binding to specific regions of DNA, or by interacting with other DNA-bound proteins to modulate their function. Previously, researchers could only analyze two to three proteins and DNA sequences at a time, and were unable to see the true complexities of the interactions among proteins and DNA that occur in living cells.

The challenge resembled trying to figure out interactions in a crowded mosh pit by studying a few waltzing couples in an otherwise empty ballroom, and it has severely limited what could be learned about the dynamics of gene expression.

The ENCODE, for the Encyclopedia of DNA Elements, project was a five-year collaboration of more than 440 scientists in 32 labs around the world to reveal the complex interplay among regulatory regions, proteins and RNA molecules that governs when and how genes are expressed. The project has been generating a treasure trove of data for researchers to analyze for the last eight years.

In this study, the researchers combined data from genomics (a field devoted to the study of genes) and proteomics (which focuses on proteins and their interactions). They studied 128 proteins, called trans-acting factors, which are known to regulate gene expression by binding to regulatory regions within the genome. Some of the regions control the expression of nearby genes; others affect the expression of genes great distances away.

The researchers used 238 data sets generated by the ENCODE project to study the specific DNA sequences bound by each of the 128 trans-acting factors. But these factors aren’t monogamous; they bind many different sequences in a variety of protein-DNA combinations. Xie, Boyle and Snyder designed a machine-learning algorithm to analyze all the data and identify which trans-acting factors tend to be seen together and which DNA sequences they prefer.

Wu then performed immunoprecipitation experiments, which use antibodies to identify protein interactions in the cell nucleus. In this way, they were able to tell which proteins interacted directly with one another, and which were seen together because their preferred DNA binding sites were adjoining.

“Before our work, only the combination of two or three regulatory proteins were studied, which oversimplified how gene regulators collaborate to find their targets,” Xie said. “With our method we are able to study the combination of more than 100 regulators and see a much more complex structure of collaboration. For example, it had been believed that a key regulator of cell proliferation called FOS typically only works with JUN protein family members. We show, in addition to JUN, FOS has different partners under different circumstances. In fact, we found almost all the canonical combinations of two or three trans-acting factors have many more partners than we previously thought.”

To broaden their analysis, the researchers included data from other sources that explored protein-binding patterns in five cell types. They found that patterns of co-localization among proteins, in which several proteins are found clustered closely on the DNA to govern gene expression, vary according to cell type and the conditions under which the cells are grown. They also found that many of these clusters can be explained through interactions among proteins, and that not every protein bound to DNA directly.

“We’d like to understand how these interactions work together to make different cell types and how they gain their unique identities in development,” Snyder said. “Furthermore, diseased cells will have a very different type of wiring diagram. We hope to understand how these cells go astray.”

Other Stanford co-authors include life science research assistant Jie Zhai and life science research associate Trupti Kawli, PhD.

The research was supported by the National Human Genome Research Institute (grants U54HG004558 and U54HG006996).

Information about Stanford’s Department of Genetics, which also supported the work, is available at http://genetics.stanford.edu.

PRINT MEDIA CONTACT: Krista Conger | Tel (650) 725-5371; kristac@stanford.edu

BROADCAST MEDIA CONTACT: M.A. Malone | Tel (650) 723-6912; mamalone@stanford.edu

Stanford Medicine integrates research, medical education and patient care at its three institutions – Stanford University School of Medicine, Stanford Hospital & Clinics and Lucile Packard Children’s Hospital. For more information, please visit the Office of Communication & Public Affairs site at

http://mednews.stanford.edu/.http://med.stanford.edu/ism/2013/october/snyder.html?goback=%2Egde_5180384_member_5799368448383397888#sthash%2EhU03LKIX%2Edpuf

Dynamic trans-Acting Factor Colocalization in Human Cells

PDF 4.53 MB

Export Citation
Permissions

Cell, Volume 155, Issue 3, 713-724, 24 October 2013
Copyright © 2013 Elsevier Inc. All rights reserved.
10.1016/j.cell.2013.09.043

Authors

Dan Xie,Alan P. Boyle,Linfeng Wu,Jie Zhai,Trupti Kawli,Michael Snyder send email

See Affiliations

Highlights
Colocalization patterns of 128 TFs in human cells
An application of SOMs to study high-dimensional TF colocalization patterns
Colocalization patterns are dynamic through stimulation and across cell types
Many TF colocalizations can be explained by protein-protein interaction

Summary

Different trans-acting factors (TFs) collaborate and act in concert at distinct loci to perform accurate regulation of their target genes. To date, the cobinding of TF pairs has been investigated in a limited context both in terms of the number of factors within a cell type and across cell types and the extent of combinatorial colocalizations. Here, we use an approach to analyze TF colocalization within a cell type and across multiple cell lines at an unprecedented level. We extend this approach with large-scale mass spectrometry analysis of immunoprecipitations of 50 TFs. Our combined approach reveals large numbers of interesting TF-TF associations. We observe extensive change in TF colocalizations both within a cell type exposed to different conditions and across multiple cell types. We show distinct functional annotations and properties of different TF cobinding patterns and provide insights into the complex regulatory landscape of the cell.

http://www.cell.com/abstract/S0092-8674%2813%2901217-8#!

Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes

Authors

Department of Genetics, Stanford University School of Medicine, Stanford University, Stanford, CA 94305, USA Division of Systems Medicine and Division of Immunology and Allergy, Department of Pediatrics, Stanford University, Stanford, CA 94305, USA Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University, Stanford, CA 94305, USA Division of Hematology, Department of Medicine, Stanford University, Stanford, CA 94305, USA

Personalized medicine aims to assess medical risks, monitor, diagnose and treat patients according to their specific genetic composition and molecular phenotype. The advent of genome sequencing and the analysis of physiological states has proven to be powerful (Cancer Genome Atlas Research Network, 2011). However, its implementation for the analysis of otherwise healthy individuals for estimation of disease risk and medical interpretation is less clear. Much of the genome is difficult to interpret and many complex diseases, such as diabetes, neurological disorders and cancer, likely involve a large number of different genes and biological pathways (Ashley et al., 2010,Grayson et al., 2011,Li et al., 2011), as well as environmental contributors that can be difficult to assess. As such, the combination of genomic information along with a detailed molecular analysis of samples will be important for predicting, diagnosing and treating diseases as well as for understanding the onset, progression, and prevalence of disease states (Snyder et al., 2009).

Presently, healthy and diseased states are typically followed using a limited number of assays that analyze a small number of markers of distinct types. With the advancement of many new technologies, it is now possible to analyze upward of 10⁵ molecular constituents. For example, DNA microarrays have allowed the subcategorization of lymphomas and gliomas (Mischel et al., 2003), and RNA sequencing (RNA-Seq) has identified breast cancer transcript isoforms (Li et al., 2011,van der Werf et al., 2007,Wu et al., 2010,Lapuk et al., 2010). Although transcriptome and RNA splicing profiling are powerful and convenient, they provide a partial portrait of an organism’s physiological state. Transcriptomic data, when combined with genomic, proteomic, and metabolomic data are expected to provide a much deeper understanding of normal and diseased states (Snyder et al., 2010). To date, comprehensive integrative omics profiles have been limited and have not been applied to the analysis of generally healthy individuals.

To obtain a better understanding of: (1) how to generate an integrative personal omics profile (iPOP) and examine as many biological components as possible, (2) how these components change during healthy and diseased states, and (3) how this information can be combined with genomic information to estimate disease risk and gain new insights into diseased states, we performed extensive omics profiling of blood components from a generally healthy individual over a 14 month period (24 months total when including time points with other molecular analyses). We determined the whole-genome sequence (WGS) of the subject, and together with transcriptomic, proteomic, metabolomic, and autoantibody profiles, used this information to generate an iPOP. We analyzed the iPOP of the individual over the course of healthy states and two viral infections (Figure 1A). Our results indicate that disease risk can be estimated by a whole-genome sequence and by regularly monitoring health states with iPOP disease onset may also be observed. The wealth of information provided by detailed longitudinal iPOP revealed unexpected molecular complexity, which exhibited dynamic changes during healthy and diseased states, and provided insight into multiple biological processes. Detailed omics profiling coupled with genome sequencing can provide molecular and physiological information of medical significance. This approach can be generalized for personalized health monitoring and medicine.

(9) >

Figure 1. Summary of Study

http://www.cell.com/abstract/S0092-8674%2812%2900166-3#Introduction

Posted in Biological Networks, Gene Regulation and Evolution, Computational Biology/Systems and Bioinformatics, Genome Biology, Genomic Testing: Methodology for Diagnosis, Personalized and Precision Medicine & Genomic Research, Population Health Management, Genetics & Pharmaceutical, Scientist: Career considerations, Statistical Methods for Research Evaluation | Tagged Alan Boyle, DNA, ENCODE, gene expression, National Human Genome Research Institute, Nucleic acid sequence, Stanford University School of Medicine | Leave a Comment

Comments RSS

Leaders in Pharmaceutical Business Intelligence Group, LLC, Doing Business As LPBI Group, Newton, MA