Evolution of the Human Cell Genome Biology Field of Gene Expression, Gene Regulation, Gene Regulatory Networks and Application of Machine Learning Algorithms in Large-Scale Biological Data Analysis

Evolution of the Human Cell Genome Biology Field of Gene Expression, Gene Regulation, Gene Regulatory Networks and Application of Machine Learning Algorithms in Large-Scale Biological Data Analysis

Curator & Reporter: Aviva Lev-Ari, PhD, RN



The Scientific Frontier is presented in Deciphering eukaryotic gene-regulatory logic with 100 million random promoters

Boer, C.G., Vaishnav, E.D., Sadeh, R. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promotersNat Biotechnol (2019) doi:10.1038/s41587-019-0315-8


How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation.

The Evolution of the Human Cell Genome Biology Field of Gene Expression, Gene Regulation, Gene Regulatory Networks and Application of Machine Learning Algorithms in Large-Scale Biological Data Analysis is presented in the following Table


50 Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034 e1026 (2019).
5 Muerdter, F. et al. Resolving systematic errors in widely used enhancer activity assays in human cells. Nat. Methods 15, 141–149 (2018).
6 Wang, X. et al. High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human. Nat. Commun. 9, 5380 (2018).
15 Yona, A. H., Alm, E. J. & Gore, J. Random sequences rapidly evolve into de novo promoters. Nat. Commun. 9, 1530 (2018).
4 van Arensbergen, J. et al. Genome-wide mapping of autonomous promoter activity in human cells. Nat. Biotechnol. 35, 145–153 (2017).
14 Cuperus, J. T. et al. Deep learning of the regulatory grammar of yeast 5’ untranslated regions from 500,000 random sequences. Genome Res. 27, 2015–2024 (2017).
31 Levo, M. et al. Systematic investigation of transcription factor activity in the context of chromatin using massively parallel binding and expression assays. Mol. Cell 65, 604–617 e606 (2017).
49 Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
54 de Boer, C. High-efficiency S. cerevisiae lithium acetate transformation. protocols.io https://doi.org/10.17504/protocols.io.j4tcqwn (2017).
59 Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous systems. arXiv 1603.04467 (2016).
20 Shalem, O. et al. Systematic dissection of the sequence determinants of gene 3’ end mediated expression control. PLoS Genet. 11, e1005147 (2015).
55 Deng, C., Daley, T. & Smith, A. D. Applications of species accumulation curves in large-scale biological data analysis. Quant. Biol. 3, 135–144 (2015).
9 Hughes, T. R. & de Boer, C. G. Mapping yeast transcriptional networks. Genetics 195, 9–36 (2013).
10 Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).
19 Kosuri, S. et al. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl Acad. Sci. USA 110, 14024–14029 (2013).
7 Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).
18 de Boer, C. G. & Hughes, T. R. YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities. Nucleic Acids Res. 40, D169–D179 (2012).
56 Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
61 Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).
11 Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat. Biotechnol. 29, 659–664 (2011).
26 Zhang, Z. et al. A packing mechanism for nucleosome organization reconstituted across a eukaryotic genome. Science 332, 977–980 (2011).
30 Ganapathi, M. et al. Extensive role of the general regulatory factors, Abf1 and Rap1, in determining genome-wide chromatin structure in budding yeast. Nucleic Acids Res. 39, 2032–2044 (2011).
52 Erb, I. & van Nimwegen, E. Transcription factor binding site positioning in yeast: proximal promoter motifs characterize TATA-less promoters. PloS One 6, e24279 (2011).
3 Kinney, J. B., Murugan, A., Callan, C. G. Jr. & Cox, E. C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl Acad. Sci. USA107, 9158–9163 (2010).
8 Gertz, J., Siggia, E. D. & Cohen, B. A. Analysis of combinatorial cis-regulation in synthetic and genomic promoters. Nature 457, 215–218 (2009).
16 Wunderlich, Z. & Mirny, L. A. Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 25, 434–440 (2009).
27 Hesselberth, J. R. et al. Global mapping of protein–DNA interactions in vivo by digital genomic footprinting. Nat. Methods 6, 283–289 (2009).
29 Hartley, P. D. & Madhani, H. D. Mechanisms that specify promoter nucleosome location and identity. Cell 137, 445–458 (2009).
51 Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).
58 Segal, E. & Widom, J. From DNA sequence to transcriptional behaviour: a quantitative approach. Nat. Rev. Genet. 10, 443–456 (2009).
2 Yuan, Y., Guo, L., Shen, L. & Liu, J. S. Predicting gene expression from sequence: a reexamination. PLoS Comput. Biol. 3, e243 (2007).
46 Hibbs, M. A. et al. Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics 23, 2692–2699 (2007).
25 Liu, X., Lee, C. K., Granek, J. A., Clarke, N. D. & Lieb, J. D. Whole-genome comparison of Leu3 binding in vitro and in vivo reveals the importance of nucleosome occupancy in target site selection. Genome Res. 16, 1517–1528 (2006).
34 Roberts, G. G. & Hudson, A. P. Transcriptome profiling of Saccharomyces cerevisiae during a transition from fermentative to glycerol-based respiratory growth reveals extensive metabolic and structural remodeling. Mol. Genet. Genomics 276, 170–186 (2006).
48 Tanay, A. Extensive low-affinity transcriptional interactions in the yeast genome. Gen. Res. 16, 962–972 (2006).
53 Tong, A. H. & Boone, C. Synthetic genetic array analysis in Saccharomyces cerevisiae. Methods Mol. Biol. 313, 171–192 (2006).
57 Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
62 Chua, G. et al. Identifying transcription factor functions and targets by phenotypic activation. Proc. Natl Acad. Sci. USA 103, 12045–12050 (2006).
17 Arnosti, D. N. & Kulkarni, M. M. Transcriptional enhancers: intelligent enhanceosomes or flexible billboards? J. Cell. Biochem. 94, 890–898 (2005).
21 Granek, J. A. & Clarke, N. D. Explicit equilibrium modeling of transcription-factor binding and gene regulation. Genome Biol. 6, R87 (2005).
1 Beer, M. A. & Tavazoie, S. Predicting gene expression from sequence. Cell 117, 185–198 (2004).
28 Bernstein, B. E., Liu, C. L., Humphrey, E. L., Perlstein, E. O. & Schreiber, S. L. Global nucleosome occupancy in yeast. Genome Biol. 5, R62 (2004).
44 Kim, T. S., Kim, H. Y., Yoon, J. H. & Kang, H. S. Recruitment of the Swi/Snf complex by Ste12-Tec1 promotes Flo8-Mss11-mediated activation of STA1 expression. Mol. Cell. Biol. 24, 9542–9556 (2004).
45 Harbison, C. T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004).
60 Kent, N. A., Eibert, S. M. & Mellor, J. Cbf1p is required for chromatin remodeling at promoter-proximal CACGTG motifs in yeast. J. Biol. Chem. 279, 27116–27123 (2004).
22 Kulkarni, M. M. & Arnosti, D. N. Information display by transcriptional enhancers. Development 130, 6569–6575 (2003).
24 Conlon, E. M., Liu, X. S., Lieb, J. D. & Liu, J. S. Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Natl Acad. Sci. USA 100, 3339–3344 (2003).
43 Neely, K. E., Hassan, A. H., Brown, C. E., Howe, L. & Workman, J. L. Transcription activator interactions with multiple SWI/SNF subunits. Mol. Cell. Biol. 22, 1615–1625 (2002).
23 Bussemaker, H. J., Li, H. & Siggia, E. D. Regulatory element detection using correlation with expression. Nat. Genet. 27, 167–171 (2001).
37 Haurie, V. et al. The transcriptional activator Cat8p provides a major contribution to the reprogramming of carbon metabolism during the diauxic shift in Saccharomyces cerevisiae. J. Biol. Chem. 276, 76–85 (2001).
39 Grauslund, M. & Ronnow, B. Carbon source-dependent transcriptional regulation of the mitochondrial glycerol-3-phosphate dehydrogenase gene, GUT2, from Saccharomyces cerevisiae. Can. J. Microbiol. 46, 1096–1100 (2000).
42 Cullen, P. J. & Sprague, G. F. Jr. Glucose depletion causes haploid invasive growth in yeast. Proc. Natl Acad. Sci. USA 97, 13619–13624 (2000).
38 Sato, T. et al. TheE-box DNA binding protein Sgc1p suppresses the gcr2 mutation, which is involved in transcriptional activation of glycolytic genes in Saccharomyces cerevisiae. FEBS Lett. 463, 307–311 (1999).
40 Madhani, H. D. & Fink, G. R. Combinatorial control required for the specificity of yeast MAPK signaling. Science 275, 1314–1317 (1997).
41 Gavrias, V., Andrianopoulos, A., Gimeno, C. J. & Timberlake, W. E. Saccharomyces cerevisiae TEC1 is required for pseudohyphal growth. Mol. Microbiol. 19, 1255–1263 (1996).
36 Hedges, D., Proft, M. & Entian, K. D. CAT8, a new zinc cluster-encoding gene necessary for derepression of gluconeogenic enzymes in the yeast Saccharomyces cerevisiae. Mol. Cell. Biol. 15, 1915–1922 (1995).
47 Bednar, J. et al. Determination of DNA persistence length by cryo-electron microscopy. Separation of the static and dynamic contributions to the apparent persistence length of DNA. J. Mol. Biol. 254, 579–594 (1995).
32 Axelrod, J. D., Reagan, M. S. & Majors, J. GAL4 disrupts a repressing nucleosome during activation of GAL1 transcription in vivo. Genes Dev. 7, 857–869 (1993).
33 Morse, R. H. Nucleosome disruption by transcription factor binding in yeast. Science 262, 1563–1566 (1993).
12 Oliphant, A. R., Brandl, C. J. & Struhl, K. Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides: analysis of yeast GCN4 protein. Mol. Cell. Biol. 9, 2944–2949 (1989).
35 Forsburg, S. L. & Guarente, L. Identification and characterization of HAP4: a third component of the CCAAT-bound HAP2/HAP3 heteromer. Genes Dev. 3, 1166–1178 (1989).
13 Horwitz, M. S. & Loeb, L. A. Promoters selected from random DNA sequences. Proc. Natl Acad. Sci. USA 83, 7405–7409 (1986).


To access each reference as a live link, go to the number in the first column in the Table and look it up in the List of References in the Link, below


Author information

C.G.D. and A.R. drafted the manuscript, with all authors contributing. C.G.D. analyzed the data. C.G.D., E.D.V., E.L.A. and R.S. performed the experiments. A.R. and N.F. supervised the research.

Correspondence to Carl G. de Boer or Aviv Regev.

Ethics declarations

Competing interests

A.R. is an SAB member of Thermo Fisher Scientific, Neogene Therapeutics, Asimov, and Syros Pharmaceuticals, an equity holder of Immunitas, and a founder of and equity holder in Celsius Therapeutics. All other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Cite this article

Boer, C.G., Vaishnav, E.D., Sadeh, R. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat Biotechnol (2019) doi:10.1038/s41587-019-0315-8

Download citation

Comments RSS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: