Reporter and Curator: Dr. Sudipta Saha, Ph.D.
Negative selection was examined using two measures that highlight different periods of selection in the human genome. The first measure, inter-species, pan-mammalian constraint (GERP-based scores; 24 mammals) addresses selection during mammalian evolution. The second measure is intra-species constraint estimated from the numbers of variants discovered in human populations using data from the 1000 Genomes project and covers selection over human evolution.
For DNaseI elements and bound motifs most sets of elements show enrichment in pan mammalian constraint and decreased human population diversity, though for some cell types the DNaseI sites do not appear overall to be subject to pan-mammalian constraint. Bound TF motifs have a natural control from the set of TF motif with equal sequence potential for binding but without binding evidence from ChIP-seq experiments; in all cases, the bound motifs showed both more mammalian constraint and higher suppression of human diversity.
Consistent with previous findings, genome-wide evidence was not observed for pan-mammalian selection of novel RNA sequences. There are also a large number of elements without mammalian constraint, between 17-90% for TF-binding regions as well as DHSs and FAIRE regions. Previous studies could not determine whether these sequences are either biochemically active, but with little overall impact on the organism, or are under lineage specific selection. By isolating sequences preferentially inserted into the primate lineage, which is only feasible given the genome-wide scale of this data, this issue was specifically examined. The majority of primate-specific sequence is due to retrotransposon activity, but an appreciable proportion is non-repetitive primate-specific sequence. Of 104,343,413 primate-specific bases (excluding repetitive elements), 67,769,372 (65%) are found within ENCODE-identified elements. Examination of 227,688 variants segregating in these primate specific regions revealed that all classes of elements (RNA and regulatory) show depressed derived allele frequencies, consistent with recent negative selection occurring in at least some of these regions. This suggests that an appreciable proportion of the unconstrained elements are lineage specific elements required for organismal function, consistent with long standing views of recent evolution, and the remainder are likely to be “neutral” elements which are not currently under selection, but may still affect cellular or larger scale phenotypes without an effect on fitness.
The binding patterns of TFs are not uniform, and can be correlated both inter-and intra-species measures of negative selection with the overall information content of motif positions. The selection on some motif positions is as high as protein coding exons. These aggregate measures across motifs show that the binding preferences found in the population of sites are also relevant to the per-site behavior. By developing a per-site metric of population effect on bound motifs, it was found that highly constrained bound instances across mammals are able to buffer the impact of individual variation.
It was proposed to express the deleterious effect of TFBS mutations in terms of mutational load, a known population genetics metric that combines the frequency of mutation with predicted phenotypic consequences that it causes. This metric was adapted to use the reduction in PWM score associated with a mutation as a crude but computable measure of such phenotypic consequences. It was not assumed that TFBS load at a given site reduces an individual’s biological fitness. Rather, it was argued that binding sites that tolerate a higher load are less functionally constrained. This approach, although undoubtedly a crude one, makes it possible to consistently estimate TFBS constraints for different TFs and even different organisms and ask why TFBS mutations are tolerated differently in different contexts.
It was first asked whether motif load would be able to detect the expected link between evolutionary and individual variation. A published metric was used, Branch Length Score (BLS), to characterise the evolutionary conservation of a motif instance. This metric utilises both a PWM based model of the conservation of bases and allows for motif movement. Reassuringly, mutational load correlated with BLS in both species, with evolutionary non-conserved motifs (BLS=0) showing by far the highest degree of variation in the population. At the same time, ∼40% of human and fly TFBSs with an appreciable load (L>5e-3) still mapped to reasonably conserved sites (BLS>0.2, ∼50% percentile in both organisms), demonstrating that score-reducing mutations at evolutionary preserved sequences can be tolerated in these populations.
Using this metric, the original findings were confirmed, suggesting that TFBSs with higher PWM scores are generally more functionally constrained compared to ‘weaker’ sites. The fraction of detected sites mapping to bound regions remained similar across the whole analysed score range, suggesting that this relationship is unlikely to be an artefact of higher false-positive rates at ‘weaker’ sites. This global observation, however, does not rule out the possibility that a weaker match at some sites is specifically preserved to ensure dose-specific TF binding. This may be the case, for example, for Drosophila Bric-à-brac motifs, which exhibited no correlation between motif load and PWM score, consistent with the known dosage-dependent function of Bric-à-brac in embryo patterning.
Motif load was used to address whether TFBSs proximal to transcription start sites (TSS) are more constrained compared to more distant regulatory regions. This was found to be the case in the human, but not in Drosophila. CTCF binding sites in both species were a notable exception, tolerating the lowest mutational load at locations 500bp-1kb from TSS, but not closer to the TSS, suggesting that the putative role of CTCF in establishing chromatin domains is particularly important in proximity of gene promoters.
To gain further insight into the functional effects of TFBS mutations, a dataset was used that mapped human CTCF binding sites across four individuals. TFBS mutations detected in this dataset often did not result in a significant loss of binding, with ∼75% mutated sites retaining at least two thirds of the binding signal. This was particularly prominent at conserved sites (BLS>0.5), 90% of which showed this ‘buffering’ effect. To address whether buffering could be explained solely by the flexibility of CTCF sequence preferences, it was analysed between-allele differences in the PWM score at polymorphic binding sites. As expected, globally CTCF binding signal correlated with the PWM score of the underlying motifs. Consistent with this, alleles with minor differences in PWM match generally had little effect on the binding signal compared to sites with larger PWM score changes, suggesting that the PWM model adequately describes the functional constraints of CTCF binding sites. At the same time, it was found that CTCF binding signals could be maintained even in those cases, where mutations resulted in significant changes of PWM score, particularly at evolutionary conserved sites. A linear interaction model confirmed that the effect of motif mutations on CTCF binding was significantly reduced with increasing conservation. These effects were not due to the presence of additional CTCF motifs (as 96% of bound regions only contained a single motif), while differences between more and less conserved sites could not be explained away by differences in the PWM scores of their major alleles. A CTCF dataset from three additional individuals generated by a different laboratory yielded consistent conclusions, suggesting that our observations were not due to over-fitting.
Taken together, CTCF binding data for multiple individuals show that mutations can be buffered to maintain the levels of binding signal, particularly at highly conserved sites, and this effect cannot be explained solely by the flexibility of CTCF’s sequence consensus. It was asked whether mechanisms potentially accountable for such buffering would also affect the relationship between sequence and binding in the absence of mutations. Training an interaction linear model across the whole set of mapped CTCF binding sites revealed that conservation consistently weakens the relationship between PWM score and the binding intensity. Thus, CTCF binding to evolutionary conserved sites may generally have a reduced dependence on sequence.
Source References:
http://www.nature.com/encode/threads/impact-of-evolutionary-selection-on-functional-regions
Nice job.
Just finished reading the post.
Dr. Saha, Do you see a connection between this field of genomics and development of applications to pharmaceutics and then implementation in Medicine? To answer this I believe one need to review Resources in use for genomics research and apply the visionary future envisioning potential. Let’s try to do that for Reproductive Medicine.
I thank you for embarking of the new path of Genomics and Reproduction.
[…] Impact of evolutionary selection on functional regions: The imprint of evolutionary selection on ENC… […]
[…] Impact of evolutionary selection on functional regions: The imprint of evolutionary selection on ENC… […]
[…] Impact of evolutionary selection on functional regions: The imprint of evolutionary selection on ENC… […]
PUT IT IN CONTEXT OF CANCER CELL MOVEMENT
The contraction of skeletal muscle is triggered by nerve impulses, which stimulate the release of Ca2+ from the sarcoplasmic reticuluma specialized network of internal membranes, similar to the endoplasmic reticulum, that stores high concentrations of Ca2+ ions. The release of Ca2+ from the sarcoplasmic reticulum increases the concentration of Ca2+ in the cytosol from approximately 10-7 to 10-5 M. The increased Ca2+ concentration signals muscle contraction via the action of two accessory proteins bound to the actin filaments: tropomyosin and troponin (Figure 11.25). Tropomyosin is a fibrous protein that binds lengthwise along the groove of actin filaments. In striated muscle, each tropomyosin molecule is bound to troponin, which is a complex of three polypeptides: troponin C (Ca2+-binding), troponin I (inhibitory), and troponin T (tropomyosin-binding). When the concentration of Ca2+ is low, the complex of the troponins with tropomyosin blocks the interaction of actin and myosin, so the muscle does not contract. At high concentrations, Ca2+ binding to troponin C shifts the position of the complex, relieving this inhibition and allowing contraction to proceed.
Figure 11.25
Association of tropomyosin and troponins with actin filaments. (A) Tropomyosin binds lengthwise along actin filaments and, in striated muscle, is associated with a complex of three troponins: troponin I (TnI), troponin C (TnC), and troponin T (TnT). In (more ) Contractile Assemblies of Actin and Myosin in Nonmuscle Cells
Contractile assemblies of actin and myosin, resembling small-scale versions of muscle fibers, are present also in nonmuscle cells. As in muscle, the actin filaments in these contractile assemblies are interdigitated with bipolar filaments of myosin II, consisting of 15 to 20 myosin II molecules, which produce contraction by sliding the actin filaments relative to one another (Figure 11.26). The actin filaments in contractile bundles in nonmuscle cells are also associated with tropomyosin, which facilitates their interaction with myosin II, probably by competing with filamin for binding sites on actin.
Figure 11.26
Contractile assemblies in nonmuscle cells. Bipolar filaments of myosin II produce contraction by sliding actin filaments in opposite directions. Two examples of contractile assemblies in nonmuscle cells, stress fibers and adhesion belts, were discussed earlier with respect to attachment of the actin cytoskeleton to regions of cell-substrate and cell-cell contacts (see Figures 11.13 and 11.14). The contraction of stress fibers produces tension across the cell, allowing the cell to pull on a substrate (e.g., the extracellular matrix) to which it is anchored. The contraction of adhesion belts alters the shape of epithelial cell sheets: a process that is particularly important during embryonic development, when sheets of epithelial cells fold into structures such as tubes.
The most dramatic example of actin-myosin contraction in nonmuscle cells, however, is provided by cytokinesisthe division of a cell into two following mitosis (Figure 11.27). Toward the end of mitosis in animal cells, a contractile ring consisting of actin filaments and myosin II assembles just underneath the plasma membrane. Its contraction pulls the plasma membrane progressively inward, constricting the center of the cell and pinching it in two. Interestingly, the thickness of the contractile ring remains constant as it contracts, implying that actin filaments disassemble as contraction proceeds. The ring then disperses completely following cell division.
Figure 11.27
Cytokinesis. Following completion of mitosis (nuclear division), a contractile ring consisting of actin filaments and myosin II divides the cell in two.
http://www.ncbi.nlm.nih.gov/books/NBK9961/
This is good. I don’t recall seeing it in the original comment. I am very aware of the actin myosin troponin connection in heart and in skeletal muscle, and I did know about the nonmuscle work. I won’t deal with it now, and I have been working with Aviral now online for 2 hours.
I have had a considerable background from way back in atomic orbital theory, physical chemistry, organic chemistry, and the equilibrium necessary for cations and anions. Despite the calcium role in contraction, I would not discount hypomagnesemia in having a disease role because of the intracellular-extracellular connection. The description you pasted reminds me also of a lecture given a few years ago by the Nobel Laureate that year on the mechanism of cell division.