Synopsis Days 1,2,3: 2018 Annual World Medical Innovation Forum Artificial Intelligence April 23–25, 2018 Boston, Massachusetts | Westin Copley Place
Curator: Aviva Lev-Ari, PhD, RN

Posted in Artificial Intelligence - Breakthroughs in Theories and Technologies, Big Data, BioIT: BioInformatics, Biomedical Measurement Science, Cardiomyopathy, Clinical & Translational, Computational Biology/Systems and Bioinformatics, congestive heart failure, Curation, Diagnostics and Lab Tests, Disease Biology, FDA Regulatory Affairs, FDA, CE Mark & Global Regulatory Affairs: process management and strategic planning - GCP, GLP, ISO 14155, Frontiers in Cardiology and Cardiovascular Disorders, Heart Failure (HF), tagged early diagnosis, hs-troponins, increase in measured value, revised upper limit on June 18, 2016| Leave a Comment »
Curators: Larry H. Bernstein, MD, FCAP and Aviva Lev-Ari, PhD, RN
UPDATED on 5/14/2021
Original Investigations
Ishani Ganguli, Jinghan Cui, Nitya Thakore, John Orav, James L. Januzzi, Christopher W. Baugh, Thomas D. Sequist, and
J Am Coll Cardiol. May 03, 2021. Epublished DOI: 10.1016/j.jacc.2021.04.049
Editorial Comment: Downstream consequences of implementing high-sensitivity cardiac troponin: why indication and education matter
Abstract
Background
Chest pain patients are often evaluated for acute myocardial infarction through troponin testing, which may prompt downstream services (cascades) of uncertain value.
Objective
Determine the association of high-sensitivity cardiac troponin (hs-cTn) assay implementation with cascade events.
Methods
Using electronic health record and billing data, we examined patient-visits to five emergency departments, April 1, 2017 – April 1, 2019. Difference-in-differences analysis compared patient-visits for chest pain (n=7,564) to patient-visits for other symptoms (n=100,415) (irrespective of troponin testing) before and after hs-cTn assay implementation. Outcomes included presence of any cascade event potentially associated with an initial hs-cTn test (primary), individual cascade events, length of stay, and spending on cardiac services.
Results
Following hs-cTn implementation, patients with chest pain had a 2.8% (95%CI 0.72, 4.9) net increase in experiencing any cascade event. They were more likely to have multiple troponin tests (10.5%, 95%CI 9.0, 12.0) and electrocardiograms (7.1 per 100 patient-visits, 95%CI 1.8, 12.4). However, they received net fewer computed tomography scans (-1.5 per 100 patient-visits, 95%CI -1.8, -1.1), stress tests (-5.9 per 100 patient-visits, 95%CI -6.5, -5.3), and cardiac catheterizations (-0.65 per 100 patient-visits, 95%CI -1.01, -0.30) and were less likely to receive cardiac medications, undergo cardiology evaluation (-3.5%, 95%CI -4.5, 2.6), or be hospitalized (-5.8%, 95%CI -7.7, -3.8). Chest pain patients had lower net mean length of stay (-0.24 days, 95%CI -0.32, -0.16) but no net change in spending.
Conclusions
Hs-cTn assay implementation was associated with more net upfront tests yet fewer net stress tests, catheterizations, cardiology evaluations, and hospital admissions in chest pain patients relative to patients with other symptoms.
Keywords
SOURCE
https://www.jacc.org/doi/10.1016/j.jacc.2021.04.049
UPDATED on 3/18/2020
Interference in Troponin Assays: What’s Going On?
— Heterophile antibodies, biotin, and more with Robert Christenson, PhD
https://www.medpagetoday.com/blogs/ap-cardiology/85409
UPDATED on 5/1/2019
Background: We assessed whether plasma troponin I measured by a high-sensitivity assay (hs-TnI) is associated with incident cardiovascular disease (CVD) and mortality in a community-based sample without prior CVD.
Methods: ARIC study (Atherosclerosis Risk in Communities) participants aged 54 to 74 years without baseline CVD were included in this study (n=8121). Cox proportional hazards models were constructed to determine associations between hs-TnI and incident coronary heart disease (CHD; myocardial infarction and fatal CHD), ischemic stroke, atherosclerotic CVD (CHD and stroke), heart failure hospitalization, global CVD (atherosclerotic CVD and heart failure), and all-cause mortality. The comparative association of hs-TnI and high-sensitivity troponin T with incident CVD events was also evaluated. Risk prediction models were constructed to assess prediction improvement when hs-TnI was added to traditional risk factors used in the Pooled Cohort Equation.
Results: The median follow-up period was ≈15 years. Detectable hs-TnI levels were observed in 85% of the study population. In adjusted models, in comparison to low hs-TnI (lowest quintile, hs-TnI ≤1.3 ng/L), elevated hs-TnI (highest quintile, hs-TnI ≥3.8 ng/L) was associated with greater incident CHD (hazard ratio [HR], 2.20; 95% CI, 1.64-2.95), ischemic stroke (HR, 2.99; 95% CI, 2.01-4.46), atherosclerotic CVD (HR, 2.36; 95% CI, 1.86-3.00), heart failure hospitalization (HR, 4.20; 95% CI, 3.28-5.37), global CVD (HR, 3.01; 95% CI, 2.50-3.63), and all-cause mortality (HR, 1.83; 95% CI, 1.56-2.14). hs-TnI was observed to have a stronger association with incident global CVD events in white than in black individuals and a stronger association with incident CHD in women than in men. hs-TnI and high-sensitivity troponin T were only modestly correlated (r=0.47) and were complementary in prediction of incident CVD events, with elevation of both troponins conferring the highest risk in comparison with elevation in either one alone. The addition of hsTnI to the Pooled Cohort Equation model improved risk prediction for atherosclerotic CVD, heart failure, and global CVD.
Conclusions: Elevated hs-TnI is strongly associated with increased global CVD incidence in the general population independent of traditional risk factors. hs-TnI and high-sensitivity troponin T provide complementary rather than redundant information.
UPDATED on 8/14/2018
The new troponin I assays can detect lower levels of troponin compared to conventional testing
July 25, 2018 — The U.S. Food and Drug Administration (FDA) cleared Siemens Healthineers high-sensitivity troponin I assays (TnIH) for the Atellica IM and ADVIA Centaur XP/XPT in vitro diagnostic analyzers from Siemens Healthineers to aid in the early diagnosis of myocardial infarctions.
The new tests can shorten the time doctors need to diagnose a life-threatening heart attacks. The time to first results is 10 minutes. When a patient experiencing chest pain enters the emergency department, a physician orders a blood test to determine whether troponin is present. As blood flow to the heart is blocked, the heart muscle begins to die in as few as 30 to 60 minutes and releases troponin into the bloodstream.
The company said its high-sensitivity performance of the two new Siemens TnIH assays offers the ability to detect lower levels of troponin at significantly improved precision at the 99th percentile, and detect smaller changes in a patient’s troponin level as repeat testing occurs. This design affords clinicians greater confidence in the results with precision that provides the ability to measure slight, yet critical, changes to begin treatment.[1,2]
Chest pain is the cause of more than 8 million visits annually nationwide to emergency departments, but only 5.5 percent of those visits lead to serious diagnoses such as heart attacks.[3] Armed with data to properly triage patients sooner or to exclude myocardial infarctions, the Siemens Healthineers TnIH assays can help support testing initiatives tied to improving patient experience.
“Our emergency department is overcrowded with patients. If we can do a more efficient job at triaging patients to receive the proper level of care and to discharge the patients who do not need to stay in the emergency department, this will have a tremendous economic advantage for our healthcare system,” said Alan Wu, M.D., chief of clinical chemistry and toxicology at Zuckerberg San Francisco General Hospital and Trauma Center.
Siemens is launching the product at the 70th AACC Annual Scientific Meeting and Clinical Lab Expo taking place July 31 to Aug. 2 in Chicago.
For more information: http://www.siemens-healthineers.com
Watch the related VIDEO: Use of High Sensitivity Troponin Testing in the Emergency Department — Interview with James Januzzi, M.D., Massachusetts General Hospital
SOURCE
References:
Increases in levels of cardiac troponin T by high-sensitivity assay (hs-cTnT) over time are associated with later risk of death, coronary heart disease (CHD), and especially heart failure in apparently healthy middle-aged people, according to a report published June 8, 2016 in JAMA Cardiology[1].
The novel findings, based on a cohort of >8000 participants from the Atherosclerosis Risk in Communities (ARIC) study followed up to 16 years, are the first to show “an association between temporal hs-cTnT change and incident CHD events” in asymptomatic middle-aged adults,” write the authors, led by Dr John W McEvoy (Johns Hopkins University School of Medicine, Baltimore, MD).
Individuals with the greatest troponin increases over time had the highest risk for poor cardiac outcomes. The strongest association was for risk of heart failure, which reached almost 800% for those with the sharpest hs-cTnT rises.
Intriguingly, those in whom troponin levels fell at least 50% had a reduced mortality risk and may have had a slightly decreased risk of later HF or CHD.
“Serial testing over time with high-sensitivity cardiac troponins provided additional prognostic information over and above the usual clinical risk factors, [natriuretic peptide] levels, and a single troponin measurement. Two measurements appear better than one when it comes to informing risk for future coronary heart disease, heart failure, and death,” McEvoy told heartwire from Medscape.
He cautioned, though, that the conclusion is based on observational data and would need to be confirmed in clinical trials. Moreover, high-sensitivity cardiac troponin assays are widely used in Europe but are not approved in the US.
An important next step after this study, according to an accompanying editorial from Dr James Januzzi (Massachusetts General Hospital, Boston, MA), would be to evaluate whether the combination of hs-troponin and natriuretic peptides improves predictive value in this population[2].
“To the extent prevention is ultimately the holy grail for defeating the global pandemic of CHD, stroke, and HF, the main reason to do a biomarker study such as this would be to set the stage for a biomarker-guided strategy to improve the medical care for those patients at highest risk, as has been recently done with [natriuretic peptides],” he wrote.
The ARIC prospective cohort study entered and followed 8838 participants (mean age 56, 59% female, 21.4% black) in North Carolina, Mississippi, Minneapolis, and Maryland from January 1990 to December 2011. At baseline, participants had no clinical signs of CHD or heart failure.
Levels of hs-cTnT, obtained 6 years apart, were categorized as undetectable (<0.005 ng/mL), detectable (≥0.005 ng/mL to <0.014 ng/mL), and elevated (>0.014 ng/mL).
Troponin increases from <0.005 ng/mL to 0.005 ng/mL or higher independently predicted development of CHD (HR 1.41; 95% CI 1.16–1.63), HF (HR 1.96; 95% CI 1.62–2.37), and death (HR 1.50; 95% CI 1.31–1.72), compared with undetectable levels at both measurements.
Hazard ratios were adjusted for age, sex, race, body-mass index, C-reactive protein, smoking status, alcohol-intake history, systolic blood pressure, current antihypertensive therapy, diabetes, serum lipid and cholesterol levels, lipid-modifying therapy, estimated glomerular filtration rate, and left ventricular hypertrophy.
Subjects with >50% increase in hs-cTnT had a significantly increased risk of CHD (HR 1.28; 95% CI 1.09–1.52), HF (HR 1.60; 95% CI 1.35–1.91), and death (HR 1.39; 95% CI 1.22–1.59).
Risks for those end points fell somewhat for those with a >50% decrease in hs-cTnT (CHD: HR 0.47; 95% CI 0.22–1.03; HF: HR 0.49 95% CI 0.23–1.01; death: HR 0.57 95% CI 0.33–0.99).
Among participants with an adjudicated HF hospitalization, the group writes, associations of hs-cTnT changes with outcomes were of similar magnitude for those with HF with preserved ejection fraction (HFpEF) and HF with reduced ejection fraction (HFrEF).
Few biomarkers have been linked to increased risk for HFpEF, and few effective therapies exist for it. That may be due to problems identifying and enrolling patients with HFpEF in clinical trials, Dr McEvoy pointed out.
“We think the increased troponin over time reflects progressive myocardial injury or progressive myocardial damage,” Dr McEvoy said. “This is a window into future risk, particularly with respect to heart failure but other outcomes as well. It may suggest high-sensitivity troponins as a marker of myocardial health and help guide interventions targeting the myocardium.”
Moreover, he said, “We think that high-sensitivity troponin may also be a useful biomarker along with [natriuretic peptides] for emerging trials of HFpEF therapy.”
But whether hs-troponin has the potential for use as a screening tool is a question for future studies, according to McEvoy.
In his editorial, Januzzi pointed out several implications of the study, including the possibility for lowering cardiac risk in those with measurable hs-troponin, and that HF may be the most obvious outcome to target. Also, optimizing treatment and using cardioprotective therapies may reduce risk linked to increases in hs-troponin. Finally, long-term, large clinical trials on this issue will require a multidisciplinary team effort from various sectors.
“What is needed now are efforts toward developing strategies to upwardly bend the survival curves of those with a biomarker signature of risk, leveraging the knowledge gained from studies such as the report by McEvoy et al to improve public health,” he concluded.
Posted in Artificial Intelligence - Breakthroughs in Theories and Technologies, Big Data, Bio Instrumentation in Experimental Life Sciences Research, BioIT: BioInformatics, Biological Networks, Gene Regulation and Evolution, Biomarkers & Medical Diagnostics, Biomedical Measurement Science, Cancer and Current Therapeutics, Clinical Genomics, Computational Biology/Systems and Bioinformatics, Disease Biology, Exosomes, FDA Regulatory Affairs, Gene Regulation, Genetics & Pharmaceutical, Molecular Genetics & Pharmaceutical, Pharmacogenomics, Population Health Management, Genetics & Pharmaceutical, RNA Biology, Cancer and Therapeutics, tagged Exosomes, multivariate statistical analysis, survival analysis on June 18, 2016| Leave a Comment »

Curators: Larry H. Bernstein, MD, FCAP and Aviva Lev-Ari, PhD, RN
SURVIV for survival analysis of mRNA isoform variation
Shihao Shen, Yuanyuan Wang, Chengyang Wang, Ying Nian Wu & Yi Xing
Nature Communications7,Article number:11548 Feb 2016 doi:10.1038/ncomms11548
The rapid accumulation of clinical RNA-seq data sets has provided the opportunity to associate mRNA isoform variations to clinical outcomes. Here we report a statistical method SURVIV (Survival analysis of mRNA Isoform Variation), designed for identifying mRNA isoform variation associated with patient survival time. A unique feature and major strength of SURVIV is that it models the measurement uncertainty of mRNA isoform ratio in RNA-seq data. Simulation studies suggest that SURVIV outperforms the conventional Cox regression survival analysis, especially for data sets with modest sequencing depth. We applied SURVIV to TCGA RNA-seq data of invasive ductal carcinoma as well as five additional cancer types. Alternative splicing-based survival predictors consistently outperform gene expression-based survival predictors, and the integration of clinical, gene expression and alternative splicing profiles leads to the best survival prediction. We anticipate that SURVIV will have broad utilities for analysing diverse types of mRNA isoform variation in large-scale clinical RNA-seq projects.
Eukaryotic cells generate remarkable regulatory and functional complexity from a finite set of genes. Production of mRNA isoforms through alternative processing and modification of RNA is essential for generating this complexity. A prevalent mechanism for producing mRNA isoforms is the alternative splicing of precursor mRNA1. Over 95% of the multi-exon human genes undergo alternative splicing2, 3, resulting in an enormous level of plasticity in the regulation of gene function and protein diversity. In the last decade, extensive genomic and functional studies have firmly established the critical role of alternative splicing in cancer4, 5, 6. Alternative splicing is involved in a full spectrum of oncogenic processes including cell proliferation, apoptosis, hypoxia, angiogenesis, immune escape and metastasis7, 8. These cancer-associated alternative splicing patterns are not merely the consequences of disrupted gene regulation in cancer but in numerous instances actively contribute to cancer development and progression. For example, alternative splicing of genes encoding the Bcl-2 family of apoptosis regulators generates both anti-apoptotic and pro-apoptotic protein isoforms9. Alternative splicing of the pyruvate kinase M (PKM) gene has a significant impact on cancer cell metabolism and tumour growth10. A transcriptome-wide switch of the alternative splicing programme during the epithelial–mesenchymal transition plays an important role in cancer cell invasion and metastasis11, 12.
RNA sequencing (RNA-seq) has become a popular and cost-effective technology to study transcriptome regulation and mRNA isoform variation13, 14. As the cost of RNA-seq continues to decline, it has been widely adopted in large-scale clinical transcriptome projects, especially for profiling transcriptome changes in cancer. For example, as of April 2015 The Cancer Genome Atlas (TCGA) consortium had generated RNA-seq data on over 11,000 cancer patient specimens from 34 different cancer types. Within the TCGA data, breast invasive carcinoma (BRCA) has the largest sample size of RNA-seq data covering over 1,000 patients, and clinical information such as survival times, tumour stages and histological subtypes is available for the majority of the BRCA patients15. Moreover, the median follow-up time of BRCA patients is ~400 days, and 25% of the patients have more than 1,200 days of follow-up. Collectively, the large sample size and long follow-up time of the TCGA BRCA data set allow us to correlate genomic and transcriptomic profiles to clinical outcomes and patient survival times.
To date, systematic analyses have been performed to reveal the association between copy number variation, DNA methylation, gene expression and microRNA expression profiles with cancer patient survival16, 17. By contrast, despite the importance of mRNA isoform variation and alternative splicing, there have been limited efforts in transcriptome-wide survival analysis of alternative splicing in cancer patients. Most RNA-seq studies of alternative splicing in cancer transcriptomes focus on identifying ‘cancer-specific’ alternative splicing events by comparing cancer tissues with normal controls (see refs 18, 19, 20, 21, 22, 23 for examples). A recent analysis of TCGA RNA-seq data identified 163 recurrent differential alternative splicing events between cancer and normal tissues of three cancer types, among which five were found to have suggestive survival signals for breast cancer at a nominal P-value cutoff of 0.05 (ref. 21). Some other studies reported a significant survival difference between cancer patient subgroups after stratifying patients with overall mRNA isoform expression profiles24, 25. However, systematic cancer survival analyses of alternative splicing at the individual exon resolution have been lacking. Two main challenges exist for survival analyses of mRNA isoform variation and alternative splicing using RNA-seq data. The first challenge is to account for the estimation uncertainty of mRNA isoform ratios inferred from RNA-seq read counts. The statistical confidence of mRNA isoform ratio estimation depends on the RNA-seq read coverage for the events of interest, with larger read coverage leading to a more reliable estimation14. Modelling the estimation uncertainty of mRNA isoform ratio is an essential component of RNA-seq analyses of alternative splicing, as shown by various statistical algorithms developed for detecting differential alternative splicing from multi-group RNA-seq data14, 26, 27, 28,29. The second challenge, which is a general issue in survival analysis, is to properly model the association of mRNA isoform ratio with survival time, while accounting for missing data in survival time because of censoring, that is, patients still alive at the end of the survival study, whose precise survival time would be uncertain. To date, no algorithm has been developed for survival analyses of mRNA isoform variation that accounts for these sources of uncertainty simultaneously.
Here we introduce SURVIV (Survival analysis of mRNA Isoform Variation), a statistical model for identifying mRNA isoform ratios associated with patient survival times in large-scale cancer RNA-seq data sets. SURVIV models the estimation uncertainty of mRNA isoform ratios in RNA-seq data and tests the survival effects of isoform variation in both censored and uncensored survival data. In simulation studies, SURVIV consistently outperforms the conventional Cox regression survival analysis that ignores the measurement uncertainty of mRNA isoform ratio. We used SURVIV to identify alternatively spliced exons whose exon-inclusion levels significantly correlated with the survival times of invasive ductal carcinoma (IDC) patients from the TCGA breast cancer cohort. Survival-associated alternative splicing events are identified in gene pathways associated with apoptosis, oxidative stress and DNA damage repair. Importantly, we show that alternative splicing-based survival predictors outperform gene expression-based survival predictors in the TCGA IDC RNA-seq data set, as well as in TCGA data of five additional cancer types. Moreover, the integration of clinical information, gene expression and alternative splicing profiles leads to the best prediction of survival time.
SURVIV statistical model
The statistical model of SURVIV assesses the association between mRNA isoform ratio and patient survival time. While the model is generic for many types of alternative isoform variation, here we use the exon-skipping type of alternative splicing to illustrate the model (Fig. 1a). For each alternative exon involved in exon-skipping, we can use the RNA-seq reads mapping to its exon-inclusion or -skipping isoform to estimate its exon-inclusion level (denoted as ψ, or PSI that is Per cent Spliced In14). A key feature of SURVIV is that it models the RNA-seq estimation uncertainty of exon-inclusion level as influenced by the sequencing coverage for the alternative splicing event of interest. This is a critical issue in accurate quantitative analyses of mRNA isoform ratio in large-scale RNA-seq data sets14, 26, 27, 28, 29. Therefore, SURVIV contains two major components: the first to model the association of mRNA isoform ratio with patient survival time across all patients, and the second to model the estimation uncertainty of mRNA isoform ratio in each individual patient (Fig. 1a).
Figure 1: The statistical framework of the SURVIV model.
(a) For each patient k, the patient’s hazard rate λk(t) is associated with the baseline hazard rate λ0(t) and this patient’s exon-inclusion level ψk. The association of exon-inclusion level with patient survival is estimated by the survival coefficient β. The exon-inclusion level ψk is estimated from the read counts for the exon-inclusion isoform ICk and the exon-skipping isoform SCk. The proportion of the inclusion and skipping reads is adjusted by a normalization function f that considers the lengths of the exon-inclusion and -skipping isoforms (see details in Results and Supplementary Methods). (b) A hypothetical example to illustrate the association of exon-inclusion level with patient survival probability over time Sk(t), with the survival coefficient β=−1 and a constant baseline hazard rate λ0(t)=1. In this example, patients with higher exon-inclusion levels have lower hazard rates and higher survival probabilities. (c) The schematic diagram of an exon-skipping event. The exon-inclusion reads ICk are the reads from the upstream splice junction, the alternative exon itself and the downstream splice junction. The exon-skipping reads SCk are the reads from the skipping splice junction that directly connects the upstream exon to the downstream exon.
Briefly, for any individual exon-skipping event, the first component of SURVIV uses a proportional hazards model to establish the relationship between patient k’s exon-inclusion level ψk and hazard rate λk(t).

For each exon, the association between the exon-inclusion level and patient survival time is reflected by the survival coefficient β. A positive β means increased exon inclusion is associated with higher hazard rate and poorer survival, while a negative β means increased exon inclusion is associated with lower hazard rate and better survival. λ0(t) is the baseline hazard rate estimated from the survival data of all patients (see Supplementary Methods for the detailed estimation procedure). A particular patient’s survival probability over time Sk(t) can be calculated from the patient-specific hazard rate λk(t) as
. Figure 1b illustrates a simple example with a negative β=−1 and a constant baseline hazard rate λ0(t)=1, where higher exon-inclusion levels are associated with lower hazard rates and higher survival probabilities.
The second component of SURVIV models the exon-inclusion level and its estimation uncertainty in individual patient samples. As illustrated in Fig. 1c, the exon-inclusion level ψk of a given exon in a particular sample can be estimated by the RNA-seq read count specific to the exon inclusion isoform (ICk) and the exon-skipping isoform (SCk). Other types of alternative splicing and mRNA isoform variation can be similarly modelled by this framework29. Given the effective lengths (that is, the number of unique isoform-specific read positions) of the exon-inclusion isoform (lI) and the exon-skipping isoform (lS), the exon-inclusion level ψk can be estimated as
. Assuming that the exon-inclusion read count ICk follows a binomial distribution with the total read count nk=ICk+SCk, we have:

The binomial distribution models the estimation uncertainty of ψk as influenced by the total read count nk, in which the parameter pk represents the proportion of reads from the exon-inclusion isoform, given the exon-inclusion level ψk adjusted by a length normalization function f(ψk) based on the effective lengths of the isoforms. The definitions of effective lengths for all basic types of alternative splicing patterns are described in ref. 29.
Distinct from conventional survival analyses in which predictors do not have estimation uncertainty, the predictors in SURVIV are exon-inclusion levels ψk estimated from RNA-seq count data, and the confidence of ψk estimate for a given exon in a particular sample depends on the RNA-seq read coverage. We use the statistical framework of survival measurement error model30 to incorporate the estimation uncertainty of isoform ratio in the proportional hazards model. Using a likelihood ratio test, we test whether the exon-inclusion levels have a significant association with patient survival over the null hypothesis H0:β=0. The false discovery rate (FDR) is estimated using the Benjamini and Hochberg approach31. Details of the parameter estimation and likelihood ratio test in SURVIV are described in Supplementary Methods.
Figure 2: Simulation studies to assess the performance of SURVIV and the importance of modelling the estimation uncertainty of mRNA isoform ratio.
We compared our SURVIV model with Cox regression using point estimates of exon-inclusion levels, which does not consider the estimation uncertainty of the mRNA isoform ratio. (a) To study the effect of RNA-seq depth, we simulated the mean total splice junction read counts equal to 5, 10, 20, 50, 80 and 100 reads. We generated two sets of simulations with and without data-censoring. For each simulation, the true-positive rate (TPR) at 5% false-positive rate is plotted. The inset figure shows the empirical distribution of the mean total splice junction read counts in the TCGA IDC RNA-seq data (x axis in the log10 scale). (b) To faithfully represent the read count distribution in a real data set, we performed another simulation with read counts directly sampled from the TCGA IDC data. Sampled read counts were then multiplied by different factors ranging from 10 to 300% to simulate data sets with different RNA-seq read depth. Continuous and dashed lines represent the performance of SURVIV and Cox regression, respectively. Red lines represent the area under curve (AUC) of the ROC curve (TPR versus false-positive rate plot). Black lines represent the TPR at 5% false-positive rate.
Using these simulated data, we compared SURVIV with Cox regression in two settings, without or with censoring of the survival time. In the setting without censoring, the death and survival time of each individual is known. In the setting with censoring, certain individuals are still alive at the end of the survival study. Consequently, these patients have unknown death and survival time. Here, in the simulation with censoring, we assumed that 85% of the patients were still alive at the end of the study, similar to the censoring rate of the TCGA IDC data set. In both settings and with different depths of RNA-seq coverage, SURVIV consistently outperformed Cox regression in the true-positive rate at the same false-positive rate of 5% (Fig. 2a). As expected, we observed a more significant improvement in SURVIV over Cox regression when the RNA-seq read coverage was low (Fig. 2a).
To more faithfully recapitulate the read count distribution in a real cancer RNA-seq data set, we performed another simulation study with read counts directly sampled from the TCGA IDC data. To assess the influence of RNA-seq read depth on the performance of SURVIV and Cox regression, sampled read counts were then multiplied by different factors ranging from 10 to 300% to simulate data sets with different RNA-seq read depths (Fig. 2b). The TCGA IDC data set has an average RNA-seq depth of ~60 million paired-end reads per patient. Thus, the read depth of these simulated RNA-seq data sets ranged from ~6 million reads to 180 million reads per patient, representing low-coverage RNA-seq studies designed primarily for gene expression analysis32 up to high-coverage RNA-seq studies designed primarily for alternative isoform analysis29. At all levels of RNA-seq depth, SURVIV consistently outperformed Cox regression, as reflected by the area under curve of the receiver operating characteristic (ROC) curve as well as the true-positive rate at 5% false-positive rate (Fig. 2b). The improvement of SURVIV over Cox regression was particularly prominent when the read depth was low. For example, at 10% read depth, SURVIV had 7% improvement in area under curve (68% versus 61%) and 8% improvement in the true-positive rate at 5% false-positive rate (46% versus 38%). Collectively, these simulation results suggest that SURVIV achieves a higher accuracy by accounting for the estimation uncertainty of mRNA isoform ratio in RNA-seq data.
SURVIV analysis of TCGA IDC breast cancer data
To illustrate the practical utility of SURVIV, we used it to analyse the overall survival time of 682 IDC patients from the TCGA breast cancer (BRCA) RNA-seq data set (see Methods for details of the data source and processing pipeline). We chose to analyse IDC because it is the most frequent type of breast cancer33, comprising ~70% of patients in the TCGA breast cancer data set. To control for the effects of significant clinical parameters such as tumour stage and subtype and identify alternative splicing events associated with patient outcomes across multiple molecular and clinical subtypes, we followed the procedure of Croce and colleagues in analysing mRNA and microRNA prognostic signature of IDC33 and stratified the patients according to their clinical parameters. We then conducted SURVIV analysis in 26 clinical subgroups with at least 50 patients in each subgroup. We identified 229 exon-skipping events associated with patient survival in multiple clinical subgroups that met the criteria of SURVIV P-value≤0.01 in at least two subgroups of the same clinical parameter (cancer subtype, stage, lymph node, metastasis, tumour size, oestrogen receptor status, progesterone receptor status, HER2 status and age as shown in Fig. 3). DAVID (Database for Annotation, Visualization and Integrated Discovery) Gene Ontology analyses34 of the 229 alternative splicing events suggest an enrichment of genes in cancer-related functional categories such as intracellular signalling, apoptosis, oxidative stress and response to DNA damage (Supplementary Fig. 1). Table 1 shows a few selected examples of survival-associated alternative splicing events in cancer-related genes. Using two-means clustering of each individual exon’s inclusion levels, the 682 IDC patients can be segregated into two subgroups with significantly different survival times as illustrated by the Kaplan–Meier survival plot (Fig. 4). We also carried out hierarchical clustering of IDC patients using 176 survival-associated alternative exons (P≤0.01; SURVIV analysis of all IDC patients). Using the exon-inclusion levels of these 176 exons, we clustered IDC patients into three major subgroups, with 95, 194 and 389 patients, respectively. As illustrated by the Kaplan–Meier survival plots, the three subgroups had significantly different survival times (Supplementary Fig. 2).
Figure 3: SURVIV analysis of exon-skipping events in the TCGA IDC RNA-seq data set.
IDC patients are stratified into multiple clinical subgroups based on clinical parameters including cancer subtype, stage, lymph node status, metastasis, tumour size, oestrogen receptor status, progesterone receptor status, HER2 status and age. Only clinical subgroups with at least 50 patients are included in further analyses. Numbers of patients in the subgroups are indicated next to the names of the subgroups. Shown in the heatmap are the log10 SURVIV P-values of the 229 exons associated with patient survival (P≤0.01) in at least two subgroups of the same class of clinical parameters. Turquoise colour indicates positive correlation that higher exon-inclusion levels are associated with higher survival probabilities. Magenta colour indicates negative correlation that lower exon-inclusion levels are associated with higher survival probabilities.
TABLE 1 (not shown)
Figure 4: Kaplan–Meier survival plots of IDC patients stratified by two-means clustering of the exon-inclusion levels of four survival-associated alternative splicing events.
Clustering was generated for each of the four exons separately. Black lines represent patients with high exon-inclusion levels. Red lines represent patients with low exon-inclusion levels. The P-values are from SURVIV analysis of the TCGA IDC RNA-seq data. (a) ATRIP. (b) BCL2L11. (c) CD74. (d) PCBP4.
Figure 5: Alternative splicing of STAT5A exon 5 is significantly associated with IDC patient survival.
(a) The gene structure of the STAT5A full-length isoform compared to the ΔEx5 isoform skipping the 5th exon. (b) Kaplan–Meier survival plot of IDC patients stratified by two-means clustering using exon-inclusion levels of STAT5A exon 5. The 420 patients in Group 1 (average exon 5 inclusion level=95%) have significantly higher survival probabilities than the 262 patients in Group 2 (average exon 5 inclusion level=85%) (SURVIV P=6.8e−4). (c) Exon 5 inclusion levels of IDC patients stratified by two-means clustering using exon 5 inclusion levels. Group 1 has 420 patients with average exon-inclusion level at 95%. Group 2 has 262 patients with average exon-inclusion level at 85%. (d) STAT5A exon 5 inclusion levels in normal breast tissues versus breast cancer tumour samples. Exon-inclusion levels are extracted from 86 TCGA breast cancer patients with matched normal and tumour samples. Normal breast tissues have average exon 5 inclusion level at 95%, compared to 91% average exon-inclusion level in tumour samples. Error bars represent 95% confidence interval of the mean.
Network of survival-associated alternative splicing events
…see http://www.nature.com/ncomms/2016/160609/ncomms11548/full/ncomms11548.html
Figure 6: Splicing factor regulatory network of survival-associated alternative splicing events in IDC.
(a–c) Kaplan–Meier survival plots of IDC patients stratified by the gene expression levels of three splicing factors: TRA2B (a, Cox regression P=1.8e−4), HNRNPH1 (b, P=3.4e−4) and SFRS3 (c, P=2.8e−3). Black lines represent patients with high gene expression levels. Red lines represent patients with low gene expression levels. (d) The exon-inclusion levels of a DHX30 alternative exon are negatively correlated with TRA2B gene expression levels (robust correlation coefficient r=−0.26, correlation P=1.2e−17). (e) The exon-inclusion levels of a MAP3K4 alternative exon are positively correlated withHNRNPH1 gene expression levels (robust correlation coefficient r=0.16, correlation P=2.6e−06). (f) A splicing co-expression network of the three splicing factors and their correlated survival-associated alternative exons. In total, 84 survival-associated alternative exons are significantly correlated with the three splicing factors. The positive/negative correlation between splicing factors and alternative exons is represented by blue/red lines, respectively. Exons whose inclusion levels are positively/negatively correlated with survival times are represented by blue/red dots, respectively. The size of the splicing factor circles is proportional to the number of correlated exons within the network.
…..
Alternative splicing predictors of cancer patient survival
see http://www.nature.com/ncomms/2016/160609/ncomms11548/full/ncomms11548.html
Figure 7: Cross-validation of different classes of IDC survival predictors measured by the C-index
A C-index of 1 indicates perfect prediction accuracy and a C-index of 0.5 indicates random guess. The plots indicate the distribution of C-indexes from 100 rounds of cross-validation. The centre value of the box plot is the median C-index from 100 rounds of cross-validation. The notch represents the 95%confidence interval of the median. The box represents the 25 and 75% quantiles. The whiskers extended out from the box represent the 5 and 95% quantiles. Two-sided Wilcoxon test was used to compare different survival predictors. The different classes of predictors are: (a) clinical information (median C-index 0.67). (b) Gene expression (median C-index 0.68). (c) Alternative splicing (median C-index 0.71). (d) Clinical information+gene expression (median C-index 0.69). (e) Clinical information+alternative splicing (median C-index 0.73). (f) Clinical information+gene expression+alternative splicing (median C-index 0.74). Note that ‘Gene’ refers to ‘Gene-level expression’ in these plots.
Next, we carried out the SURVIV analysis in five additional cancer types in TCGA, including GBM (glioblastoma multiforme), KIRC (kidney renal clear cell carcinoma), LGG (lower grade glioma), LUSC (lung squamous cell carcinoma) and OV (ovarian serous cystadenocarcinoma). As expected, the number of significant events at different FDR or P-value significance cutoffs varied across cancer types, with LGG having the strongest survival-associated alternative splicing signals with 660 significant exon-skipping events at FDR≤5% (Supplementary Data 3 and 4). Strikingly, regardless of the number of significant events, alternative splicing-based survival predictors outperformed gene expression-based survival predictors across all cancer types (Supplementary Fig. 3), consistent with our initial observation on the IDC data set.
Alternative processing and modification of mRNA, such as alternative splicing, allow cells to generate a large number of mRNA and protein isoforms with diverse regulatory and functional properties. The plasticity of alternative splicing is often exploited by cancer cells to produce isoform switches that promote cancer cell survival, proliferation and metastasis7, 8. The widespread use of RNA-seq in cancer transcriptome studies15, 47, 48 has provided the opportunity to comprehensively elucidate the landscape of alternative splicing in cancer tissues. While existing studies of alternative splicing in large-scale cancer transcriptome data largely focused on the comparison of splicing patterns between cancer and normal tissues or between different subtypes of cancer18, 21, 49, additional computational tools are needed to characterize the clinical relevance of alternative splicing using massive RNA-seq data sets, including the association of alternative splicing with phenotypes and patient outcomes.
We have developed SURVIV, a novel statistical model for survival analysis of alternative isoform variation using cancer RNA-seq data. SURVIV uses a survival measurement error model to simultaneously model the estimation uncertainty of mRNA isoform ratio in individual patients and the association of mRNA isoform ratio with survival time across patients. Compared with the conventional Cox regression model that uses each patient’s mRNA isoform ratio as a point estimate, SURVIV achieves a higher accuracy as indicated by simulation studies under a variety of settings. Of note, we observed a particularly marked improvement of SURVIV over Cox regression for low- and moderate-depth RNA-seq data (Fig. 2b). This has important practical value because many clinical RNA-seq data sets have large sample size but relatively modest sequencing depth.
Using the TCGA IDC breast cancer RNA-seq data of 682 patients, SURVIV identified 229 alternative splicing events associated with patient survival time, which met the criteria of SURVIVP-values≤0.01 in multiple clinical subgroups. While the statistical threshold seemed loose, several lines of evidence suggest the functional and clinical relevance of these survival-associated alternative splicing events. These alternative splicing events were frequently identified and enriched in the gene functional groups important for cancer development and progression, including apoptosis, DNA damage response and oxidative stress. While some of these events may simply reflect correlation but not causal effect on cancer patient survival, other events may play an active role in regulating cancer cell phenotypes. For example, a survival-associated alternative splicing event involving exon 5 of STAT5A is known to regulate the activity of this transcription factor with important roles in epithelial cell growth and apoptosis37. Using a co-expression network analysis of splicing factor to exon correlation across all patients, we identified three splicing factors (TRA2B, HNRNPH1 and SFRS3) as potential hubs of the survival-associated alternative splicing network of IDC. The expression levels of all three splicing factors were negatively associated with patient survival times (Fig. 6a–c), and both TRA2B and HNRNPH1 were previously reported to have an impact on cancer-related molecular pathways40, 41, 42, 43, 44, 45. Finally, despite the limited power in detecting individual events, we show that the survival-associated alternative splicing events can be used to construct a predictor for patient survival, with an accuracy higher than predictors based on clinical parameters or gene expression profiles (Fig. 7). This further demonstrates the potential biological relevance and clinical utility of the identified alternative splicing events.
We performed cross-validation analyses to evaluate and compare the prognostic value of alternative splicing, gene expression and clinical information for predicting patient survival, either independently or in combination. As expected, the combined use of all three types of information led to the best prediction accuracy. Because we used penalized regression to build the prediction model, combining information from multiple layers of data did not necessarily increase the number of predictors in the model. The perhaps more surprising and intriguing result is that alternative splicing-based predictors appear to outperform gene expression-based predictors when used alone and when either type of data was combined with clinical information (Fig. 7). We observed the same trend in five additional cancer types (Supplementary Fig. 3). We note that this finding was consistent with a previous report that cancer subtype classification based on splicing isoform expression performed better than gene expression-based classification25. While this trend seems counterintuitive because accurate estimation of gene expression requires much lower RNA-seq depth than accurate estimation of alternative splicing29, one possible explanation may be the inherent characteristic of isoform ratio data. By definition, mRNA isoform ratio is estimated as the ratio of multiple mRNA isoforms from a single gene. Therefore, mRNA isoform ratio data have a ‘built-in’ internal control that could be more robust against certain artefacts and confounding issues that influence gene expression estimates across large clinical RNA-seq data sets, such as poor sample quality and RNA degradation12. Regardless of the reasons, our data call for further studies to fully explore the utility of mRNA isoform ratio data for various clinical research applications.
The SURVIV source code is available for download at https://github.com/Xinglab/SURVIV. SURVIV is a general statistical model for survival analysis of mRNA isoform ratio using RNA-seq data. The current statistical framework of SURVIV is applicable to RNA-seq based count data for all basic types of alternative splicing patterns involving two isoform choices from an alternatively spliced region, such as exon-skipping, alternative 5′ splice sites, alternative 3′ splice sites, mutually exclusive exons and retained introns, as well as other forms of alternative isoform variation such as RNA editing. With the rapid accumulation of clinical RNA-seq data sets, SURVIV will be a useful tool for elucidating the clinical relevance and potential functional significance of alternative isoform variation in cancer and other diseases.
Posted in Artificial Intelligence - Breakthroughs in Theories and Technologies, Big Data, BioIT: BioInformatics, Gene Regulation, Genetics & Pharmaceutical, Genome Biology, Pharmaceutical Drug Discovery, Proteins, Proteomics, RNA Biology, Cancer and Therapeutics, Signaling & Cell Circuits, Small Molecules in Development of Therapeutic Drugs, tagged Computational Biology/Systems and Bioinformatics, gene expression, gene transcription, pharmaceutical targets, Transcriptional Activator-Coactivator Interactions on May 19, 2016| Leave a Comment »

Switching on genes
Curator: Larry H. Bernstein, MD, FCAP
LPBI
UPDATED 3/17/2020
Gene Expression Controls Revealed
Researchers have modelled every atom in a key part of the process for switching on genes, revealing a whole new area for potential drug targets.
Proteins are essential for processes that sustain life. They are created in cells through a process called gene expression, which uses instructions from stretches of DNA called genes to build proteins. Sometimes genes are faulty and create proteins that contain errors, preventing the cell from functioning properly. These lead to genetic diseases like cystic fibrosis and haemophilia.
Gene expression is controlled by molecules called transcription factors, which bind to the start of a gene sequence at its ‘basal machinery’ and tell it to switch on and start creating certain proteins.
The way transcription factors bind to the basal machinery is a ‘fuzzy’ process, meaning the exact sequence of events is unknown because the steps do not exist for long enough to be captured by traditional imaging techniques.
But now, by creating a computer simulation of all of the tens of thousands of atoms making up the process and modelling their movements in 50 million separate steps, researchers at Imperial College London have been able to determine the sequence of events that lead to genes being switched on.
DISRUPTING DETRIMENTAL GENES
The simulated process revealed ‘pockets’ in the gene basal machinery, which the transcription factors move in and out of during binding. Knowing how these structures fit together could lead to the design of molecules that interfere with or disrupt the process, potentially tackling diseases.
Lead researcher Dr Robert Weinzierl from Imperial’s Department of Life Sciences said: “For the first time, we can fill in the dynamic landscape of interaction between transcription factors and basal machinery. This is a central mechanism for gene expression – the interactions here determine whether a gene gets switched on and creates proteins.”
“Gene regulation is a completely new drug target that has previously been too challenging to explore,” added Dr Weinzierl. “This process influences biology on a really fundamental level, and could allow us to prevent the expression of detrimental genes.”
FASTER DRUG SCREENING
The researchers’ new technique predicts the movements of all the atoms in order to build up a picture of the structures involved changing every couple of femtoseconds – quadrillionths of a second. The results of the first trial of the technique are reported today in PLOS Computational Biology.
Dr Weinzierl has submitted a patent application for his computer-based approach to studying gene expression interactions. Using this, compounds could be screened for possible fit into the basal machinery pockets.
“With computer simulation, it becomes easy to identify candidate compounds that could target these interactions without the need to test them first in real life, cutting down the time required to sift for new drugs,” said Dr Weinzierl.
Steps that lead to genes being switched on revealed in atomic simulation
by Hayley Dunning 13 May 2016
http://www3.imperial.ac.uk/newsandeventspggrp/imperialcollege/newssummary/news_13-5-2016-10-29-43

Researchers have modelled every atom in a key part of the process for switching on genes, revealing a whole new area for potential drug targets.
Proteins are essential for processes that sustain life. They are created in cells through a process called gene expression, which uses instructions from stretches of DNA called genes to build proteins. Sometimes genes are faulty and create proteins that contain errors, preventing the cell from functioning properly. These lead to genetic diseases like cystic fibrosis and haemophilia.
Gene regulation is a completely new drug target that has previously been too challenging to explore.
Gene expression is controlled by molecules called transcription factors, which bind to the start of a gene sequence at its ‘basal machinery’ and tell it to switch on and start creating certain proteins.
The way transcription factors bind to the basal machinery is a ‘fuzzy’ process, meaning the exact sequence of events is unknown because the steps do not exist for long enough to be captured by traditional imaging techniques.
But now, by creating a computer simulation of all of the tens of thousands of atoms making up the process and modelling their movements in 50 million separate steps, researchers at Imperial College London have been able to determine the sequence of events that lead to genes being switched on.
For more see: http://www3.imperial.ac.uk/newsandeventspggrp/imperialcollege/newssummary/news_13-5-2016-10-29-43
Molecular Dynamics of “Fuzzy” Transcriptional Activator-Coactivator Interactions
Natalie S. Scholes, Robert O. J. Weinzierl
Transcriptional activation domains (ADs) are generally thought to be intrinsically unstructured, but capable of adopting limited secondary structure upon interaction with a coactivator surface. The indeterminate nature of this interface made it hitherto difficult to study structure/function relationships of such contacts. Here we used atomistic accelerated molecular dynamics (aMD) simulations to study the conformational changes of the GCN4 AD and variants thereof, either free in solution, or bound to the GAL11 coactivator surface. We show that the AD-coactivator interactions are highly dynamic while obeying distinct rules. The data provide insights into the constant and variable aspects of orientation of ADs relative to the coactivator, changes in secondary structure and energetic contributions stabilizing the various conformers at different time points. We also demonstrate that a prediction of α-helical propensity correlates directly with the experimentally measured transactivation potential of a large set of mutagenized ADs. The link between α-helical propensity and the stimulatory activity of ADs has fundamental practical and theoretical implications concerning the recruitment of ADs to coactivators.Author Summary
The regulated transcription of eukaryotic genes is governed by gene-specific transcription factors that contain activation domains to stimulate the expression of nearby genes. Activation domains are unable to take up a defined three-dimensional conformation. Nevertheless, as we demonstrate in our study, molecular dynamics simulations reveal that the key docking point of such domains (centered around several large hydrophobic amino acid sidechains) folds into fluctuating α-helical conformations. Analysis of published data shows that this tendency of adopting such local structures correlates directly with stimulation activity. We also investigate the interaction of these structurally unstable domains with a coactivator interaction partner. Computational simulations are ideally suited for analysing the rapidly changing, “fuzzy” interactions occurring between these protein partners. We gained new insights into the competitive nature of the key hydrophobic sidechains in binding to a pocket on the coactivator surface and documented for the first time the rapidly changing movements of an activation domain during these interactions.
Transcription Factor Effector Domains
Transcriptional activation is a stepwise process that requires (a) creating and maintaining an open chromatin structure, (b) assembly of the preinitiation complex, and (c) transition to productive elongation (Fig. 12.1). Successful completion of each of these steps involves a diverse group of proteins, some of which function in a relatively promoter-specific manner whereas others regulate large sets of genes. Recent advances in molecular and computational biology allow histone and DNA modifications, TFs, and RNA polymerases to be precisely mapped throughout the genome, relative to active or silent promoters (see [1–3] for reviews). From this research, it is becoming clear that there is a complex interaction between the chromatin landscape and the transcriptional machinery and that the dynamic relationship of this interface is central to biological control over gene expression [4]. It is now recognized that regulatory factors can exert their influence on transcriptional activation either via co-localization with other proteins that are bound at or near core promoter regions or they can be recruited to distal enhancer regions and interact with promoter-bound proteins via looping mechanisms. However, generally speaking, the chromatin remodeling enzymes and the general transcription factors involved in initiation and elongation cannot, on their own, recognize and stably bind to the promoter or enhancer regions.
One way in which chromatin remodeling enzymes and general transcription factors are recruited to cis-regulatory regions is through interaction with site-specific DNA binding TFs (Fig. 12.2a). The three largest classes of site-specific DNA binding proteins in mammals contact the genome via conserved DNA binding domains called zinc fingers, homeodomains, and helix–loop–helix domains [5] (Chapter 3 of this volume provides a catalog of eukaryotic DNA binding domains, and Chapters 4 and 5 specifically review C2H2 zinc fingers and homeodomains). Each of these classes of site-specific DNA binding factors contains many different proteins; for example, in humans there are over 650 zinc finger proteins, ~ 250 homeodomain proteins, and ~80 helix-loop-helix proteins [5]. Within each class, individual TFs can bind to and regulate hundreds to thousands of different genes. Site-specific TFs are modular in their structure reflecting their ability to bind to DNA via their DNA binding domains and simultaneously bind to other transcriptional regulatory proteins via so-called effector domains. The modular nature of site-specific TFs has been repeatedly demonstrated using in vitro and in vivo reporter assays. In these experiments, effector domains are separated from their natural DNA binding domains and then engineered to be part of a fusion protein having a heterologous DNA binding domain. Numerous studies have shown that simply bringing such effector domains to promoter regions can modulate transcription [6–8].
Another way in which chromatin remodeling enzymes and general transcription factors can be brought to the genome is via effector domains that reside in proteins that can recognize epigenomic marks. Similar to recognition of a short nucleotide motif by a DNA binding protein, other proteins can distinguish distinctively modified DNA and histone protein “motifs”. For example, methylated cytosine in the 5′-CpG-3′ dinucleotide sequence is specifically recognized by members of a family of proteins containing a conserved methyl-CpG binding domain (MBD). MBD-containing proteins, which include MeCP2, MBD1, MBD2 and MBD4, bind specifically to methyl-CpG motifs located throughout the genome [9]; see Fig. 12.2b. MBD-containing proteins function by recruiting various co-regulators to methyl-CpG sites. For example, MeCP2 simultaneously binds promoter regions containing methyl-CpG motifs and the Sin3-containing histone deacetylase complex via a transcriptional repression domain (TRD), resulting in histone deacetylation and transcriptional silencing [10, 11]. Likewise, MBD1 and MBD2 copurify with distinct cellular complexes which link DNA methylation with chromatin modification and transcriptional repression. Similarly, posttranslational modifications of the amino termini of core histones are correlated to transcriptional states and are recognized by relevant chromatin-associated proteins (Fig. 12.2c). Several different histone modifications have been identified, including acetylation, phosphorylation, and methylation, and specific protein domains have evolved to recognize several of these different modifications. For example, different methylation states of histone H3 at lysine 4 can be recognized by tudor, chromo, and plant homeodomains (PHD), by malignant brain tumor (MBT) domains, and by WD40 repeat domains (many of these domains are structurally related and are collectively referred to as the “royal family” [12], reviewed [13, 14]). Other examples of this family include the chromodomain of HP1, which interacts with lower (mono- and di-) methylation states of lysine 9 of histone H3 but preferentially binds to the trimethylated state [15, 16] and the tudor domain of 53BP1, which can discriminate between the diand tri-methyl state of H4K20, preferring the dimethyl form [17, 18]. Acetylated lysine is also recognized by specific protein modules called the bromodomain [19], which is found in many chromatin-associated proteins and in nearly all known nuclear histone acetyltransferases (HATs). Of course, epigenetic marks such as DNA methylation and histone modifications are located at specific genomic regions (which can vary in different cell types), indicating that DNA methylases and histone modifying enzymes must be recruited to the genome by sequence-specific mechanisms such as site-specific TFs or RNAs. For example, KRAB-ZNFs can recruit the KAP1/SETDB1 histone methylating complex and long non-coding RNAs can recruit the PRC2 histone methylation complex [20–23].
The focus of this chapter is on the effector domains that are brought to specific sites of the genome by DNA binding proteins, methyl-CpG binding proteins, or histone binding proteins. (The interaction of TFs with chromatin more generally is discussed in Chapter 11). We provide examples of common effector domains that can function in transcriptional regulation via their ability to influence each of the steps outlined in Fig. 12.1. Specifically, we discuss effector domains that can: (a) interact with the basal transcriptional machinery and general co-activators, (b) interact with other TFs to allow cooperative binding, and (c) directly or indirectly recruit histone and chromatin modifying enzymes.
Eukaryotic transcriptional dynamics: from single molecules to cell populations
Transcriptional regulation is achieved through combinatorial interactions between regulatory elements in the human genome and a vast range of factors that modulate the recruitment and activity of RNA polymerase. Experimental approaches for studying transcription in vivo now extend from single-molecule techniques to genome-wide measurements. Parallel to these developments is the need for testable quantitative and predictive models for understanding gene regulation. These conceptual models must also provide insight into the dynamics of transcription and the variability that is observed at the single-cell level. In this Review, we discuss recent results on transcriptional regulation and also the models those results engender. We show how a non-equilibrium description informs our view of transcription by explicitly considering time-and energy-dependence at the molecular level.
Transcriptional regulation in the nucleus is the culmination of the actions of a diverse range of factors, such as transcription factors, chromatin remodellers, polymerases, helicases, topoisomerases, kinases, chaperones, proteasomes, acetyltransferases, deacetylases and methyltransferases. Determining how these molecules work in concert in the eukaryotic nucleus to regulate genes remains a central challenge in molecular biology. Dynamics lie at the heart of this mystery. Megadalton complexes assemble and disassemble on genes within seconds1,2; nucleosome turnover ranges from minutes to hours3; and gene activity demonstrates complex temporal patterns such as oscillation and transcriptional bursting4,5. Exciting new experimental advances have enabled the study of dynamic transcriptional regulation at the single-molecule6 and genome-wide7levels, thus enhancing our understanding of transcriptional regulation in vivo. These approaches also necessitate new models for describing gene expression. In this Review, we discuss recent in vivo results and the quantitative models that are motivated by those results.
Chromatin immunoprecipitation (ChIP) provides genome-wide occupancy profiles for chromatin-interacting factors at near base-pair resolution in populations of cells8,9. Using this approach on a genome-wide level has generated comprehensive maps of regulation on a gene-by-gene basis7,8,10. This population approach has been complemented by single-cell imaging techniques. Almost all factors that have been studied by live-cell microscopy exhibit dwell times on chromatin on the order of seconds11, and single-cell studies demonstrate a great variability in gene expression among cells in a population, owing in part to the stochastic nature of transcription12. Despite these tremendous advances in understanding the behaviour of individual factors, both methods fall short of capturing the sequence of events that is required to activate or repress a gene in vivo. Ideally, the occupancy of many factors that are coincident on a single stretch of DNA would be measured to obtain a sense of the complexes and intermediates that assemble in vivo. However, this experimental challenge is a daunting one. Current re-ChIP (also known as sequential ChIP) experiments usually look at two factors4,13 but it would be necessary to look at an order of magnitude more factors to begin to capture the combinatorial complexity of transcriptional regulation in metazoans4,14–16.
The gulf between actual mechanisms of transcriptional regulation and experimental capabilities could be bridged by using quantitative models of transcription. Decades of biochemical, structural and genetic data have spawned multiple models of transcriptional regulation, several of which we discuss below (FIG. 1). Even though these views are not mutually exclusive and boundaries between them are not clear, they reflect fundamental differences regarding the mechanisms of the underlying molecular processes. Currently, most quantitative theoretical models describe transcriptional regulation as an equilibrium thermodynamic phenomenon — an assumption that allows model building without explicitly considering the dynamics. Here we explain how this description is fundamentally inconsistent with the canonical view of gene regulation based on a sequential, ordered recruitment of factors, which is an example of a non-equilibrium model. In the context of a non-equilibrium model, the transcriptional dynamics can exhibit a form of molecular memory so that the future behaviour of the system depends on its history. We will outline this gap between the molecular biologist’s canonical view of transcription and the quantitative approaches that are often used to describe it. We argue for a non-equilibrium view of transcriptional regulation that is informed and constrained by single-cell observations. With the ability to observe single transcription factors17 and single transcribing genes18 in living cells, new experimental and modelling possibilities are emerging for understanding transcription dynamics in vivo.
see more at: doi: 10.1038/nrg3484
UPDATED 3/17/2020
From Chromatin Biology in the Journal Science
Gregory D. Bowman and Sebastian Deindl
Science 04 Oct 2019:
Vol. 366, Issue 6461, pp. 35-36
DOI: 10.1126/science.aay4317
In complex organisms such as humans, a single genetic blueprint can give rise to a multitude of different cell types, from nerve to liver to muscle. Such cellular diversity relies on restricting which portions of genomic DNA are accessible and therefore can be read by cellular machinery. Ultimately, access to DNA depends on placement of a repetitive, spool-like structure called the nucleosome, the basic packaging unit of chromosomes. The nucleosome occludes two tight loops of DNA and thus represents a fundamentally repressive element. When and where nucleosomes are positioned can affect complex transcriptional programs, and therefore disruptions in the factors responsible for nucleosome positioning often result in cancers and multisystem developmental diseases. Although the mechanism of shifting nucleosomes along DNA has long proved elusive, a recent flurry of structural, biophysical, and biochemical work has revealed a core mechanistic framework explaining how nucleosomes are actively repositioned throughout the genome.
Nucleosomes are the most ubiquitous protein-DNA complexes in all eukaryotic cells. The core of each nucleosome is a symmetric, disk-like structure made of histone proteins that provides a scaffold around which two loops of the DNA helix are snugly wrapped (1). Histones are often modified through, for example, acetylation, methylation, and phosphorylation, which add an additional layer of information on top of the genetic code. This epigenetic information demarcates functionally distinct regions of the genome—for instance, whether a gene is active or designated to remain silent—for each cell type.
Owing to their extensive protein-DNA interface, nucleosomes are relatively stable structures. Active placement and reorganization of nucleosomes depend on chromatin remodelers. As the gatekeepers of nucleosome packaging, these enzymes participate both in activating and repressing gene expression. Remodelers can assemble, disassemble, and exchange histones within the nucleosome, as well as shift the position of the histone core along DNA. Acting on either face of the nucleosome disk, remodelers can move the histone core back and forth on DNA, changing which parts of DNA are exposed and which are wrapped up in the nucleosome. Increased exposure of DNA occurs when remodelers shift adjacent nucleosomes into each other, resulting in histone ejection (2).
The authors suggest some questions that might direct future research. For example, in addition to DNA geometry and energetics, to what extent is twist diffusion dependent on other characteristics of the nucleosome?
Posted in Artificial Intelligence - Breakthroughs in Theories and Technologies on May 10, 2016| Leave a Comment »
Building AI Is Hard—So Facebook Is Building AI That Builds AI
Reporter: Aviva Lev-Ari, PhD, RN
By forcing computers to do more of the grunt work, the world’s biggest tech companies are accelerating how quickly AI enters the everyday world.
Sourced through Scoop.it from: www.wired.com
See on Scoop.it – Cardiovascular and vascular imaging
In other words, for computers to get smarter faster, computers themselves must handle even more of the grunt work. The giants of the Internet are building computing systems that can test countless machine learning algorithms on behalf of their engineers, that can cycle through so many possibilities on their own. Better yet, these companies are building AI algorithms that can help build AI algorithms. No joke. Inside Facebook, engineers have designed what they like to call an “automated machine learning engineer,” an artificially intelligent system that helps create artificially intelligent systems. It’s a long way from perfection. But the goal is to create new AI models using as little human grunt work as possible.
Feeling the FlowAfter Facebook’s $104 billion IPO in 2012, Hussein Mehanna and other engineers on the Facebook ads team felt an added pressure to improve the company’s ad targeting, to more precisely match ads to the hundreds of millions of people using its social network. This meant building deep neural networks and other machine learning algorithms that could make better use of the vast amounts of data Facebook collects on the characteristics and behavior of those hundreds of millions of people.
SOURCE
https://www.wired.com/2016/05/facebook-trying-create-ai-can-create-ai/
Posted in Artificial Intelligence - Breakthroughs in Theories and Technologies on May 10, 2016| Leave a Comment »
The next AI is no AI
Reporter: Aviva Lev-Ari, PhD, RN
Artificial Intelligence is starting to turn invisible from the outside in — and vice versa. The exact effects and workings of AI technologies are becoming..
Sourced through Scoop.it from: techcrunch.com
See on Scoop.it – Cardiovascular and vascular imaging
Incomprehensible intelligence
Following this, we are able to perceive manifestations and presentations of artificial intelligence, but the intelligence itself becomes unknowable to humans through human senses. Currently there are two distinct traits in this development.
First, most algorithmic systems, as well as the latest advancements in AI technologies, are black boxes; inaccessible, unfathomable and uncontrollable to most people.
Therefore, it’s hard to perceive or assess how intelligent systems shape your life online and offline, from your latest song recommendations to your personalized insurance policy, not to mention the algorithmic stock market trading that shapes the global market economy affecting almost every aspect of modern life.
Concretely, when the actions of intelligent systems become more holistically intertwined with personal, social, cultural, political and economical systems, it becomes challenging to distinguish the exact effects or impact of the machine intelligence itself.
Second, AI technologies are becoming so complex that they are hard to understand — even for the experts designing and developing them. In his recent book, The Master Algorithm, machine learning expert Pedro Domingos points out that already back in 1950s scientists created an algorithm that could do something that humans couldn’t fully comprehend.
This development hasn’t changed its course; rather, to the contrary. With the current pace of AI development, even seasoned experts have a hard time keeping up.
Today’s various machine learning systems can already provide unexpected insights in varying fields, from personalization technologies to particle physics, from cooking recipes and outlandish game moves to crime prevention and bioengineering. Concretely, specialized systems can empower scientific discoveries in biology or help you choose the best route to your next meeting.
SOURCE
This is a lovely method and should find wide applicability in many settings, especially for microorganisms and cell lines. However, it is not clear that this approach will be, as implied by the discussion, an efficient mapping method for all multicellular organisms. I have performed similar experiments in Drosophila, focused on meiotic recombination, on a much smaller scale, and found that CRISPR-Cas9 can indeed generate targeted recombination at gRNA target sites. In every case I tested, I found that the recombination event was associated with a deletion at the gRNA site, which is probably unimportant for most mapping efforts, but may be a concern in some specific cases, for example for clinical applications. It would be interesting to know how often mutations occurred at the targeted gRNA site in this study.
The wider issue, however, is whether CRISPR-mediated recombination will be more efficient than other methods of mapping. After careful consideration of all the costs and the time involved in each of the steps for Drosophila, we have decided that targeted meiotic recombination using flanking visible markers will be, in most cases, considerably more efficient than CRISPR-mediated recombination. This is mainly due to the large expense of injecting embryos and the extensive effort and time required to screen injected animals for appropriate events. It is both cheaper and faster to generate markers (with CRISPR) and then perform a large meiotic recombination mapping experiment than it would be to generate the lines required for CRISPR-mediated recombination mapping. It is possible to dramatically reduce costs by, for example, mapping sequentially at finer resolution. But this approach would require much more time than marker-assisted mapping. If someone develops a rapid and cheap method of reliably introducing DNA into Drosophila embryos, then this calculus might change.
However, it is possible to imagine situations where CRISPR-mediated mapping would be preferable, even for Drosophila. For example, some genomic regions display extremely low or highly non-uniform recombination rates. It is possible that CRISPR-mediated mapping could provide a reasonable approach to fine mapping genes in these regions.
The authors also propose the exciting possibility that CRISPR-mediated loss of heterozygosity could be used to map traits in sterile species hybrids. It is not entirely obvious to me how this experiment would proceed and I hope the authors can illuminate me. If we imagine driving a recombination event in the early embryo (with maternal Cas9 from one parent and gRNA from a second parent), then at best we would end up with chimeric individuals carrying mitotic clones. I don’t think one could generate diploid animals where all cells carried the same loss of heterozygosity event. Even if we could, this experiment would require construction of a substantial number of stable transgenic lines expressing gRNAs. Mapping an ~20Mbp chromosome arm to ~10kb would require on the order of two-thousand transgenic lines. Not an undertaking to be taken lightly. It is already possible to perform similar tests (hemizygosity tests) using D. melanogaster deficiency lines in crosses with D. simulans, so perhaps CRISPR-mediated LOH could complement these deficiency screens for fine mapping efforts. But, at the moment, it is not clear to me how to do the experiment.