Advertisements
Feeds:
Posts
Comments

Posts Tagged ‘multivariate statistical analysis’


mRNA Data Survival Analysis

Curators: Larry H. Bernstein, MD, FCAP and Aviva Lev-Ari, PhD, RN

 

 

SURVIV for survival analysis of mRNA isoform variation

Shihao ShenYuanyuan WangChengyang WangYing Nian Wu & Yi Xing
Nature Communications7,Article number:11548
 Feb 2016      doi:10.1038/ncomms11548

The rapid accumulation of clinical RNA-seq data sets has provided the opportunity to associate mRNA isoform variations to clinical outcomes. Here we report a statistical method SURVIV (Survival analysis of mRNA Isoform Variation), designed for identifying mRNA isoform variation associated with patient survival time. A unique feature and major strength of SURVIV is that it models the measurement uncertainty of mRNA isoform ratio in RNA-seq data. Simulation studies suggest that SURVIV outperforms the conventional Cox regression survival analysis, especially for data sets with modest sequencing depth. We applied SURVIV to TCGA RNA-seq data of invasive ductal carcinoma as well as five additional cancer types. Alternative splicing-based survival predictors consistently outperform gene expression-based survival predictors, and the integration of clinical, gene expression and alternative splicing profiles leads to the best survival prediction. We anticipate that SURVIV will have broad utilities for analysing diverse types of mRNA isoform variation in large-scale clinical RNA-seq projects.

Eukaryotic cells generate remarkable regulatory and functional complexity from a finite set of genes. Production of mRNA isoforms through alternative processing and modification of RNA is essential for generating this complexity. A prevalent mechanism for producing mRNA isoforms is the alternative splicing of precursor mRNA1. Over 95% of the multi-exon human genes undergo alternative splicing2, 3, resulting in an enormous level of plasticity in the regulation of gene function and protein diversity. In the last decade, extensive genomic and functional studies have firmly established the critical role of alternative splicing in cancer4, 5, 6. Alternative splicing is involved in a full spectrum of oncogenic processes including cell proliferation, apoptosis, hypoxia, angiogenesis, immune escape and metastasis7, 8. These cancer-associated alternative splicing patterns are not merely the consequences of disrupted gene regulation in cancer but in numerous instances actively contribute to cancer development and progression. For example, alternative splicing of genes encoding the Bcl-2 family of apoptosis regulators generates both anti-apoptotic and pro-apoptotic protein isoforms9. Alternative splicing of the pyruvate kinase M (PKM) gene has a significant impact on cancer cell metabolism and tumour growth10. A transcriptome-wide switch of the alternative splicing programme during the epithelial–mesenchymal transition plays an important role in cancer cell invasion and metastasis11, 12.

RNA sequencing (RNA-seq) has become a popular and cost-effective technology to study transcriptome regulation and mRNA isoform variation13, 14. As the cost of RNA-seq continues to decline, it has been widely adopted in large-scale clinical transcriptome projects, especially for profiling transcriptome changes in cancer. For example, as of April 2015 The Cancer Genome Atlas (TCGA) consortium had generated RNA-seq data on over 11,000 cancer patient specimens from 34 different cancer types. Within the TCGA data, breast invasive carcinoma (BRCA) has the largest sample size of RNA-seq data covering over 1,000 patients, and clinical information such as survival times, tumour stages and histological subtypes is available for the majority of the BRCA patients15. Moreover, the median follow-up time of BRCA patients is ~400 days, and 25% of the patients have more than 1,200 days of follow-up. Collectively, the large sample size and long follow-up time of the TCGA BRCA data set allow us to correlate genomic and transcriptomic profiles to clinical outcomes and patient survival times.

To date, systematic analyses have been performed to reveal the association between copy number variation, DNA methylation, gene expression and microRNA expression profiles with cancer patient survival16, 17. By contrast, despite the importance of mRNA isoform variation and alternative splicing, there have been limited efforts in transcriptome-wide survival analysis of alternative splicing in cancer patients. Most RNA-seq studies of alternative splicing in cancer transcriptomes focus on identifying ‘cancer-specific’ alternative splicing events by comparing cancer tissues with normal controls (see refs 18, 19, 20, 21, 22, 23 for examples). A recent analysis of TCGA RNA-seq data identified 163 recurrent differential alternative splicing events between cancer and normal tissues of three cancer types, among which five were found to have suggestive survival signals for breast cancer at a nominal P-value cutoff of 0.05 (ref. 21). Some other studies reported a significant survival difference between cancer patient subgroups after stratifying patients with overall mRNA isoform expression profiles24, 25. However, systematic cancer survival analyses of alternative splicing at the individual exon resolution have been lacking. Two main challenges exist for survival analyses of mRNA isoform variation and alternative splicing using RNA-seq data. The first challenge is to account for the estimation uncertainty of mRNA isoform ratios inferred from RNA-seq read counts. The statistical confidence of mRNA isoform ratio estimation depends on the RNA-seq read coverage for the events of interest, with larger read coverage leading to a more reliable estimation14. Modelling the estimation uncertainty of mRNA isoform ratio is an essential component of RNA-seq analyses of alternative splicing, as shown by various statistical algorithms developed for detecting differential alternative splicing from multi-group RNA-seq data14, 26, 27, 28,29. The second challenge, which is a general issue in survival analysis, is to properly model the association of mRNA isoform ratio with survival time, while accounting for missing data in survival time because of censoring, that is, patients still alive at the end of the survival study, whose precise survival time would be uncertain. To date, no algorithm has been developed for survival analyses of mRNA isoform variation that accounts for these sources of uncertainty simultaneously.

Here we introduce SURVIV (Survival analysis of mRNA Isoform Variation), a statistical model for identifying mRNA isoform ratios associated with patient survival times in large-scale cancer RNA-seq data sets. SURVIV models the estimation uncertainty of mRNA isoform ratios in RNA-seq data and tests the survival effects of isoform variation in both censored and uncensored survival data. In simulation studies, SURVIV consistently outperforms the conventional Cox regression survival analysis that ignores the measurement uncertainty of mRNA isoform ratio. We used SURVIV to identify alternatively spliced exons whose exon-inclusion levels significantly correlated with the survival times of invasive ductal carcinoma (IDC) patients from the TCGA breast cancer cohort. Survival-associated alternative splicing events are identified in gene pathways associated with apoptosis, oxidative stress and DNA damage repair. Importantly, we show that alternative splicing-based survival predictors outperform gene expression-based survival predictors in the TCGA IDC RNA-seq data set, as well as in TCGA data of five additional cancer types. Moreover, the integration of clinical information, gene expression and alternative splicing profiles leads to the best prediction of survival time.

SURVIV statistical model

The statistical model of SURVIV assesses the association between mRNA isoform ratio and patient survival time. While the model is generic for many types of alternative isoform variation, here we use the exon-skipping type of alternative splicing to illustrate the model (Fig. 1a). For each alternative exon involved in exon-skipping, we can use the RNA-seq reads mapping to its exon-inclusion or -skipping isoform to estimate its exon-inclusion level (denoted as ψ, or PSI that is Per cent Spliced In14). A key feature of SURVIV is that it models the RNA-seq estimation uncertainty of exon-inclusion level as influenced by the sequencing coverage for the alternative splicing event of interest. This is a critical issue in accurate quantitative analyses of mRNA isoform ratio in large-scale RNA-seq data sets14, 26, 27, 28, 29. Therefore, SURVIV contains two major components: the first to model the association of mRNA isoform ratio with patient survival time across all patients, and the second to model the estimation uncertainty of mRNA isoform ratio in each individual patient (Fig. 1a).

Figure 1: The statistical framework of the SURVIV model.

(a) For each patient k, the patient’s hazard rate λk(t) is associated with the baseline hazard rate λ0(t) and this patient’s exon-inclusion level ψk. The association of exon-inclusion level with patient survival is estimated by the survival coefficient β. The exon-inclusion level ψk is estimated from the read counts for the exon-inclusion isoform ICk and the exon-skipping isoform SCk. The proportion of the inclusion and skipping reads is adjusted by a normalization function f that considers the lengths of the exon-inclusion and -skipping isoforms (see details in Results and Supplementary Methods). (b) A hypothetical example to illustrate the association of exon-inclusion level with patient survival probability over time Sk(t), with the survival coefficient β=−1 and a constant baseline hazard rate λ0(t)=1. In this example, patients with higher exon-inclusion levels have lower hazard rates and higher survival probabilities. (c) The schematic diagram of an exon-skipping event. The exon-inclusion reads ICk are the reads from the upstream splice junction, the alternative exon itself and the downstream splice junction. The exon-skipping reads SCk are the reads from the skipping splice junction that directly connects the upstream exon to the downstream exon.

Briefly, for any individual exon-skipping event, the first component of SURVIV uses a proportional hazards model to establish the relationship between patient k’s exon-inclusion level ψk and hazard rate λk(t).

For each exon, the association between the exon-inclusion level and patient survival time is reflected by the survival coefficient β. A positive β means increased exon inclusion is associated with higher hazard rate and poorer survival, while a negative β means increased exon inclusion is associated with lower hazard rate and better survival. λ0(t) is the baseline hazard rate estimated from the survival data of all patients (see Supplementary Methods for the detailed estimation procedure). A particular patient’s survival probability over time Sk(t) can be calculated from the patient-specific hazard rate λk(t) as . Figure 1b illustrates a simple example with a negative β=−1 and a constant baseline hazard rate λ0(t)=1, where higher exon-inclusion levels are associated with lower hazard rates and higher survival probabilities.

The second component of SURVIV models the exon-inclusion level and its estimation uncertainty in individual patient samples. As illustrated in Fig. 1c, the exon-inclusion level ψk of a given exon in a particular sample can be estimated by the RNA-seq read count specific to the exon inclusion isoform (ICk) and the exon-skipping isoform (SCk). Other types of alternative splicing and mRNA isoform variation can be similarly modelled by this framework29. Given the effective lengths (that is, the number of unique isoform-specific read positions) of the exon-inclusion isoform (lI) and the exon-skipping isoform (lS), the exon-inclusion level ψk can be estimated as . Assuming that the exon-inclusion read count ICk follows a binomial distribution with the total read count nk=ICk+SCk, we have:

The binomial distribution models the estimation uncertainty of ψk as influenced by the total read count nk, in which the parameter pk represents the proportion of reads from the exon-inclusion isoform, given the exon-inclusion level ψk adjusted by a length normalization function f(ψk) based on the effective lengths of the isoforms. The definitions of effective lengths for all basic types of alternative splicing patterns are described in ref. 29.

Distinct from conventional survival analyses in which predictors do not have estimation uncertainty, the predictors in SURVIV are exon-inclusion levels ψk estimated from RNA-seq count data, and the confidence of ψk estimate for a given exon in a particular sample depends on the RNA-seq read coverage. We use the statistical framework of survival measurement error model30 to incorporate the estimation uncertainty of isoform ratio in the proportional hazards model. Using a likelihood ratio test, we test whether the exon-inclusion levels have a significant association with patient survival over the null hypothesis H0:β=0. The false discovery rate (FDR) is estimated using the Benjamini and Hochberg approach31. Details of the parameter estimation and likelihood ratio test in SURVIV are described in Supplementary Methods.

 

Figure 2: Simulation studies to assess the performance of SURVIV and the importance of modelling the estimation uncertainty of mRNA isoform ratio.

We compared our SURVIV model with Cox regression using point estimates of exon-inclusion levels, which does not consider the estimation uncertainty of the mRNA isoform ratio. (a) To study the effect of RNA-seq depth, we simulated the mean total splice junction read counts equal to 5, 10, 20, 50, 80 and 100 reads. We generated two sets of simulations with and without data-censoring. For each simulation, the true-positive rate (TPR) at 5% false-positive rate is plotted. The inset figure shows the empirical distribution of the mean total splice junction read counts in the TCGA IDC RNA-seq data (x axis in the log10 scale). (b) To faithfully represent the read count distribution in a real data set, we performed another simulation with read counts directly sampled from the TCGA IDC data. Sampled read counts were then multiplied by different factors ranging from 10 to 300% to simulate data sets with different RNA-seq read depth. Continuous and dashed lines represent the performance of SURVIV and Cox regression, respectively. Red lines represent the area under curve (AUC) of the ROC curve (TPR versus false-positive rate plot). Black lines represent the TPR at 5% false-positive rate.

 

Using these simulated data, we compared SURVIV with Cox regression in two settings, without or with censoring of the survival time. In the setting without censoring, the death and survival time of each individual is known. In the setting with censoring, certain individuals are still alive at the end of the survival study. Consequently, these patients have unknown death and survival time. Here, in the simulation with censoring, we assumed that 85% of the patients were still alive at the end of the study, similar to the censoring rate of the TCGA IDC data set. In both settings and with different depths of RNA-seq coverage, SURVIV consistently outperformed Cox regression in the true-positive rate at the same false-positive rate of 5% (Fig. 2a). As expected, we observed a more significant improvement in SURVIV over Cox regression when the RNA-seq read coverage was low (Fig. 2a).

To more faithfully recapitulate the read count distribution in a real cancer RNA-seq data set, we performed another simulation study with read counts directly sampled from the TCGA IDC data. To assess the influence of RNA-seq read depth on the performance of SURVIV and Cox regression, sampled read counts were then multiplied by different factors ranging from 10 to 300% to simulate data sets with different RNA-seq read depths (Fig. 2b). The TCGA IDC data set has an average RNA-seq depth of ~60 million paired-end reads per patient. Thus, the read depth of these simulated RNA-seq data sets ranged from ~6 million reads to 180 million reads per patient, representing low-coverage RNA-seq studies designed primarily for gene expression analysis32 up to high-coverage RNA-seq studies designed primarily for alternative isoform analysis29. At all levels of RNA-seq depth, SURVIV consistently outperformed Cox regression, as reflected by the area under curve of the receiver operating characteristic (ROC) curve as well as the true-positive rate at 5% false-positive rate (Fig. 2b). The improvement of SURVIV over Cox regression was particularly prominent when the read depth was low. For example, at 10% read depth, SURVIV had 7% improvement in area under curve (68% versus 61%) and 8% improvement in the true-positive rate at 5% false-positive rate (46% versus 38%). Collectively, these simulation results suggest that SURVIV achieves a higher accuracy by accounting for the estimation uncertainty of mRNA isoform ratio in RNA-seq data.

SURVIV analysis of TCGA IDC breast cancer data

To illustrate the practical utility of SURVIV, we used it to analyse the overall survival time of 682 IDC patients from the TCGA breast cancer (BRCA) RNA-seq data set (see Methods for details of the data source and processing pipeline). We chose to analyse IDC because it is the most frequent type of breast cancer33, comprising ~70% of patients in the TCGA breast cancer data set. To control for the effects of significant clinical parameters such as tumour stage and subtype and identify alternative splicing events associated with patient outcomes across multiple molecular and clinical subtypes, we followed the procedure of Croce and colleagues in analysing mRNA and microRNA prognostic signature of IDC33 and stratified the patients according to their clinical parameters. We then conducted SURVIV analysis in 26 clinical subgroups with at least 50 patients in each subgroup. We identified 229 exon-skipping events associated with patient survival in multiple clinical subgroups that met the criteria of SURVIV P-value≤0.01 in at least two subgroups of the same clinical parameter (cancer subtype, stage, lymph node, metastasis, tumour size, oestrogen receptor status, progesterone receptor status, HER2 status and age as shown in Fig. 3). DAVID (Database for Annotation, Visualization and Integrated Discovery) Gene Ontology analyses34 of the 229 alternative splicing events suggest an enrichment of genes in cancer-related functional categories such as intracellular signalling, apoptosis, oxidative stress and response to DNA damage (Supplementary Fig. 1). Table 1 shows a few selected examples of survival-associated alternative splicing events in cancer-related genes. Using two-means clustering of each individual exon’s inclusion levels, the 682 IDC patients can be segregated into two subgroups with significantly different survival times as illustrated by the Kaplan–Meier survival plot (Fig. 4). We also carried out hierarchical clustering of IDC patients using 176 survival-associated alternative exons (P≤0.01; SURVIV analysis of all IDC patients). Using the exon-inclusion levels of these 176 exons, we clustered IDC patients into three major subgroups, with 95, 194 and 389 patients, respectively. As illustrated by the Kaplan–Meier survival plots, the three subgroups had significantly different survival times (Supplementary Fig. 2).

Figure 3: SURVIV analysis of exon-skipping events in the TCGA IDC RNA-seq data set.

IDC patients are stratified into multiple clinical subgroups based on clinical parameters including cancer subtype, stage, lymph node status, metastasis, tumour size, oestrogen receptor status, progesterone receptor status, HER2 status and age. Only clinical subgroups with at least 50 patients are included in further analyses. Numbers of patients in the subgroups are indicated next to the names of the subgroups. Shown in the heatmap are the log10 SURVIV P-values of the 229 exons associated with patient survival (P≤0.01) in at least two subgroups of the same class of clinical parameters. Turquoise colour indicates positive correlation that higher exon-inclusion levels are associated with higher survival probabilities. Magenta colour indicates negative correlation that lower exon-inclusion levels are associated with higher survival probabilities.

TABLE 1 (not shown)

Figure 4: Kaplan–Meier survival plots of IDC patients stratified by two-means clustering of the exon-inclusion levels of four survival-associated alternative splicing events.

Clustering was generated for each of the four exons separately. Black lines represent patients with high exon-inclusion levels. Red lines represent patients with low exon-inclusion levels. The P-values are from SURVIV analysis of the TCGA IDC RNA-seq data. (a) ATRIP. (b) BCL2L11. (c) CD74. (d) PCBP4.

 

Figure 5: Alternative splicing of STAT5A exon 5 is significantly associated with IDC patient survival.

(a) The gene structure of the STAT5A full-length isoform compared to the ΔEx5 isoform skipping the 5th exon. (b) Kaplan–Meier survival plot of IDC patients stratified by two-means clustering using exon-inclusion levels of STAT5A exon 5. The 420 patients in Group 1 (average exon 5 inclusion level=95%) have significantly higher survival probabilities than the 262 patients in Group 2 (average exon 5 inclusion level=85%) (SURVIV P=6.8e−4). (c) Exon 5 inclusion levels of IDC patients stratified by two-means clustering using exon 5 inclusion levels. Group 1 has 420 patients with average exon-inclusion level at 95%. Group 2 has 262 patients with average exon-inclusion level at 85%. (d) STAT5A exon 5 inclusion levels in normal breast tissues versus breast cancer tumour samples. Exon-inclusion levels are extracted from 86 TCGA breast cancer patients with matched normal and tumour samples. Normal breast tissues have average exon 5 inclusion level at 95%, compared to 91% average exon-inclusion level in tumour samples. Error bars represent 95% confidence interval of the mean.

Network of survival-associated alternative splicing events

…see http://www.nature.com/ncomms/2016/160609/ncomms11548/full/ncomms11548.html

Figure 6: Splicing factor regulatory network of survival-associated alternative splicing events in IDC.

(ac) Kaplan–Meier survival plots of IDC patients stratified by the gene expression levels of three splicing factors: TRA2B (a, Cox regression P=1.8e−4), HNRNPH1 (b, P=3.4e−4) and SFRS3 (c, P=2.8e−3). Black lines represent patients with high gene expression levels. Red lines represent patients with low gene expression levels. (d) The exon-inclusion levels of a DHX30 alternative exon are negatively correlated with TRA2B gene expression levels (robust correlation coefficient r=−0.26, correlation P=1.2e−17). (e) The exon-inclusion levels of a MAP3K4 alternative exon are positively correlated withHNRNPH1 gene expression levels (robust correlation coefficient r=0.16, correlation P=2.6e−06). (f) A splicing co-expression network of the three splicing factors and their correlated survival-associated alternative exons. In total, 84 survival-associated alternative exons are significantly correlated with the three splicing factors. The positive/negative correlation between splicing factors and alternative exons is represented by blue/red lines, respectively. Exons whose inclusion levels are positively/negatively correlated with survival times are represented by blue/red dots, respectively. The size of the splicing factor circles is proportional to the number of correlated exons within the network.

…..

Alternative splicing predictors of cancer patient survival

see http://www.nature.com/ncomms/2016/160609/ncomms11548/full/ncomms11548.html

Figure 7: Cross-validation of different classes of IDC survival predictors measured by the C-index

A C-index of 1 indicates perfect prediction accuracy and a C-index of 0.5 indicates random guess. The plots indicate the distribution of C-indexes from 100 rounds of cross-validation. The centre value of the box plot is the median C-index from 100 rounds of cross-validation. The notch represents the 95%confidence interval of the median. The box represents the 25 and 75% quantiles. The whiskers extended out from the box represent the 5 and 95% quantiles. Two-sided Wilcoxon test was used to compare different survival predictors. The different classes of predictors are: (a) clinical information (median C-index 0.67). (b) Gene expression (median C-index 0.68). (c) Alternative splicing (median C-index 0.71). (d) Clinical information+gene expression (median C-index 0.69). (e) Clinical information+alternative splicing (median C-index 0.73). (f) Clinical information+gene expression+alternative splicing (median C-index 0.74). Note that ‘Gene’ refers to ‘Gene-level expression’ in these plots.

Next, we carried out the SURVIV analysis in five additional cancer types in TCGA, including GBM (glioblastoma multiforme), KIRC (kidney renal clear cell carcinoma), LGG (lower grade glioma), LUSC (lung squamous cell carcinoma) and OV (ovarian serous cystadenocarcinoma). As expected, the number of significant events at different FDR or P-value significance cutoffs varied across cancer types, with LGG having the strongest survival-associated alternative splicing signals with 660 significant exon-skipping events at FDR≤5% (Supplementary Data 3 and 4). Strikingly, regardless of the number of significant events, alternative splicing-based survival predictors outperformed gene expression-based survival predictors across all cancer types (Supplementary Fig. 3), consistent with our initial observation on the IDC data set.

 

Alternative processing and modification of mRNA, such as alternative splicing, allow cells to generate a large number of mRNA and protein isoforms with diverse regulatory and functional properties. The plasticity of alternative splicing is often exploited by cancer cells to produce isoform switches that promote cancer cell survival, proliferation and metastasis7, 8. The widespread use of RNA-seq in cancer transcriptome studies15, 47, 48 has provided the opportunity to comprehensively elucidate the landscape of alternative splicing in cancer tissues. While existing studies of alternative splicing in large-scale cancer transcriptome data largely focused on the comparison of splicing patterns between cancer and normal tissues or between different subtypes of cancer18, 21, 49, additional computational tools are needed to characterize the clinical relevance of alternative splicing using massive RNA-seq data sets, including the association of alternative splicing with phenotypes and patient outcomes.

We have developed SURVIV, a novel statistical model for survival analysis of alternative isoform variation using cancer RNA-seq data. SURVIV uses a survival measurement error model to simultaneously model the estimation uncertainty of mRNA isoform ratio in individual patients and the association of mRNA isoform ratio with survival time across patients. Compared with the conventional Cox regression model that uses each patient’s mRNA isoform ratio as a point estimate, SURVIV achieves a higher accuracy as indicated by simulation studies under a variety of settings. Of note, we observed a particularly marked improvement of SURVIV over Cox regression for low- and moderate-depth RNA-seq data (Fig. 2b). This has important practical value because many clinical RNA-seq data sets have large sample size but relatively modest sequencing depth.

Using the TCGA IDC breast cancer RNA-seq data of 682 patients, SURVIV identified 229 alternative splicing events associated with patient survival time, which met the criteria of SURVIVP-values≤0.01 in multiple clinical subgroups. While the statistical threshold seemed loose, several lines of evidence suggest the functional and clinical relevance of these survival-associated alternative splicing events. These alternative splicing events were frequently identified and enriched in the gene functional groups important for cancer development and progression, including apoptosis, DNA damage response and oxidative stress. While some of these events may simply reflect correlation but not causal effect on cancer patient survival, other events may play an active role in regulating cancer cell phenotypes. For example, a survival-associated alternative splicing event involving exon 5 of STAT5A is known to regulate the activity of this transcription factor with important roles in epithelial cell growth and apoptosis37. Using a co-expression network analysis of splicing factor to exon correlation across all patients, we identified three splicing factors (TRA2B, HNRNPH1 and SFRS3) as potential hubs of the survival-associated alternative splicing network of IDC. The expression levels of all three splicing factors were negatively associated with patient survival times (Fig. 6a–c), and both TRA2B and HNRNPH1 were previously reported to have an impact on cancer-related molecular pathways40, 41, 42, 43, 44, 45. Finally, despite the limited power in detecting individual events, we show that the survival-associated alternative splicing events can be used to construct a predictor for patient survival, with an accuracy higher than predictors based on clinical parameters or gene expression profiles (Fig. 7). This further demonstrates the potential biological relevance and clinical utility of the identified alternative splicing events.

We performed cross-validation analyses to evaluate and compare the prognostic value of alternative splicing, gene expression and clinical information for predicting patient survival, either independently or in combination. As expected, the combined use of all three types of information led to the best prediction accuracy. Because we used penalized regression to build the prediction model, combining information from multiple layers of data did not necessarily increase the number of predictors in the model. The perhaps more surprising and intriguing result is that alternative splicing-based predictors appear to outperform gene expression-based predictors when used alone and when either type of data was combined with clinical information (Fig. 7). We observed the same trend in five additional cancer types (Supplementary Fig. 3). We note that this finding was consistent with a previous report that cancer subtype classification based on splicing isoform expression performed better than gene expression-based classification25. While this trend seems counterintuitive because accurate estimation of gene expression requires much lower RNA-seq depth than accurate estimation of alternative splicing29, one possible explanation may be the inherent characteristic of isoform ratio data. By definition, mRNA isoform ratio is estimated as the ratio of multiple mRNA isoforms from a single gene. Therefore, mRNA isoform ratio data have a ‘built-in’ internal control that could be more robust against certain artefacts and confounding issues that influence gene expression estimates across large clinical RNA-seq data sets, such as poor sample quality and RNA degradation12. Regardless of the reasons, our data call for further studies to fully explore the utility of mRNA isoform ratio data for various clinical research applications.

The SURVIV source code is available for download at https://github.com/Xinglab/SURVIV. SURVIV is a general statistical model for survival analysis of mRNA isoform ratio using RNA-seq data. The current statistical framework of SURVIV is applicable to RNA-seq based count data for all basic types of alternative splicing patterns involving two isoform choices from an alternatively spliced region, such as exon-skipping, alternative 5′ splice sites, alternative 3′ splice sites, mutually exclusive exons and retained introns, as well as other forms of alternative isoform variation such as RNA editing. With the rapid accumulation of clinical RNA-seq data sets, SURVIV will be a useful tool for elucidating the clinical relevance and potential functional significance of alternative isoform variation in cancer and other diseases.

 

Advertisements

Read Full Post »


Genetic Analysis of Atrial Fibrillation

Author and Curator: Larry H Bernstein, MD, FCAP  

and 

Curator: Aviva-Lev Ari, PhD, RN

This article is a followup of the wonderful study of the effect of oxidation of a methionine residue in calcium dependent-calmodulin kinase Ox-CaMKII on stabilizing the atrial cardiomyocyte, giving protection from atrial fibrillation.  It is also not so distant from the work reviewed, mostly on the ventricular myocyte and the calcium signaling by initiation of the ryanodyne receptor (RyR2) in calcium sparks and the CaMKII d isoenzyme.

We refer to the following related articles published in pharmaceutical Intelligence:

Oxidized Calcium Calmodulin Kinase and Atrial Fibrillation
Author: Larry H. Bernstein, MD, FCAP and Curator: Aviva Lev-Ari, PhD, RN
https://pharmaceuticalintelligence.com/2013/10/26/oxidized-calcium-calmodulin-kinase-and-atrial-fibrillation/

Jmjd3 and Cardiovascular Differentiation of Embryonic Stem Cells

Author: Larry H. Bernstein, MD, FCAP and Curator: Aviva Lev-Ari, PhD, RN

https://pharmaceuticalintelligence.com/2013/10/26/jmjd3-and-cardiovascular-differentiation-of-embryonic-stem-cells/

Contributions to cardiomyocyte interactions and signaling
Author and Curator: Larry H Bernstein, MD, FCAP  and Curator: Aviva Lev-Ari, PhD, RN
https://pharmaceuticalintelligence.com/2013/10/21/contributions-to-cardiomyocyte-interactions-and-signaling/

Cardiac Contractility & Myocardium Performance: Therapeutic Implications for Ryanopathy (Calcium Release-related Contractile Dysfunction) and Catecholamine Responses
Editor: Justin Pearlman, MD, PhD, FACC, Author and Curator: Larry H Bernstein, MD, FCAP, and Article Curator: Aviva Lev-Ari, PhD, RN
https://pharmaceuticalintelligence.com/2013/08/28/cardiac-contractility-myocardium-performance-ventricular-arrhythmias-and-non-ischemic-heart-failure-therapeutic-implications-for-cardiomyocyte-ryanopathy-calcium-release-related-contractile/

Part I. Identification of Biomarkers that are Related to the Actin Cytoskeleton
Curator and Writer: Larry H Bernstein, MD, FCAP
https://pharmaceuticalintelligence.com/2012/12/10/identification-of-biomarkers-that-are-related-to-the-actin-cytoskeleton/

Part II: Role of Calcium, the Actin Skeleton, and Lipid Structures in Signaling and Cell Motility
Larry H. Bernstein, MD, FCAP, Stephen Williams, PhD and Aviva Lev-Ari, PhD, RN
https://pharmaceuticalintelligence.com/2013/08/26/role-of-calcium-the-actin-skeleton-and-lipid-structures-in-signaling-and-cell-motility/

Part IV: The Centrality of Ca(2+) Signaling and Cytoskeleton Involving Calmodulin Kinases and Ryanodine Receptors in Cardiac Failure, Arterial Smooth Muscle, Post-ischemic Arrhythmia, Similarities and Differences, and Pharmaceutical Targets
Larry H Bernstein, MD, FCAP, Justin Pearlman, MD, PhD, FACC and Aviva Lev-Ari, PhD, RN
https://pharmaceuticalintelligence.com/2013/09/08/the-centrality-of-ca2-signaling-and-cytoskeleton-involving-calmodulin-kinases-and-ryanodine-receptors-in-cardiac-failure-arterial-smooth-muscle-post-ischemic-arrhythmia-similarities-and-differen/

Part VI: Calcium Cycling (ATPase Pump) in Cardiac Gene Therapy: Inhalable Gene Therapy for Pulmonary Arterial Hypertension and Percutaneous Intra-coronary Artery Infusion for Heart Failure: Contributions by Roger J. Hajjar, MD
Aviva Lev-Ari, PhD, RN
https://pharmaceuticalintelligence.com/2013/08/01/calcium-molecule-in-cardiac-gene-therapy-inhalable-gene-therapy-for-pulmonary-arterial-hypertension-and-percutaneous-intra-coronary-artery-infusion-for-heart-failure-contributions-by-roger-j-hajjar/

Part VII: Cardiac Contractility & Myocardium Performance: Ventricular Arrhythmias and Non-ischemic Heart Failure – Therapeutic Implications for Cardiomyocyte Ryanopathy (Calcium Release-related Contractile Dysfunction) and Catecholamine Responses
Justin Pearlman, MD, PhD, FACC, Larry H Bernstein, MD, FCAP and Aviva Lev-Ari, PhD, RN
https://pharmaceuticalintelligence.com/2013/08/28/cardiac-contractility-myocardium-performance-ventricular-arrhythmias-and-non-ischemic-heart-failure-therapeutic-implications-for-cardiomyocyte-ryanopathy-calcium-release-related-contractile/

Part VIII: Disruption of Calcium Homeostasis: Cardiomyocytes and Vascular Smooth Muscle Cells: The Cardiac and Cardiovascular Calcium Signaling Mechanism
Justin Pearlman, MD, PhD, FACC, Larry H Bernstein, MD, FCAP and Aviva Lev-Ari, PhD, RN
https://pharmaceuticalintelligence.com/2013/09/12/disruption-of-calcium-homeostasis-cardiomyocytes-and-vascular-smooth-muscle-cells-the-cardiac-and-cardiovascular-calcium-signaling-mechanism/

Part IX: Calcium-Channel Blockers, Calcium Release-related Contractile Dysfunction (Ryanopathy) and Calcium as Neurotransmitter Sensor
Justin Pearlman, MD, PhD, FACC, Larry H Bernstein, MD, FCAP and Aviva Lev-Ari, PhD, RN
https://pharmaceuticalintelligence.com/2013/09/16/calcium-channel-blocker-calcium-as-neurotransmitter-sensor-and-calcium-release-related-contractile-dysfunction-ryanopathy/

Part X: Synaptotagmin functions as a Calcium Sensor: How Calcium Ions Regulate the fusion of vesicles with cell membranes during Neurotransmission
Larry H Bernstein, MD, FCAP and Aviva Lev-Ari, PhD, RN
https://pharmaceuticalintelligence.com/2013/09/10/synaptotagmin-functions-as-a-calcium-sensor-how-calcium-ions-regulate-the-fusion-of-vesicles-with-cell-membranes-during-neurotransmission/

The material presented is very focused, and cannot be found elsewhere in Pharmaceutical Intelligence with respedt to genetics and heart disease.  However, there are other articles that may be of interest to the reader.

Volume Three: Etiologies of Cardiovascular Diseases – Epigenetics, Genetics & Genomics

Curators: Larry H Bernstein, MD, FCAP and Aviva Lev-Ari, PhD, RN
https://pharmaceuticalintelligence.com/biomed-e-books/series-a-e-books-on-cardiovascular-diseases/volume-three-etiologies-of-cardiovascular-diseases-epigenetics-genetics-genomics/

PART 3.  Determinants of Cardiovascular Diseases: Genetics, Heredity and Genomics Discoveries

3.2 Leading DIAGNOSES of Cardiovascular Diseases covered in Circulation: Cardiovascular Genetics, 3/2010 – 3/2013

The Diagnoses covered include the following – relevant to this discussion

  • MicroRNA in Serum as Bimarker for Cardiovascular Pathologies: acute myocardial infarction, viral myocarditis, diastolic dysfunction, and acute heart failure
  • Genomics of Ventricular arrhythmias, A-Fib, Right Ventricular Dysplasia, Cardiomyopathy
  • Heredity of Cardiovascular Disorders Inheritance

3.2.1: Heredity of Cardiovascular Disorders Inheritance

The implications of heredity extend beyond serving as a platform for genetic analysis, influencing diagnosis,

  1. prognostication, and
  2. treatment of both index cases and relatives, and
  3. enabling rational targeting of genotyping resources.

This review covers acquisition of a family history, evaluation of heritability and inheritance patterns, and the impact of inheritance on subsequent components of the clinical pathway.

SOURCE:   Circulation: Cardiovascular Genetics.2011; 4: 701-709.  http://dx.doi.org/10.1161/CIRCGENETICS.110.959379

3.2.2: Myocardial Damage

3.2.2.1 MicroRNA in Serum as Biomarker for Cardiovascular Pathologies: acute myocardial infarction, viral myocarditis,  diastolic dysfunction, and acute heart failure

Increased MicroRNA-1 and MicroRNA-133a Levels in Serum of Patients With Cardiovascular Disease Indicate Myocardial Damage
Y Kuwabara, Koh Ono, T Horie, H Nishi, K Nagao, et al.
SOURCE:  Circulation: Cardiovascular Genetics. 2011; 4: 446-454   http://dx.doi.org/10.1161/CIRCGENETICS.110.958975

3.2.2.2 Circulating MicroRNA-208b and MicroRNA-499 Reflect Myocardial Damage in Cardiovascular Disease

MF Corsten, R Dennert, S Jochems, T Kuznetsova, Y Devaux, et al.
SOURCE: Circulation: Cardiovascular Genetics. 2010; 3: 499-506.  http://dx.doi.org/10.1161/CIRCGENETICS.110.957415

3.2.4.2 Large-Scale Candidate Gene Analysis in Whites and African Americans Identifies IL6R Polymorphism in Relation to Atrial Fibrillation

The National Heart, Lung, and Blood Institute’s Candidate Gene Association Resource (CARe) Project
RB Schnabel, KF Kerr, SA Lubitz, EL Alkylbekova, et al.
SOURCE:  Circulation: Cardiovascular Genetics.2011; 4: 557-564   http://dx.doi.org/10.1161/CIRCGENETICS.110.959197

 Weighted Gene Coexpression Network Analysis of Human Left Atrial Tissue Identifies Gene Modules Associated With Atrial Fibrillation

N Tan, MK Chung, JD Smith, J Hsu, D Serre, DW Newton, L Castel, E Soltesz, G Pettersson, AM Gillinov, DR Van Wagoner and J Barnard
From the Cleveland Clinic Lerner College of Medicine (N.T.), Department of Cardiovascular Medicine (M.K.C., D.W.N.), and Department of Thoracic & Cardiovascular Surgery (E.S., G.P., A.M.G.); and Department of Cellular & Molecular Medicine (J.D.S., J.H.), Genomic Medicine Institute (D.S.), Department of Molecular Cardiology (L.C.), and Department of Quantitative Health Sciences (J.B.), Cleveland Clinic Lerner Research Institute, Cleveland, OH
Circ Cardiovasc Genet. 2013;6:362-371; http://dx.doi.org/10.1161/CIRCGENETICS.113.000133
http://circgenetics.ahajournals.org/content/6/4/362   The online-only Data Supplement is available at http://circgenetics.ahajournals.org/lookup/suppl/doi:10.1161/CIRCGENETICS.113.000133/-/DC1

Background—Genetic mechanisms of atrial fibrillation (AF) remain incompletely understood. Previous differential expression studies in AF were limited by small sample size and provided limited understanding of global gene networks, prompting the need for larger-scale, network-based analyses.

Methods and Results—Left atrial tissues from Cleveland Clinic patients who underwent cardiac surgery were assayed using Illumina Human HT-12 mRNA microarrays. The data set included 3 groups based on cardiovascular comorbidities: mitral valve (MV) disease without coronary artery disease (n=64), coronary artery disease without MV disease (n=57), and lone AF (n=35). Weighted gene coexpression network analysis was performed in the MV group to detect modules of correlated genes. Module preservation was assessed in the other 2 groups. Module eigengenes were regressed on AF severity or atrial rhythm at surgery. Modules whose eigengenes correlated with either AF phenotype were analyzed for gene content. A total of 14 modules were detected in the MV group; all were preserved in the other 2 groups. One module (124 genes) was associated with AF severity and atrial rhythm across all groups. Its top hub gene, RCAN1, is implicated in calcineurin-dependent signaling and cardiac hypertrophy. Another module (679 genes) was associated with atrial rhythm in the MV and coronary artery disease groups. It was enriched with cell signaling genes and contained cardiovascular developmental genes including TBX5.

Conclusions—Our network-based approach found 2 modules strongly associated with AF. Further analysis of these modules may yield insight into AF pathogenesis by providing novel targets for functional studies. (Circ Cardiovasc Genet. 2013;6:362-371.)

Key Words: arrhythmias, cardiac • atrial fibrillation • bioinformatics • gene coexpression • gene regulatory networks • genetics • microarrays

Introduction

trial fibrillation (AF) is the most common sustained car­diac arrhythmia, with a prevalence of ≈1% to 2% in the general population.1,2 Although AF may be an isolated con­dition (lone AF [LAF]), it often occurs concomitantly with other cardiovascular diseases, such as coronary artery disease (CAD) and valvular heart disease.1 In addition, stroke risk is increased 5-fold among patients with AF, and ischemic strokes attributed to AF are more likely to be fatal.1 Current antiarrhythmic drug therapies are limited in terms of efficacy and safety.1,3,4 Thus, there is a need to develop better risk pre­diction tools as well as mechanistically targeted therapies for AF. Such developments can only come about through a clearer understanding of its pathogenesis.

Family history is an established risk factor for AF. A Danish Twin Registry study estimated AF heritability at 62%, indicating a significant genetic component.5 Substantial progress has been made to elucidate this genetic basis. For example, genome-wide association studies (GWASs) have identified several susceptibil­ity loci and candidate genes linked with AF. Initial studies per­formed in European populations found 3 AF-associated genomic loci.6–9 Of these, the most significant single-nucleotide polymor-phisms (SNPs) mapped to an intergenic region of chromosome 4q25. The closest gene in this region, PITX2, is crucial in left-right asymmetrical development of the heart and thus seems promising as a major player in initiating AF.10,11 A large-scale GWAS meta-analysis discovered 6 additional susceptibility loci, implicating genes involved in cardiopulmonary development, ion transport, and cellular structural integrity.12

Differential expression studies have also provided insight into the pathogenesis of AF. A study by Barth et al13 found that about two-thirds of the genes expressed in the right atrial appendage were downregulated during permanent AF, and that many of these genes were involved in calcium-dependent signaling pathways. In addition, ventricular-predominant genes were upregulated in right atrial appendages of sub­jects with AF.13 Another study showed that inflammatory and transcription-related gene expression was increased in right atrial appendages of subjects with AF versus controls.14 These results highlight the adaptive responses to AF-induced stress and ischemia taking place within the atria.

Despite these advances, much remains to be discovered about the genetic mechanisms of AF. The AF-associated SNPs found thus far only explain a fraction of its heritability15; furthermore, the means by which the putative candidate genes cause AF have not been fully established.9,15,16 Additionally, previous dif­ferential expression studies in human tissue were limited to the right atrial appendage, had small sample sizes, and provided little understanding of global gene interactions.13,14 Weighted gene coexpression network analysis (WGCNA) is a technique to construct gene modules within a network based on correla­tions in gene expression (ie, coexpression).17,18 WGCNA has been used to study genetically complex diseases, such as meta­bolic syndrome,19 schizophrenia,20 and heart failure.21 Here, we obtained mRNA expression profiles from human left atrial appendage tissue and implemented WGCNA to identify gene modules associated with AF phenotypes.

Methods

Subject Recruitment

From 2001 to 2008, patients undergoing cardiac surgery at the Cleveland Clinic were prospectively screened and recruited. Informed consent for research use of discarded atrial tissues was ob­tained from each patient by a study coordinator during the presur­gical visit. Demographic and clinical data were obtained from the Cardiovascular Surgery Information Registry and by chart review. Use of human atrial tissues was approved by the Institutional Review Board of the Cleveland Clinic.

Table S1: Clinical definitions of cardiovascular phenotype groups

Criterion Type Mitral Valve (MV) Disease Coronary Artery Disease (CAD) Lone Atrial Fibrillation (LAF)
Inclusion Criteria Surgical indication – Surgical indication – History of atrial fibrillation
mitral valve repair or replacement coronary artery bypass graft
Surgical indication
– MAZE procedure
Preserved ejection fraction (≥50%)
Exclusion Criteria Significant coronary artery disease: Significant mitral valve disease: Significant
coronary artery
– Significant (≥50%) stenosis – Documented echocardiography disease:
 in at least finding of – Significant
one coronary artery  mitral regurgitation (≥3) or (≥50%) stenosis in
via cardiac catheterization mitral stenosis at least one
– History of revascularization – History of mitral valve coronary artery via
(percutaneous coronary intervention or coronary artery bypass graft surgery)  repair or replacement cardiac catheterization
– History of revascularization
(percutaneous coronary intervention or coronary artery bypass graft surgery)
Significant valvular heart disease:
-Documented echocardiography finding of valvular regurgitation (≥3) or stenosis
-History of valve repair or replacement

RNA Microarray Isolation and Profiling

Left atria appendage specimens were dissected during cardiac surgery and stored frozen at −80°C. Total RNA was extracted using the Trizol technique. RNA samples were processed by the Cleveland Clinic Genomics Core. For each sample, 250-ng RNA was reverse tran­scribed into cRNA and biotin-UTP labeled using the TotalPrep RNA Amplification Kit (Ambion, Austin, TX). cRNA was quantified using a Nanodrop spectrophotometer, and cRNA size distribution was as­sessed on a 1% agarose gel. cRNA was hybridized to Illumina Human HT-12 Expression BeadChip arrays (v.3). Arrays were scanned using a BeadArray reader.

Expression Data Preprocessing

Raw expression data were extracted using the beadarray package in R, and bead-level data were averaged after log base-2 transformation. Background correction was performed by fitting a normal-gamma deconvolution model using the NormalGamma R package.22 Quantile normalization and batch effect adjustment with the ComBat method were performed using R.23 Probes that were not detected (at a P<0.05 threshold) in all samples as well as probes with relatively lower vari­ances (interquartile range ≤log2[1.2]) were excluded.

The WGCNA approach requires that genes be represented as sin­gular nodes in such a network. However, a small proportion of the genes in our data have multiple probe mappings. To facilitate the representation of singular genes within the network, a probe must be selected to represent its associated gene. Hence, for genes that mapped to multiple probes, the probe with the highest mean expres­sion level was selected for analysis (which often selects the splice isoform with the highest expression and signal-to-noise ratio), result­ing in a total of 6168 genes.

Defining Training and Test Sets

Currently, no large external mRNA microarray data from human left atrial tissues are publicly available. To facilitate internal validation of results, we divided our data set into 3 groups based on cardiovascular comorbidities: mitral valve (MV) disease without CAD (MV group; n=64), CAD without MV disease (CAD group; n=57), and LAF (LAF group; n=35). LAF was defined as the presence of AF without concomitant structural heart disease, according to the guidelines set by the European Society of Cardiology.1 The MV group, which was the largest and had the most power for detecting significant modules, served as the training set for module derivation, whereas the other 2 groups were designated test sets for module reproducibility. To mini­mize the effect of population stratification, the data set was limited to white subjects. Differences in clinical characteristics among the groups were assessed using Kruskal–Wallis rank-sum tests for con­tinuous variables and Pearson x2 test for categorical variables.

Weight Gene Coexpression Network Analysis

WGCNA is a systems-biology method to identify and characterize gene modules whose members share strong coexpression. We applied previously validated methodology in this analysis.17 Briefly, pair-wise gene (Pearson) correlations were calculated using the MV group data set. A weighted adjacency matrix was then constructed. I is a soft-thresholding pa­rameter that provides emphasis on stronger correlations over weaker and less meaningful ones while preserving the continuous nature of gene–gene relationships. I=3 was selected in this analysis based on the criterion outlined by Zhang and Horvath17 (see the online-only Data Supplement).

Next, the topological overlap–based dissimilarity matrix was com­puted from the weighted adjacency matrix. The topological overlap, developed by Ravasz et al,24 reflects the relative interconnectedness (ie, shared neighbors) between 2 genes.17 Hence, construction of the net­work dendrogram based on this dissimilarity measure allows for the identification of gene modules whose members share strong intercon-nectivity patterns. The WGCNA cutreeDynamic R function was used to identify a suitable cut height for module identification via an adap­tive cut height selection approach.18 Gene modules, defined as branches of the network dendrogram, were assigned colors for visualization.

Network Preservation Analysis

Module preservation between the MV and CAD groups as well as the MV and LAF groups was assessed using network preservation statis­tics as described in Langfelder et al.25 Module density–based statistics (to assess whether genes in each module remain highly connected in the test set) and connectivity-based statistics (to assess whether con­nectivity patterns between genes in the test set remain similar com­pared with the training set) were considered in this analysis.25 In each comparison, a Z statistic representing a weighted summary of module density and connectivity measures was computed for every module (Zsummary). The Zsummary score was used to evaluate module preserva­tion, with values ≥8 indicating strong preservation, as proposed by Langfelder et al.25 The WGCNA R function network preservation was used to implement this analysis.25

Table S2: Network preservation analysis between the MV and CAD groups – size and Zsummary scores of gene modules detected.

Module Module Size

ZSummary

Black 275 15.52
Blue 964 44.79
Brown 817 12.80
Cyan 119 13.42
Green 349 14.27
Green-Yellow 215 19.31
Magenta 239 15.38
Midnight-Blue 83 15.92
Pink 252 23.31
Purple 224 16.96
Red 278 17.30
Salmon 124 13.84
Tan 679 28.48
Turquoise 1512 44.03


Table S3: Network preservation analysis between the MV and LAF groups – size and Zsummary scores of gene modules detected

Module Module Size ZSummary
Black 275 13.14
Blue 964 39.26
Brown 817 14.98
Cyan 119 11.46
Green 349 14.91
Green-Yellow 215 20.99
Magenta 239 18.58
Midnight-Blue 83 13.87
Pink 252 19.10
Purple 224 8.80
Red 278 16.62
Salmon 124 11.57
Tan 679 28.61
Turquoise 1512 42.07

Clinical Significance of Preserved Modules

Principal component analysis of the expression data for each gene module was performed. The first principal component of each mod­ule, designated the eigengene, was identified for the 3 cardiovascular disease groups; this served as a summary expression measure that explained the largest proportion of the variance of the module.26 Multivariate linear regression was performed with the module ei-gengenes as the outcome variables and AF severity (no AF, parox­ysmal AF, persistent AF, permanent AF) as the predictor of interest (adjusting for age and sex). A similar regression analysis was per­formed with atrial rhythm at surgery (no AF history, AF history in sinus rhythm, AF history in AF rhythm) as the predictor of interest. The false discovery rate method was used to adjust for multiple com­parisons. Modules whose eigengenes associated with AF severity and atrial rhythm were identified for further analysis.

In addition, hierarchical clustering of module eigengenes and se­lected clinical traits (age, sex, hypertension, cholesterol, left atrial size, AF state, and atrial rhythm) was used to identify additional module–trait associations. Clusters of eigengenes/traits were detected based on a dissimilarity measure D, as given by

D=1−cor(Vi,Vj),i≠j                                                                              (3)

where V=the eigengene or clinical trait.

Enrichment Analysis

Gene modules significantly associated with AF severity and atrial rhythm were submitted to Ingenuity Pathway Analysis (IPA) to determine enrichment for functional/disease categories. IPA is an application of gene set over-representation analysis; for each dis-ease/functional category annotation, a P value is calculated (using Fisher exact test) by comparing the number of genes from the mod­ule of interest that participate in the said category against the total number of participating genes in the background set.27 All 6168 genes in the current data set served as the background set for the enrichment analysis.

Hub Gene Analysis

Hub genes are defined as genes that have high intramodular connectivity17,20

Alternatively, they may also be defined as genes with high module membership21,25

Both definitions were used to identify the hub genes of modules associated with AF phenotype.

To confirm that the hub genes identified were themselves associ­ated with AF phenotype, the expression data of the top 10 hub genes (by intramodular connectivity) were regressed on atrial rhythm (ad­justing for age and sex). In addition, eigengenes of AF-associated modules were regressed on their respective (top 10) hub gene expres­sion profiles, and the model R2 indices were computed.

Membership of AF-Associated Candidate Genes From Previous Studies

Previous GWAS studies identified multiple AF-associated SNPs.8,9,12,15,28 We selected candidate genes closest to or containing these SNPs and identified their module locations as well as their clos­est within-module partners (absolute Pearson correlations).

Sensitivity Analysis of Soft-Thresholding Parameter

To verify that the key results obtained from the above analysis were robust with respect to the chosen soft-thresholding parameter (I=3), we repeated the module identification process using I=5. The eigen-genes of the detected modules were computed and regressed on atrial rhythm (adjusting for age and sex). Modules significantly associated with atrial rhythm in ≥2 groups of data set were compared with the AF phenotype–associated modules from the original analysis.

Results

Subject Characteristics

Table 1 describes the clinical characteristics of the cardiac surgery patients who were recruited for the study. Subjects in the LAF group were generally younger and less likely to be a current smoker (P=2.0×10−4 and 0.032, respectively). Subjects in the MV group had lower body mass indices (P=2.7×10−6), and a larger proportion had paroxysmal AF compared with the other 2 groups (P=0.033).

Table 1. Clinical Characteristics of Study Subjects

Characteristics

MV Group (n=64)

CAD Group (n=57)

LAF Group (n=35)

P Value*

Age, median y (1st–3rd quartiles)

60 (51.75–67.25)

64 (58.00–70.00)

56 (45.50–60.50)

2.0×10−4

Sex, female (%) 19 (29.7) 6 (10.5)

7 (20.0)

0.033

BMI, median (1st–3rd quartiles)

25.97 (24.27–28.66)

29.01 (27.06–32.11)

29.71 (26.72–35.10)

2.7×10−6

Current smoker (%) 29 (45.3) 35 (61.4)

12 (21.1)

0.032

Hypertension (%) 21 (32.8) 39 (68.4)

16 (45.7)

4.4×10−4

AF severity (%)
No AF 7 (10.9) 7 (12.3)

0 (0.0)

0.033

Paroxysmal 19 (29.7) 10 (17.5)

7 (20.0)

Persistent 30 (46.9) 26 (45.6)

15 (42.9)

Permanent 8 (12.5) 14 (24.6)

13 (37.1)

Atrial rhythm at surgery (%)
No AF history in sinus rhythm 7 (10.9) 7 (12.3)

0 (0)

0.065

AF history in sinus rhythm 28 (43.8) 16 (28.1)

11 (31.4)

AF History in AF rhythm 29 (45.3) 34 (59.6)

24 (68.6)

Gene Coexpression Network Construction and Module Identificationsee document at  http://circgenetics.ahajournals.org/content/6/4/362

A total of 14 modules were detected using the MV group data set (Figure 1), with module sizes ranging from 83 genes to 1512 genes; 38 genes did not share similar coexpression with the other genes in the network and were therefore not included in any of the identified modules

Figure 1. Network dendrogram (top) and colors of identified modules (bottom).

Figure 1. Network dendrogram (top) and colors of identified modules (bottom). The dendrogram was constructed using the topological overlap matrix as the similarity measure. Modules corresponded to branches of the dendrogram and were assigned colors for visualization.

Network Preservation Analysis Revealed Strong Preservation of All Modules Between the Training and Test Sets

All 14 modules showed strong preservation across the CAD and LAF groups in both comparisons, with Z [summary]  scores of >10 in most modules (Figure 2). No major deviations in the Z [summary] score distributions for the 2 comparisons were noted, indicating that modules were preserved to a similar extent across the 2 groups

Figure 2. Preservation of mod-ules between mitral valve (MV) and coronary artery disease

Figure 2. Preservation of mod­ules between mitral valve (MV) and coronary artery disease (CAD) groups (left), and MV and lone atrial fibrillation (LAF) groups (right). A Zsummary sta­tistic was computed for each module as an overall measure of its preservation relating to density and connectivity. All modules showed strong pres­ervation in both comparisons with Zsummary scores >8 (red dot­ted line).

Regression Analysis of Module Eigengene Profiles Identified 2 Modules Associated With AF Severity and Atrial Rhythm

Table IV in the online-only Data Supplement summarizes the proportion of variance explained by the first 3 principal components for each module. On average, the first principal component (ie, the eigengene) explained ≈18% of the total variance of its associated module. For each group, the mod­ule eigengenes were extracted and regressed on AF severity (with age and sex as covariates). The salmon module (124 genes) eigengene was strongly associated with AF severity in the MV and CAD groups (P=1.7×10−6 and 5.2×10−4, respec­tively); this association was less significant in the LAF group (P=9.0×10−2). Eigengene levels increased with worsening AF severity across all 3 groups, with the greatest stepwise change taking place between the paroxysmal AF and per­sistent AF categories (Figure 3A). When the module eigen-genes were regressed on atrial rhythm, the salmon module eigengene showed significant association in all groups (MV: P=1.1×10−14; CAD: P=1.36×10−6; LAF: P=2.1×10−4). Eigen-gene levels were higher in the AF history in AF rhythm cat­egory (Figure 3B).

Table S4: Proportion of variance explained by the principal components for each module.

Dataset
Group

Principal
Component

Black

Blue

Brown

Cyan

Green

Green-
Yellow

Magenta

Mitral

1

20.5% 22.2% 20.1% 21.8% 21.4% 22.8% 19.6%

2

4.1% 3.6% 4.8% 5.7% 4.5% 5.9% 3.9%

3

3.4% 3.1% 3.8% 4.4% 3.9% 3.7% 3.7%

CAD

1

12.5% 18.6% 7.1% 16.8% 12.2% 20.3% 12.8%

2

6.0% 5.5% 5.0% 7.0% 5.5% 6.1% 6.4%

3

4.9% 4.1% 4.4% 6.5% 4.8% 4.4% 4.8%

LAF

1

14.0% 16.6% 11.7% 14.3% 14.7% 20.8% 20.2%

2

8.9% 8.5% 7.6% 9.3% 7.3% 11.1% 6.9%

3

6.5% 6.3% 5.5% 8.2% 6.1% 5.3% 6.2%

Dataset
Group

Principal
Component

Midnight- Blue

Pink

Purple

Red

Salmon

Tan

Turquoise

Mitral

1

28.5% 22.6% 18.7% 20.5% 22.3% 19.0% 25.8%

2

4.6% 6.0% 4.7% 4.1% 6.9% 4.0% 3.5%

3

4.2% 4.2% 4.2% 3.5% 4.0% 3.6% 3.3%

CAD

1

23.4% 17.1% 15.5% 15.0% 18.0% 14.6% 18.2%

2

7.4% 8.6% 6.0% 6.4% 7.2% 5.8% 6.6%

3

5.1% 5.4% 5.3% 5.4% 6.2% 5.1% 4.5%

LAF

1

23.5% 18.4% 12.0% 15.9% 16.9% 13.7% 16.5%

2

7.9% 8.5% 9.8% 9.4% 9.5% 9.1% 9.6%

3

6.7% 7.0% 6.6% 6.0% 6.9% 6.8% 6.3%

Figure 3. Boxplots of salmon module eigengene expression levels with respect to atrial fibrillation (AF) severity (A) and atrial rhythm (B).

Figure 3. Boxplots of salmon module eigengene expression levels with respect to atrial fibrillation (AF) severity (A) and atrial rhythm (B).
A, Eigengene expression correlated positively with AF severity, with the largest stepwise increase between the paroxysmal AF and per­manent AF categories. B, Eigengene expression was highest in the AF history in AF rhythm category in all 3 groups. CAD indicates coro­nary artery disease; LAF, lone AF; and MV, mitral valve.

The regression analysis also revealed statistically significant associations between the tan module (679 genes) eigengene and atrial rhythm in the MV and CAD groups (P=5.8×10−4 and 3.4×10−2, respectively). Eigengene levels were lower in the AF history in AF rhythm category compared with the AF history in sinus rhythm category (Figure 4); this trend was also observed in the LAF group, albeit with weaker statistical evidence (P=0.15).

Figure 4. Boxplots of tan module eigengene expression levels with respect to atrial rhythm.

Figure 4. Boxplots of tan module eigengene expression levels with respect to atrial rhythm.
Eigengene expression levels were lower in the atrial fibrillation (AF) history in AF rhythm category compared with the AF history in sinus rhythm category. CAD indicates coronary artery disease; LAF, lone AF; and MV, mitral valve

Hierarchical Clustering of Eigengene Profiles With Clinical Traits

Hierarchical clustering was performed to identify relation­ships between gene modules and selected clinical traits. The salmon module clustered with AF severity and atrial rhythm; in addition, left atrial size was found in the same cluster, sug­gesting a possible relationship between salmon module gene expression and atrial remodeling (Figure 5A). Although the tan module was in a separate cluster from the salmon module, it was negatively correlated with both atrial rhythm and AF severity (Figure 5B).

Figure 5. Dendrogram (A) and correlation heatmap (B) of module eigengenes and clinical traits.

Figure 5. Dendrogram (A) and correlation heatmap (B) of module eigengenes and clinical traits

A, The salmon module eigengene but not the tan module eigengene clustered with atrial fibrillation (AF) severity, atrial rhythm, and left atrial size. B, AF severity and atrial rhythm at surgery correlated positively with the salmon module eigengene and negatively with the tan module eigengene. Arhythm indicates atrial rhythm at surgery; Chol, cholesterol; HTN, hypertension; and LASize, left atrial size.

IPA Enrichment Analysis of Salmon and Tan Modules

The salmon module was enriched in genes involved in cardio­vascular function and development (smallest P=4.4×10−4) and organ morphology (smallest P=4.4×10−4). In addition, the top disease categories identified included endocrine system disor­ders (smallest P=4.4×10−4) and cardiovascular disease (small­est P=2.59×10−3).

The tan module was enriched in genes involved in cell-to-cell signaling and interaction (smallest P=8.9×10−4) and cell death and survival (smallest P=1.5×10−3). Enriched disease categories included cancer (smallest P=2.2×10−4) and cardio­vascular disease (smallest P=4.5×10−4).

see document at  http://circgenetics.ahajournals.org/content/6/4/362

Hub Gene Analysis of Salmon and Tan Modules

We identified hub genes in the 2 modules based on intramod-ular connectivity and module membership. For the salmon module, the gene RCAN1 exhibited the highest intramodular connectivity and module membership. The top 10 hub genes (by intramodular connectivity) were significantly associated with atrial rhythm, with false discovery rate–adjusted P values ranging from 1.5×10−5 to 4.2×10−12. These hub genes accounted for 95% of the variation in the salmon module eigengene.

In the tan module, the top hub gene was CPEB3. The top 10 hub genes (by intramodular connectivity) correlated with atrial rhythm as well, although the statistical associations in the lower-ranked hub genes were relatively weaker (false discovery rate–adjusted P values ranging from 1.1×10−1 to 3.4×10−4). These hub genes explained 94% of the total varia­tion in the tan module eigengene.

The names and connectivity measures of the hub genes found in both modules are presented in Table 2.

Table 2. Top 10 Hub Genes in the Salmon (Left) and Tan (Right) Modules as Defined by Intramodular Connectivity and Module Membership

Salmon Module

Tan Module

Gene

IMC

Gene

MM

Gene

IMC

Gene

MM

RCAN1 8.2

RCAN1

0.81

CPEB3

43.3

CPEB3

0.85
DNAJA4 7.7

DNAJA4

0.81

CPLX3

42.4

CPLX3

0.84
PDE8B 7.7

PDE8B

0.80

NEDD4L

40.8

NEDD4L

0.83
PRKAR1A 6.9

PRKAR1A

0.77

SGSM1

40.7

SGSM1

0.82
PTPN4 6.7

PTPN4

0.75

UCKL1

39.0

UCKL1

0.81
SORBS2 6.0

FHL2

0.69

SOSTDC1

37.2

SOSTDC1

0.79
ADCY6 5.7

ADCY6

0.69

PRDX1

35.5

RCOR2

0.78
FHL2 5.7

SORBS2

0.68

RCOR2

35.4

EEF2K

0.77
BVES 5.4

DHRS9

0.67

NPPB

35.3

PRDX1

0.76
TMEM173 5.3

LAPTM4B

0.65

LRRN3

34.6

MMP11

0.76

A visualiza­tion of the salmon module is shown using the Cytoscape tool (Figure 6). A full list of the genes in the salmon and tan mod­ules is provided in the online-only Data Supplement.

Figure 6. Cytoscape visualization of genes in the salmon module.
Nodes representing genes with high intramodu-lar connectivities, such as RCAN1 and DNAJA4, appear larger in the network. Strong connections are visualized with darker lines, whereas weak connections appear more translucent

Figure 6. Cytoscape visualization of genes in the salmon module.

Membership of AF-Associated Candidate Genes From Previous Studies

The tan module contained MYOZ1, which was identified as a candidate gene from the recent AF meta-analysis. PITX2 was located in the green module (n=349), and ZFHX3 was located in the turquoise module (n=1512). The locations of other can­didate genes (and their closest partners) are reported in the online-only Data Supplement.

Sensitivity Analysis of Key Results

We repeated the WGCNA module identification approach using a different soft-thresholding parameter (β=5). One mod­ule (n=121) was found to be strongly associated with atrial rhythm at surgery across all 3 groups of data set, whereas another module (n=244) was associated with atrial rhythm at surgery in the MV and CAD groups. The first module over­lapped significantly with the salmon module in terms of gene membership, whereas most of the second modules’ genes were contained within the tan module. The top hub genes found in the salmon and tan modules remained present and highly connected in the 2 new modules identified with the dif­ferent soft-thresholding parameter.

Discussion

To our knowledge, our study is the first implementation of an unbiased, network-based analysis in a large sample of human left atrial appendage gene expression profiles. We found 2 modules associated with AF severity and atrial rhythm in 2 to 3 of our cardiovascular comorbidity groups. Functional analy­ses revealed significant enrichment of cardiovascular-related categories for both modules. In addition, several of the hub genes identified are implicated in cardiovascular disease and may play a role in AF initiation and progression.

In our study, WGCNA was used to construct modules based on gene coexpression, thereby reducing the net-work’s dimensionality to a smaller set of elements.17,21 Relating modulewise changes to phenotypic traits allowed statistically significant associations to be detected at a lower false discovery rate compared with traditional differential expression studies. Furthermore, shared functions and path­ways among genes in the modules could be inferred via enrichment analyses.

We divided our data set into 3 groups to verify the repro­ducibility of the modules identified by WGCNA; 14 modules were identified in the MV group in our gene network. All were strongly preserved in the CAD and LAF groups, suggesting that gene coexpression patterns are robust and reproducible despite differences in cardiovascular comorbidities.

The use of module eigengene profiles as representative summary measures has been validated in a number of studies.20,26 Additionally, we found that the eigengenes accounted for a significant proportion (average 18%) of gene expression variability in their respective modules. Regression analysis of the module eigengenes found 2 modules associated with AF severity and atrial rhythm in ≥2 groups of data set. The association between the salmon module eigengene and AF severity was statistically weaker in the LAF group (adjusted P=9.0×10−2). This was probably because of its significantly smaller sample size compared with the MV and CAD groups. Despite this weaker association, the relationship between the salmon module eigengene and AF severity remained consistent among the 3 groups (Figure 3A). Similarly, the lack of statistical significance for the association between the tan module eigengene and atrial rhythm at surgery in the LAF group was likely driven by the smaller sample size and (by definition) lack of samples in the no AF category.

A major part of our analysis focused on the identifica­tion of module hub genes. Hubs are connected with a large number of nodes; disruption of hubs therefore leads to wide­spread changes within the network. This concept has powerful applications in the study of biology, genetics, and disease.29,30 Although mutations of peripheral genes can certainly lead to disease, gene network changes are more likely to be motivated by changes in hub genes, making them more biologically inter­esting targets for further study.17,29,31 Indeed,

  • the hub genes of the salmon and tan modules accounted for the vast majority of the variation in their respective module eigengenes, signaling their importance in driving gene module behavior.

The hub genes identified in the salmon and tan modules were significantly associated with AF phenotype overall. It was noted that this association was statistically weaker for the lower-ranked hub genes in the tan module. This highlights an important aspect and strength of WGCNA—to be able to capture module-wide changes with respect to disease despite potentially weaker associations among individual genes.

The implementation of WGCNA necessitated the selection of a soft-thresholding parameter 13. Unlike hard-thresholding (where gene correlations below a certain value are shrunk to zero), the soft-thresholding approach gives greater weight to stronger correlations while maintaining the continuous nature of gene–gene relationships. We selected a 13 value of 3 based on the criteria outlined by Zhang and Horvath.17 His team and other investigators have demonstrated that module identifica­tion is robust with respect to the 13 parameter.17,19–21 In our data, we were also able to reproduce the key findings reported with a different, larger 13 value, thereby verifying the stability of our results relating to 13.

The salmon module (124 genes) was associated with both AF phenotypes; furthermore, IPA analysis of its gene con­tents suggested enrichment in cardiovascular development as well as disease. Its eigengene increased with worsening AF severity, with the largest stepwise change occurring between the paroxysmal AF and persistent AF categories (Figure 3). Hence,

  • the gene expression changes within the salmon mod­ule may reflect the later stages of AF pathophysiology.

The top hub gene of the salmon module was RCAN1 (reg­ulator of calcineurin 1). Calcineurin is a cytoplasmic Ca2+/ calmodulin-dependent protein phosphatase that stimulates cardiac hypertrophy via its interactions with NFAT and L-type Ca2+ channels.32,33 RCAN1 is known to inhibit calcineurin and its associated pathways.32,34 However, some data suggest that RCAN1 may instead function as a calcineurin activator when highly expressed and consequently potentiate hypertrophic signaling.35 Thus,

  • perturbations in RCAN1 levels (attribut­able to genetic variants or mutations) may cause an aberrant switching in function, which in turn triggers atrial remodeling and arrhythmogenesis.

Other hub genes found in the salmon module are also involved in cardiovascular development and function and may be potential targets for further study.

  • DNAJA4 (DnaJ homolog, subfamily A, member 4) regulates the trafficking and matu­ration of KCNH2 potassium channels, which have a promi­nent role in cardiac repolarization and are implicated in the long-QT syndromes.36

FHL2 (four-and-a-half LIM domain protein 2) interacts with numerous cellular components, including

  1. actin cytoskeleton,
  2. transcription machinery, and
  3. ion channels.37

FHL2 was shown to enhance the hypertrophic effects of isoproterenol, indicating that

  • FHL2 may modulate the effect of environmental stress on cardiomyocyte growth.38
  • FHL2 also interacts with several potassium channels in the heart, such as KCNQ1, KCNE1, and KCNA5.37,39

Additionally, blood vessel epicardial substance (BVES) and other members of its family were shown to be highly expressed in cardiac pacemaker cells. BVES knockout mice exhibited sinus nodal dysfunction, suggesting that BVES regulates the development of the cardiac pacemaking and conduction system40 and may therefore be involved in the early phase of AF development.

The tan module (679 genes) eigengene was negatively correlated with atrial rhythm in the MV and CAD groups (Figure 4); this may indicate a general decrease in gene expres­sion of its members in fibrillating atrial tissue. IPA analysis revealed enrichment in genes involved in cell signaling as well as apoptosis. The top-ranked hub gene, cytoplasmic polyade-nylation element binding protein 3 (CPEB3), regulates mRNA translation and has been associated with synaptic plasticity and memory formation.41 The role of CPEB3 in the heart is currently unknown, so further exploration via animal model studies may be warranted.

Natriuretic peptide-precursor B (NPPB), another highly interconnected hub gene, produces a precursor peptide of brain natriuretic peptide, which

  • regulates blood pressure through natriuresis and vasodilation.42

(NPPB) gene variants have been linked with diabetes mellitus, although associations with cardiac phenotypes are less clear.42 TBX5 and GATA4, which play important roles in the embryonic heart development,43 were members of the tan module. Although not hub genes, they may also contribute toward developmental sus­ceptibility of AF. In addition, TBX5 was previously reported to be near an SNP associated with PR interval and AF in separate large-scale GWAS studies.12,28 MYOZ1, another candidate gene identified in the recent AF GWAS meta-analysis, was found to be a member as well; it associates with proteins found in the Z-disc of skeletal and cardiac muscle and may suppress calcineurin-dependent hypertrophic signaling.12

Some, but not all, of the candidate genes found in previous GWAS studies were located in the AF-associated modules. One possible explanation for this could be the difference in sample sizes. The meta-analysis involved thousands of indi­viduals, whereas the current study had <100 in each group of data set, which limited the power to detect significant differ­ences between levels of AF phenotype even with the module-wise approach. Additionally, transcription factors like PITX2 are most highly expressed during the fetal phase of develop­ment. Perturbations in these genes (attributable to genetic variants or mutations) may therefore initiate the development of AF at this stage and play no significant role in adults (when we obtained their tissue samples).

Limitations in Study

We noted several limitations in this study. First, no human left atrial mRNA data set of adequate size currently exists publicly. Hence, we were unable to validate our results with an external, independent data set. However, the network pres­ervation assessment performed within our data set showed strong preservation in all modules, indicating that our findings are robust and reproducible.

Although the module eigengenes captured a significant pro­portion of module variance, a large fraction of variability did remain unaccounted for, which may limit their use as repre­sentative summary measures.

We extracted RNA from human left atrial appendage tis­sue, which consists primarily of cardiomyocytes and fibro­blasts. Atrial fibrosis is known to occur with AF-associated remodeling.44 As such, the cardiomyocyte to fibroblast ratio is likely to change with different levels of AF severity, which in turn influences the amount of RNA extracted from each cell type. Hence, true differences in gene expression (and coexpression) within cardiomyocytes may be confounded by changes in cellular composition attributable to atrial remod­eling. Also, there may be significant regional heterogeneity in the left atrium with respect to structure, cellular composi­tion, and gene expression,45 which may limit the generaliz-ability of our results to other parts of the left atrium.

All subjects in the study were whites to minimize the effects of population stratification. However, it is recognized that the genetic basis of AF may differ among ethnic groups.9 Thus, our results may not be generalizable to other ethnicities.

Finally, it is possible for genes to be involved in multiple processes and functions that require different sets of genes. However, WGCNA does not allow for overlapping modules to be formed. Thus,

  • this limits the method’s ability to character­ize such gene interactions.

Conclusions

In summary, we constructed a weighted gene coexpression network based on RNA expression data from the largest collection of human left atrial appendage tissue specimens to date. We identified 2 gene modules significantly associated with AF severity or atrial rhythm at surgery. Hub genes within these modules may be involved in the initiation or progression of AF and may therefore be candidates for functional stud­ies.

Refererences

1. European Heart Rhythm Association, European Association for Cardio-Thoracic Surgery, Camm AJ, Kirchhof P, Lip GY, Schotten U, et al. Guidelines for the management of atrial fibrillation: the task force for the management of atrial fibrillation of the European Society of Cardiology (ESC). Eur Heart J. 2010;31:2369–2429.

2. Lemmens R, Hermans S, Nuyens D, Thijs V. Genetics of atrial fibrilla­tion and possible implications for ischemic stroke. Stroke Res Treat. 2011;2011:208694.

3. Wann LS, Curtis AB, January CT, Ellenbogen KA, Lowe JE, Estes NA III, et al; ACCF/AHA/HRS. 2011 ACCF/AHA/HRS focused update on the management of patients with atrial fibrillation (Updating the 2006 Guideline): a report of the American College of Cardiology Foundation/ American Heart Association Task Force on Practice Guidelines. J Am Coll Cardiol. 2011;57:223–242.

4. Dobrev D, Carlsson L, Nattel S. Novel molecular targets for atrial fibrilla­tion therapy. Nat Rev Drug Discov. 2012;11:275–291.

5. Christophersen IE, Ravn LS, Budtz-Joergensen E, Skytthe A, Haunsoe S, Svendsen JH, et al. Familial aggregation of atrial fibrillation: a study in Danish twins. Circ Arrhythm Electrophysiol. 2009;2:378–383.

6. Gudbjartsson DF, Arnar DO, Helgadottir A, Gretarsdottir S, Holm H, Sig-urdsson A, et al. Variants conferring risk of atrial fibrillation on chromo­some 4q25. Nature. 2007;448:353–357.

7. Ellinor PT, Lunetta KL, Glazer NL, Pfeufer A, Alonso A, Chung MK, et al. Common variants in KCNN3 are associated with lone atrial fibrillation. Nat Genet. 2010;42:240–244.

8. Benjamin EJ, Rice KM, Arking DE, Pfeufer A, van Noord C, Smith AV, et al. Variants in ZFHX3 are associated with atrial fibrillation in individuals of European ancestry. Nat Genet. 2009;41:879–881.

9. Sinner MF, Ellinor PT, Meitinger T, Benjamin EJ, Kääb S. Genome-wide association studies of atrial fibrillation: past, present, and future. Cardio-vasc Res. 2011;89:701–709.

10. Clauss S, Kääb S. Is Pitx2 growing up? Circ Cardiovasc Genet. 2011;4:105–107.

11. Kirchhof P, Kahr PC, Kaese S, Piccini I, Vokshi I, Scheld HH, et al. PITX2c is expressed in the adult left atrium, and reducing Pitx2c expres­sion promotes atrial fibrillation inducibility and complex changes in gene expression. Circ Cardiovasc Genet. 2011;4:123–133.

12. Ellinor PT, Lunetta KL, Albert CM, Glazer NL, Ritchie MD, Smith AV, et al. Meta-analysis identifies six new susceptibility loci for atrial fibrillation. Nat Genet. 2012;44:670–675.

13. Barth AS, Merk S, Arnoldi E, Zwermann L, Kloos P, Gebauer M, et al. Reprogramming of the human atrial transcriptome in permanent atrial fi­brillation: expression of a ventricular-like genomic signature. Circ Res. 2005;96:1022–1029.

Continues to 45.  see

http://circgenetics.ahajournals.org/content/6/4/362

CLINICAL PERSPECTIVE

Atrial fibrillation is the most common sustained cardiac arrhythmias in the United States. The genetic and molecular mecha­nisms governing its initiation and progression are complex, and our understanding of these mechanisms remains incomplete despite recent advances via genome-wide association studies, animal model experiments, and differential expression studies. In this study, we used weighted gene coexpression network analysis to identify gene modules significantly associated with atrial fibrillation in a large sample of human left atrial appendage tissues. We further identified highly interconnected genes (ie, hub genes) within these gene modules that may be novel candidates for functional studies. The discovery of the atrial fibrillation-associated gene modules and their corresponding hub genes provide novel insight into the gene network changes that occur with atrial fibrillation, and closer study of these findings can lead to more effective targeted therapies for disease management.

Read Full Post »


A Software Agent for Diagnosis of ACUTE MI

Authors: Isaac E. Mayzlin, Ph.D.1, David Mayzlin1,Larry H. Bernstein, M.D.2

1MayNet, Carlsbad, CA, 2Department of Pathology and Laboratory Medicine, BridgeportHospital, Bridgeport, CT.

Agent-based  decision  support  systems  are  designed  to  provide  medical  staff  with  information  needed  for making critical decisions. We describe a Software Agent for evaluating multiple tests based on a large data base  especially  efficient  when  time  for  making  the  decision  is  critical  for  successful  treatment  of  serious conditions, such as stroke or acute myocardial infarction (AMI).

Goldman and others (1) developed a screening algorithm based on characteristics of the chest pain, EKG changes, and key clinical findings to separate high-risk from low-risk patients at the time they present using clinical features without using a serum marker. The Goldman algorithm was not widely used because of a 7 percent misclassification error, mostly false positives.       Nonetheless, A third of emergency room visits by patients presenting with symptoms of rule out AMI are not associated with chest pain. A related issue is the finding that a significant number of patients who are at high risk have to be identified using a cardiac marker. The use of cardiac isoenzymes has been to classify patients meeting the high risk criteria, many of whom are not subsequently found to have AMI.

Software Agent for Diagnosis based on the Knowledge incorporated in the Trained Artificial Neural Network and Data Clustering

This Software Agent is based on the combination of clustering by Euclidean distances in multi-dimensional space and non-linear  discrimination  fulfilled  by  the  Artificial  Neural  Network  (ANN)  trained  on  clusters’  averages.         Our  studies indicate that at an optimum clustering  distance the number of classes is minimized with efficient training on the ANN, retaining accuracy of classification by the ANN at 97%. The studies   conducted involve training and testing on separate clinical data sets.  We perform clustering using the geometrical (Euclidean) distance between two points in n-dimensional space,  formed  by  n  variables,  including  both  input  and  output  variables.  Since  this  distance  assumes  compatibility  of different variables, the values of all input variables are linearly transformed (scaled) to the range from 0 to 1.

The ANN technique for readers accustomed to classical statistics can be viewed as an extension of multivariate regression analyses with such new features as non-linearity and ability to process categorical data. Categorical (not continuous) variables represent two or more levels, groups, or classes of correspondent features, and in our case this concept is used to signify patient condition, for example existence or not of AMI.

Process  description. We  implemented  the  proposed  algorithm  for  diagnosis  of  AMI.  All  the  calculations  were performed on the authors’ unique Software Agent Maynet. First, using the automatic random extraction procedure, the initial data set (139 patients) was partitioned into two sets — training and testing.  This randomization also determined the size of these sets (96 and 43, respectively) since the program was instructed to assign approximately 70 % of data to the training set.

The main process consists of three successive steps:

(1)        clustering performed on training data set,

(2)        neural network’s training on clusters from previous step, and

(3)        classifier’s accuracy evaluation on testing data.

The classifier in this research will be the ANN, created on step 2, with output in the range [0,1], that provides binary result (1 – AMI, 0 – not AMI), using decision point 0.5.

In this paper we used the data of two previous studies (2,3) with three patients, potential outliers, removed (n = 139). The data contains three input variables, CK-MB, LD-1, LD-1/total LD, and one output variable, diagnoses, coded as 1 (for AMI) or 0 (non-AMI).

Table  1.  Effect  of  selection  of  maximum  distance  on  the  number  of  classes  formed  and  on  the accuracy of recognition by ANN

Clustering Distance Factor F(D = F * R) Number ofClasses Number of Nodes in The Hidden Layers Number of Misrecognized Patterns inThe TestingSet of 43 Percent ofMisrecognized
10.90.8

0.7

241413

5

1,  02,  03,  0

1,  0

2,  0

3,  0

3,  2

3,  2

121

1

2

1

1

1

2.34.62.3

2.3

4.6

2.3

2.3

2.3

Abbreviations: creatine kinase MB isoenzyme: CK-MB; lactate dehydrogenase isoenzyme-1: LD1; LD1/total LD ratio: %LD1; acute myocardial infarction: AMI; artificial neural network: ANN

Read Full Post »