data integration | Leaders in Pharmaceutical Business Intelligence (LPBI) Group

Posts Tagged ‘data integration’

Summary of Transcription, Translation ond Transcription Factors

Posted in Amino acids, Biochemical pathways, Computational Biology/Systems and Bioinformatics, Curation, Innovations, Interviews with Scientific Leaders, Investment in Technological Breakthrough, Metabolism, Metabolomics, Pharmaceutical Discovery, Proteins, Proteomics, RNA Biology, Cancer and Therapeutics, Signaling & Cell Circuits, Small Molecules in Development of Therapeutic Drugs, Transcriptomics, Translational Effectiveness, Translational Research, Translational Science, tagged data integration, graphics, metabolites, pathway linkages, transcription, transcriptome, transcripts on November 5, 2014| Leave a Comment »

Summary of Transcription, Translation ond Transcription Factors

Author and Curator: Larry H. Bernstein, MD, FCAP

Article ID #158: Summary of Transcription, Translation and Transcription Factors. Published on 11/5/2014

WordCloud Image Produced by Adam Tubman

Proteins are integral to the composition of the cytoskeleton, and also to the extracellular matrix. Many proteins are actually enzymes, carrying out the transformation of some substrate, a derivative of the food we ingest. They have a catalytic site, and they function with a cofactor – either a multivalent metal or a nucleotide. Proteins also are critically involved in the regulation of cell metabolism, and they are involved in translation of the DNA code, as they make up transcription factors (TFs). There are 20 essential amino acids that go into protein synthesis that are derived from animal or plant protein. Protein synthesis is carried out by the transport of mRNA out of the nucleus to the ribosome, where tRNA is paired with a matching amino acid, and the primary sequence of a protein is constructed as a linear string of amino acids.

This is illustrated in the following three pictures:

protein synthesis

mcell-transcription-translation

transcription_translation

Proteins synthesized at distal locations frequently contain intrinsically disordered segments. These regions are generally rich in assembly-promoting modules and are often regulated by post-translational modifications. Such proteins are tightly regulated but display distinct temporal dynamics upon stimulation with growth factors. Thus, proteins synthesized on-site may rapidly alter proteome composition and act as dynamically regulated scaffolds to promote the formation of reversible cellular assemblies.
RJ Weatheritt, et al. Nature Structural & Molecular Biology 24 Aug, 2014; 21: 833–839 http://dx.do.orgi:/10.1038/nsmb.2876

An overview of the potential advantages conferred by distal-site protein synthesis

Turquoise and red filled circle represents off-target and correct interaction partners, respectively. Wavy lines represent a disordered region within a distal site synthesis protein. Grey and red line in graphs represents profiles of t… http://www.nature.com/nsmb/journal/v21/n9/carousel/nsmb.2876-F5.jpg

In the the transcription process an RNA sequence is read. This is essential for protein synthesis through the ordering of the amino acids in the primary structure. However, there are microRNAs and noncoding RNAs, and there are transcription factors. The transcription factors bind to chromatin, and the RNAs also have some role in regulating the transcription process. (see picture above)

Transcription factors (TFs) interact dynamically in vivo with chromatin binding sites. Four different techniques are currently used to measure their kinetics in live cells,

fluorescence recovery after photobleaching (FRAP),
fluorescence correlation spectroscopy (FCS),
single molecule tracking (SMT) and
competition ChIP (CC).

A comparison of data from each of these techniques raises an important question:

do measured transcription kinetics reflect biologically functional interactions at specific sites (i.e. working TFs) or
do they reflect non-specific interactions (i.e. playing TFs)?

There are five key unresolved biological questions related to

the functionality of transient and prolonged binding events at both
specific promoter response elements as well as non-specific sites.

In support of functionality,

there are data suggesting that TF residence times are tightly regulated, and
that this regulation modulates transcriptional output at single genes.

In addition to this site-specific regulatory role, TF residence times

also determine the fraction of promoter targets occupied within a cell
thereby impacting the functional status of cellular gene networks.
TF residence times, then, are key parameters that could influence transcription in multiple ways.

Quantifying transcription factor kinetics: At work or at play? Mueller F., et al. http://dx.doi.org:/10.3109/10409238.2013.833891

Dr. Virginie Mattot works in the team “Angiogenesis, endothelium activation and Cancer” directed by Dr. Fabrice Soncin at the Institut de Biologie de Lille in France where she studies the roles played by microRNAs in endothelial cells during physiological and pathological processes such as angiogenesis or endothelium activation. She has been using Target Site Blockers to investigate the role of microRNAs on putative targets.

A few years ago, the team identified

an endothelial cell-specific gene which
harbors a microRNA in its intronic sequence.

They have since been working on understanding the functions of

both this new gene and its intronic microRNA in endothelial cells.

While they were searching for the functions of the intronic microRNA,

theye identified an unknown gene as a putative target.

The aim of my project was to investigate if this unknown gene was actually a genuine target and

if regulation of this gene by the microRNA was involved in endothelial cell function.

They had already shown the endothelial cell phenotype is associated with the inhibition of the intronic microRNA.
They then used miRCURY LNA™ Target Site Blockers to demonstrate

the expression of this unknown gene is actually controlled by this microRNA.
the microRNA regulates specific endothelial cell properties through regulation of this unknown gene.

MicroRNA function in endothelial cells – Solving the mystery of an unknown target gene using Target Site Blockers to investigate the role of microRNAs on putative targets

We first verified that this TSB was functional by analyzing

the expression of the miRNA target against which the TSB was directed
we then showed the TSB induced similar phenotypes as those when we inhibited the microRNA in the same cells.

Target Site Blockers were shown to be efficient tools to demonstrate the specific involvement of

putative microRNA targets
in the function played by this microRNA.

Some genes are known to have several different alternatively spliced protein variants, but the Scripps Research Institute’s Paul Schimmel and his colleagues have uncovered almost 250 protein splice variants of an essential, evolutionarily conserved family of human genes. The results were published July 17 in Science.

Focusing on the 20-gene family of aminoacyl tRNA synthetases (AARSs),

the team captured AARS transcripts from human tissues—some fetal, some adult—and showed that
many of these messenger RNAs (mRNAs) were translated into proteins.

Previous studies have identified several splice variants of these enzymes that have novel functions, but uncovering so many more variants was unexpected, Schimmel said. Most of these new protein products

lack the catalytic domain but retain other AARS non-catalytic functional domains.

This study fundamentally effects how we view protein-synthesis, according to Michael Ibba (who was not involved in the work), The Scientist reported. “The unexpected and potentially vast expanded functional networks that emerge from this study have the potential to influence virtually any aspect of cell growth.”

The team—comprehensively captured and sequenced the AARS mRNAs from six human tissue types using high-throughput deep sequencing. They next showed that a proportion of these transcripts, including those missing the catalytic domain, indeed resulted in stable protein products:

48 of these splice variants associated with polysomes.

In vitro translation assays and the expression of more than 100 of these variants in cells confirmed that

many of these variants could be made into stable protein products.

The AARS enzymes—of which there’s one for each of the 20 amino acids—bring together an amino acid with its appropriate transfer RNA (tRNA) molecule. This reaction allows a ribosome to add the amino acid to a growing peptide chain during protein translation. AARS enzymes can be found in all living organisms and are thought to be among the first proteins to have originated on Earth.

One goal of human genetics is to understand how the information for precise and dynamic gene expression programs is encoded in the genome. The interactions of transcription factors (TFs) with DNA regulatory elements clearly

play an important role in determining gene expression outputs, yet
the regulatory logic underlying functional transcription factor binding is poorly understood.

An important question in genomics is to understand how a class of proteins called ‘‘transcription factors’’ controls the expression level of other genes in the genome in a cell type-specific manner – a process that is essential to human development. One major approach to this problem is to study where these transcription factors bind in the genome, but this does not tell us about the effect of that binding on gene expression levels and

it is generally accepted that much of the binding does not strongly influence gene expression.

DA Cusanovich et al. PLoS Genet 2014;10(3):e1004226. http://dx.doi.org:/10.1371/journal.pgen.1004226

We knocked down 59 TFs and chromatin modifiers in one HapMap lymphoblastoid cell line

to evaluate the context of functional TF binding.

We then identified genes whose expression was affected by the knockdowns

by intersecting the gene expression data with transcription factor binding data
(based on ChIP-seq and DNase-seq)
within 10 kb of the transcription start sites of expressed genes.

This combination of data allowed us to infer functional TF binding.
Only a small subset of genes bound by a factor were

differentially expressed following the knockdown of that factor,
suggesting that most interactions between TF and chromatin
do not result in measurable changes in gene expression levels
of putative target genes.

We found that functional TF binding is enriched

in regulatory elements that harbor a large number of TF binding sites,
at sites with predicted higher binding affinity, and
at sites that are enriched in genomic regions annotated as ‘‘active enhancers.’’

We aim to be able to predict the expression pattern of a gene based on its regulatory
sequence alone.

Combining a TF knockdown approach with TF binding data can help us to

distinguish functional binding from non-functional binding

This approach has previously been applied to the study of human TFs, although for the most part studies have only focused on

the regulatory relationship of a single factor with its downstream targets.

The FANTOM consortium knocked down 52 different transcription factors in

the THP-1 cell line, an acute monocytic leukemia-derived cell line, and
used a subset of these to validate certain regulatory predictions based on binding motif enrichments.

We and others previously studied the regulatory architecture of gene expression in

the model system of HapMap lymphoblastoid cell lines (LCLs) using both
binding map strategies and QTL mapping strategies.

We now sought to use knockdown experiments targeting transcription factors in a HapMap LCL

to refine our understanding of the gene regulatory circuitry of the human genome.

Therefore, We integrated the results of the knockdown experiments with previous data on TF binding to

better characterize the regulatory targets of 59 different factors and
to learn when a disruption in transcription factor binding
is most likely to be associated with variation in the expression level of a nearby gene.

Gene expression levels following the knockdown were compared to

expression data collected from six samples that were transfected with negative control siRNA.

Depending on the factor targeted, the knockdowns resulted in

between 39 and 3,892 differentially expressed genes at an FDR of 5%
(Figure 1B; see Table S3 for a summary of the results).

The knockdown efficiency for the 59 factors ranged

from 50% to 90% (based on qPCR; Table S1).

The qPCR measurements of the knockdown level were significantly

correlated with estimates of the TF expression levels
based on the microarray data (P =0.001; Figure 1C).

Did the factors tended to have a consistent effect (either up- or down-regulation)

on the expression levels of genes they purportedly regulated?

All factors we tested are associated with both up- and down-regulation of downstream targets (Figure 6).

While there is compelling evidence for our inferences, the current chromatin functional annotations

do not fully explain the regulatory effects of the knockdown experiments.

For example, the enrichments for binding in ‘‘strong enhancer’’ regions of the genome range from 7.2% to 50.1% (median = 19.2%),

much beyond what is expected by chance alone, but far from accounting for all functional binding.

A slight majority of downstream target genes were expressed at higher levels

following the knockdown for 15 of the 29 factors for which we had binding information (Figure 6B).

The factor that is associated with the largest fraction (68.8%) of up-regulated target genes following the knockdown is EZH2,

the enzymatic component of the Polycomb group complex.

On the other end of the spectrum was JUND, a member of the AP-1 complex, for which

66.7% of differentially expressed targets were down-regulated following the knockdown.

Our results, combined with the previous work from our group and others make for a complicated view

of the role of transcription factors in gene regulation as
it seems difficult to reconcile the inference from previous work that
many transcription factors should primarily act as activators with the results presented here.

One somewhat complicated hypothesis, which nevertheless can resolve the apparent discrepancy, is that

the ‘‘repressive’’ effects we observe for known activators may be
at sites in which the activator is acting as a weak enhancer of transcription and
that reducing the cellular concentration of the factor
releases the regulatory region to binding by an alternative, stronger activator.

Integrative study of Arabidopsis thaliana metabolomic and transcriptomic data
with the interactiveMarVis-Graph software

M Landesfeind, A Kaever, K Feussner, C Thurow, C Gatz, I Feussner and P Meinicke
PeerJ 2:e239; http://dx.doi.org /10.7717/peerj.239

High-throughput technologies notoriously generate large datasets often including data from different omics platforms. Each dataset contains data for several thousand experimental markers, e.g., mass-to-charge ratios in mass spectrometry or spots in DNA microarray analysis. An experimental marker is associated with an intensity profile which may include several measurements according to different experimental conditions (Dettmer, Aronov & Hammock, 2007).

The combined analysis and visualization of data from different high-throughput technologies remains a key challenge in bioinformatics.We present here theMarVis-Graph software for integrative analysis of metabolic and transcriptomic data. All experimental data is investigated in terms of the full metabolic network obtained from a reference database. The reactions of the network are scored based on the associated data, and

sub-networks, according to connected high-scoring reactions, are identified.

Finally, MarVis-Graph scores the detected sub-networks,

evaluates them by means of a random permutation test and
presents them as a ranked list.

Furthermore, MarVis-Graph features an interactive network visualization that provides researchers with a convenient view on the results.

The key advantage ofMarVis-Graph is the analysis of reactions detached from their pathways so that

it is possible to identify new pathways or
to connect known pathways by previously unrelated reactions.

TheMarVis-Graph software is freely available for academic use and can be downloaded at: http://marvis.gobics.de/marvis-graph.

Significant differences or clusters may be explained by associated annotations, e.g., in terms of metabolic pathways or biological functions. During recent years, numerous specialized tools have been developed to aid biological researchers in automating all these steps (e.g., Medina et al., 2010; Kaever et al., 2009; Waegele et al., 2012). Comprehensive studies can be performed by combining technologies from different omics fields. The combination of transcriptomic and proteomic data sets revealed a strong
correlation between both kinds of data (Nie et al., 2007) and supported the detection of complex interactions, e.g., in RNA silencing (Haq et al., 2010). Moreover, correlations
were detected between RNA expression levels and metabolite abundances (Gibon et al., 2006). Therefore, tools that integrate, analyze and visualize experimental markers from different platforms are needed. To cope with the complexity of genome-wide studies, pathway models are utilized extensively as a simple abstraction of the underlying complex mechanisms. Set Enrichment Analysis (Subramanian et al., 2005) and Over-Representation Analysis (Huang, Sherman & Lempicki, 2009) have become state-of-the-art tools for analyzing large-scale datasets: both methods evaluate predefined sets of entities, e.g., the accumulation of differentially expressed genes in a pathway.

While manually curated pathways are convenient and easy to interpret, experimental studies have shown that all metabolic and signaling pathways are heavily interconnected (Kunkel & Brooks, 2002; Laule et al., 2003). Data from biomolecular databases support these studies: the metabolic network of Arabidopsis thaliana in the KEGG database (Kanehisa et al., 2012; Kanehisa & Goto, 2000) contains 1606 reactions from which 1464 are connected in a single sub-network (>91%), i.e., they
share a metabolite as product or substrate. In the AraCyc 10.0 database (Mueller, Zhang & Rhee, 2003; Rhee et al., 2006), more than 89% of the reactions are counted in a single sub-network. In both databases, most other reactions are completely disconnected. Additionally, Set Enrichment Analyses can not identify links between the predefined sets easily. This becomes even more important when analyzing smaller pathways as provided by the MetaCyc (Caspi et al., 2008; Caspi et al., 2012) database. Moreover, methods that utilize pathways as predefined sets ignore reactions and related biomolecular entities (e.g., metabolites, genes) which are not associated with a single pathway. For example, this affects 4000 reactions in MetaCyc and 2500 in KEGG, respectively (Altman et al., 2013). Therefore, it is desirable to develop additional methods

that do not require predefined sets but may detect enriched sub-networks in the full metabolic network.

While several tools support the statistical analysis of experimental markers from one or more omics technologies and then utilize variants of Set Enrichment Analysis (Xia et al., 2012; Chen et al., 2013; Howe et al., 2011),

no tool is able to explicitly search for connected reactions that include
most of the metabolites, genes, and enyzmes with experimental evidence.

However, the automatic identification of sub-networks has been proven useful in other contexts, e.g., in the analysis of protein–protein-interaction networks (Alcaraz et al., 2012; Baumbach et al., 2012; Maeyer et al., 2013).

MarVis-Graph imports experimental markers from different high-throughput experiments and

analyses them in the context of reaction-chains in full metabolic networks.

Then, MarVis-Graph scores the reactions in the metabolic network

according to the number of associated experimental markers and
identifies sub-networks consisting of subsequent, high-scoring reactions.

The resulting sub-networks are

ranked according to a scoring method and visualized interactively.

Hereby, sub-networks consisting of reactions from different pathways may be identified to be important

whereas the single pathways may not be found to be significantly enriched.

MarVis-Graph may also connect reactions without an assigned pathway

to reactions within a particular pathway.

TheMarVis-Graph tool was applied in a case-study investigating the wound response in Arabidopsis thaliana to analyze combined metabolomic and transcriptomic high-throughput data.

Figure 1 Schema of the metabolic network representation in MarVis-Graph. Metabolite markers are shown in gray, metabolites in red, reactions in blue, enzymes in green, genes in yellow, transcript markers in pink, and pathways in turquoise color. The edges are shown in black with labels that comply with the biological meaning. The orange arrows depict the flow of score for the initial scoring (described in section “Initial Scoring”). (not shown)

In MarVis-Graph, metabolite markers obtained from mass-spectrometry experiments additionally contain the experimental mass. The experimental mass has to be
calculated based on the mass-to-charge ratio (m/z-value) and specific isotope- or adduct-corrections (Draper et al., 2009) by means of specialized tools, e.g.,MarVis-Filter
(Kaever et al., 2012).

For each transcript marker the corresponding annotation has to be given. In DNA microarray experiments, each spot (transcript marker) is specific for a gene and can
therefore be used for annotation. For other technologies an annotation has to be provided by external tools.

In MarVis-Graph, each reaction is scored initially based on the associated experimental data (see “Initial scoring”). This initial scoring is refined (see “Refining the scoring”) and afterwards reactions with a score below a user-defined threshold are removed. The network is

decomposed into subsequent high-scoring reactions that constitute the sub-networks.

The weight of each experimental marker (see “Experimental markers”) is equally distributed over all metabolites and genes associated with the metabolite marker or
transcript marker, respectively. For all vertices, this is repeated as illustrated in Fig. 1 until the weights are accumulated by the reactions.

The initial reaction scores are used as input scoring for the random walk algorithm. The algorithm is performed as described by Glaab et al. (2012) with a user-defined
restart-probability r (default value 0.8). After convergence of the algorithm, reactions with a score lower than the user-defined threshold t (default value t = 1−r) are removed from the reaction network. During the removal process,

the network is decomposed into pairwise disconnected sub-networks containing only high-scoring reactions.

In the following, a resulting sub-network is denoted by a prime: G′ = (V′,L′) with V′ = M′ ∪C′ ∪R′ ∪E′ ∪G′ ∪T′ ∪P′.

The scores of the identified sub-networks can be assessed using a random permutation test, evaluating the marker annotations under the null hypothesis of being connected
randomly. Here, the assignments

from metabolite markers to metabolites and from transcript markers to genes are randomized.

For each association between a metabolite marker and a metabolite,

this connection is replaced by a connection between a randomly chosen metabolite marker and a randomly chosen metabolite.

The random metabolite marker is chosen from the pool of formerly connected metabolite markers. Each connected transcript marker

is associated with a randomly chosen gene.

Choosing from the list of already connected experimental markers ensures that

the sum of weights from the original and the permuted network are equal.

This method differs from the commonly utilized XSwap permutation (Hanhij¨arvi, Garriga & Puolam¨aki, 2009) that is based on swapping endpoints of two random edges. The main difference of our permutation method is that it results in a network with different topological structure, i.e., different degree of the metabolite and gene nodes.

Finally, the sub-networks are detected and scored with the same parameters applied for the original network. Based on the scores of the networks identified in the random
permutations, the family-wise-error-rate (FWER) and false-discovery-rate (FDR) are calculated for each originally identified sub-network.

MarVis-Graph was applied in a case study investigating the A. thaliana wound response. Data from a metabolite fingerprinting (Meinicke et al., 2008) and a DNA microarray
experiment (Yan et al., 2007) were imported into a metabolic network specific for A. thaliana created from the AraCyc 10.0 database (Lamesch et al., 2011). The metabolome
and transcriptome have been measured before wounding as control and at specific time points after wounding in wild-type and in the allene oxide synthase (AOS) knock-out
mutant dde-2-2 (Park et al., 2002) of A. thaliana Columbia (see Table 1). The AOS mutant was chosen, because AOS catalyzes the first specific step in the biosynthesis of the hormone jasmonic acid, which is the key regulator in wound response of plants (Wasternack & Hause, 2013).

Both datasets have been preprocessed with theMarVis-Filter tool (Kaever et al., 2012) utilizing the Kruskal–Wallis p-value calculation on the intensity profiles. Based on the ranking of ascending p-values,

the first 25% of the metabolite markers and 10% of the transcript markers have been selected for further investigation (Data S2).

The filtered metabolite and transcript markers were imported into the metabolic network. For metabolite markers, metabolites were associated

if the metabolite marker’s detected mass differs from the metabolites monoisotopic mass by a maximum of 0.005u.

Transcript markers were linked to the genes whose ID equaled the ID given in the CATMA database (Sclep et al., 2007) for that transcript marker.

Table 2 Vertices in the A. thaliana specific metabolic network after import of experimental markers. Number of objects in the metabolic network
in absolute counts and relative abundances. For experimental markers, the with annotation column gives the number of metabolite markers and
transcript markers that were annotated with a metabolite or gene, respectively. The direct evidence column contains the number of metabolites
and genes, that are associated with a metabolite marker or transcript marker. For enzymes, this is the number of enzymes encoded by a gene with
direct evidence. The number of vertices with an association to a reaction is given in the with reaction column. In the last column, this is given for
associations to metabolic pathways. (not shown)

MarVis-Graph detected a total of 133 sub-networks. The sub-networks were ranked according to size Ss, diameter Sd, and sum-of-weights Ssow
scores (Table S4). Interestingly, the different rankings show a high correlation with all pairwise correlations higher than 0.75 (Pearson correlation
coefficient) and 0.6 (Spearman rank correlation).

Allene-oxide cyclase sub-network
In all rankings, the sub-network allene-oxide cyclase (named after the reaction with the highest score in this sub-network) appeared as top candidate.

This sub-network is constituted of reactions from different pathways related to fatty acids. Figure 2 shows a visualization of the sub-network.
Jasmonic acid biosynthesis. The main part of the sub-network is formed by reactions from the “jasmonic acid biosynthesis” (PlantMetabolic Network, 2013)
resulting in jasmonic acid (jasmonate). The presence of this pathway is very well established because of its central role in mediating the plants wound response
(Reymond & Farmer, 1998; Creelman, Tierney & Mullet, 1992). Additionally, metabolites and transcripts from this pathway were expected to show prominent
expression profiles because AOS, a key enzyme in this pathway, is knocked-out in themutant plant. Jasmonic acid derivatives and hormones.

Jasmonic acid derivatives and hormones. Jasmonate is a precursor for a broad variety of plant hormones (Wasternack & Hause, 2013), e.g., the derivative (-)-
jasmonic acid methyl ester (also Methyl Jasmonic Acid; MeJA) is a volatile, airborne signal mediating wound response between plants (Farmer&Ryan, 1990).
Reactions from the jasmonoyl-amino acid conjugates biosynthesis I (PMN, 2013a) pathway connect jasmonate to different amino acids, including L-valine,
L-leucine, and L-isoleucine. Via these amino acids, this sub-network is connected to the indole-3-acetylamino acid biosynthesis (PMN, 2013b) (IAA biosynthesis).
Again, this pathway produces a well known plant hormone: Auxine (Woodward & Bartel, 2005). Even though, jasmonate and auxin are both plant hormones, their
connection in this subnetwork is of minor relevance because amino acid conjugates are often utilized as active or storage forms of signaling molecules.While
jasmonoyl-amino acid conjugates represent the active signaling form of jasmonates, IAA amino acid conjugates are the storage form of this hormone (Staswick et al.,
2005).

polyhydroxy fatty acids synthesis

Figure 2 Schema of the allene-oxide cyclase sub-network. Metabolites are shown in red, reactions in blue, and enzymes in green color. Metabolites and reactions without direct experimental evidence are marked by a dashed outline and a brighter color while enzymes without experimental evidence are hidden. The metabolic pathways described in section “Resulting sub-networks” are highlighted with different colors. The orange and green parts indicate the reaction chains required to build jasmonate and its amino acid conjugates. The coloring of pathways was done manually after export from MarVis-Graph.

The ω-3-fatty acid desaturase should catalyze a reaction from linoleate to α-linolenate. Metabolite markers that match the mass of crepenynic acid do also match α-linolenate
because both molecules have the same sum-formula and monoisotopic mass. As mentioned above, MarVis-Graph compiled the metabolic network for this study
from the AraCyc database version 10.0. On June 4th, a curator changed the database to remove theΔ12-fatty acid dehydrogenase prior to the release of AraCyc version 11.0.

The presented new software tool MarVis-Graph supports the investigation and visualization of omics data from different fields of study. The introduced algorithm for
identification of sub-networks is able to identify reaction-chains across different pathways and includes reactions that are not associated with a single pathway. The application of MarVis-Graph in the case study on A. thaliana wound response resulted in a convenient graphical representation of high-throughput data which allows the analysis of the complex dynamics in a metabolic network.

Read Full Post »

Ca2+ Signaling: Transcriptional Control

Posted in Biological Networks, Cell Biology, Chemical Biology and its relations to Metabolic Disease, Computational Biology/Systems and Bioinformatics, Frontiers in Cardiology and Cardiovascular Disorders, Gene Regulation and Evolution, International Global Work in Pharmaceutical, Metabolomics, Molecular Genetics & Pharmaceutical, Origins of Cardiovascular Disease, Pharmaceutical Industry Competitive Intelligence, Proteomics, Signaling & Cell Circuits, Statistical Methods for Research Evaluation, Technology Transfer: Biotech and Pharmaceutical, tagged Ca2+/calmodulin-dependent protein kinase, Cornell University, data integration, Domínguez-Rodríguez, Excitation-contraction coupling, genome scale network reconstruction, heart, Messenger RNA, metabolic network reconstruction, Mount Sinai School of Medicine, predictive models, tissue-specific metabolism, United States on March 6, 2013| 7 Comments »

Ca2+ Signaling: Transcriptional Control

Reporter: Larry H. Bernstein, MD, FCAP

Cardiac Physiology (excitation-transcription coupling)(transient receptor potential channels canonical; TRPCs)
The other side of cardiac Ca2+ signaling: transcriptional control
Domínguez-Rodríguez A, Ruiz-Hurtado G, Benitah J-P and Gómez AM
Front. Physio.2012; 3:452. http://dx.doi.org/10.3389/fphys.2012.00452
http://www.FrontPhysiol.com/The_other_side_of_cardiac_Ca2+_signaling:_transcriptional_control
http://www.frontiersin.org/Computational_Physiology_and_Medicine/10.3389/fphys.2012.00299/full

Integration of expression data in genome-scale metabolic network reconstructions
Anna S. Blazier and Jason A. Papin*
Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA
Front. Physiol., 06 August 2012 | http://dx.doi.org/10.3389/fphys.2012.00299
http://

The other side of cardiac Ca2+ signaling: transcriptional control
Alejandro Domínguez-Rodríguez1, Gema Ruiz-Hurtado2, Jean-Pierre Benitah1 and Ana M. Gómez1*
Ca2+ is probably the most versatile signal transduction element used by all cell types. In the heart, it is essential to activate cellular contraction in each heartbeat. Nevertheless Ca2+ is not only a key element in excitation-contraction coupling (EC coupling), but it is also

a pivotal second messenger in cardiac signal transduction, being able to control processes such as
- excitability, metabolism, and transcriptional regulation.

Regarding the latter, Ca2+ activates Ca2+-dependent transcription factors by a process called excitation-transcription coupling (ET coupling). ET coupling is an integrated process by which

the common signaling pathways that regulate EC coupling
- activate transcription factors.

In studies on the development of cardiac hypertrophy, two Ca2+-dependent enzymes are key actors:

Ca2+/Calmodulin kinase II (CaMKII) and
phosphatase calcineurin,

both of which are activated by the complex Ca2+/Calmodulin.

The question now is how ET coupling occurs in cardiomyocytes, where intracellular Ca2+ is continuously oscillating. We draw attention to location of Ca2+ signaling:

intranuclear ([Ca2+]n) or cytoplasmic ([Ca2+]c), and
the specific ionic channels involved in the activation of cardiac ET coupling.

We highlight the role of the 1,4,5 inositol triphosphate receptors (IP3Rs) in the elevation of [Ca2+]n levels, which are important to

locally activate CaMKII, and
the role of transient receptor potential channels canonical (TRPCs) in [Ca2+]c,
- needed to activate calcineurin (Cn).

Keywords: heart, calcium, excitation-transcription coupling, TRPC, nuclear calcium
Citation: Domínguez-Rodríguez A, Ruiz-Hurtado G, Benitah J-P and Gómez AM (2012) The other side of cardiac Ca2+ signaling: transcriptional control.
Front. Physio. 3:452. http://dx.doi.org/10.3389/fphys.2012.00452 Published online: 28 November 2012.
Edited by:Eric A. Sobie, Mount Sinai School of Medicine, USA; Reviewed by: Jeffrey Varner, Cornell University, USA; Ravi Radhakrishnan, University of Pennsylvania, USA

Integration of expression data in genome-scale metabolic network reconstructions
Anna S. Blazier and Jason A. Papin*
Front. Physiol., 06 August 2012 | doi: 10.3389/fphys.2012.00299

With the advent of high-throughput technologies, the field of systems biology has amassed an abundance of “omics” data,

quantifying thousands of cellular components across a variety of scales,
- ranging from mRNA transcript levels to metabolite quantities.

Methods are needed to not only

integrate this omics data but to also
use this data to heighten the predictive capabilities of computational models.

Several recent studies have successfully demonstrated how flux balance analysis (FBA), a constraint-based modeling approach, can be used

to integrate transcriptomic data into genome-scale metabolic network reconstructions
- to generate predictive computational models.

We summarize such FBA-based methods for integrating expression data into genome-scale metabolic network reconstructions, highlighting their advantages as well as their limitations.

Introduction

Genomics provides data on a cell’s DNA sequence,
transcriptomics on the mRNA expression of cells,
proteomics on a cell’s protein composition, and
metabolomics on a cell’s metabolite abundance.

Computational methods are needed to reduce this dimensionality across the wide spectrum of omics data to improve understanding of the underlying biological processes (Cakir et al., 2006; Pfau et al., 2011).

Metabolic network reconstructions are an advantageous platform for the integration of omics data (Palsson, 2002). Assembled in part from

annotated genomes as well as
- biochemical, genetic, and cell phenotype data,
a metabolic network reconstruction is a manually-curated, computational framework that
- enables the description of gene-protein-reaction relationships (Chavali et al., 2012).

Numerous studies have demonstrated how such reconstructions of metabolism can guide the development of biological hypotheses and discoveries (Oberhardt et al., 2010; Sigurdsson et al., 2010; Chang et al., 2011).

Flux balance analysis (FBA), a constraint-based modeling approach, can be used to probe these network reconstructions by

predicting physiologically relevant growth rates as a function of the underlying biochemical networks (Gianchandani et al., 2009).

To do so, FBA involves delineating constraints on the network according to

physicochemical,
environmental,
regulatory, and
thermodynamic principles
- (Kauffman et al., 2003; Price et al., 2003).

After applying constraints, the solution space of possible phenotypes narrows, allowing for more accurate characterization of the reconstructed metabolic network,

Omics data can be used to further constrain the possible solution space and
enhance the model’s predictive powers
- (Palsson, 2002; Lewis et al., 2012).

Given the wealth of transcriptomic data, efforts to integrate mRNA expression data with metabolic network reconstructions, have, in particular, made significant progress when using FBA as an analytical platform (Covert and Palsson, 2002; Akesson et al., 2004; Covert et al., 2004). However, despite this abundance of data, the integration of expression data faces unique challenges such as

experimental and inherent biological noise,
variation among experimental platforms,
detection bias, and the
unclear relationship between gene expression and reaction flux
- (Zhang et al., 2010).

The past few years have witnessed several advances in the integration of transcriptomic data with genome-scale metabolic network reconstructions. Specifically, numerous FBA-driven algorithms have been introduced that use experimentally derived mRNA transcript levels to modify the network’s reactions either by

inactivating them entirely or
by constraining their activity levels.

Such algorithms have demonstrated their applicability by, for example,

classifying tissue-specific metabolic activity in the human network and
by identifying novel drug targets in Mycobacterium tuberculosis
- (Shlomi et al., 2008; Colijn et al., 2009).

We give an overview of the formulation of FBA.
We summarize various FBA-driven methods for integrating expression data into genome-scale metabolic network reconstructions.
We survey the limitations of these algorithms as well as look to the future of

multi-omics data integration using genome-scale metabolic network reconstructions as the scaffold.

Flux balance analysis

FBA is a constraint-based modeling approach that characterizes and predicts aspects of an organism’s metabolism (Gianchandani et al., 2009) To use FBA, the user supplies a metabolic network reconstruction in the form of a stoichiometric matrix, S, where

the rows in S correspond to the metabolites of the reconstruction and
the columns in S represent reactions in the reconstruction.
a stoichiometric coefficient s_ijconveys the molecularity of a certain metabolite in a particular reaction, with
- s_ij ≥ 1 indicating that the metabolite is a product of the reaction,
- s_ij ≤ −1 a reactant, and
- s_ij = 0 signifies that the metabolite is not involved.

A system of linear equations is established by multiplying the S matrix by a column vector, v, which contains the unknown fluxes through each of the reactions of the S matrix. Under the assumption that the system operates at steady-state, that is to say there is no net production or consumption of mass within the system, the product of this matrix multiplication must equal zero, S · v = 0 (Gianchandani et al., 2009). Because the resulting system is underdetermined (i.e., too few equations, too many unknowns), linear programming (LP) is used to optimize for a particular flux,Z, the objective function, subject to underlying constraints. The objective function typically takes on the form of: Z = c ⋅ v
where c is a row vector of weights for each of the fluxes in column vector v, indicating how much each reaction in v contributes to the objective function,Z (Lee et al., 2006; Orth et al., 2010). Examples of objective functions include maximizing biomass, ATP production, and the production of a metabolite of interest (Lewis et al., 2012).

(1)

subject to

S ⋅ v = 0

(2)

lb ≤ v ≤ ub

(3)

(1) outlines the objective function to be optimized,

(2) the steady state assumption, and

(3) describes the upper and lower bounds, ub and lb, of each of the fluxes in v according to such constraints as

enzyme capacities,
maximum uptake and secretion rates, and
thermodynamic constraints
- (Price et al., 2003; Jensen and Papin, 2011).

Through this application of constraints, the solution space of physiologically feasible flux distributions for v shrinks. Thus, the task of FBA is to find a solution to v that lies within the bounded solution space and that optimizes the objective function at the same time.

Several recently developed algorithms have demonstrated how expression data can be incorporated into FBA models to further constrain the flux distribution solution space in genome-scale metabolic network reconstructions .
Summary of the algorithms for the integration of expression data. Table 1 image URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3429070/table/T1/?report=thumb

List of Methods:

GIMME guarantees to both produce a functioning metabolic model based on gene expression levels and quantify the agreement between the model and the data is called the Gene Inactivity Moderated by Metabolism and Expression (GIMME) algorithm (Becker and Palsson, 2008).

iMAT Similar to GIMME, the Integrative Metabolic Analysis Tool (iMAT) results in a functioning model in which the fluxes of reactions correlated with high mRNA levels are maximized and the fluxes of reactions associated with low mRNA levels are minimized (Shlomi et al., 2008; Zur et al., 2010). A key difference is that iMAT does not require a priori knowledge of a defined metabolic functionality. Briefly, this method establishes a tri-valued gene-to-reaction mapping for each reaction in the model according to the level of gene expression in the data. iMAT requires that reactions catalyzed by the products of highly expressed genes are able to carry a minimum flux. By removing this need for user-specified objective functions, iMAT bypasses assumptions about metabolic functionalities of a particular network, which proves advantageous for models where there is no clear objective function, as in models of mammalian cells.

MADE While both GIMME and iMAT rely on user-specified threshold values to determine which reactions are highly expressed and which reactions are lowly expressed, Metabolic Adjustment by Differential Expression (MADE) uses statistically significant changes in gene expression measurements to determine sequences of highly and lowly expressed reactions (Jensen and Papin, 2011). The lack of correlation between mRNA levels and protein levels makes it difficult to accurately determine when genes are “turned on,” and when they are “turned off.” Therefore, in eliminating this need for thresholding, MADE removes significant user-bias from the system.

E-Flux Whereas GIMME, iMAT, and MADE incorporate gene expression data into their models by reducing gene expression levels to binary states, the method E-Flux attempts to more directly incorporate gene expression data into FBA optimization problems by constraining the maximum possible flux through the reactions (Colijn et al., 2009). Rather than setting the upper bounds of a reaction to some large constant or 0, mirroring the implementation of binary-based algorithms, E-Flux constrains the upper bound of a reaction according to its respective gene expression level relative to a particular threshold. In cases where the gene expression data is below a certain threshold, tight constraints are placed on the flux through the corresponding reactions in the reconstruction; conversely, in cases where the gene expression is above a certain threshold, loose constraints are placed on the flux through the corresponding reactions.

PROM In contrast to the other methods discussed, which focused solely on integrating gene expression data into genome-scale metabolic network reconstructions, Probabilistic Regulation of Metabolism (PROM) aims to fuse together metabolic networks and transcription regulatory networks with expression data (Chandrasekaran and Price, 2010). To run PROM, the user supplies a genome-scale metabolic network reconstruction, a regulatory network structure describing transcription factors and their targets, and a range of expression data from various environmental and genetic perturbations. Given this expression data, PROM binarizes the genes with respect to a user-supplied threshold to evaluate the likelihood of the expression of a target gene given the expression of that gene’s transcription factor.

Challenges facing the integration of expression data

Each of the methods discussed hinges on the assumption that mRNA transcript levels are a strong indicator for the level of protein activity. For instance, GIMME and iMAT assume that mRNA levels below a certain threshold suggest that the corresponding reactions are inactive. MADE follows a similar logic, turning reactions on and off depending on the changes in mRNA transcript levels. E-Flux and PROM assume that transcript levels indicate the degree to which reactions are active, evident in the constraining of the upper bounds in the FBA optimization problems associated with these methods.

Rather than requiring that the reconstruction mirror the expression data exactly, the methods allow for deviations in the FBA flux solution space in order to generate a functioning model that adheres to the specified constraints. In the case of GIMME, highly expressed reactions are prioritized relative to lowly expressed reactions; however, in the event that an optimal, functioning solution cannot be found, the assumption can be violated and lowly expressed reactions can be added back into the reconstruction. Thus, this assumption that mRNA transcript levels correlate to protein levels serves as a cue rather than a mandate.

Conclusion

The above methods have been used to not only integrate expression data from a variety of sources but to also make progress toward overcoming key challenges in the field of systems biology. For instance, iMAT, highlighting its applicability in multi-cellular organisms, was used to curate the human metabolic network reconstruction and predict tissue-specific gene activity levels in ten human tissues (Duarte et al., 2007; Shlomi et al., 2008). Additionally, both E-Flux and PROM have been used to discover novel drug targets in Mycobacterium tuberculosis (Colijn et al., 2009; Chandrasekaran and Price, 2010).

Given the recent success with using genome-scale metabolic network reconstructions as a platform for integrating expression data, efforts should focus on multi-omics data integration. A handful of methods have already been introduced that integrate two or more types of omics data into genome-scale metabolic network reconstructions. For example, despite the current dearth of quantitative metabolomics data, a method has been developed that demonstrates how semi-quantitative metabolomics data can be used with transcriptomic data to curate genome-scale metabolic network reconstructions and identify key reactions involved in the production of certain metabolites (Cakir et al., 2006). Another algorithm, called Integrative Omics-Metabolic Analysis (IOMA), integrates metabolomics data and proteomics data into a genome-scale metabolic network reconstruction by evaluating kinetic rate equations subject to quantitative omics measurements (Yizhak et al., 2010). Furthermore, Mass Action Stoichiometric Simulation (MASS) uses metabolomic, fluxomic, and proteomic data to transform a static stoichiometric reconstruction of an organism into a large-scale dynamic network model (Jamshidi and Palsson, 2010). And finally, building off of iMAT, the Model-Building Algorithm (MBA) utilizes literature-based knowledge, transcriptomic, proteomic, metabolomic, and phenotypic data to curate the human metabolic network reconstruction to derive a more complete picture of tissue-specific metabolism (Jerby et al., 2010). Such algorithms show promise in their ability to easily integrate high-throughput data into genome-scale metabolic network reconstructions to generate phenotypically accurate and predictive computational models.