Posts Tagged ‘alternative-strand’

RNAi – On Transcription and Metabolic Control

Writer and Curator: Larry H Bernstein, MD, FCAP



This is the third contribution to a series on transcription and metabolic control. It reveals the enormous complexity in this emerging research.


mRNA, small RNAs, long RNAs, RNAi and DicAR

Aberrant mRNA translation in cancer pathogenesis
Pier Paolo Pandolfi
Oncogene (2004) 23, 3134–3137

As the molecular processes that control mRNA translation and ribosome biogenesis in the eukaryotic cell are extremely complex and multilayered, their deregulation can in principle occur at multiple levels, leading to both disease and cancer pathogenesis. For a long time, it was speculated that disruption of these processes may participate in tumorigenesis, but this notion was, until recently, solely supported by correlative studies. Strong genetic support is now being accrued, while new molecular links between tumor-suppressive and oncogenic pathways and the control of protein synthetic machinery are being unraveled. The importance of aberrant protein synthesis in tumorigenesis is further underscored by the discovery that compounds such as Rapamycin, known to modulate signaling pathways regulatory of this process, are effective anticancer drugs. A number of fundamental questions remain to be addressed and a number of novel ones emerge as this exciting field evolves.


mRNA Translation and Energy Metabolism in Cancer
I. Topisirovic and N. Sonenberg
Cold Spring Harbor Symposia on Quantitative Biology, Volume LXXVI

A prominent feature of cancer cells is the use of aerobic glycolysis under conditions in which oxygen levels are sufficient to support energy production in the mitochondria (Jones and Thompson 2009; Cairns et al. 2010). This phenomenon, named the “Warburg effect,” after its discoverer Otto Warburg, is thought to fuel the biosynthetic requirements of the neoplastic growth (Warburg 1956; Koppenol et al. 2011) and has recently been acknowledged as one of the hallmarks of cancer (Hanahan and Weinberg 2011). mRNA translation is the most energy-demanding process in the cell (Buttgereit and Brand 1995).In mammalian cells it consumes >20% of cellular ATP, not considering the energy that is required for the biosynthesis of the components of the translational machinery (e.g., ribosome biogenesis; Buttgereit and Brand 1995). Control of mRNA translation plays a pivotal role in the regulation of gene expression (Sonenberg and Hinnebusch 2009). In fact, a recent study demonstrated that mammalian proteome is mostly governed at the mRNA translation level (Schwanhausser et al. 2011). Malfunction of mRNA translation critically contributes to human disease, including diabetes, heart disease, blood disorders, and, most notably, cancer (Fig. 1; Crozier et al. 2006; Narla and Ebert 2010; Silvera et al. 2010; Spriggs et al. 2010). The first account of changes in the translational apparatus in cancer dates back to 1896, showing enlarged and irregularly shaped nucleoli that are the site of ribosome biogenesis (Pianese 1896). Rapidly proliferating cancer cells have more ribosomes than normal cells.

Figure 1. Dysregulated mRNA translation plays a pivotal role in cancer. Malignant cells are characterized by enlarged nucleoli and a larger number of ribosomes than their normal counterparts. Mutations and/or altered expression of ribosomal proteins (e.g., RPS19, RPS 24), rRNA-modifying enzymes (e.g., dyskerin), translation initiation factors (e.g., eIF4E), or the initiator tRNA (tRNAiMet) result in malignant transformation. Signaling pathways whose dysfunction is frequent in cancer (e.g., MAPK, PI3K/AKT) affect mRNA translation. Perturbations in the translatome result in aberrant cellular growth, proliferation, and survival characteristic of tumorigenesis.


In stark contrast to normal cells, in cancer cells ribosomal biogenesis is uncoupled from cell proliferation (Stanners et al. 1979). Accordingly, cancer cells exhibit abnormally high rates of protein synthesis (Silvera et al. 2010). That ribosomal dysfunction plays a central role in cancer is further corroborated by the findings that genetic alterations, which encompass the components of the ribosome machinery (i.e., “ribosomopathies”), are characterized by elevated cancer risk (Narla and Ebert 2010).

mRNA translation is the most energy-consuming process in the cell and strongly correlates with cellular metabolic activity. Translation and energy metabolism play important roles in homeostatic cell growth and proliferation, and when dysregulated lead to cancer. eIF4E is a key regulator of translation, which promotes oncogenesis by selectively enhancing translation of a subset of tumor-promoting mRNAs (e.g., cyclins and c-myc). PI3K/AKT and mitogen-activated protein kinase (MAPK) pathways, which are strongly implicated in cancer etiology, exert a number of their biological effects by modulating translation. The PI3K/AKT pathway regulates eIF4E function by inactivating the inhibitory 4E-BPs via mTORC1, whereas MAPKs activate MAP kinase signal-integrating kinases 1 and 2, which phosphorylate eIF4E. In addition, AMP-activated protein kinase, which is a central sensor of the cellular energy balance, impairs translation by inhibiting mTORC1. Thus, eIF4E plays a major role in mediating the effects of PI3K/AKT, MAPK, and cellular energetics on mRNA translation.Figure 2. eIF4E is regulated by multiple mechanisms. The expression of eIF4E is regulated by several transcription factors (e.g., c-myc, hnRNPK, p53) and adenine-uracil-rich element binding proteins (i.e., HuR and AUF1). eIF4E is suppressed by 4E-BPs, which are regulated by mTORC1. MAP kinase signal integrating kinases 1 and 2 (MNKs) phosphorylate eIF4E.


Figure 3. Ras/MAPK and PI3K/AKT/mTORC1 regulate the activity of eIF4E. Various stimuli activate phosphoinositide-3-kinase (PI3K) through the receptor tyrosine kinases (RTKs). Upon activation, PI3K converts phosphatidylinositol 4,5-bisphosphate (PIP2) into phosphatidylinositol-3,4,5-triphosphate (PIP3). This reaction is reversed by PTEN. Phosphoinositide-dependent protein kinase 1 (PDK1) and AKT bind to PIP3 via their pleckstrin homology domains, which allows for the phosphorylation and activation of AKT by PDK1. In addition, the mammalian target of rapamycin complex 2 (mTORC2) modulates the activity of AKT by phosphorylating its hydrophobic motif. AKT phosphorylates tuberous sclerosis complex 2 (TSC2) at multiple sites, which results in its inhibition and consequent activation of Ras homolog enriched in brain (Rheb), which is a small GTPase that activates mTORC1. mTORC1 phosphorylates 4E-BPs leading to their dissociation from eIF4E. In addition to the PI3K/AKT pathway, the activity of mTORC1 is regulated by the serine/threonine kinase 11/LKB1/AMP-kinase (LKB1/AMPK) pathway, regulated in development and DNA damage response 1 (REDD1) and Rag GTPases in response to the changes in cellular energy balance, oxygen and amino acid availability, respectively. Ras and the MAPK pathways are activated by various stimuli through receptor tyrosine kinases (RTKs). In addition the MAPK pathway isactivatedthrough theGprotein–coupled receptors(GPCRs) and byproteinkinaseC (PKC;notshown).TheMAPK pathways encompass an initial GTPase-regulated kinase (MAPKKK), which activates an effector kinase (MAPK) via an intermediate kinase (MAPKK). In response to stimuli such as growth factors, hormones, and phorbol-esters, Ras GTPase stimulates Raf kinase (MAPKKK), which activates extracellular signal-regulated kinases 1 and 2 (ERK 1 and 2) via extracellular signal-regulated kinase activator kinases MEK1 and 2 (MAPKK). Cellular stresses, including osmotic shock, inflammatory cytokines, and UV light, activate p38 MAPKs via multiple mechanisms including Rac kinase (MAPKKK) and MKK3 and 6 (MAPKK). p38 MAPK and ERK activate the MAPK signal–integrating kinases 1 and 2 (MNK1/2), which phosphorylate eIF4E. Additional abbreviations are provided in the text.


Cancer Exosomes Perform Cell-Independent MicroRNA Biogenesis and Promote Tumorigenesis
Cancer Cell Nov, 2014; 26: 707–721.

Breast cancer cells secrete exosomes with specific capacity for cell-independent miRNA biogenesis, while normal cellderivedexosomes lack thisability. Exosomes derivedfrom cancer cellsand serum frompatients withbreast cancer contain the RISC loading complex proteins, Dicer, TRBP, and AGO2, which process pre-miRNAs into mature miRNAs. Cancer exosomes alter the transcriptome of target cells in a Dicer-dependent manner, which stimulate nontumorigenic epithelial cells to form tumors.This study identifies a mechanism whereby cancer cells impart an oncogenic field effect by manipulating the surrounding cells via exosomes. Presence of Dicer in exosomes may serve as biomarker for detection of cancer.

Dicers at RISC. The Mechanism of RNAi

Marcel Tijsterman and Ronald H.A. Plasterk
Cell, Apr 2014; 117:1–4

Figure 1. Model for RNA Silencing in Drosophila In an ordered biochemical pathway, miRNAs (left panel) and siRNAs (right panel) are processed from double-stranded precursor molecules by Dcr-1and Dcr-2, respectively, and stay attached to Dicer-containing complexes, which assemble into RISC. The degree of complementarity between the RNA silencing molecule (in red) and its cognate target determines the fate of the mRNA: blocked translation or immediate destruction.

Argonaute2 Cleaves the Anti-Guide Strand of siRNA during RISC Activation
Cell 2005; 123:621-629
Dicing and slicing- The core machinery of the RNA interference pathway
Scott C Hammond
FEBS Letters 579 (2005) 5822–5829

Fig. 1. Domain organization of RNaseIII gene family. Three classes of RNaseIII genes are shown. The PAZ domain in Dm-Dicer-2 contains mutations in several residues required for RNA binding and may not be functional.

Fig. 2. Model for Dicer catalysis. The PAZ domain binds the 2 nt 30 overhang of a dsRNA terminus. The RNaseIII domains form a pseudo-dimer. Each domain hydrolyzes one strand of the substrate. The binding site of the dsRBD is not defined. The function of the helicase domain is not known.

Fig. 3. Biogenesis pathway of microRNAs. MicroRNA genes are transcribed by RNA polymerase II. The primary transcript is referred to as ‘‘primicroRNA’’. Drosha processing occurs in the nucleus. The resulting precursor, ‘‘pre-microRNA’’, is exported to the cytoplasm for Dicer processing. In a coordinated manner, the mature microRNA is transferred to RISC and unwound by a helicase. mRNA targets that duplex in the Slicer scissile site are cleaved and degraded, if the microRNA is loaded into an Ago2 RISC. Mismatched targets are translationally suppressed. All Ago family members are believed to function in translational suppression.

Fig. 4. Model for Slicer catalysis. The siRNA guide strand is bound at the 50 end by the PIWI domain and at the 30 end by the PAZ domain. The 50 phosphate is coordinated by conserved basic residues. mRNA targets are initially bound by the seed region of the siRNA and pairing is extended to the 30 end. The RNaseH fold hydrolyzes the target in a cation dependent manner. Slicer cleavage is measured from the 50 end of the siRNA. Product is released by an unknown mechanism and the enzyme recycles.



RNA interference (RNAi) is a biological process in which RNA molecules inhibit gene expression, typically by causing the destruction of specific mRNA molecules. Historically, it was known by other names, including co-suppression, post transcriptional gene silencing (PTGS), and quelling. Only after these apparently unrelated processes were fully understood did it become clear that they all described the RNAi phenomenon. Andrew Fire and Craig C. Mello shared the 2006 Nobel Prize in Physiology or Medicine for their work on RNA interference in the nematode worm Caenorhabditis elegans, which they published in 1998.


Two types of small ribonucleic acid (RNA) molecules – microRNA (miRNA) and small interfering RNA (siRNA) – are central to RNA interference. RNAs are the direct products of genes, and these small RNAs can bind to other specific messenger RNA (mRNA) molecules and either increase or decrease their activity, for example by preventing an mRNA from producing a protein. RNA interference has an important role in defending cells against parasitic nucleotide sequences – viruses and transposons. It also influences development.


The RNAi pathway is found in many eukaryotes, including animals, and is initiated by the enzyme Dicer, which cleaves long double-stranded RNA (dsRNA) molecules into short double stranded fragments of ~20 nucleotide siRNAs. Each siRNA is unwound into two single-stranded RNAs (ssRNAs), the passenger strand and the guide strand. The passenger strand is degraded and the guide strand is incorporated into the RNA-induced silencing complex (RISC). The most well-studied outcome is post-transcriptional gene silencing, which occurs when the guide strand pairs with a complementary sequence in a messenger RNA molecule and induces cleavage by Argonaute, the catalytic component of the RISC complex. In some organisms, this process spreads systemically, despite the initially limited molar concentrations of siRNA.

The enzyme dicer trims double stranded RNA, to form small interfering RNA or microRNA. These processed RNAs are incorporated into the RNA-induced silencing.
MiRNA biogenesis and function. (A) The canonical miRNA biogenesis pathway is Drosha- and Dicer-dependent. It begins with RNA Pol II-mediated transcription..


Dicer Promotes Transcription Termination

Dicer Promotes Transcription Termination

Dicer Promotes Transcription Termination at Sites of Replication Stress to Maintain Genome Stability
Cell Oct 2014; 159(3): 572–583


18-13 miRNA- protein complex ap-chap-18-pp-42-728

18-13 miRNA- protein complex ap-chap-18-pp-42-728

18-13 miRNA- protein complex (a) Primary miRNA transcript Translation blocked Hydrogen bond (b) Generation and function of miRNAs Hairpin miRNA miRNA Dicer …



Identification and characterization of small RNAs involved in RNA silencing
FEBS Letters 579 (2005) 5830–5840

Fig. 1. Small RNA cloning procedure. Outline of the small RNA cloning procedure. RNA is dephosphorylated (step 1) for joining the 30 adapter by T4 RNA ligase 1 in the presence of ATP (step 2). The use of a chemically adenylated adapter and truncated form of T4 RNA ligase 2 (Rnl2) allows eliminating the dephosphorylation step (step 4). If the RNA was dephosphorylated, it is re-phosphorylated (step 3) prior to 50 adapter ligation with T4 RNA ligase 1 and ATP (step 5). After 50 adapter ligation, a standard reverse transcription is performed (step 6). Alternatively, after 30 adapter ligation, the RNA is used directly for reverse transcription simultaneously with 50 adaptor joining (step 7). In this case, the property of reverse transcriptase to add non-templated cytidine residues at the 50 end of synthesized DNA is used to facilitate template switch of the reverse transcriptase to the 30 guanosine residues of the 50 adapter (SMART technology, Invitrogen). Abbreviations: P and OH indicate phosphate and hydroxyl ends of the RNA; App indicates 50 chemically adenylated adapter; L, 30 blocking group; CIP, calf alkaline phosphatase and PNK, polynucleotide kinase.


Transcriptional regulatory functions of nuclear long noncoding RNAs
Trends in Genetics, Aug 2014; 30(8):348-356

Cis-acting lncRNAEnhancer-associated lncRNAIntergenic lncRNA


Promoter-associated lncRNA

Proximity transfer

Trans-acting lncRNA


Functional interactions among microRNAs and long noncoding RNAs
Sem Cell Dev Biol 2014; 34:9-14
Genome-wide application of RNAi to the discovery of potential drug targets
FEBS Letters 579 (2005) 5988–599

Fig. 1. Schematic representation of gene silencing by an shRNA-expression vector. The shRNA is processed by Dicer. The processed siRNA enters the RNA-induced silencing complex (RISC), where it targets mRNA for degradation.

Fig. 2. Schematic representation of a transcription system for production of siRNA

Fig. 3. (A) Schematic representation of the proposed siRNA-expression system. Three or four C to U or A to G mutations are introduced into the sense strand. (B) Schematic representation of the discovery of a novel gene using an siRNA library.


Imperfect centered miRNA binding sites are common and can mediate repression of target mRNAs
Martin et al. Genome Biology 2014, 15:R51





Table 1 Number of inferred targets for each miRNA tested

miRNA Probes Transcripts Genes
miR-10a 2,206 5,963 1,887
miR-10a-iso 1,648 1,468 4,211
miR-10b 1,588 3,940 1,365
miR-10b-iso 963 2,235 889
miR-17-5p 1,223 2,862 1,137
miR-17-5p-iso 1,656 3,731 1,461
miR-182 2,261 6,423 2,008
miR-182-iso 1,569 4,316 1,444
miR-23b 2,248 5,383 1,990
miR-27a 2,334 5,310 2,069

Probes: number of probes significantly enriched in pull-downs compared to controls (5% FDR). Transcripts: number of transcripts to which those probes map exactly. Genes: number of genes from which those transcripts originate

Figure 2 Biotin pull-downs identify bone fide miRNA targets. (A) Volcano plot showing the significance of the difference in expression between the miR-17-5p pull-down and the mock-transfected control, for all transcripts expressed in HEK293T cells. Both targets predicted by TargetScan or validated previously via luciferase assay were significantly enriched in the pull-down compared to the controls. (B) Results from luciferase assays on previously untested targets predicted using TargetScan and uncovered using the biotin pull-down. The plot indicates mean luciferase activity from either the empty plasmid or from pMIR containing a miRNA binding site in the 3′ UTR, relative to a negative control. Asterisks indicate a significant reduction in luciferase activity (one-sided t-test; P<0.05) and error bars the standard error of the mean over three replicates. (C-E) Targets identified through PAR-CLIP or through miRNA over-expression studies show greater enrichment in the pull-down. Cumulative distribution of log fold-change in the pull-down for transcripts identified as targets by the indicated miRNA over-expression study or not. Red, canonical transcripts found to be miR-17-5p targets in the indicated study (Table S5 in Additional file 1); black, all other canonical transcripts; p, one-sided P-value from Kolmogorov-Smirnov test for a difference in distributions. (F) To confirm that our results were dependent on RISC association, cells were transfected with either single or double-stranded synthetic miRNAs, then subjected to AGO2 immunoprecipitation. The biotin pull-down was performed in the AGO2-enriched and AGO2-depleted fractions. (G-H) Quantitative RT-PCR revealed that, with double-stranded (ds) miRNA (G), four out of five known targets were enriched relative to input mRNA (*P≤0.05, **P<0.01, ***P<0.001) in the AGO2-enriched but not in the AGO2-depleted fractions, but this enrichment was not seen for the cells transfected with a single-stranded (ss) miRNA (H). The numbers on the x-axis correspond to those in Figure 2F. Error bars represent the standard error of mean (sem).

Figure 5 IsomiRs and canonical miRNAs target many of the same transcripts.

Hammerhead ribozymes in therapeutic target discovery and validation
Drug Disc Today 2009; 14(15/16): 776-783

Figure 1. Features of hammerhead ribozymes. A generic diagram of a hammerhead ribozyme bound to its target substrate: NUH is the cleavage triplet on target sequence, stems I and III are sites of the specific interactions between ribozyme and target, stem II is the structural element connecting separate parts of the catalytic core. Arrows represent the cleavage site, numbering system according to Hertel et al. [60].

hammerhead ribozyme

hammerhead ribozyme


Figure 1  Schematic (A) and ribbon (B) diagrams depicting the crystal structure of the full-length hammerhead ribozyme. The sequence and secondary structure


TABLE 1 Typical examples of successful applications of hammerhead ribozymes. Most of the data are derived from [10] and [11], the others are expressly specified.

  • Growth factors, receptors, transduction elements
  • Oncogenes, protoncogenes, fusion genes
  • Apoptosis, survival factors, drug resistance
  • Transcription factors
  • Extracellular matrix, matrix modulating factors
  • Circulating factors
  • Viral genome, viral genes

Figure 2.Target–ribozyme interactions. (a) As cheme of ribozyme binding to full substrate. The calculated energy of this binding ensures the formation of a stable complex. At the denaturating temperature, Tm, will allow this complex to survive to biological conditions. Conversely, after cleavage, binding energies calculated on single, (b) and (c), ribozyme arms are very low and no longer stable. These properties will ensure both the efficient release of cleavage fragments and the prevention of binding to unrelated targets. RNAs complementary to one binding arm only will not be bound or cleaved by the hammerhead catalytic sequence.

Figure 3. ‘Chemical omics’ approach. According to this target discovery strategy: (1) a first round of ‘omic’ study (proteomic, genomic, metabolomic, …) will enable the discovery of a set of (2) putative markers. A series of hammerhead ribozymes will then be prepared in order to target each marker. (4) A second ‘omic’ study round will be performed on (3) knocked down samples obtained after ribozymes administration. (5) A new series of markers will then be produced. An expanding analytical process of this type may be further repeated. Finally, a robust bioinformatic algorithm will make it possible to connect the different markers and draw new hypothetical links and pathways.



ADAR Enzyme and miRNA Story
Sara Tomaselli, Barbara Bonamassa, Anna Alisi, et al.
Int. J. Mol. Sci. 2013, 14, 22796-22816;

Adenosine deaminase acting on RNA (ADAR) enzymes convert adenosine (A) to inosine (I) in double-stranded (ds) RNAs. Since Inosine is read as Guanosine, the biological consequence of ADAR enzyme activity is an A/G conversion within RNA molecules. A-to-I editing events can occur on both coding and non-coding RNAs, including microRNAs (miRNAs), which are small regulatory RNAs of ~20–23 nucleotides that regulate several cell processes by annealing to target mRNAs and inhibiting their translation. Both miRNA precursors and mature miRNAs undergo A-to-I RNA editing, affecting the miRNA maturation process and activity. ADARs can also edit 3′ UTR of mRNAs, further increasing the interplay between mRNA targets and miRNAs. In this review, we provide a general overview of the ADAR enzymes and their mechanisms of action as well as miRNA processing and function. We then review the more recent findings about the impact of ADAR-mediated activity on the miRNA pathway in terms of biogenesis, target recognition, and gene expression regulation.

Figure 1. Structure of ADAR family proteins: ADAR1, ADAR2, and ADAR3. The ADAR enzymes contain a C-terminal conserved catalytic deaminase domain (DM), two or three dsRBDs in the N-terminal portion. ADAR1 full-length protein also contains a N-terminal Zα domain with a nuclear export signal (NES) and a Zβ domain, while ADAR3 has a  R-domain. A nuclear localization signal is also indicated.


Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites
Doron Betel, Anjali Koppal, Phaedra Agius, Chris Sander, Christina Leslie
Genome Biology 2010, 11:R90

microRNAs are a class of small regulatory RNAs that are involved in post-transcriptional gene silencing. These small (approximately 22 nucleotide) single-strand RNAs guide a gene silencing complex to an mRNA by complementary base pairing, mostly at the 3′ untranslated region (3′ UTR). The association of the RNAinduced silencing complex (RISC) to the conjugate mRNA results in silencing the gene either by translational repression or by degradation of the mRNA. Reliable microRNA target prediction is an important and still unsolved computational challenge, hampered both by insufficient knowledge of microRNA biology as well as the limited number of experimentally validated targets.

mirSVR is a new machine learning method for ranking microRNA target sites by a down-regulation score. The algorithm trains a regression model on sequence and contextual features extracted from miRanda-predicted target sites. In a large-scale evaluation, miRanda-mirSVR is competitive with other target prediction methods in identifying target genes and predicting the extent of their downregulation at the mRNA or protein levels. Importantly, the method identifies a significant number of experimentally determined non-canonical and non-conserved sites.
Human RISC – MicroRNA Biogenesis and Posttranscriptional Gene Silencing
Cell 2005; 123:631-640
Development of microRNA therapeutics
Eva van Rooij & Sakari Kauppinen
EMBO Mol Med (2014) 6: 851–864

MicroRNAs (miRNAs) play key regulatory roles in diverse biological processes and are frequently dysregulated in human diseases. Thus, miRNAs have emerged as a class of promising targets for therapeutic intervention. Here, we describe the current strategies for therapeutic modulation of miRNAs and provide an update on the development of miRNA-based therapeutics for the treatment of cancer, cardiovascular disease and hepatitis C virus (HCV) infection.

Figure 1. miRNA biogenesis and modulation of miRNA activity by miRNA mimics and antimiR oligonucleotides. MiRNA genes are transcribed by RNA polymerase II from intergenic, intronic or polycistronic loci to long primary miRNA transcripts (pri-miRNAs) and processed in the nucleus by the Drosha–DGCR8 complex to approximately 70 nt pre-miRNA hairpin structures. The most common alternative miRNA biogenesis pathway involves short intronic hairpins, termed mirtrons, that are spliced and debranched to form pre-miRNA hairpins. Pre-miRNAs are exported into the cytoplasm and then cleaved by the Dicer–TRBP complex to imperfect miRNA: miRNA* duplexes about 22 nucleotides in length. In the cytoplasm, miRNA duplexes are incorporated into Argonaute-containing miRNA induced silencing complex (miRISC), followed by unwinding of the duplex and retention of the mature miRNA strand in miRISC, while the complementary strand is released and degraded. The mature miRNA functions as a guide molecule for miRISC by directing it to partially complementary sites in the target mRNAs, resulting in translational repression and/or mRNA degradation. Currently, two strategies are employed to modulate miRNA activity: restoring the function of a miRNA using double-stranded miRNA mimics, and inhibition of miRNA function using single-stranded anti-miR oligonucleotides.

Figure 2. Design of chemically modified miRNA modulators. (A) Structures of chemical modifications used in miRNA modulators. A number of different sugar modifications are used to increase the duplex melting temperature (Tm) of anti-miR oligonucleotides. The20-O-methyl(20-O-Me), 20-O-methoxyethyl(20-MOE )and 20-fluoro(20-F) nucleotides are modified at the 20 position of the sugar moiety, whereas locked nucleic acid (LNA) is a bicyclic RNA analogue in which the ribose is locked in a C30-endo conformation by introduction of a 20-O,40-C methylene bridge. To increase nuclease resistance and enhance the pharmacokinetic properties, most anti-miR oligonucleotides harbor phosphorothioate (PS) backbone linkages, in which sulfur replaces one of the non-bridging oxygen atoms in the phosphate group. In morpholino oligomers, a six-membered morpholine ring replaces the sugar moiety. Morpholinos are uncharged and exhibit a slight increase in binding affinity to their cognate miRNAs. PNA oligomers are uncharged oligonucleotide analogues, in which the sugar–phosphate backbone has been replaced by a peptide-like backbone consisting of N-(2-aminoethyl)-glycine units. (B) An example of a synthetic double-stranded miRNA mimic described in this review. One way to therapeutically mimic a miRNA is by using synthetic RNA duplexes that harbor chemical modifications for improved stability and cellular uptake. In such constructs, the antisense (guide) strand is identical to the miRNA of interest, while the sense (passenger) strand is modified and can be linked to a molecule, such as cholesterol, for enhanced cellular uptake. The sense strand contains chemical modifications to prevent mi-RISC loading. Several mismatches can be introduced to prevent this strand from functioning as an anti-miR, while it is further left unmodified to ensure rapid degradation.The20-F modification helps to protect the antisense strand against exonucleases, hence making the guide strand more stable, while it does not interfere with mi-RISC loading. (C) Design of chemically modified anti-miR oligonucleotides described in this review. Antagomirs are30 cholesterol-conjugated,20-O-Me oligonucleotides fully complementary to the mature miRNA sequence with several PS moieties to increase their in vivo stability. The use of unconjugated 20-F/MOE-, 20-MOE- or LNA-modified anti-miR oligonucleotides harboring a complete PS backbone represents another approach for inhibition of miRNA function in vivo. The high duplex melting temperature of LNA-modified oligonucleotides allows efficient miRNA inhibition using truncated, high-affinity 15–16-nucleotide LNA/DNA anti-miR oligonucleotides targeting the 50 region of the mature miRNA. Furthermore, the high binding affinity of fully LNA-modified 8-mer PS oligonucleotides, designated as tiny LNAs, facilitates simultaneous inhibition of entire miRNA seed families by targeting the shared seed sequence.

Human MicroRNA Targets
Bino John, Anton J. Enright, Alexei Aravin, Thomas Tuschl,.., Debora S. Mark
PLoS Biol 2004; 2(11): e363

More than ten years after the discovery of the first miRNA gene, lin-4 (Chalfie et al. 1981; Lee et al. 1993), we know that miRNA genes constitute about 1%–2% of the known genes in eukaryotes. Investigation of miRNA expression combined with genetic and molecular studies in Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana have identified the biological functions of several miRNAs (recent review, Bartel 2004). In C. elegans, lin-4 and let-7 were first discovered as key regulators of developmental timing in early larval developmental transitions (Ambros 2000; Abrahante et al. 2003; Lin et al. 2003; Vella et al. 2004). More recently lsy-6 was shown to determine the left–right asymmetry of chemoreceptor expression (Johnston and Hobert 2003). In D. melanogaster, miR-14 has a role in apoptosis and fat metabolism (Xu et al. 2003) and the bantam miRNA targets the gene hid involved in apoptosis and growth control (Brennecke et al. 2003).

MicroRNAs (miRNAs) interact with target mRNAs at specific sites to induce cleavage of the message or inhibit translation. The specific function of most mammalian miRNAs is unknown. We have predicted target sites on the 39 untranslated regions of human gene transcripts for all currently known 218 mammalian miRNAs to facilitate focused experiments. We report about 2,000 human genes with miRNA target sites conserved in mammals and about 250 human genes conserved as targets between mammals and fish. The prediction algorithm optimizes sequence complementarity using position-specific rules and relies on strict requirements of interspecies conservation. Experimental support for the validity of the method comes from known targets and from strong enrichment of predicted targets in mRNAs associated with the fragile X mental retardation protein in mammals. This is consistent with the hypothesis that miRNAs act as sequence-specific adaptors in the interaction of ribonuclear particles with translationally regulated messages. Overrepresented groups of targets include mRNAs coding for transcription factors, components of the miRNA machinery, and other proteins involved in translational regulation, as well as components of the ubiquitin machinery, representing novel feedback loops in gene regulation. Detailed information about target genes, target processes, and open-source software for target prediction (miRanda) is available at Our analysis suggests that miRNA genes, which are about 1% of all human genes, regulate protein production for 10% or more of all human genes.

Figure 1. Target Prediction Pipeline for miRNA Targets in Vertebrates The mammalian (human, mouse, and rat) and fish (zebra and fugu) 39 UTRs were first scanned for miRNA target sites using position specific rules of sequence complementarity. Next, aligned UTRs of orthologous genes were used to check for conservation of miRNA– target relationships (‘‘target conservation’’) between mammalian genomes and, separately, between fish genomes. The main results (bottom) are the conserved mammalian and conserved fish targets, for each miRNA,as well as a smaller set of super-conserved vertebrate targets.
Figure 2. Distribution of Transcripts with Cooperativity of Target Sites and Estimated Number of False Positives Each bar reflects the number of human transcripts with a given number of target sites on their UTR. Estimated rate of false positives(e.g., 39%for2 targets) is given by the number of target sites predicted using shuffled miRNAs processed in a way identical to real miRNAs, including the use of interspecies conservation filter.

Conserved Seed Pairing, Often improved an-Flanked by Adenosines, Indicates Thousands of Human Genes are MicroRNA Targets
Cell, Jan 2005; 120: 15–20

Integrated analysis of microRNA and mRNA expression. adding biological significance to microRNA target predictions.
Maarten van Iterson, Sander Bervoets, Emile J. de Meijer, et al.
Nucleic Acids Research, 2013; 41(15), e146

Current microRNA target predictions are based on sequence information and empirically derived rules but do not make use of the expression of microRNAs and their targets. This study aimed to improve microRNA target predictions in a given biological context, using in silico predictions, microRNA and mRNA expression. We used target prediction tools to produce lists of predicted targets and used a gene set test designed to detect consistent effects of microRNAs on the joint expression of multiple targets. In a single test, association between microRNA expression and target gene set expression as well as the contribution of the individual target genes on the association are determined. The strongest negatively associated mRNAs as measured by the test were prioritized. We applied our integration method to a well-defined muscle differentiation model. Validation of our predictions in C2C12 cells confirmed predicted targets of known as well as novel muscle-related microRNAs. We further studied associations between microRNA–mRNA pairs in human prostate cancer, finding some pairs that have been recently experimentally validated by others. Using the same study, we showed the advantages of the global test over Pearson correlation and lasso. We conclude that our integrated approach successfully identifies regulated microRNAs and their targets.

Long non-coding RNA and microRNAs might act in regulating the expression of BARD1 mRNAs
Int J Biol & Cell Biol 2014; 54:356-367


Passenger-Strand Cleavage Facilitates Assembly of siRNA into Ago2-Containing RNAi Enzyme Complexes
Cell 2006; 123:607-620


RNAi- RISC Gets Loaded
Cell 2005; 123:543-553
RNAi- The Nuts and Bolts of the RISC Machine
Cell 2005; 122:17-20
Structural domains in RNAi
FEBS Letters 579 (2005) 5841–5849

Fig. 1. A ‘‘Domain-centric’’ view of RNAi. (A) The conserved pathways of RNA silencing. The domain structure of each protein in (hypothetical) interaction with its RNA is shown. For clarity, the second column lists domains in order N- to C-terminal. Figures are not to scale. In brief, Drosha, an RNase III enzyme, and its obligate binding partner, Pasha recognize pri-mRNA loops, and cut these into 70 nt hairpin pre-miRNAs. Dicer utilizes a PAZ domain to sense the 30 2-nt overhang created, and further processes these, and dsRNAs into miRNAs and siRNAs. Argonaute binds the 50 end of guide RNAs via its PIWI domain, and the 30 end via a PAZ domain, yielding RISCs that effect RNA silencing through several mechanisms. A Viral protein, VP19 can suppress RNA silencing by sequestering siRNAs. (B) A summary of known siRNA structural biology. Listed by domain are solved structures, their protein/organism of origin, and ligands, where applicable. Also shown are PDB codes.

Fig. 2. Novel modes of RNA recognition. (A) A typical dsRBD: Xenopus binding protein A (1DI2). A RNA helix is modeled pink, and the protein is rendered in transparent electrostatic contours (blue is basic, red acidic). Note the interaction of helices along the major groove, and the position of helix 1. A second dsRBD protein is visible, in the lower right. (B) A dsRBD, Saccharomyces Rnt1P (1T4L), recognizes hairpin loops. A novel third helix (top) pushes helix one into the loop of a hairpin RNA. (C) 30-OH recognition by PAZ. Human Eif2c1 (1SI3) bound to RNA (pink) is shown. PAZ is green, with transparent electrostatic surface plot. The OB-fold (nucleotide binding fold) and the insertion domain are labeled. Note the glove-and-thumb like cleft they form, that the 30-OH is inserted into. A basic groove (blue) the RNA binds along outside the cleft is visible. (D) A close-up view of PAZ, as in C (surface not-transparent, slightly rotated). See white arrows for orientation, and location of 30-OH binding site. RNA is shown red in sticks. The terminal –OH is barely visible, buried in a cleft. It and the carbon it bonds have been colored yellow for clarity. (E) The PIWI domain (2BGG). Note the insertion of the 50P red (labeled) into the binding site. Its complimentary strand (pink) is not annealed to it, and the 30 overhang and first complimentary bases sit on the protein surface. (F) An enlarged view of (E), with protein in slate and RNA modeled as red sticks. The coordinated magnesium is a grey sphere, which is coordinated by the terminal carboxylate of the protein, protein side chains, and RNA phosphate oxygens. The 50 base stacks against a conserved Tyr. Several other sidechain contacts are shown.

Fig. 3. Argonaute/RISC. (A) P. furiosus Argonaute (PDB 1Z26). A color-guided key to the domains is presented. PAZ sits over the PIWI/N/MID bowl and active site. The liganding atoms for the catalytic metal are depicted as yellow balls for clarity. The tungstate binding site (50P surrogate) is shown as tan spheres. (B) A guide strand channel. Looking down from the PAZ domain towards the active site, Z-sections are clipped off. Colors of domains are as in the key in (A). Wrapping down along a basic cleft from the PAZ 30OH binding site (approximate position labeled), a RNA binding groove passes the active site (yellow), and runs down to the 50P binding site (tan balls). A second cleft running perpendicular to this one at its entry may accommodate target strand RNA. For more detail, and models of siRNA placed into the grooves, see [27,29].

Fig. 4. VP19 sequestration of siRNA. (A) CIRV VP19 (1RPU, RNA removed). Two monomers (blue and cyan) form an 8 strand, concave b-sheet with bracketing helices at the ends. (B) Tombus viral VP19 bound to siRNA (1 monomer shown). RNA strands are modeled as sticks, with one strand pink and one red. The bracketing helix places two tryptophans in position to stack over the terminal RNA bases. On the b-sheet surface, and Arg and a Lys interact with the phosphate backbone, and at the center of the RNA binding surface, a number of Ser and Thr mediate an extensive hydrogen bond network. Both the Trp brackets and RNA binding by an extended b-sheet are unique.


Small RNA asymmetry in RNAi- Function in RISC assembly and gene regulation
FEBS Letters 579 (2005) 5850–5857


The role of the oncofetal IGF2 mRNA-binding protein 3 (IGF2BP3) in cancer
Seminars in Cancer Biol 2014; 29:3-12

Table 1 – Target mRNAs of IGF2BP3.

Target cis-Element Regulation
CD44 3’ -utr Control of mRNA stability
IGF2 5’ -utr Translational control
H19 ncRNA Unknown
ACTB 3’ -utr Unknown
MYC CRD Unknown
CD164 Unknown Control of mRNA stability
MMP9 Unknown Control of mRNA stability
ABCG2 Unknown Unknown
PDPN 3’ -utr Control of mRNA stability
HMGA2 3’ -utr Protection from miR directed degradation
CCND1 3’ -utr translational control
CCND3 3’ -utr translational control
CCNG1 3’ -utr translationalcontrol


Targeting glucose uptake with siRNA-based nanomedicine for cancer therapy
Biomaterials 2015; 51:1-11
The therapeutic potential of RNA interference
FEBS Letters 579 (2005) 5996–6007

Table 1 Companies developing RNAi therapeutics that includes cancer

Company name Primary areas of interest
Atugen AG Metabolic disease; cancer ocular disease; skin disease
Benitec Australia Limited Hepatitis C virus; HIV/AIDS; cancer; diabetes/obesity
Calando Pharmaceuticals Nanoparticle technology
Genta Incorporated Cancer
Intradigm Corporation Cancer; SARS; arthritis
Sirna Therapeutics, Inc. AMD; Hepatitis C virus; asthma; diabetes; cancer; Huntington s disease; hearing loss


The Noncoding RNA Revolution—Trashing Old Rules to Forge New Ones
Cell 2014; 157:77-94

Figure 1. Noncoding RNAs Function in Diverse Contexts Noncoding RNAs function in all domains of life, regulating gene expression from transcription to splicing to translation and contributing to genome organization and stability. Self-splicing RNAs, ribosomes, and riboswitches function in both eukaryotes and bacteria. Archaea (not shown) also utilize ncRNA systems including ribosomes, riboswitches, snoRNPs, and CRISPR. Orange strands, ncRNA performing the action indicated; red strands, the RNA acted upon by the ncRNA. Blue strands, DNA. Triangle, small-molecule metabolite bound by a riboswitch. Ovals indicate protein components of an RNP, such as the spliceosome (white oval), ribosome (two purple subunits), or other RNPs (yellow ovals). Because of the importance of RNA structure in these ncRNAs, some structures are shown but they are not meant to be realistic.


miRNAs and cancer targeting

Table 1 of targets

miRNA Cancer type reference
NA GI cancer Current status of miRNA-targeting therapeutics and preclinical studies against gastroenterological carcinoma
NA Renal cell Differential expression profiling of microRNAs and their potential involvement in renal cell carcinoma pathogenesis
NA urothelial
A microRNA expression ratio defining the invasive phenotype in bladder tumors
miR-31 breast A Pleiotropically Acting MicroRNA, miR-31, inhibits breast cancer growth
miR-512-3p NSCLC Inhibition of RAC1-GEF DOCK3 by miR-512-3p contributes to suppression of metastasis in non-small cell lung cancer
miR-495 gastric Methylation-associated silencing of miR-495 inhibit the migration and invasion of human gastric cancer cells
microRNA-218 prostate microRNA-218 inhibits prostate cancer cell growth and promotes apoptosis by repressing TPD52 expression
MicroRNA-373 cervical cancer MicroRNA-373 functions as an oncogene and targets YOD1 gene in cervical cancer
miR-25 NSCLC miR-25 modulates NSCLC cell radio-sensitivity – inhibiting BTG2 expression
miR-92a cervical cancer miR-92a. upregulated in cervical cancer & promotes cell proliferation and invasion by targeting FBXW7
MiR-153 NSCLC MiR-153 inhibits migration and invasion of human non-small-cell lung cancer by targeting ADAM19
miR-203 melanoma miR-203 inhibits melanoma invasive and proliferative abilities by targeting the polycomb group gene BMI1
miR-204-5p Papillary thyroid miR-204-5p suppresses cell proliferation by inhibiting IGFBP5 in papillary thyroid carcinoma
miR-342-3p Hepato-cellular miR-342-3p affects hepatocellular carcinoma cell proliferation via regulating NF-κB pathway
miR-1271 NSCLC miR-1271 promotes non-small-cell lung cancer cell proliferation and invasion via targeting HOXA5
miR-203 pancreas Pancreatic cancer derived exosomes regulate the expression of TLR4 in dendritic cells via miR-203
miR-203 metastatic SCC Rewiring of an Epithelial Differentiation Factor, miR-203, to Inhibit Human SCC Metastasis
miR-204 RCC TRPM3 and miR-204 Establish a Regulatory Circuit that Controls Oncogenic Autophagy in Clear Cell Renal Cell Carcinoma
NA urologic MicroRNAs and cancer. Current and future perspectives in urologic oncology
NA RCC MicroRNAs and their target gene networks in renal cell carcinoma
NA osteoSA MicroRNAs in osteosarcoma
NA urologic MicroRNA in Prostate, Bladder, and Kidney Cancer
NA urologic Micro-RNA profiling in kidney and bladder cancers


Current status of miRNA-targeting therapeutics and preclinical studies against gastroenterological carcinoma
Shibata et al. Molecular and Cellular Therapies 2013, 1:5

Differential expression profiling of microRNAs and their potential involvement in renal cell carcinoma pathogenesis
Clinical Biochemistry 43 (2010) 150–158

A microRNA expression ratio defining the invasive phenotype in bladder tumors
Urologic Oncology: Seminars and Original Investigations 28 (2010) 39–48

A Pleiotropically Acting MicroRNA, miR-31, inhibits breast cancer growth
Cell 137, 1032–1046, June 12, 2009

Inhibition of RAC1-GEF DOCK3 by miR-512-3p contributes to suppression of metastasis in non-small cell lung cancer
Intl JBiochem & Cell Biol 2015; 61:103-114

Methylation-associated silencing of miR-495 inhibit the migration and invasion of human gastric cancer cells by directly targeting PRL-3
Biochem Biochem Res Commun 2014; 456:344-350

microRNA-218 inhibits prostate cancer cell growth and promotes apoptosis by repressing TPD52 expression
Biochem Biophys Res Commun 2015; 456:804-809

MicroRNA-373 functions as an oncogene and targets YOD1 gene in cervical cancer
BBRC 2015; xx:1-6

miR-25 modulates NSCLC cell radio-sensitivity – inhibiting BTG2 expression
BBRC 2015; 457:235-241

miR-92a. upregulated in cervical cancer & promotes cell proliferation and invasion by targeting FBXW7
BBRC 2015; 458:63-69

MiR-153 inhibits migration and invasion of human non-small-cell lung cancer by targeting ADAM19
BBRC 2015; 456:381-385

miR-203 inhibits melanoma invasive and proliferative abilities by targeting the polycomb group gene BMI1
BBMC 2015; 456: 361-366

miR-204-5p suppresses cell proliferation by inhibiting IGFBP5 in papillary thyroid carcinoma
BBRC 2015; 457:621-627

miR-342-3p affects hepatocellular carcinoma cell proliferation via regulating NF-κB pathway
BBRC 2015; 457:370-377

miR-1271 promotes non-small-cell lung cancer cell proliferation and invasion via targeting HOXA5
BBRC 2015; 458:714-719

Pancreatic cancer derived exosomes regulate the expression of TLR4 in dendritic cells via miR-203
Cell Immunol 2014; 292:65-69

Rewiring of an Epithelial Differentiation Factor, miR-203, to Inhibit Human Squamous Cell Carcinoma Metastasis
Cell Reports 2014; 9:104-117

TRPM3 and miR-204 Establish a Regulatory Circuit that Controls Oncogenic Autophagy in Clear Cell Renal Cell Carcinoma
Cancer Cell Nov 10, 2014; 26: 738–753

MicroRNA in Prostate, Bladder, and Kidney Cancer
Eur Urol 2011; 59:671-681

Micro-RNA profiling in kidney and bladder cancers
Urologic Oncology: Seminars and Original Investigations 2007; 25:387–392

MicroRNAs and cancer. Current and future perspectives in urologic oncology
Urologic Oncology: Seminars and Original Investigations 2010; 28:4–13

MicroRNAs and their target gene networks in renal cell carcinoma
BBRC 2011; 405:153-156

MicroRNAs in osteosarcoma
Clin Chim Acta 2015; 444:9-17


Table 2. miRNA cancer therapeutics



  • miRNA and mRNA cancer signatures determined by analysis of expression levels in large cohorts of patients
    | PNAS | Nov 19, 2013; 110(47): 19160–19165 study of mRNA and microRNA (miRNA) expression profiles of cells and tissue has become a major tool for therapeutic development. The results of such experiments are expected to change the methods used in the diagnosis and prognosis of disease. We introduce surprisal analysis, an information-theoretic approach grounded in thermodynamics, to compactly transform the information acquired from microarray studies into applicable knowledge about the cancer phenotypic state. The analysis of mRNA and miRNA expression data from ovarian serous carcinoma, prostate adenocarcinoma, breast invasive carcinoma, and lung adenocarcinoma cancer patients and organ specific control patients identifies cancer-specific signatures. We experimentally examine these signatures and their respective networks as possible therapeutic targets for cancer in single cell experiments.



RNA editing is vital to provide the RNA and protein complexity to regulate the gene expression. Correct RNA editing maintains the cell function and organism development. Imbalance of the RNA editing machinery may lead to diseases and cancers. Recently,RNA editing has been recognized as a target for drug discovery although few studies targeting RNA editing for disease and cancer therapy were reported in the field of natural products. Therefore, RNA  editing may be a potential target for therapeutic natural products


Aberrant microRNA (miRNA) expression is implicated in tumorigenesis. The underlying mechanisms are unclear because the regulations of each miRNA on potentially hundreds of mRNAs are sample specific.


We describe a novel approach to infer Probabilistic Mi RNA–mRNA  Interaction Signature (‘ProMISe’) from a single pair of miRNA–mRNA expression profile. Our model considers mRNA and miRNA competition as a probabilistic function of the expressed seeds (matches). To demonstrate ProMISe, we extensively exploited The Cancer Genome Atlas data. As a target predictor, ProMISe identifies more confidence/validated targets than other methods. Importantly, ProMISe confers higher cancer diagnostic power than using expression profiles alone.

Gene set enrichment analysis on averaged ProMISe uniquely revealed respective target enrichments of oncomirs miR-21 and 145 in glioblastoma and ovarian cancers. Moreover, comparing matched breast (BRCA) and thyroid (THCA) tumor/normal samples uncovered thousands of tumor-related interactions. For example, ProMISe– BRCA network involves miR-155/183/21, which exhibits higher ProMISe coupled with coherently higher miRNA expression and lower target expression; oncomirs miR-221/222 in the ProMISe–THCA network engage with many downregulated target genes. Together, our probabilistic approach of integrating expression and sequence scores establishes a functional link between the aberrant miRNA and mRNA expression, which was previously under-appreciated due to the methodological differences.






Read Full Post »

CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics – Part IIB

Curator: Larry H Bernstein, MD, FCAP

Part I: The Initiation and Growth of Molecular Biology and Genomics – Part I From Molecular Biology to Translational Medicine: How Far Have We Come, and Where Does It Lead Us?

Part II: CRACKING THE CODE OF HUMAN LIFE is divided into a three part series.

Part IIA. “CRACKING THE CODE OF HUMAN LIFE: Milestones along the Way” reviews the Human Genome Project and the decade beyond.

Part IIB. “CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics” lays the manifold multivariate systems analytical tools that has moved the science forward to a groung that ensures clinical application.

Part IIC. “CRACKING THE CODE OF HUMAN LIFE: Recent Advances in Genomic Analysis and Disease “ will extend the discussion to advances in the management of patients as well as providing a roadmap for pharmaceutical drug targeting.

To be followed by:
Part III will conclude with Ubiquitin, it’s role in Signaling and Regulatory Control.

Part IIB. “CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics” is a continuation of a previous discussion on the role of genomics in discovery of therapeutic targets titled, Directions for Genomics in Personalized Medicinewhich focused on:

  • key drivers of cellular proliferation,
  • stepwise mutational changes coinciding with cancer progression, and
  • potential therapeutic targets for reversal of the process.

It is a direct extension of The Initiation and Growth of Molecular Biology and Genomics – Part I 

These articles review a web-like connectivity between inter-connected scientific discoveries, as significant findings have led to novel hypotheses and many expectations over the last 75 years. This largely post WWII revolution has driven our understanding of biological and medical processes at an exponential pace owing to successive discoveries of
  • chemical structure,
  • the basic building blocks of DNA  and proteins, of
  • nucleotide and protein-protein interactions,
  • protein folding,
  • allostericity,
  • genomic structure,
  • DNA replication,
  • nuclear polyribosome interaction, and
  • metabolic control.


In addition, the emergence of methods for

  • copying,
  • removal
  • insertion, and
  • improvements in structural analysis
  • developments in applied mathematics have transformed the research framework.

This last point,

  • developments in applied mathematics have transformed the research framework, is been developed in this very article

CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics – Part IIB

Computational Genomics

1. Three-Dimensional Folding and Functional Organization Principles of The Drosophila Genome

Sexton T, Yaffe E, Kenigeberg E, Bantignies F,…Cavalli G. Institute de Genetique Humaine, Montpelliere GenomiX, and Weissman Institute, France and Israel. Cell 2012; 148(3): 458-472.

Chromosomes are the physical realization of genetic information and thus form the basis for its readout and propagation.

250px-DNA_labeled  DNA diagram showing base pairing      circular genome map

Here we present a high-resolution chromosomal contact map derived from

  • a modified genome-wide chromosome conformation capture approach applied to Drosophila embryonic nuclei.
  • the entire genome is linearly partitioned into well-demarcated physical domains that overlap extensively with active and repressive epigenetic marks.
  • Chromosomal contacts are hierarchically organized between domains.
  • Global modeling of contact density and clustering of domains show that inactive
  • domains are condensed and confined to their chromosomal territories, whereas
  • active domains reach out of the territory to form remote intra- and interchromosomal contacts.

Moreover, we systematically identify

  • specific long-range intrachromosomal contacts between Polycomb-repressed domains.

Together, these observations

  • allow for quantitative prediction of the Drosophila chromosomal contact map,
  • laying the foundation for detailed studies of chromosome structure and function in a genetically tractable system.


2A. Architecture Reveals Genome’s Secrets

Three-dimensional genome maps – Human chromosome

Genome sequencing projects have provided rich troves of information about

  • stretches of DNA that regulate gene expression, as well as
  • how different genetic sequences contribute to health and disease.

But these studies miss a key element of the genome—its spatial organization—which has long been recognized as an important regulator of gene expression.

  • Regulatory elements often lie thousands of base pairs away from their target genes, and recent technological advances are allowing scientists to begin examining
  • how distant chromosome locations interact inside a nucleus.
  • The creation and function of 3-D genome organization, some say, is the next frontier of genetics.

Mapping and sequencing may be completely separate processes. For example, it’s possible to determine the location of a gene—to “map” the gene—without sequencing it. Thus, a map may tell you nothing about the sequence of the genome, and a sequence may tell you nothing about the map.  But the landmarks on a map are DNA sequences, and mapping is the cousin of sequencing. A map of a sequence might look like this:
On this map, GCC is one landmark; CCCC is another. Here we find, the sequence is a landmark on a map. In general, particularly for humans and other species with large genomes,

  • creating a reasonably comprehensive genome map is quicker and cheaper than sequencing the entire genome.
  • mapping involves less information to collect and organize than sequencing does.

Completed in 2003, the Human Genome Project (HGP) was a 13-year project. The goals were:

  • identify all the approximately 20,000-25,000 genes in human DNA,
  • determine the sequences of the 3 billion chemical base pairs that make up human DNA,
  • store this information in databases,
  • improve tools for data analysis,
  • transfer related technologies to the private sector, and
  • address the ethical, legal, and social issues (ELSI) that may arise from the project.

Though the HGP is finished, analyses of the data will continue for many years. By licensing technologies to private companies and awarding grants for innovative research, the project catalyzed the multibillion-dollar U.S. biotechnology industry and fostered the development of new medical applications. When genes are expressed, their sequences are first converted into messenger RNA transcripts, which can be isolated in the form of complementary DNAs (cDNAs). A small portion of each cDNA sequence is all that is needed to develop unique gene markers, known as sequence tagged sites or STSs, which can be detected using the polymerase chain reaction (PCR). To construct a transcript map, cDNA sequences from a master catalog of human genes were distributed to mapping laboratories in North America, Europe, and Japan. These cDNAs were converted to STSs and their physical locations on chromosomes determined on one of two radiation hybrid (RH) panels or a yeast artificial chromosome (YAC) library containing human genomic DNA. This mapping data was integrated relative to the human genetic map and then cross-referenced to cytogenetic band maps of the chromosomes. (Further details are available in the accompanying article in the 25 October issue of SCIENCE).

Tremendous progress has been made in the mapping of human genes, a major milestone in the Human Genome Project. Apart from its utility in advancing our understanding of the genetic basis of disease, it  provides a framework and focus for accelerated sequencing efforts by highlighting key landmarks (gene-rich regions) of the chromosomes. The construction of this map has been possible through the cooperative efforts of an international consortium of scientists who provide equal, full and unrestricted access to the data for the advancement of biology and human health.

There are two types of maps: genetic linkage map and physical map. The genetic linkage map shows the arrangement of genes and genetic markers along the chromosomes as calculated by the frequency with which they are inherited together. The physical map is representation of the chromosomes, providing the physical distance between landmarks on the chromosome, ideally measured in nucleotide bases. Physical maps can be divided into three general types: chromosomal or cytogenetic maps, radiation hybrid (RH) maps, and sequence maps.
 ch10f3  radiation hybrid maps   ch10f2  subchromosomal mapping

2B. Genome-nuclear lamina interactions and gene regulation.

Kind J, van Steensel B. Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, The Netherlands.
The nuclear lamina, a filamentous protein network that coats the inner nuclear membrane, has long been thought to interact with specific genomic loci and regulate their expression. Molecular mapping studies have now identified
  • large genomic domains that are in contact with the lamina.
Genes in these domains are typically repressed, and artificial tethering experiments indicate that
  • the lamina can actively contribute to this repression.
Furthermore, the lamina indirectly controls gene expression in the nuclear interior by sequestration of certain transcription factors.
Mol Cell. 2010; 38(4):603-13.
Peric-Hupkes D, Meuleman W, Pagie L, Bruggeman SW, Solovei I,  …., van Steensel B.  Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, The Netherlands.
To visualize three-dimensional organization of chromosomes within the nucleus, we generated high-resolution maps of genome-nuclear lamina interactions during subsequent differentiation of mouse embryonic stem cells via lineage-committed neural precursor cells into terminally differentiated astrocytes.  A basal chromosome architecture present in embryonic stem cells is cumulatively altered at hundreds of sites during lineage commitment and subsequent terminal differentiation. This remodeling involves both
  • individual transcription units and multigene regions and
  • affects many genes that determine cellular identity.
  •  genes that move away from the lamina are concomitantly activated;
  • others, remain inactive yet become unlocked for activation in a next differentiation step.

lamina-genome interactions are widely involved in the control of gene expression programs during lineage commitment and terminal differentiation.

 view the full text on ScienceDirect.
Graphical Summary
PDF 1.54 MB
Referred to by: The Silence of the LADs: Dynamic Genome-…
Authors:  Daan Peric-Hupkes, Wouter Meuleman, Ludo Pagie, Sophia W.M. Bruggeman, et al.
  • Various cell types share a core architecture of genome-nuclear lamina interactions
  • During differentiation, hundreds of genes change their lamina interactions
  • Changes in lamina interactions reflect cell identity
  • Release from the lamina may unlock some genes for activation

Fractal “globule”

About 10 years ago—just as the human genome project was completing its first draft sequence—Dekker pioneered a new technique, called chromosome conformation capture (C3) that allowed researchers to get a glimpse of how chromosomes are arranged relative to each other in the nucleus. The technique relies on the physical cross-linking of chromosomal regions that lie in close proximity to one another. The regions are then sequenced to identify which regions have been cross-linked. In 2009, using a high throughput version of this basic method, called Hi-C, Dekker and his collaborators discovered that the human genome appears to adopt a “fractal globule” conformation—

  • a manner of crumpling without knotting.


In the last 3 years, Jobe Dekker and others have advanced technology even further, allowing them to paint a more refined picture of how the genome folds—and how this influences gene expression and disease states.  Dekker’s 2009 findings were a breakthrough in modeling genome folding, but the resolution—about 1 million base pairs— was too crude to allow scientists to really understand how genes interacted with specific regulatory elements. The researchers report two striking findings.

First, the human genome is organized into two separate compartments, keeping

  • active genes separate and accessible
  • while sequestering unused DNA in a denser storage compartment.
  • Chromosomes snake in and out of the two compartments repeatedly
  • as their DNA alternates between active, gene-rich and inactive, gene-poor stretches.

Second, at a finer scale, the genome adopts an unusual organization known in mathematics as a “fractal.” The specific architecture the scientists found, called

  • a “fractal globule,” enables the cell to pack DNA incredibly tightly —

the information density in the nucleus is trillions of times higher than on a computer chip — while avoiding the knots and tangles that might interfere with the cell’s ability to read its own genome. Moreover, the DNA can easily Unfold and Refold during

  • gene activation,
  • gene repression, and
  • cell replication.

Dekker and his colleagues discovered, for example, that chromosomes can be divided into folding domains—megabase-long segments within which

  • genes and regulatory elements associate more often with one another than with other chromosome sections.

The DNA forms loops within the domains that bring a gene into close proximity with a specific regulatory element at a distant location along the chromosome. Another group, that of molecular biologist Bing Ren at the University of California, San Diego, published a similar finding in the same issue of Nature.  Dekker thinks the discovery of [folding] domains will be one of the most fundamental [genetics] discoveries of the last 10 years. The big questions now are

  • how these domains are formed, and
  • what determines which elements are looped into proximity.

“By breaking the genome into millions of pieces, we created a spatial map showing how close different parts are to one another,” says co-first author Nynke van Berkum, a postdoctoral researcher at UMass Medical School in Dekker‘s laboratory. “We made a fantastic three-dimensional jigsaw puzzle and then, with a computer, solved the puzzle.”

Lieberman-Aiden, van Berkum, Lander, and Dekker’s co-authors are Bryan R. Lajoie of UMMS; Louise Williams, Ido Amit, and Andreas Gnirke of the Broad Institute; Maxim Imakaev and Leonid A. Mirny of MIT; Tobias Ragoczy, Agnes Telling, and Mark Groudine of the Fred Hutchison, Cancer Research Center and the University of Washington; Peter J. Sabo, Michael O. Dorschner, Richard Sandstrom, M.A. Bender, and John Stamatoyannopoulos of the University of Washington; and Bradley Bernstein of the Broad Institute and Harvard Medical School.

2C. three-dimensional structure of the human genome

Lieberman-Aiden et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 2009; DOI: 10.1126/science.1181369.
Harvard University (2009, October 11). 3-D Structure Of Human Genome: Fractal Globule Architecture Packs Two Meters Of DNA Into Each Cell. ScienceDaily.   Retrieved February 2, 2013, from

Using a new technology called Hi-C and applying it to answer the thorny question of how each of our cells stows some three billion base pairs of DNA while maintaining access to functionally crucial segments. The paper comes from a team led by scientists at Harvard University, the Broad Institute of Harvard and MIT, University of Massachusetts Medical School, and the Massachusetts Institute of Technology. “We’ve long known that on a small scale, DNA is a double helix,” says co-first author Erez Lieberman-Aiden, a graduate student in the Harvard-MIT Division of Health Science and Technology and a researcher at Harvard’s School of Engineering and Applied Sciences and in the laboratory of Eric Lander at the Broad Institute. “But if the double helix didn’t fold further, the genome in each cell would be two meters long. Scientists have not really understood how the double helix folds to fit into the nucleus of a human cell, which is only about a hundredth of a millimeter in diameter. This new approach enabled us to probe exactly that question.”

The mapping technique that Aiden and his colleagues have come up with bridges a crucial gap in knowledge—between what goes on at the smallest levels of genetics (the double helix of DNA and the base pairs) and the largest levels (the way DNA is gathered up into the 23 chromosomes that contain much of the human genome). The intermediate level, on the order of thousands or millions of base pairs, has remained murky.  As the genome is so closely wound, base pairs in one end can be close to others at another end in ways that are not obvious merely by knowing the sequence of base pairs. Borrowing from work that was started in the 1990s, Aiden and others have been able to figure out which base pairs have wound up next  to one another. From there, they can begin to reconstruct the genome—in three dimensions.

4C profiles validate the Hi-C Genome wide map

Even as the multi-dimensional mapping techniques remain in their early stages, their importance in basic biological research is becoming ever more apparent. “The three-dimensional genome is a powerful thing to know,” Aiden says. “A central mystery of biology is the question of how different cells perform different functions—despite the fact that they share the same genome.” How does a liver cell, for example, “know” to perform its liver duties when it contains the same genome as a cell in the eye? As Aiden and others reconstruct the trail of letters into a three-dimensional entity, they have begun to see that “the way the genome is folded determines which genes were

2D. “Mr. President; The Genome is Fractal !”

Eric Lander (Science Adviser to the President and Director of Broad Institute) et al. delivered the message on Science Magazine cover (Oct. 9, 2009) and generated interest in this by the International HoloGenomics Society at a Sept meeting.

First, it may seem to be trivial to rectify the statement in “About cover” of Science Magazine by AAAS.

  • The statement “the Hilbert curve is a one-dimensional fractal trajectory” needs mathematical clarification.

The mathematical concept of a Hilbert space, named after David Hilbert, generalizes the notion of Euclidean space. It extends the methods of vector algebra and calculus from the two-dimensional Euclidean plane and three-dimensional space to spaces with any finite or infinite number of dimensions. A Hilbert space is an abstract vector space possessing the structure of an inner product that allows length and angle to be measured. Furthermore, Hilbert spaces must be complete, a property that stipulates the existence of enough limits in the space to allow the techniques of calculus to be used. A Hilbert curve (also known as a Hilbert space-filling curve) is a continuous fractal space-filling curve first described by the German mathematician David Hilbert in 1891,[1] as a variant of the space-filling curves discovered by Giuseppe Peano in 1890.[2] For multidimensional databases, Hilbert order has been proposed to be used instead of Z order because it has better locality-preserving behavior.

Representation as Lindenmayer system
The Hilbert Curve can be expressed by a rewrite system (L-system).

Alphabet : A, B

Constants : F + –                                                                                                                                      119px-Hilbert3d-step3                             120px-Hilbert512

Axiom : A

Production rules:

A → – B F + A F A + F B –

B → + A F – B F B – F A +

Here, F means “draw forward”, – means “turn left 90°”, and + means “turn right 90°” (see turtle graphics).


While the paper itself does not make this statement, the new Editorship of the AAAS Magazine might be even more advanced if the previous Editorship did not reject (without review) a Manuscript by 20+ Founders of (formerly) International PostGenetics Society in December, 2006.

Second, it may not be sufficiently clear for the reader that the reasonable requirement for the DNA polymerase to crawl along a “knot-free” (or “low knot”) structure does not need fractals. A “knot-free” structure could be spooled by an ordinary “knitting globule” (such that the DNA polymerase does not bump into a “knot” when duplicating the strand; just like someone knitting can go through the entire thread without encountering an annoying knot): Just to be “knot-free” you don’t need fractals. Note, however, that

  • the “strand” can be accessed only at its beginning – it is impossible to e.g. to pluck a segment from deep inside the “globulus”.

This is where certain fractals provide a major advantage – that could be the “Eureka” moment for many readers. For instance,

  • the mentioned Hilbert-curve is not only “knot free” –
  • but provides an easy access to “linearly remote” segments of the strand.

If the Hilbert curve starts from the lower right corner and ends at the lower left corner, for instance

  • the path shows the very easy access of what would be the mid-point
  • if the Hilbert-curve is measured by the Euclidean distance along the zig-zagged path.

Likewise, even the path from the beginning of the Hilbert-curve is about equally easy to access – easier than to reach from the origin a point that is about 2/3 down the path. The Hilbert-curve provides an easy access between two points within the “spooled thread”; from a point that is about 1/5 of the overall length to about 3/5 is also in a “close neighborhood”.

This may be the “Eureka-moment” for some readers, to realize that

  • the strand of “the Double Helix” requires quite a finess to fold into the densest possible globuli (the chromosomes) in a clever way
  • that various segments can be easily accessed. Moreover, in a way that distances between various segments are minimized.

This marvellous fractal structure is illustrated by the 3D rendering of the Hilbert-curve. Once you observe such fractal structure, you’ll never again think of a chromosome as a “brillo mess”, would you? It will dawn on you that the genome is orders of magnitudes more finessed than we ever thought so.

Those embarking at a somewhat complex review of some historical aspects of the power of fractals may wish to consult the ouvre of Mandelbrot (also, to celebrate his 85th birthday). For the more sophisticated readers, even the fairly simple Hilbert-curve (a representative of the Peano-class) becomes even more stunningly brilliant than just some “see through density”. Those who are familiar with the classic “Traveling Salesman Problem” know that “the shortest path along which every given n locations can be visited once, and only once” requires fairly sophisticated algorithms (and tremendous amount of computation if n>10 (or much more). Some readers will be amazed, therefore, that for n=9 the underlying Hilbert-curve helps to provide an empirical solution.

refer to

Briefly, the significance of the above realization, that the (recursive) Fractal Hilbert Curve is intimately connected to the (recursive) solution of TravelingSalesman Problem, a core-concept of Artificial Neural Networks can be summarized as below.

Accomplished physicist John Hopfield (already a member of the National Academy of Science) aroused great excitement in 1982 with his (recursive) design of artificial neural networks and learning algorithms which were able to find reasonable solutions to combinatorial problems such as the Traveling SalesmanProblem. (Book review Clark Jeffries, 1991, see also 2. J. Anderson, R. Rosenfeld, and A. Pellionisz (eds.), Neurocomputing 2: Directions for research, MIT Press, Cambridge, MA, 1990):

“Perceptions were modeled chiefly with neural connections in a “forward” direction: A -> B -* C — D. The analysis of networks with strong backward coupling proved intractable. All our interesting results arise as consequences of the strong back-coupling” (Hopfield, 1982).

The Principle of Recursive Genome Function surpassed obsolete axioms that blocked, for half a Century, entry of recursive algorithms to interpretation of the structure-and function of (Holo)Genome.  This breakthrough, by uniting the two largely separate fields of Neural Networks and Genome Informatics, is particularly important for

  • those who focused on Biological (actually occurring) Neural Networks (rather than abstract algorithms that may not, or because of their core-axioms, simply could not
  • represent neural networks under the governance of DNA information).

DNA base triplets

3A. The FractoGene Decade

from Inception in 2002 to Proofs of Concept and Impending Clinical Applications by 2012

  1. Junk DNA Revisited (SF Gate, 2002)
  2. The Future of Life, 50th Anniversary of DNA (Monterey, 2003)
  3. Mandelbrot and Pellionisz (Stanford, 2004)
  4. Morphogenesis, Physiology and Biophysics (Simons, Pellionisz 2005)
  5. PostGenetics; Genetics beyond Genes (Budapest, 2006)
  6. ENCODE-conclusion (Collins, 2007)

The Principle of Recursive Genome Function (paper, YouTube, 2008)

  1. Cold Spring Harbor presentation of FractoGene (Cold Spring Harbor, 2009)
  2. Mr. President, the Genome is Fractal! (2009)
  3. HolGenTech, Inc. Founded (2010)
  4. Pellionisz on the Board of Advisers in the USA and India (2011)
  5. ENCODE – final admission (2012)
  6. Recursive Genome Function is Clogged by Fractal Defects in Hilbert-Curve (2012)
  7. Geometric Unification of Neuroscience and Genomics (2012)
  8. US Patent Office issues FractoGene 8,280,641 to Pellionisz (2012)

When the human genome was first sequenced in June 2000, there were two pretty big surprises. The first was thathumans have only about 30,000-40,000 identifiable genes, not the 100,000 or more many researchers were expecting. The lower –and more humbling — number

  • means humans have just one-third more genes than a common species of worm.

The second stunner was

  • how much human genetic material — more than 90 percent — is made up of what scientists were calling “junk DNA.”

The term was coined to describe similar but not completely identical repetitive sequences of amino acids (the same substances that make genes), which appeared to have no function or purpose. The main theory at the time was that these apparently non-working sections of DNA were just evolutionary leftovers, much like our earlobes.

If biophysicist Andras Pellionisz is correct, genetic science may be on the verge of yielding its third — and by far biggest — surprise.

With a doctorate in physics, Pellionisz is the holder of Ph.D.’s in computer sciences and experimental biology from the prestigious Budapest Technical University and the Hungarian National Academy of Sciences. A biophysicist by training, the 59-year-old is a former research associate professor of physiology and biophysics at New York University, author of numerous papers in respected scientific journals and textbooks, a past winner of the prestigious Humboldt Prize for scientific research, a former consultant to NASA and holder of a patent on the world’s first artificial cerebellum, a technology that has already been integrated into research on advanced avionics systems. Because of his background, the Hungarian-born brain researcher might also become one of the first people to successfully launch a new company by using the Internet to gather momentum for a novel scientific idea.

The genes we know about today, Pellionisz says, can be thought of as something similar to machines that make bricks (proteins, in the case of genes), with certain junk-DNA sections providing a blueprint for the different ways those proteins are assembled. The notion that at least certain parts of junk DNA might have a purpose for example, many researchers now refer to with a far less derogatory term: introns.

In a provisional patent application filed July 31, Pellionisz claims to have unlocked a key to the hidden role junk DNA plays in growth — and in life itself. His patent application covers all attempts to count, measure and compare the fractal properties of introns for diagnostic and therapeutic purposes.

3B. The Hidden Fractal Language of Intron DNA

To fully understand Pellionisz’ idea, one must first know what a fractal is.

Fractals are a way that nature organizes matter. Fractal patterns can be found in anything that has a nonsmooth surface (unlike a billiard ball), such as coastal seashores, the branches of a tree or the contours of a neuron (a nerve cell in the brain). Some, but not all, fractals are self-similar and stop repeating their patterns at some stage; the branches of a tree, for example, can get only so small. Because they are geometric, meaning they have a shape, fractals can be described in mathematical terms. It’s similar to the way a circle can be described by using a number to represent its radius (the distance from its center to its outer edge). When that number is known, it’s possible to draw the circle it represents without ever having seen it before.

Although the math is much more complicated, the same is true of fractals. If one has the formula for a given fractal, it’s possible to use that formula

  • to construct, or reconstruct,
  • an image of whatever structure it represents,
  • no matter how complicated.

The mysteriously repetitive but not identical strands of genetic material are in reality building instructions organized in a special type

  • of pattern known as a fractal.  It’s this pattern of fractal instructions, he says, that
  • tells genes what they must do in order to form living tissue,
  • everything from the wings of a fly to the entire body of a full-grown human.

In a move sure to alienate some scientists, Pellionisz has chosen the unorthodox route of making his initial disclosures online on his own Web site. He picked that strategy, he says, because it is the fastest way he can document his claims and find scientific collaborators and investors. Most mainstream scientists usually blanch at such approaches, preferring more traditionally credible methods, such as publishing articles in peer-reviewed journals.

Basically, Pellionisz’ idea is that a fractal set of building instructions in the DNA plays a similar role in organizing life itself. Decode the way that language works, he says, and in theory it could be reverse engineered. Just as knowing the radius of a circle lets one create that circle, the more complicated fractal-based formula would allow us to understand how nature creates a heart or simpler structures, such as disease-fighting antibodies. At a minimum, we’d get a far better understanding of how nature gets that job done.

The complicated quality of the idea is helping encourage new collaborations across the boundaries that sometimes separate the increasingly intertwined disciplines of biology, mathematics and computer sciences.

Hal Plotkin, Special to SF Gate. Thursday, November 21, 2002.                 to SF Gate/plotkin.htm (1 of 10)2012.12.13. 12:11:58/


3C. multifractal analysis

The human genome: a multifractal analysis. Moreno PA, Vélez PE, Martínez E, et al.

BMC Genomics 2011, 12:506.

Background: Several studies have shown that genomes can be studied via a multifractal formalism. Recently, we used a multifractal approach to study the genetic information content of the Caenorhabditis elegans genome. Here we investigate the possibility that the human genome shows a similar behavior to that observed in the nematode.
Results: We report here multifractality in the human genome sequence. This behavior correlates strongly on the

  • presence of Alu elements and
  • to a lesser extent on CpG islands and (G+C) content.

In contrast, no or low relationship was found for LINE, MIR, MER, LTRs elements and DNA regions poor in genetic information.

  • Gene function,
  • cluster of orthologous genes,
  • metabolic pathways, and
  • exons tended to increase their frequencies with ranges of multifractality and
  • large gene families were located in genomic regions with varied multifractality.

Additionally, a multifractal map and classification for human chromosomes are proposed.


we propose a descriptive non-linear model for the structure of the human genome,

This model reveals

  • a multifractal regionalization where many regions coexist that are far from equilibrium and
  • this non-linear organization has significant molecular and medical genetic implications for understanding the role of
  • Alu elements in genome stability and structure of the human genome.

Given the role of Alu sequences in

  • gene regulation,
  • genetic diseases,
  • human genetic diversity,
  • adaptation
  • and phylogenetic analyses,

these quantifications are especially useful.

MiIP: The Monomer Identification and Isolation Program

Bun C, Ziccardi W, Doering J and Putonti C.Evolutionary Bioinformatics 2012:8 293-300.

Repetitive elements within genomic DNA are both functionally and evolutionarilly informative. Discovering these sequences ab initio is

  • computationally challenging, compounded by the fact that
  • sequence identity between repetitive elements can vary significantly.

Here we present a new application, the Monomer Identification and Isolation Program (MiIP), which provides functionality to both

  • search for a particular repeat as well as
  • discover repetitive elements within a larger genomic sequence.

To compare MiIP’s performance with other repeat detection tools, analysis was conducted for

  • synthetic sequences as well as
  • several a21-II clones and
  • HC21 BAC sequences.

The primary benefit of MiIP is the fact that it is a single tool capable of searching for both

  • known monomeric sequences as well as
  • discovering the occurrence of repeats ab initio, per the user’s required sensitivity of the search.

Methods for Examining Genomic and Proteomic Interactions

1. An Integrated Statistical Approach to Compare Transcriptomics Data Across Experiments: A Case Study on the Identification of Candidate Target Genes of the Transcription Factor PPARα

Ullah MO, Müller M and Hooiveld GJEJ. Bioinformatics and Biology Insights 2012:6 145–154. transcriptomic_Data_Across_Experiments-A-Case_Study_on_the_Identification_ of_Candidate_Target_Genes_of_the Transcription_Factor_PPARα/
Corresponding author email:

An effective strategy to elucidate the signal transduction cascades activated by a transcription factor is to compare the transcriptional profiles of wild type and transcription factor knockout models. Many statistical tests have been proposed for analyzing gene expression data, but most

  • tests are based on pair-wise comparisons. Since the analysis of microarrays involves the testing of multiple hypotheses within one study, it is
  • generally accepted that one should control for false positives by the false discovery rate (FDR). However, it has been reported that
  • this may be an inappropriate metric for comparing data across different experiments.

Here we propose an approach that addresses the above mentioned problem by the simultaneous testing and integration of the three hypotheses (contrasts) using the cell means ANOVA model.

These three contrasts test for the effect of

  • a treatment in wild type,
  • gene knockout, and
  • globally over all experimental groups.

We illustrate our approach on microarray experiments that focused on the identification of candidate target genes and biological processes governed by the fatty acid sensing transcription factor PPARα in liver. Compared to the often applied FDR based across experiment comparison, our approach identified a conservative but less noisy set of candidate genes with same sensitivity and specificity. However, our method had the advantage of

  • properly adjusting for multiple testing while
  • integrating data from two experiments, and
  • was driven by biological inference.

We present a simple, yet efficient strategy to compare

  • differential expression of genes across experiments
  • while controlling for multiple hypothesis testing.

2. Managing biological complexity across orthologs with a visual knowledgebase of documented biomolecular interactions

Vincent VanBuren & Hailin Chen.   Scientific Reports 2, Article number: 1011  Received 02 October 2012 Accepted 04 December 2012 Published 20 December 2012

The complexity of biomolecular interactions and influences is a major obstacle to their comprehension and elucidation. Visualizing knowledge of biomolecular interactions increases comprehension and facilitates the development of new hypotheses. The rapidly changing landscape of high-content experimental results also presents a challenge for the maintenance of comprehensive knowledgebases. Distributing the responsibility for maintenance of a knowledgebase to a community of subject matter experts is an effective strategy for large, complex and rapidly changing knowledgebases.
Cognoscente serves these needs by

  • building visualizations for queries of biomolecular interactions on demand,
  • by managing the complexity of those visualizations, and
  • by crowdsourcing to promote the incorporation of current knowledge from the literature.

Imputing functional associations between biomolecules and imputing directionality of regulation for those predictions each

  • require a corpus of existing knowledge as a framework to build upon. Comprehension of the complexity of this corpus of knowledge
  • will be facilitated by effective visualizations of the corresponding biomolecular interaction networks.

was designed and implemented to serve these roles as

  • a knowledgebase and
  • as an effective visualization tool for systems biology research and education.

Cognoscente currently contains over 413,000 documented interactions, with coverage across multiple species.  Perl, HTML, GraphViz1, and a MySQL database were used in the development of Cognoscente. Cognoscente was motivated by the need to

  • update the knowledgebase of biomolecular interactions at the user level, and
  • flexibly visualize multi-molecule query results for heterogeneous interaction types across different orthologs.

Satisfying these needs provides a strong foundation for developing new hypotheses about regulatory and metabolic pathway topologies.  Several existing tools provide functions that are similar to Cognoscente, so we selected several popular alternatives to

  • assess how their feature sets compare with Cognoscente ( Table 1 ). All databases assessed had
  • easily traceable documentation for each interaction, and
  • included protein-protein interactions in the database.

Most databases, with the exception of BIND,

  • provide an open-access database that can be downloaded as a whole.

Most databases, with the exceptions of EcoCyc and HPRD, provide

  • support for multiple organisms.

Most databases support web services for interacting with the database contents programatically, whereas this is a planned feature for Cognoscente.

  • INT, STRING, IntAct, EcoCyc, DIP and Cognoscente provide built-in visualizations of query results,
  • which we consider among the most important features for facilitating comprehension of query results.
  • BIND supports visualizations via Cytoscape. Cognoscente is among a few other tools that support multiple organisms in the same query,
  • protein->DNA interactions, and
  • multi-molecule queries.

Cognoscente has planned support for small molecule interactants (i.e. pharmacological agents).  MINT, STRING, and IntAct provide a prediction (i.e. score) of functional associations, whereas
Cognoscente does not currently support this. Cognoscente provides support for multiple edge encodings to visualize different types of interactions in the same display,

  • a crowdsourcing web portal that allows users to submit interactions
  • that are then automatically incorporated in the knowledgebase, and displays orthologs as compound nodes to provide clues about potential
  • orthologous interactions.

The main strengths of Cognoscente are that

  1. it provides a combined feature set that is superior to any existing database,
  2. it provides a unique visualization feature for orthologous molecules, and relatively unique support for
  3. multiple edge encodings,
  4. crowdsourcing, and
  5. connectivity parameterization.

The current weaknesses of Cognoscente relative to these other tools are

  • that it does not fully support web service interactions with the database,
  • it does not fully support small molecule interactants, and
  • it does not score interactions to predict functional associations.

Web services and support for small molecule interactants are currently under development.

Other related articles on thie Open Access Online Sceintific Journal, include the following:

Big Data in Genomic Medicine                    lhb                

BRCA1 a tumour suppressor in breast and ovarian cancer – functions in transcription, ubiquitination and DNA repair S Saha                                                                         

Computational Genomics Center: New Unification of Computational Technologies at Stanford A Lev-Ari

Paradigm Shift in Human Genomics – Predictive Biomarkers and Personalized Medicine – Part 1 ( A Lev-Ari

LEADERS in Genome Sequencing of Genetic Mutations for Therapeutic Drug Selection in Cancer Personalized Treatment: Part 2 A Lev-Ari

Personalized Medicine: An Institute Profile – Coriell Institute for Medical Research: Part 3 A Lev-Ari

GSK for Personalized Medicine using Cancer Drugs needs Alacris systems biology model to determine the in silico effect of the inhibitor in its “virtual clinical trial” A Lev-Ari

Recurrent somatic mutations in chromatin-remodeling and ubiquitin ligase complex genes in serous endometrial tumors S Saha

Human Variome Project: encyclopedic catalog of sequence variants indexed to the human genome sequence A Lev-Ari

Prostate Cancer Cells: Histone Deacetylase Inhibitors Induce Epithelial-to-Mesenchymal Transition sjwilliams

Directions for genomics in personalized medicine lhb

How mobile elements in “Junk” DNA promote cancer. Part 1: Transposon-mediated tumorigenesis. Sjwilliams

Mitochondrial fission and fusion: potential therapeutic targets? Ritu saxena

Mitochondrial mutation analysis might be “1-step” away ritu saxena

mRNA interference with cancer expression lhb

Expanding the Genetic Alphabet and linking the genome to the metabolome

Breast Cancer: Genomic profiling to predict Survival: Combination of Histopathology and Gene Expression Analysis A Lev-Ari

Ubiquinin-Proteosome pathway, autophagy, the mitochondrion, proteolysis and cell apoptosis lhb

Genomic Analysis: FLUIDIGM Technology in the Life Science and Agricultural Biotechnology A Lev-Ari

2013 Genomics: The Era Beyond the Sequencing Human Genome: Francis Collins, Craig Venter, Eric Lander, et al.

Paradigm Shift in Human Genomics – Predictive Biomarkers and Personalized Medicine – Part 1 Shift in Human Genomics_/

English: DNA replication or DNA synthesis is t...

English: DNA replication or DNA synthesis is the process of copying a double-stranded DNA molecule. This process is paramount to all life as we know it. (Photo credit: Wikipedia)

Français : Deletion chromosomique

Français : Deletion chromosomique (Photo credit: Wikipedia)

A slight mutation in the matched nucleotides c...

A slight mutation in the matched nucleotides can lead to chromosomal aberrations and unintentional genetic rearrangement. (Photo credit: Wikipedia)

Read Full Post »