Feeds:
Posts
Comments

Posts Tagged ‘RNA’

Reporter: Aviva Lev-Ari, PhD, RN

Press Release 16 January, 2013

Dr. Rotem Karni and PhD student Vered Ben Hur at the Institute for Medical Research Israel-Canada of the Hebrew University,

Dr. Rotem Karni and PhD student Vered Ben Hur at the Institute for Medical Research Israel-Canada of the Hebrew University,

Screen Shot 2021-07-19 at 7.20.33 PM

Word Cloud By Danielle Smolyar

Mechanism involved in breast cancer cell growth provides opening for early detection, treatment


Researchers at the Hebrew University Institute of Medical Research Israel-Canada have discovered a new mechanism by which breast cancer cells switch on their aggressive cancerous behavior. The discovery provides a valuable marker for the early diagnosis and follow-up treatment of malignant growths.

In normal cell reproduction, a process of RNA splicing takes place. RNA (ribonucleic acid) is a family of large biological molecules that performs multiple, vital roles in the coding, decoding, regulation and expression of genes. Cellular organisms use messenger RNA, called mRNA, to convey genetic information that directs synthesis of specific proteins.

RNA splicing is similar to the process of editing a movie. In this process, the information needed for the production of a mature protein is encoded in segments called exons (which like important movie scenes are needed in a specific sequence in order to understand the movie). In the splicing process, the non-coding segments of the RNA (unimportant scenes, called introns) are spliced from the pre-mRNA and the exons are joined together.

Alternative splicing is when a specific ”scene” (or exon) is either inserted or deleted from the movie (mRNA), thus changing its meaning. Over 90 percent of the genes in our genome undergo alternative splicing of one or more of their exons, and the resulting changes in the proteins encoded by these different mRNAs are required for normal function. In cancer, the normal
process of alternative splicing is altered, and ”bad” protein forms are generated that aid cancer cell proliferation and survival.

In a study published in the online edition of Cell Reports, conducted by Ph.D. student Vered Ben Hur in the lab of Dr. Rotem Karni at the Institute for Medical Research Israel-Canada of the Hebrew University, the researchers found that breast cancer cells change the alternative splicing of an important enzyme, called S6K1, which is a protein involved in the transmission of information into the cell.

The researchers found that when this happens, breast cancer cells start to produce shorter versions of this enzyme and that these shorter versions transmit signals ordering the cells to grow, proliferate, survive and invade other tissues. On the other hand, the researchers found that the long form of this protein acts as a tumor suppressor that protects normal cells from becoming cancerous.

There are several medical implications emanating from the research, say the researchers. One of them is the use of the newly discovered short forms of S6K1 as a diagnostic marker for the detection of breast cancer. Several new anticancer drugs, which have entered the clinic recently, can inhibit the cancerous activity of the short forms of S6K1. Thus, the detection of these new forms can predict the efficacy of these drugs to treat cancer patients.

These implications were recently submitted as a patent application by Yissum, the technology transfer company of the Hebrew University. Another future application will be to ”reverse” the alternative splicing of S6K1 in cancer cells back to the normal situation as a novel anti-cancer therapy. The research group of Dr. Karni is actively engaged in this effort.

SOURCE:

Read Full Post »

How Mobile Elements in “Junk” DNA Promote Cancer – Part 1: Transposon-mediated Tumorigenesis

Author, Writer and Curator: Stephen J. Williams, Ph.D.

How Mobile Elements in “Junk” DNA Promote Cancer – Part 1 Transposon-mediated Tumorigenesis

Word Cloud by Daniel Menzin

SOURCE

Landscape of Somatic Retrotransposition in Human Cancers. Science (2012); Vol. 337:967-971. (1)

Sequencing of the human genome via massive programs such as the Cancer Genome Atlas Program (CGAP) and the Encyclopedia of DNA Elements (ENCODE) consortium in conjunction with considerable bioinformatics efforts led by the National Center for Biotechnology Information (NCBI) have unlocked a myriad of yet unclassified genes (for good review see (2).  The project encompasses 32 institutions worldwide which, so far, have generated 1640 data sets, initially depending on microarray platforms but now moving to the more cost effective new sequencing technology.  Initially the ENCODE project focused on three types of cells: an immature white blood cell line GM12878, leukemic line K562, and an approved human embryonic cell line H1-hESC.  The analysis was rapidly expanded to another 140 cell types.  DNA sequencing had revealed 20,687 known coding regions with hints of 50 more coding regions.  Another 11,224 DNA stretches were classified as pseudogenes.  The ENCODE project reveals that many genes encode for an RNA, not protein product, so called regulatory RNAs.

However some of the most recent and interesting results focus on the noncoding regions of the human genome, previously discarded as uninteresting or “junk” DNA .  Only 2% of the human genome contains coding regions while 98% of this noncoding part of the genome is actually found to be highly active “with about 4 million constantly communicating switches” (3).  Some of these “switches” in the noncoding portion contain small, repetitive elements which are mobile throughout the genome, and can control gene expression and/or predispose to disease such as cancer.  These mobile elements, found in almost all organisms, are classified as transposable elements (TE), inserting themselves into far-reaching regions of the genome.  Retro-transposons are capable of generating new insertions through RNA intermediates.  These transposable elements are normally kept immobile by epigenetic mechanisms(4-6) however some TEs can escape epigenetic repression and insert in areas of the genome, a process described as insertional mutagenesis as the process can lead to gene alterations seen in disease(7).  In addition, this insertional mutagenesis can lead to the transformation of cells and, as described in Post 2, act as a model system to determine drivers of oncogenesis. This insertional mutagenesis is a different mechanism of genetic alteration and rearrangement seen in cancer like recombination and fusion of gene fragments as seen with the Philadelphia chromosome and BCR/ABL fusion protein (8).  The mechanism of transposition and putative effects leading to mutagenesis are described in the following figure:

Image

Figure.  Insertional mutagenesis based on transposon-mediated mechanism.  A) Basic structure of  transposon contains gene/sequence flanked by two inverted repeats (IR) and/or direct repeats (DR).  An enzyme, the transposase (red hexagon) binds and cuts at the IR/DR and transposon is pasted at another site in DNA, containing an insertion site.  B)   Multiple transpositions may results in oncogenic events by inserting in promoters leading to altered expression of genes driving oncogenesis or inserting within coding regions and inactivating tumor suppressors or activating oncogenes.  Deep sequencing of the resultant tumor genomes ( based on nested PCR from IR/DRs) may reveal common insertion sites (CIS) and oncogenic mutations could be identified.

In a bioinformatics study Eunjung Lee et al.(1), in collaboration with the Cancer Genome Atlas Research Network, the authors had analyzed 43 high-coverage whole-genome sequencing datasets from five cancer types to determine transposable element insertion sites.  Using a novel computational method, the authors had identified 194 high-confidence somatic TE insertion sites present in cancers of epithelial origin such as colorectal, prostate and ovarian, but not in brain or blood cancers.  Sixty four of the 194 detected somatic TE insertions were located within 62 annotated genes. Genes with TE insertion in colon cancers have commonly high mutation rates and enriched genes were associated with cell adhesion functions (CDH12, ROBO2,NRXN3, FPR2, COL1A1, NEGR1, NTM and CTNNA2) or tumor suppressor functions (NELL1m ROBO2, DBC1, and PARK2).  None of the somatic events were located within coding regions, with the TE sequences being detected in untranslated regions (UTR) or intronic regions.  Previous studies had shown insertion in these regions (UTR or intronic) can disrupts gene expression (9). Interestingly, most of the genes with insertion sites were down-regulated, suggested by a recent paper showing that local changes in methylation status of transposable elements can drive retro-transposition (10,11).  Indeed, the authors found that somatic insertions are biased toward the hypomethylated regions in cancer cell DNA.  The authors also confirmed that the insertion sites were unique to cancer and were somatic insertions, not germline (germline: arising during embryonic development) in origin by analyzing 44 normal genomes (41 normal blood samples from cancer patients and three healthy individuals).

The authors conclude:

“that some TE insertions provide a selective advantage during tumorigenesis,

rather than being merely passenger events that precede clonal expansion(1).”

The authors also suggest that more bioinformatics studies, which utilize the expansive genomic and epigenetic databases, could determine functional consequences of such transposable elements in cancerThe following Post will describe how use of transposon-mediated insertional mutagenesis is leading to discoveries of the drivers (main genetic events) leading to oncogenesis.

1.            Lee, E., Iskow, R., Yang, L., Gokcumen, O., Haseley, P., Luquette, L. J., 3rd, Lohr, J. G., Harris, C. C., Ding, L., Wilson, R. K., Wheeler, D. A., Gibbs, R. A., Kucherlapati, R., Lee, C., Kharchenko, P. V., and Park, P. J. (2012) Science 337, 967-971

2.            Pennisi, E. (2012) Science 337, 1159, 1161

3.            Park, A. (2012) Don’t Trash These Genes. “Junk DNA may lead to valuable cures. in Time, Time, Inc., New York, N.Y.

4.            Maksakova, I. A., Mager, D. L., and Reiss, D. (2008) Cellular and molecular life sciences : CMLS 65, 3329-3347

5.            Slotkin, R. K., and Martienssen, R. (2007) Nature reviews. Genetics 8, 272-285

6.            Yang, N., and Kazazian, H. H., Jr. (2006) Nature structural & molecular biology 13, 763-771

7.            Hancks, D. C., and Kazazian, H. H., Jr. (2012) Current opinion in genetics & development 22, 191-203

8.            Sattler, M., and Griffin, J. D. (2001) International journal of hematology 73, 278-291

9.            Han, J. S., Szak, S. T., and Boeke, J. D. (2004) Nature 429, 268-274

10.          Reichmann, J., Crichton, J. H., Madej, M. J., Taggart, M., Gautier, P., Garcia-Perez, J. L., Meehan, R. R., and Adams, I. R. (2012) PLoS computational biology 8, e1002486

11.          Byun, H. M., Heo, K., Mitchell, K. J., and Yang, A. S. (2012) Journal of biomedical science 19, 13

Other research paper on ENCODE and Cancer were published on this Scientific Web site as follows:

Expanding the Genetic Alphabet and linking the genome to the metabolome

Junk DNA codes for valuable miRNAs: non-coding DNA controls Diabetes

ENCODE Findings as Consortium

Reveals from ENCODE project will invite high synergistic collaborations to discover specific targets

ENCODE: the key to unlocking the secrets of complex genetic diseases

Impact of evolutionary selection on functional regions: The imprint of evolutionary selection on ENCODE regulatory elements is manifested between species and within human populations

Metabolite Identification Combining Genetic and Metabolic Information: Genetic association links unknown metabolites to functionally related genes

Advances in Separations Technology for the “OMICs” and Clarification of Therapeutic Targets

Commentary on Dr. Baker’s post “Junk DNA codes for valuable miRNAs: non-coding DNA controls Diabetes”

Cancer Genomics – Leading the Way by Cancer Genomics Program at UC Santa Cruz

Read Full Post »

 

Author and Curator: Ritu Saxena, Ph.D.

A recent post by Dr. Margaret Baker entitled “Junk DNA codes for valuable miRNAs: non-coding DNA controls Diabetes” talks about how the ENCODE project is revealing new insights into the functions of non-coding region of the human genome previously labeled as “junk DNA”. MicroRNA or miRNA, which as stated by Dr. Baker, “are among the non-gene encoding sequences in the genome and have been shown to play a major post-transcriptional role in expression of multiple genes.”

The post has touched upon several aspects of miRNA including origin, function, and mechanism of action. This commentary is an extension of Dr. Baker’s post, expanding upon the mechanism of action of miRNAs along with their role in potential disease therapy.

microRNA: Revisiting the past

MicroRNA were not discovered long back, infact, it was in 1998 when the presence of the non-coding RNAs that could be involved in switching ‘on’ and ‘off’ of certain genes. In the last decade, 2006 Nobel Prize for medicine or physiology was awarded to scientists Andrew Fire and Craig Mello for their discovery of this new role of RNA molecules.

A breakthrough research was published in the September 2010 issue of Nature journal, stating that mammalian microRNAs predominantly act by decreasing the levels of target mRNA. Mammalian microRNAs predominantly act to decrease target mRNA levels. miRNAs were initially thought to repress protein output without changes in the corresponding mRNA levels. Guo et al challenged the previous notion of ‘translational repression’ and concluded on the basis of their experimental results that ‘mRNA-destabilization’ scenario for the major part is responsible for the repression in protein expression via miRNAs. Authors utilized the method of ‘ribosome profiling’ to measure the overall effects of miRNA on protein production and then compared these to simultaneously measured effects on mRNA levels. Ribosome profiling prepares maps that exact positions of ribosomes on transcripts after nucleases chew upon the exposed part of transcripts that are not covered by ribosomes. MiR-1 and miR-155 were introduced into the HeLa-cell line. Both of these miRNAs are not  normally expressed in HeLa cells. Another miRNA used was mir-223 which is expressed in significant amounts in neutrophils. The reason for choosing the set of these miRNAs was that they had already been shown to repress protein levels via proteomics research. It was deciphered that miRNA-mediated repression was similar regardless of target expression level and further stated that “for both ectopic and endogenous miRNA regulatory interactions, lowered mRNA levels account for lowered mRNA levels accounted for most for most (>/=84%) of the decreased protein production.” These results show that changes in mRNA levels closely reflect the impact of miRNAs on gene expression and indicate that destabilization of target mRNAs is the predominant reason for reduced protein output.

Authors concluded that the discovery “will apply broadly to the vast majority of miRNA targeting interactions. If indeed general, this conclusion will be welcome news to biologists wanting to measure the ultimate impact of miRNAs on their direct regulatory targets.”

Since then and even before the paper was published, several other miRNAs and their roles have been discovered. Information on miRNAs has been consolidated in a database that can be accessed online at http://www.mirbase.org/

microRNA: From bench to bedside

Scientific community had speculated the role of non-coding RNAs in disease treatment right after their discovery. One such study demonstrating the utilization of microRNA for Cancer treatment was published in the September 2010 issue of the journal Nature Medicine. miR-380-5p represses p53 to control cellular survival and is associated with poor outcome inMYCN-amplified neuroblastoma

The p53 gene is known as a tumor suppressor gene and its inactivation has been associated in some cancers such as neuroblastoma. The study reported that microRNA-380 (miR-380) was able to repress the expression of p53 gene in cancer patients causing uninhibited cell survival and proliferation. The research group was able to decrease the tumor size in vivo in a mouse model of the neuroblastoma by delivering miR-380 antagonist. The researchers also observed that the inhibition of endogenous miR-380 in embryonic stem or neuroblastoma cells resulted in induction of p53, and extensive apoptotic cell death.

Thus, the success of miR antagonist for decreasing tumor size speaks of the effectiveness of miR as a potential therapeutic target for cancer treatment.

In conclusion, as stated by Dr. Baker in her post, “the miRNA data for tissues and specific cell types involved in disease pathology form a new approach to either detecting or possibly correcting gene (coding or non-coding) dysregulation. miRNA mimics and anti-miRNA agents are being developed as new therapeutic modalities.”

Reference:

Pharmaceutical Intelligence post, Author, Dr. Margaret Baker: Junk DNA codes for valuable miRNAs: non-coding DNA controls Diabetes

http://pharmaceuticalintelligence.com/2012/09/24/junk-dna-codes-for-valuable-mirnas/

 

Research articles: Mammalian microRNAs predominantly act to decrease target mRNA levels

miR-380-5p represses p53 to control cellular survival and is associated with poor outcome inMYCN-amplified neuroblastoma

Expert reviews- miRNA and Cancer treatment

 

News briefs: http://ygoy.com/2010/10/02/new-treatment-for-junk-dna-induced-cancers-discovered/

http://www.evolutionnews.org/2010/10/micrornas–once_dismissed_as_j038861.html

 

Read Full Post »

ENCODE data reveals important information from Genome Wide Association Studies relevant to understanding complex genetic diseases

Author: Ritu Saxena, Ph.D.

 

Introduction

“The depth, quality, and diversity of the ENCODE data are unprecedented” is what was stated by John Stamatoyannopoulos, professor of genomic sciences at the University of Washington and one of the many principle investigators of ENCODE project. ENCODE (Encyclopedia of DNA elements), indeed, was an ambitious project launched as a pilot in 2003 and then expanded in 2007 for the whole genome analysis and identification of all the functional elements of the human genome. The findings were striking as they challenged the definition of “gene” and ‘the central dogma of genetics (Gene-mRNA-protein). Infact, the non-coding part that constitutes about 80% of the genome or the so-called “junk DNA” was found to contain elements crucial for gene regulation. The elements, in large part, include RNA transcripts that are not transcribed into proteins but might have a regulatory role. For detailed reading, refer to the findings published in the issue of Nature, The ENCODE Project Consortium Nature 489, 57–74 (2012) An integrated encyclopedia of DNA elements in the human genome

Key features of the data, as explained in the National Human Genome Research Institute website (National Human Genome Research Institute News feature), include comprehensive mapping of:

  • Protein-coding genes — Proteins are molecules made of amino acids linked together in a specific sequence; the amino acid sequence is encoded by the sequence of DNA subunits called nucleotides that make up genes.
  • Non-coding genes — Stretches of DNA that are read by the cell as if they were genes but do not encode proteins. These appear to help regulate the activity of the genome.
  • Chromatin structure features — Complex physical structures made from a combination of DNA and binding proteins that make up the contents of the nucleus and affects genome function.
  • Histone modifications — Histones are the proteins that make up the chromatin structures that help shape and control the genome. In addition, histone proteins can be physically modified by adding chemical groups, such as a methyl molecule, that further regulates genomic activity.
  • DNA methylation — Just like histones, methyl groups can be added to DNA itself in a process called DNA methylation. Chemically attaching methyl groups to DNA physically changes the ability of enzymes to reach the DNA and thus alters the gene expression pattern in cells. Methylation helps cells “remember what they are doing” or alter levels of gene expression, and it is a crucial part of normal development and cellular differentiation in higher organisms.
  • Transcription factor binding sites — Transcription factors are proteins that bind to specific DNA sequences, controlling the flow (or transcription) of genetic information from DNA to mRNA. Mapping the binding sites can help researchers understand how genomic activity is controlled.

How could ENCODE be helpful in the study of complex human diseases?

Complex diseases and Genome wide association studies (GWAS)

Coronary artery disease, type 2 diabetes and many forms of cancer are complex human diseases that have a significant genetic component. Unlike mendelian disorders that have defined loci, the genetic component of complex disorders lies in the form of genetic variations in the genome making an individual susceptible to these complex diseases.

Researchers have performed Genome-wide association studies (GWAS) of the human genome, leading to the identification of thousands of DNA variants that could be linked with complex traits and diseases. However, identifying the variants, referred to as SNPs (Single Nucleotide Polymorphisms), that actually contribute to the disease, and understanding how they exert influence on a disease has been more of a mystery.

How would ENCODE solve the puzzle?

The puzzle lies in interpreting how the SNPs found in the genome affect a person’s susceptibility to a particular trait or disease and what is the mechanism behind it. As identified in the GWAS, most variants that are associated with the phenotype of the trait or disease lie in the non-coding region of the genome. Infact, in more than 400 studies compiled in the GWAS catalog only a small minority of the trait/disease-associated SNPs occur in protein-coding regions; the large majority (89%) are in noncoding regions. These variants fall in the gene deserts that lie far from protein-coding region, similar to those where cis-regulatory modules (CRMs) are found. CRMs such as promoters and enhancers are a group of binding sites for transcription factors, and the presence of transcription factors bound to these sites is a good indicator of the potential regulatory regions.

The integrative analysis of ENCODE data has give important insights to the results of GWAS studies. Investigators have employed ENCODE data as an initial guide to discover regulatory regions in which genetic variation is affecting a complex trait. Additionally, ENCODE study when examined the SNPs from GWAS that were associated with the phenotype of the trait, found that these regions are enriched in DNase-sensitive regions i.e, lie in the function-associated DNA region of the genome as it could be bound by transcription factors affecting the regulation of gene expression. Thus, the project demonstrates that non-coding regions must be considered when interpreting GWAS results, and it provides a strong motivation for reinterpreting previous GWAS findings.

Using ENCODE Data to Interpret GWAS Results

ENCODE and predisposition to CANCER:

C-Myc, a proto-oncogene, codes for a transcripton factor, when expressed constitutively leads to uninhibited cell proliferation resulting in cancer. It has been observed that common variants within a ~1 Mb region upstream of c-Myc gene have been associated with cancers of the colon, prostate, and breast. Several SNPs have been reported in this region, that although affect the phenotype, lie in the distal cis-region of the MYC gene. Alignment of the ENCODE data in this region with the significant variants from the GWAS also reveals that key variants are found in the transcription factor occupied DNA segments mapped by this consortium. One variant rs698327, lies within a DNase hypersensitive site that is bound by several transcription factors, enhancer-associated protein p300, and contains histone modifications relative to enhancers (high H3K4me1, low H3K4me3). ENCODE data indicates that non-coding regions in the human chromosome 8q24 loci are associated with cancer and as observed in the case of c-myc gene, similar studies on cancer-related genes could help explain predisposition to cancer.

ENCODE and fetal hemoglobin expression:

Another example of the use of ENCODE data is that of gene regulation of fetal hemoglobin. Several regions were predicted via ENCODE that were involved in the regulation of fetal hemoglobin. It was found that these predicted regions are close to the SNPs in the BLC11A gene that is associated with persistent expression of fetal hemoglobin.

Future perspective

As evident from the above examples, the ENCODE data shows that genetic variants do affect regulated expression of a target gene. Recently, several research groups in the UK performed a large-scale GWAS study to determine the genetic predisposition to fracture risk. The collaborative effort, published in a recent issue of the PLoS journal, was made to identify genetic variants associated with cortical bone thickness (CBT) and bone mineral density (BMD) with data from more than 10,000 subjects. http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002745 The study generated a wealth of data including the result – identification of SNPs in the WNT16 and its adjacent gene, FAM3C were found to be relevant to CBT and BMD. ENCODE data, in this case, could be helpful in interpreting more detailed information including determining additional SNPs, the regulatory information of the genes involved and much more. Thus, it could be concluded that ENCODE data could be immensely useful in interpreting associations between disease and DNA sequences that can vary from person to person.

Sources:

Research articles

An integrated encyclopedia of DNA elements in the human genome

A User’s Guide to the Encyclopedia of DNA Elements (ENCODE)

What does our genome encode?

Genome-wide Epigenetic Data Facilitate Understanding of Disease Susceptibility Association Studies

Genomics: ENCODE explained

ENCODE Project Writes Eulogy For Junk DNA

WNT16 Influences Bone Mineral Density, Cortical Bone Thickness, Bone Strength, and Osteoporotic Fracture Risk

 News articles

ENCODE project: In massive genome analysis new data suggests ‘gene’ redefinition

National Human Genome Research Institute News feature

Related posts

Expanding the Genetic Alphabet and linking the genome to the metabolome

Junk DNA codes for valuable miRNAs: non-coding DNA controls Diabetes

ENCODE Findings as Consortium

Read Full Post »

Author: Margaret Baker, PhD, Registered Patent Agent

The Encyclopedia of DNA Elements (ENCODE) Project was launched in September of 2003. In 2007 the ENCODE project was expanded to study the entire human genome, Genome-wide association studies or GWAS, and published a Nature paper entitled “An integrated encyclopedia of DNA elements in the human genome,” this month also all data are available at http://genome.ucsc.edu/ENCODE/.  Novel functional roles have been discovered for both transcribed and non-transcribed portions of DNA.  See several articles and commentary in Science 7 September 2012: Vol. 337 no. 6099 including Maurano et al. pp. 1190-1195  DOI: 10.1126/science.1222794b

For the first time, the 3-dimensional connections that cross the genome have been mapped as long-range looping interactions between functional elements and the genes controlled. These regions of the genome, formerly referred to as “junk DNA”, have the potential to be involved in disease initiation, pathophysiology, and complications. Further, epigenetic factors may be seen to play a more direct role in the expression or silencing of protein coding genes as DNase I hot spots, nucleosomal anchor points, and DNA methylation sites are added to the map.

Non-coding transcribed DNA includes a large percentage of sequences coding for RNA. In fact, RNA encoding genes number nearly equal to the protein encoding genes- 18,400 v 20,687 – and previously unknown non-coding RNA (ncRNA) have also been characterized.

Some of the known elements that were cataloged include:

  • cis elements – promoters, transcription factor binding sites;
  • gene contiguous non-coding stretches such as introns, polyA, and UTR, splice variants;
  • pseudogenes (11,224);
  • long range gene associated elements – enhancers, insulators, suppressors, and predicted promoter flanking regions;
  • ribosomal RNA genes; and
  • sequences for 7,052 small RNAs of which 85% are small nuclear(sn)RNA, small nucleolar(sno)RNA), transfer(t)RNA, and micro(mi)RNA.

What has been found is that distinct non-coding regions, including ncRNA, can be associated with distinct disease traits. miRNA are among the non-gene encoding sequences in the genome which have already been shown to play a major post-transcriptional role in expression of multiple genes..

Most miRNA genes are intergenic or oriented antisense to neighboring genes and therefore assumed to be controlled by independent promoter units. However, in some cases a microRNA gene is transcribed together with its target gene implying coupled regulation of miRNA and protein-coding gene. About one third of miRNA genes reside in polycistronic clusters. miRNA genes can occupy the introns of protein, non-protein coding genes, or nonprotein-coding transcripts. The promoters have been shown to have some similarities in their motifs to promoters of other genes transcribed by RNA polymerase II such as protein coding genes. The ENCODE project also noted that miRNA promoters were in chromatin regions of high promiscuity. There may be up to 1000 miRNA genes in the human genome. In addition, human miRNAs show RNA editing of sequences to yield products different from those encoded by their DNA.  miRNA are implicated in cellular roles as diverse as developmental timing in worms, cell death and fat metabolism in flies, haematopoiesis in mammals, and leaf development and floral patterning in plants

The final miRNA gene product is a ∼22 nt functional RNA molecule. The mature miRNA (designated miR-#) is processed from a characteristic stem–loop sequence (called a pre-mir), which in turn may be excised from a longer primary transcript (or pri-mir). It is processed by the same enzyme (DICER) that processes short hairpin RNA, forming interfering RNA, which provides and additional level of control.

MiRNA controls gene expression by binding to complementary regions of messenger transcripts in the 3’ untranslated region to repress their translation or regulate degradation. What makes the mechanism more powerful (or complicated) is the imperfect but specific binding motif associates with a large number of mRNAs in the 3’ untranslated region having the complimentary motif.  Conversely then, each mRNA can potentially associate with a number miRNA. Mature processed cytosolic miRNA can act in a manner akin to small interfering(si)RNA, and form the RNA-induced silencing complex (RISC) to block translation. Computational methods have been used to identify potential gene targets based on complimentarity between the miRNA and mRNA sequences.

Gerstein et al. explored the “Architecture of the human regulatory network derived from ENCODE data” Nature 489:91-100 (06 Sep 2012) focusing on the regulation of transcription factors (TF) and association between TF and miRNAs, miRNA and miRNA, protein-protein interactions, and protein phosphorylation. Not surprisingly, not all TF are the upstream factor in each network.

These new and remarkably detailed examinations of the different elements within and transcribed from the human genome perhaps do more to aid our knowledge of why we have stumbled in attempts to eradicate diseases, initially by focusing on a single gene or constellation of coding regions. The miRNA wikipedia is also being re-written on a daily basis and new disease associations made*.  As an example of a pathological state that may be linked to miRNA controlled elements, in vitro as well as in small population studies have examined miRNA species in diabetogenic conditions and patients with diabetes (Type I and Type II).

Diabetes and miRNA

In adult β-cell islets, miR-375 is low when glucose is freely available and low miR-375 induces insulin secretion. Interestingly, miR-375 is found only in brain and β-cells which share a secretion pathway.

Diabetic Complications

Organ specific miRNA have been identified in liver, skeletal muscle, kidney, vascular, and adipose tissue which are responsive to transient or sustained hyperglycemia.

miR-17-5p and miR-132 were reported to show significant differences between obese and non obese omental fat and were also abnormal in the blood of obese subjects.  Altered expression of miR-17-5p and miR-132 were found to correlate significantly with BMI, fasting blood glucose and glycosylated hemoglobin. (Kloting et al. PLoS ONE 4(3), e4699 (2009).

Clinical practice related to miRNA in diabetes may be possible as one group has identified eight miRNAs (miR-144, miR-146a, miR-150, miR-182, miR-192, miR-29a, miR-30d and miR-320) as potential ‘signature miRNAs’ that could distinguish prediabetic patients from those with overt T2D (Karolina DS, Armugam A, Tavintharan S et al. MicroRNA 144 impairs insulin signaling by inhibiting the expression of insulin receptor substrate 1 in Type 2 diabetes mellitus. PLoS ONE 6(8), e22839 (2011).

Due to the autoimmune component of T1D, the constellation of miRNA would be expected to be different: upregulation of miR-510 and underexpression of miR-191 and miR-342 were observed in the Tregs (regulatory T-cells) of T1D patients (Hezova R, Slaby O, Faltejskova P et al. microRNA-342, microRNA-191 and microRNA-510 are differentially expressed in T regulatory cells of Type 1 diabetic patients. Cell. Immunol. 260(2),70–74 (2010).

Taken together with the “physical” mapping of miRNA genes in the context of the 3-dimensional genome provided by the ENCODE studies and new understanding of potential concerted regulatory mechanisms, the miRNA data for tissues and specific cell types involved in disease pathology form a new approach to either detecting or possibly correcting gene (coding or non-coding) dysregulation.  miRNA mimics and anti-miRNA agents are being developed as new therapeutic modalities.

References

Bartel, DP et al. MicroRNAs: Genomics, Biogenesis, Mechanism, and Function” Cell 2004, 116:281-297.

Fernandez-Valverde, SL et al. MicroRNAs in beta-cell Biology, insulin resistance, diabetes and its complications. Diabetes July 2011 60 (7):1825-31.

Kantharidis, et al.  Diabetes Complications: The MicroRNA Perspective http://diabetes.diabetesjournals.org/content/60/7/1832.short

MEDSCAPE Review article: “miRNAs and Diabetes Mellitus: miRNAs in Diabetic Complicatons”  http://www.medscape.org/viewarticle/763729_6

*Based on initial studies in the worm C. elegans showing the temporal appearance of 21- and 22-nt RNAs during development, a family of highly conserved micro RNA sequences (miRNA) existing in invertebrates and vertebrates, were cataloged by Tuschl et al. at the Max-Planck-Institute and others (see Eddy, SR  Non-coding RNA genes and the modern RNA world Nature Reviews Genetics, 2:920-929, 2001). The sequence-specific post-transcriptional regulatory mechanisms mediated by these miRNAs have been associated with certain disease states such as cancer miR-21) and more specifically, lung cancer (miR-124) or breast cancer (miR-7, miR-21) and new species and function continue to be found (see http://www.mirbase.org/ ).

Read Full Post »

Curator: Aviva Lev-Ari, PhD, RN

Population Genetics

HAPAA: a tool for ancestral haploblock reconstruction. Specifically, given the genotype  (for instance, as derived by an Illumina genotyping array) of an individual of admixed ancestry, find the source population for each segment of the individual’s genome.

Protein Interaction Networks

A tool for aligning multiple global protein interaction networks; Graemlin also supports search for homology between a query module of proteins and a database of interaction networks.

Machine Learning

CONTRA: Conditionally trained models for sequence analysis. SeeCONTRAlign, a protein sequence aligner with very high accuracy, especially in twilight alignments. See CONTRAfold, an RNA secondary structure prediction tool. Stay tuned for more…

RNA Structure Prediction

CONTRAfold: Prediction of RNA secondary structure with a Conditional Log-Linear model that relies on automatically trained parameters, rather than on a physics-based energy model of RNA folding.

Protein Alignment

CONTRAlign: A protein sequence aligner that users can optionally train on feature sets such as secondary structure and solvent accessibility; see the CONTRA project above.
A protein multiple sequence aligner that exhibits high accuracy on popular benchmarks.
A protein multiple aligner that automatically finds domain structures of sequences with shuffled and repeated domain architectures.

Motif Finding

MotifCut: a non-parametric graph-based motif finding algorithm.
MotifScan: a non-parametric method for representing motifs and scanning DNA sequences for known motifs.
 CompareProspector: motif finding with Gibbs sampling & alignment.

Genomic Alignment

Stanford ENCODE: Multiple Alignments of 1% of the Human genome.
Typhon: BLAST-like sequence search to a multiple alignments database.
LAGAN: tools for genomic alignment. These include the MLAGAN multiple alignment tool, and Shuffle-LAGAN for alignment with rearrangements.

Microarray Analysis

Application of Independent Component Analysis (ICA) to microarrays.

Researchers Hope New Database Becomes Universal Cancer Genomics Tool

Swiss scientists hope that a new online database called “arrayMap” will bring cancer genomics to the desktop, laptop, and tablet computers of pathologists and researchers everywhere.

The database combines genomic information from three sources: large repositories such as the NCBI Gene Expression Omnibus (GEO) and Cancer Genome Atlas (CGA); journal literature; and submissions from individual investigators. It incorporates more than 42,000 genomic copy number arrays—normal and abnormal DNA comparisons—from 195 cancer types.

“arrayMap includes a wider range of human cancer copy number samples than any single repository,” said principal investigator Michael Baudis, M.D. Ease of access, visualization, and data manipulation, he added, are top priorities in its ongoing development.

A product of the University of Zurich Institute for Molecular Life Sciences, where Baudis researches bioinformatics and oncogenomics, arrayMap illustrates the importance of copy number abnormalities (CNA)—dysfunctional DNA gains or losses that visibly lengthen or shorten certain chromosomes—in the diagnosis, staging, and treatment of various malignancies.

“I have this particular tumor type—are there any CNAs in it that can tell me anything about prognosis or treatment?” said Michael Rossi, Ph.D., director of the Winship Cancer Institute cancer genomics program at the Emory University School of Medicine in Atlanta. “Data mining tools like arrayMap are incredibly useful to help answer such questions.”

arrayMap – genomic arrays for copy number profiling in human cancer

arrayMap is a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The arrayMap database provides an entry point for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. The current data reflects:

  • 42875 genomic copy number arrays
  • 634 experimental series
  • 256 array platforms
  • 197 ICD-O cancer entities
  • 480 publications (Pubmed entries)

For the majority of the samples, probe level visualization as well as customized data representation facilitate gene level and genome wide data review. Results from multi-case selections can be connected to downstream data analysis and visualization tools, as we provide through our Progenetix project.

arrayMap is developed by the group “Theoretical Cytogenetics and Oncogenomics” at the Institute of Molecular Life Sciences of the University of Zurich.

These tools were developed for our research projects. You are welcome to try them out, but there is only sparse documentation. If more support and/or custom analysis is needed, please contact Michael Baudis regarding a collaborative project.

MIT: A New Approach Uses Compression to Speed Up Genome Analysis

Public-Domain Computing Resources

Structural Bioinformatics

The BetaWrap program detects the right-handed parallel beta-helix super-secondary structural motif in primary amino acid sequences by using beta-strand interactions learned from non-beta-helix structures.
Wrap-and-pack detects beta-trefoils in protein sequences by using both pairwise beta-strand interactions and 3-D energetic packing information
The BetaWrapPro program predicts right-handed beta-helices and beta-trefoils by using both sequence profiles and pairwise beta-strand interactions, and returns coordinates for the structure.
The MSARi program indentifies conserved RNA secondary structure in non-coding RNA genes and mRNAs by searching multiple sequence alignments of a large set of candidate catalogs for correlated arrangements of reverse-complementary regions
The Paircoil2 program predicts coiled-coil domains in protein sequences by using pairwise residue correlations obtained from a coiled-coil database. The original Paircoil program is still available for use.
The MultiCoil program predicts the location of coiled-coil regions in amino acid sequences and classifies the predictions as dimeric or trimeric. An updated version, Multicoil2, will soon be available.
The LearnCoil Histidase Kinase program uses an iterative learning algorithm to detect possible coiled-coil domains in histidase kinase receptors.
The LearnCoil-VMF program uses an iterative learning algorithm to detect coiled-coil-like regions in viral membrane-fusion proteins.
The Trilogy program discovers novel sequence-structure patterns in proteins by exhaustively searching through three-residue motifs using both sequence and structure information.
The ChainTweak program efficiently samples from the neighborhood of a given base configuration by iteratively modifying a conformation using a dihedral angle representation.
The TreePack program uses a tree-decomposition based algorithm to solve the side-chain packing problem more efficiently. This algorithm is more efficient than SCWRL 3.0 while maintaining the same level of accuracy.
PartiFold: Ensemble prediction of transmembrane protein structures. Using statistical mechanics principles, partiFold computes residue contact probabilities and sample super-secondary structures from sequence only.
tFolder: Prediction of beta sheet folding pathways. Predict a coarse grained representation of the folding pathway of beta sheet proteins in a couple of minutes.
RNAmutants: Algorithms for exploring the RNA mutational landscape.Predict the effect of mutations on structures and reciprocally the influence of structures on mutations. A tool for molecular evolution studies and RNA design.
AmyloidMutants is a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability.

Genomics

GLASS aligns large orthologous genomic regions using an iterative global alignment system. Rosetta identifies genes based on conservation of exonic features in sequences aligned by GLASS.
RNAiCut – Automated Detection of Significant Genes from Functional Genomic Screens.
MinoTar – Predict microRNA Targets in Coding Sequence.

Systems Biology

The Struct2Net program predicts protein-protein interactions (PPI) by integrating structure-based information with other functional annotations, e.g. GO, co-expression and co-localization etc. The structure-based protein interaction prediction is conducted using a protein threading server RAPTOR plus logistic regression.
IsoRank is an algorithm for global alignment of multiple protein-protein interaction (PPI) networks. The intuition is that a protein in one PPI network is a good match for a protein in another network if the former’s neighbors are good matches for the latter’s neighbors.

Other

t-sample is an online algorithm for time-series experiments that allows an experimenter to determine which biological samples should be hybridized to arrays to recover expression profiles within a given error bound.

http://people.csail.mit.edu/bab/computing_new.html#systems

Compressive genomics

http://www.nature.com/nbt/journal/v30/n7/abs/nbt.2241.html

Nature Biotechnology 30, 627–630 (2012) doi:10.1038/nbt.2241

Published online 10 July 2012

STANFORD UNIVERSITY: Resources

BMIR is committed to the development of research tools as part of its goal to provide reusable, computational building blocks to facilitate the development of a vast array of systems. Some of these resources are described below.

Resources

The National Center for Biomedical Ontology (NCBO)

NCBO

The National Center for Biomedical Ontology is a consortium of leading biologists, clinicians, informaticians, and ontologists who develop innovative technology and methods that allow scientists to create, disseminate, and manage biomedical information and knowledge in machine-processable form.

visit site

Protégé

Protege Logo

Protégé is a free, open-source platform that provides its community of more than 80,000 users with a suite of tools to construct domain models and knowledge-based applications with ontologies.

visit site

PharmGKB

PharmGKB

PharmGKB curates information that establishes knowledge about the relationships among drugs, diseases and genes, including their variations and gene products. Our mission is to catalyze pharmacogenomics research.

visit site

Simbios

Simbios Logo

About Simbios

Simbios, the National NIH Center for Physics-based Simulation of Biological Structures is devoted to helping biomedical researchers understand biological form and function. It provides infrastructure, software, and training to assist users as they create novel drugs, synthetic tissues, medical devices, and surgical interventions.

Simbios scientists investigate structure-function studies on a wide scale of biology – from molecules to organisms, and are currently focusing on challenging biological problems in RNA folding, myosin dynamics, neuromuscular biomechanics and cardiovascular dynamics.

visit site

Stanford BioMedical Informatics Research (BMIR) – Publications by Project

There are 8 publications for the project “Genomic Nosology for Medicine (GNOMED)”.

BMIR-2009-1362
Identifying compartment-specific non-HLA targets after renal transplantation by integrating transcriptome and ‘‘antibodyome’’ measures
L. Li, P. Wadia, M. Sarwal, N. Kambham, T. Sigdel, D. B. Miklos, R. Chen, M. Naesens, A. J. Butte
PNAS, 106, 11, 4148-4153. Published in 2009
BMIR-2008-1338
Using SNOMED-CT For Translational Genomics Data Integration
J. Dudley, D. P. Chen, A. J. Butte
Ronald Cornet, Kent Spackman (eds.): Representing and sharing knowledge using SNOMED. Proceedings of the 3rd International Conference on Knowledge Rep, Pheonix (AZ), USA, CEUR Workshop Proceedings, ISSN 1613-0073, online CEUR-WS.org/Vol-410/, 91-96. Published in 2008
BMIR-2008-1303
The Ultimate Model Organism
A. J. Butte
Science, 320, 5874, 325-327. Published in 2008
BMIR-2008-1293
Novel Integration of Hopsital Electronic Medical Records and Gene Expression Measurements to Identify Genetic Markers of Maturation
D. P. Chen, S. C. Weber, P. S. Constantinou, T. A. Ferris, H. J. Lowe, A. J. Butte
Pacific Symposium on Biocomputing, Big Island, Hawaii, 13, 243-254. Published in 2008
BMIR-2008-1292
Enabling Integrative Genomic Analysis of High-Impact Human Diseases through Text Mining
J. Dudley, A. J. Butte
Pacific Symposium on Biocomputing, Big Island, Hawaii, 13, 580-591. Published in 2008
BMIR-2007-1297
Methodologies for Extracting Functional Pharmacogenomic Experiments from International Repository
Y. Lin, A. P. Chiang, P. Yao, R. Chen, A. J. Butte, R. S. Lin
AMIA Annual Symposium, Chicago, IL, 463-467. Published in 2007
BMIR-2007-1296
Clinical Arrays of Laboratory Measures, or “Clinarrays”, Built from an Electronic Health Record Enable Disease Subtyping by Severity
D. P. Chen, S. C. Weber, P. S. Constantinou, T. A. Ferris, H. J. Lowe, A. J. Butte
AMIA Annual Symposium, Chicago, IL, 115-119. Published in 2007
BMIR-2006-1232
Finding Disease-Related Genomic Experiments Within an International Repository: First Steps in Translational Bioinformatics
A. J. Butte, R. Chen
Annual Symposium of the American Medical Informatics Association, Washington, D.C., 106-10. Published in 2006
http://bmir.stanford.edu/publications/project.php/genomic_nosology_for_medicine_gnomed

Featured Publications

BMIR-2011-1468
The National Center for Biomedical Ontology
M. A. Musen, N. F. Noy, C. G. Chute, M. A. Storey, B. Smith, N. H. Shah
. Published in 2011
BMIR-2009-1378
Prototyping a Biomedical Ontology Recommender Service
C. Jonquet, N. H. Shah, M. A. Musen
Bio-Ontologies: Knowledge in Biology, SIG, ISMB ECCB 2009, Stockholm, Sweden. Published in 2009
BMIR-2009-1376
Translational bioinformatics applications in genome medicine
A. J. Butte
Genome Medicine, 1, 6, 64. Published in 2009
BMIR-2009-1362
Identifying compartment-specific non-HLA targets after renal transplantation by integrating transcriptome and ‘‘antibodyome’’ measures
L. Li, P. Wadia, M. Sarwal, N. Kambham, T. Sigdel, D. B. Miklos, R. Chen, M. Naesens, A. J. Butte
PNAS, 106, 11, 4148-4153. Published in 2009
BMIR-2009-1361
Technology for Building Intelligent Systems: From Psychology to Engineering
M. A. Musen
Modeling Complex Systems, Bill Shuart, Will Spaulding and Jeffrey Poland, U Nebraska P, Lincoln, Nebraska, Vol 52 of the Nebraska Symposium on Motivation, 145-184. Published in 2009
BMIR-2009-1358
Software-Engineering Challenges of Building and Deploying Reusable Problem Solvers
M. J. O’Connor, C. I. Nyulas, A. Okhmatovskaia, D. Buckeridge, S. W. Tu, M. A. Musen
Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 24, 3. Published in 2009
BMIR-2009-1355
Data-Driven Methods to Discover Molecular Determinants of Serious Adverse Drug Events
A. P. Chiang, A. J. Butte
Clinical Pharmacology and Therapeutics, 28 January 2009, Advance online publication, doi:10.1038/clpt.2008.274. Published in 2009
BMIR-2009-1318
Knowledge-Data Integration for Temporal Reasoning in a Clinical Trial System
M. J. O’Connor, R. D. Shankar, D. B. Parrish, A. K. Das
International Journal of Medical Informatics, 78, Suppl. 1, S77-S85. Published in 2009
BMIR-2008-1353
GeneChaser: Identifying all biological and clinical conditions in which genes of interest are differentially expressed
R. Chen, R. Mallelwar, A. Thosar, S. Venkatasubrahmanyam, A. J. Butte
BMC Bioinformatics, 9, 1, 548. (doi:10.1186/1471-2105-9-548). Published in 2008
BMIR-2008-1346
FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease
R. Chen, A. A. Morgan, J. Dudley, A. M. Deshpande, L. Li, K. Kodama, A. P. Chiang, A. J. Butte
Genome Biology, 9, 12, R170 (doi:10.1186/gb-2008-9-12-r170). Published in 2008
BMIR-2008-1341
Translational Bioinformatics: Coming of Age
A. J. Butte
Journal of the American Medical Informatics Association, JAMIA, 15, 6, 709-14. Published in 2008
BMIR-2008-1329
An Ontology-Driven Framework for Deploying JADE Agent Systems
C. I. Nyulas, M. J. O’Connor, S. W. Tu, A. Okhmatovskaia, D. Buckeridge, M. A. Musen
IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Sydney, Australia, 2, 573-577. Published in 2008
BMIR-2008-1322
Understanding Detection Performance in Public Health Surveillance: Modeling Aberrancy-Detection Algorithms
D. Buckeridge, A. Okhmatovskaia, S. W. Tu, C. I. Nyulas, M. J. O’Connor, M. A. Musen
Journal of the American Medical Informatics Association, 15, 6, 760-769. Published in 2008
BMIR-2008-1319
Network Analysis of Intrinsic Functional Brain Connectivity in Alzheimer’s Disease
K. S. Supekar, V. Menon, M. A. Musen, D. L. Rubin, M. Greicius
Public Library of Science-Computational Biology., PLoS Computational Biology, June 2008. Published in 2008
BMIR-2008-1315
Medical Imaging on the Semantic Web: Annotation and Image Markup
D. L. Rubin, P. Mongkolwat, V. Kleper, K. S. Supekar, D. S. Channin
AAAI Spring Symposium Series, Semantic Scientific Knowledge Integration, Stanford. Published in 2008
BMIR-2008-1303
The Ultimate Model Organism
A. J. Butte
Science, 320, 5874, 325-327. Published in 2008
BMIR-2008-1298
BioPortal: A Web Portal to Biomedical Ontologies
D. L. Rubin, D. de Abreu Moreira, P. P. Kanjamala, M. A. Musen
AAAI Spring Symposium Series, Symbiotic Relationships between Semantic Web and Knowledge Engineering, Stanford University, (in press). Published in 2008
BMIR-2007-1295
AILUN: reannotating gene expression data automatically
R. Chen, L. Li, A. J. Butte
Nature Methods, 4, 11, 879. Published in 2007
BMIR-2007-1281
Evaluation and Integration of 49 Genome-wide Experiments and the Prediction of Previously Unknown Obesity-related Genes
S. B. English, A. J. Butte
Bioinformatics, Epub. Published in 2007
BMIR-2007-1261
Protege: A Tool for Managing and Using Terminology in Radiology Applications
D. L. Rubin, N. F. Noy, M. A. Musen
Journal of Digital Imaging, J Digit Imaging. Published in 2007
BMIR-2007-1244
Efficiently Querying Relational Databases using OWL and SWRL
M. J. O’Connor, R. D. Shankar, S. W. Tu, C. I. Nyulas, A. K. Das, M. A. Musen
The First International Conference on Web Reasoning and Rule Systems, Innsbruck, Austria, Springer, LNCS 4524, 361-363. Published in 2007
BMIR-2006-1090
Creation and implications of a phenome-genome network
A. J. Butte, I. S. Kohane
Nature Biotechnology, 24, 1, 55 – 62. Published in 2006
http://bmir.stanford.edu/publications/

NATIONAL CENTERS FOR BIOMEDICAL COMPUTING

SimBioS
National Center for Simulation of Biological Structures (SimBioS) at Stanford University

MAGNet
National Center for the Multiscale Analysis of Genomic and Cellular Networks (MAGNet) at Columbia University

NA-MIC Logo
National Alliance for Medical Image Computing (NA-MIC) at Brigham and Women’s Hospital, Boston, MA

I2B2
Integrating Biology and the Bedside (I2B2) at Brigham and Women’s Hospital, Boston, MA

NCBO
National Center for Biomedical Ontology (NCBO) at Stanford University

IDASH
Integrate Data for Analysis, Anonymization, and Sharing (IDASH) at the University of California, San Diego

http://www.ncbcs.org/

 

Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

A New Approach Uses Compression to Speed Up Genome Analysis

Public-Domain Computing Resources

Structural Bioinformatics

The BetaWrap program detects the right-handed parallel beta-helix super-secondary structural motif in primary amino acid sequences by using beta-strand interactions learned from non-beta-helix structures.
Wrap-and-pack detects beta-trefoils in protein sequences by using both pairwise beta-strand interactions and 3-D energetic packing information
The BetaWrapPro program predicts right-handed beta-helices and beta-trefoils by using both sequence profiles and pairwise beta-strand interactions, and returns coordinates for the structure.
The MSARi program indentifies conserved RNA secondary structure in non-coding RNA genes and mRNAs by searching multiple sequence alignments of a large set of candidate catalogs for correlated arrangements of reverse-complementary regions
The Paircoil2 program predicts coiled-coil domains in protein sequences by using pairwise residue correlations obtained from a coiled-coil database. The original Paircoil program is still available for use.
The MultiCoil program predicts the location of coiled-coil regions in amino acid sequences and classifies the predictions as dimeric or trimeric. An updated version, Multicoil2, will soon be available.
The LearnCoil Histidase Kinase program uses an iterative learning algorithm to detect possible coiled-coil domains in histidase kinase receptors.
The LearnCoil-VMF program uses an iterative learning algorithm to detect coiled-coil-like regions in viral membrane-fusion proteins.
The Trilogy program discovers novel sequence-structure patterns in proteins by exhaustively searching through three-residue motifs using both sequence and structure information.
The ChainTweak program efficiently samples from the neighborhood of a given base configuration by iteratively modifying a conformation using a dihedral angle representation.
The TreePack program uses a tree-decomposition based algorithm to solve the side-chain packing problem more efficiently. This algorithm is more efficient than SCWRL 3.0 while maintaining the same level of accuracy.
PartiFold: Ensemble prediction of transmembrane protein structures. Using statistical mechanics principles, partiFold computes residue contact probabilities and sample super-secondary structures from sequence only.
tFolder: Prediction of beta sheet folding pathways. Predict a coarse grained representation of the folding pathway of beta sheet proteins in a couple of minutes.
RNAmutants: Algorithms for exploring the RNA mutational landscape.Predict the effect of mutations on structures and reciprocally the influence of structures on mutations. A tool for molecular evolution studies and RNA design.
AmyloidMutants is a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability.

Genomics

GLASS aligns large orthologous genomic regions using an iterative global alignment system. Rosetta identifies genes based on conservation of exonic features in sequences aligned by GLASS.
RNAiCut – Automated Detection of Significant Genes from Functional Genomic Screens.
MinoTar – Predict microRNA Targets in Coding Sequence.

Systems Biology

The Struct2Net program predicts protein-protein interactions (PPI) by integrating structure-based information with other functional annotations, e.g. GO, co-expression and co-localization etc. The structure-based protein interaction prediction is conducted using a protein threading server RAPTOR plus logistic regression.
IsoRank is an algorithm for global alignment of multiple protein-protein interaction (PPI) networks. The intuition is that a protein in one PPI network is a good match for a protein in another network if the former’s neighbors are good matches for the latter’s neighbors.

Other

t-sample is an online algorithm for time-series experiments that allows an experimenter to determine which biological samples should be hybridized to arrays to recover expression profiles within a given error bound.

http://people.csail.mit.edu/bab/computing_new.html#systems

Compressive genomics

http://www.nature.com/nbt/journal/v30/n7/abs/nbt.2241.html

Nature Biotechnology 30, 627–630 (2012) doi:10.1038/nbt.2241

Published online 10 July 2012
Algorithms that compute directly on compressed genomic data allow analyses to keep pace with data generation.

Figures at a glance

Introduction

In the past two decades, genomic sequencing capabilities have increased exponentially123, outstripping advances in computing power45678. Extracting new insights from the data sets currently being generated will require not only faster computers, but also smarter algorithms. However, most genomes currently sequenced are highly similar to ones already collected9; thus, the amount of new sequence information is growing much more slowly.
Here we show that this redundancy can be exploited by compressing sequence data in such a way as to allow direct computation on the compressed data using methods we term ‘compressive’ algorithms. This approach reduces the task of computing on many similar genomes to only slightly more than that of operating on just one. Moreover, its relative advantage over existing algorithms will grow with the accumulation of genomic data. We demonstrate this approach by implementing compressive versions of both the Basic Local Alignment Search Tool (BLAST)10 and the BLAST-Like Alignment Tool (BLAT)11, and we emphasize how compressive genomics will enable biologists to keep pace with current data.

Conclusions

Compressive algorithms for genomics have the great advantage of becoming proportionately faster with the size of the available data. Although the compression schemes for BLAST and BLAT that we presented yield an increase in computational speed and, more importantly, in scaling, they are only a first step. Many enhancements of our proof-of-concept implementations are possible; for example, hierarchical compression structures, which respect the phylogeny underlying a set of sequences, may yield additional long-term performance gains. Moreover, analyses of such compressive structures will lead to insights as well. As sequencing technologies continue to improve, the compressive genomic paradigm will become critical to fully realizing the potential of large-scale genomics.Software is available at http://cast.csail.mit.edu/.
References
  1. Lander, E.S. et alNature 409, 860–921 (2001).
  2. Venter, J.C. et alScience 291, 1304–1351 (2001).
  3. Kircher, M. & Kelso, J. Bioessays 32, 524–536 (2010).
  4. Kahn, S.D. Science 331, 728–729 (2011).
  5. Gross, M. Curr. Biol. 21, R204–R206 (2011).
  6. Huttenhower, C. & Hofmann, O. PLoS Comput. Biol. 6, e1000779 (2010).
  7. Schatz, M., Langmead, B. & Salzberg, S. Nat. Biotechnol. 28, 691–693 (2010).
  8. 1000 Genomes Project data available on Amazon Cloud. NIH press release, 29 March 2012.
  9. Stratton, M. Nat. Biotechnol. 26, 65–66 (2008).
  10. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. J. Mol. Biol. 215, 403–410 (1990).
  11. Kent, W.J. Genome Res. 12, 656–664 (2002).
  12. Grumbach, S. & Tahi, F. J. Inf. Process. Manag. 30, 875–886 (1994).
  13. Chen, X., Li, M., Ma, B. & Tromp, J. Bioinformatics 18, 1696–1698 (2002).
  14. Christley, S., Lu, Y., Li, C. & Xie, X. Bioinformatics 25, 274–275 (2009).
  15. Brandon, M.C., Wallace, D.C. & Baldi, P. Bioinformatics 25, 1731–1738 (2009).
  16. Mäkinen, V., Navarro, G., Sirén, J. & Välimäki, N. in Research in Computational Molecular Biology, vol. 5541 of Lecture Notes in Computer Science (Batzoglou, S., ed.) 121–137 (Springer Berlin/Heidelberg, 2009).
  17. Kozanitis, C., Saunders, C., Kruglyak, S., Bafna, V. & Varghese, G. in Research in Computational Molecular Biology, vol. 6044 of Lecture Notes in Computer Science (Berger, B., ed.) 310–324 (Springer Berlin/Heidelberg, 2010).
  18. Hsi-Yang Fritz, M., Leinonen, R., Cochrane, G. & Birney, E. Genome Res. 21, 734–740 (2011).
  19. Mäkinen, V., Navarro, G., Sirén, J. & Välimäki, N. J. Comput. Biol. 17, 281–308 (2010).
  20. Deorowicz, S. & Grabowski, S. Bioinformatics 27, 2979–2986 (2011).
  21. Li, H., Ruan, J. & Durbin, R. Genome Res. 18, 1851–1858 (2008).
  22. Li, H. & Durbin, R. Bioinformatics 25, 1754–1760 (2009).
  23. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. Genome Biol. 10, R25 (2009).
  24. Carter, D.M. Saccharomyces genome resequencing project. Wellcome Trust Sanger Institute http://www.sanger.ac.uk/Teams/Team118/sgrp/ (2005).
  25. Tweedie, S. et alNucleic Acids Res. 37, D555–D559 (2009).

Primary authors

  1. P.-R.L. and M.B. contributed equally to this work.
    • Po-Ru Loh &
    • Michael Baym

Affiliations

  1. Po-Ru Loh, Michael Baym and Bonnie Berger are in the Department of Mathematics and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
  2. Michael Baym is also in the Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

September 2012

Compressing a dataset with specialized algorithms is typically done in the context of data storage, where compression tools can shrink data to save space on a hard drive. But a group of researchers at MIT has developed tools that compute directly on compressed genomic datasets by exploiting the fact that most sequenced genomes are very similar to previously sequenced genomes.

 Speed Up Genome Analysis

by exploiting the fact that most sequenced genomes are very similar to previously sequenced genomes.

Led by MIT professor Bonnie Berger, the group has recently released tools called CaBlast and CaBlat, compressive versions of the widely used Blast and Blat alignment tools, respectively.

In a Nature Biotechnology paper published in July, Berger and her colleagues describe how the algorithms deliver alignment and analysis results up to four times faster than Blast and Blat when searching for a particular sequence in 36 yeast genomes.

“What we demonstrate is that the more highly similar genomes there are in a database, the greater the relative speed of CaBlast and CaBlat compared to the original non-compressive versions,” Berger says. “As we increase the number of genomes, the amount of work required for compressive algorithms scales only linearly in the amount of non-redundant data. The idea is that we’ve already done most of the work on the first genome.”

These two algorithms are still in the beta phase, and the MIT team has several refinements planned for future release to optimize performance. To that end, Berger has made the code for both algorithms available with the hope that developers will help them build “industrial-strength” software that can be used by the research community.

“To achieve optimal performance in real-use cases, we expect the code will need to be tuned for the engineering trade-offs specific to the application at hand,” she says. “The algorithm used to find and compress similar sequences in the database may need to be tweaked to take this issue into account, and the coarse- and fine-search steps should be aware of these constraints as well.”

While computing resources are becoming increasingly powerful, Berger contends that better algorithms and the use of compression technology will play a crucial role in helping researchers to keep up with the production of next-generation sequencing data.

Matthew Dublin is a senior writer at Genome Technology.

Read Full Post »

Curated by: Dr. Venkat S. Karra, Ph.D

Auguste Deter. Alois Alzheimer's patient in No...

Nuerodegenertive disease – Alzheimer’s – is presumed to be caused by the accumulation of β-amyloid.

The diagnosis of Alzheimer’s disease focuses on

β-amyloid protein and

tau protein

Though much attention is on radiolabeled markers, imaging βamyloid is problematic because many cognitively normal elderly have large amounts of β-amyloid in their brain, and appear as “positives” in the imaging tests.

At the same time therapeutic approaches for Alzheimer’s disease have not been focused much on the process of producing a neurofibrillary tangle composed on tau protein.

Various brain sections showing tau protein

Various brain sections showing tau protein (Photo credit: WBUR)

Now the BUSM researchers identified a new group of proteins, termed RNA-binding proteins, which accumulate in the brains of patients with Alzheimer’s disease, and are present at much lower levels in subjects who are cognitively intact.

The researchers believe this work opens up novel approaches to diagnose and stage the likelihood of progression by quantifying the levels of these RNA-binding protein biomarkers that accumulate in the brains of Alzheimer patients.

The group found two different proteins, both of which show striking patterns of accumulation. “Proteins such as TIA-1 and TTP, accumulate in neurons that accumulate tau protein, and co-localize with neurofibrillary tangles. These proteins also bind to tau, and so might participate in the disease process,” explained senior author Benjamin Wolozin, MD, PhD, a professor in the departments of pharmacology and neurology at BUSM.

“A different RNA binding protein, G3BP, accumulates primarily in neurons that do not accumulate pathological tau protein.

This observation is striking because it shows that neurons lacking tau aggregates (and neurofibrillary tangles) are also affected by the disease process,” he added.

Wolozin’s group also pursued the observation that some of the RNA binding proteins bind to tau protein, and tested whether one of these proteins, TIA-1, might contribute to the disease process.

‘Stress’ induced aggregation of RNA-binding proteins

Previously, scientists like Tara Vanderweyde et. al., have demonstrated that TIA-1 spontaneously aggregates in response to stress as a normal part of the stress response. They examined the relationship between Stress Granules (SGs) and neuropathology in brain tissue from P301L Tau transgenic mice, as well as in cases of Alzheimer’s disease and FTDP-17.

Stress Granules (SGs) are ‘Stress’ induced aggregation of RNA-binding proteins.

The pattern of SG pathology differed dramatically based on the RNA-binding protein examined. SGs positive for T-cell intracellular antigen-1 (TIA-1) or tristetraprolin (TTP) initially did not co-localize with tau pathology, but then merge with tau inclusions as disease severity increases. In contrast, G3BP (ras GAP-binding protein) identifies a novel type of molecular pathology that shows increasing accumulation in neurons with increasing disease severity, but often is not associated with classic markers of tau pathology. TIA-1 and TTP both bind phospho-tau, and TIA-1 overexpression induces formation of inclusions containing phospho-tau. These data suggest that SG formation might stimulate tau pathophysiology.

Thus, study of RNA-binding proteins and SG biology highlights novel pathways interacting with the pathophysiology of AD.

With this understanding, Wolozin and his colleagues hypothesize that since TIA-1 binds tau, it might stimulate tau aggregation during the stress response. They introduced TIA-1 into neurons with tau protein, and subjected the neurons to stress. Consistent with their hypothesis, tau spontaneously aggregated in the presence of TIA-1, but not in the absence. Thus, the group has potentially identified an entirely novel mechanism to induce tau aggregates de novo.

In future work, the group hopes to use this novel finding to understand how neurofibrillary tangles for in Alzheimer’s disease and to screen for novel compounds that might inhibit the progression of Alzheimer’s disease.

They believe that it may open up novel approaches to diagnose and stage the progression likelihood of the disease in Alzheimer patients.

Curated by: Dr. Venkat S. Karra, Ph.D

Read Full Post »

« Newer Posts