Posts Tagged ‘Online Services’

Curator: Aviva Lev-Ari, PhD, RN

Population Genetics

HAPAA: a tool for ancestral haploblock reconstruction. Specifically, given the genotype  (for instance, as derived by an Illumina genotyping array) of an individual of admixed ancestry, find the source population for each segment of the individual’s genome.

Protein Interaction Networks

A tool for aligning multiple global protein interaction networks; Graemlin also supports search for homology between a query module of proteins and a database of interaction networks.

Machine Learning

CONTRA: Conditionally trained models for sequence analysis. SeeCONTRAlign, a protein sequence aligner with very high accuracy, especially in twilight alignments. See CONTRAfold, an RNA secondary structure prediction tool. Stay tuned for more…

RNA Structure Prediction

CONTRAfold: Prediction of RNA secondary structure with a Conditional Log-Linear model that relies on automatically trained parameters, rather than on a physics-based energy model of RNA folding.

Protein Alignment

CONTRAlign: A protein sequence aligner that users can optionally train on feature sets such as secondary structure and solvent accessibility; see the CONTRA project above.
A protein multiple sequence aligner that exhibits high accuracy on popular benchmarks.
A protein multiple aligner that automatically finds domain structures of sequences with shuffled and repeated domain architectures.

Motif Finding

MotifCut: a non-parametric graph-based motif finding algorithm.
MotifScan: a non-parametric method for representing motifs and scanning DNA sequences for known motifs.
 CompareProspector: motif finding with Gibbs sampling & alignment.

Genomic Alignment

Stanford ENCODE: Multiple Alignments of 1% of the Human genome.
Typhon: BLAST-like sequence search to a multiple alignments database.
LAGAN: tools for genomic alignment. These include the MLAGAN multiple alignment tool, and Shuffle-LAGAN for alignment with rearrangements.

Microarray Analysis

Application of Independent Component Analysis (ICA) to microarrays.

Researchers Hope New Database Becomes Universal Cancer Genomics Tool

Swiss scientists hope that a new online database called “arrayMap” will bring cancer genomics to the desktop, laptop, and tablet computers of pathologists and researchers everywhere.

The database combines genomic information from three sources: large repositories such as the NCBI Gene Expression Omnibus (GEO) and Cancer Genome Atlas (CGA); journal literature; and submissions from individual investigators. It incorporates more than 42,000 genomic copy number arrays—normal and abnormal DNA comparisons—from 195 cancer types.

“arrayMap includes a wider range of human cancer copy number samples than any single repository,” said principal investigator Michael Baudis, M.D. Ease of access, visualization, and data manipulation, he added, are top priorities in its ongoing development.

A product of the University of Zurich Institute for Molecular Life Sciences, where Baudis researches bioinformatics and oncogenomics, arrayMap illustrates the importance of copy number abnormalities (CNA)—dysfunctional DNA gains or losses that visibly lengthen or shorten certain chromosomes—in the diagnosis, staging, and treatment of various malignancies.

“I have this particular tumor type—are there any CNAs in it that can tell me anything about prognosis or treatment?” said Michael Rossi, Ph.D., director of the Winship Cancer Institute cancer genomics program at the Emory University School of Medicine in Atlanta. “Data mining tools like arrayMap are incredibly useful to help answer such questions.”

arrayMap – genomic arrays for copy number profiling in human cancer

arrayMap is a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The arrayMap database provides an entry point for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. The current data reflects:

  • 42875 genomic copy number arrays
  • 634 experimental series
  • 256 array platforms
  • 197 ICD-O cancer entities
  • 480 publications (Pubmed entries)

For the majority of the samples, probe level visualization as well as customized data representation facilitate gene level and genome wide data review. Results from multi-case selections can be connected to downstream data analysis and visualization tools, as we provide through our Progenetix project.

arrayMap is developed by the group “Theoretical Cytogenetics and Oncogenomics” at the Institute of Molecular Life Sciences of the University of Zurich.

These tools were developed for our research projects. You are welcome to try them out, but there is only sparse documentation. If more support and/or custom analysis is needed, please contact Michael Baudis regarding a collaborative project.

MIT: A New Approach Uses Compression to Speed Up Genome Analysis

Public-Domain Computing Resources

Structural Bioinformatics

The BetaWrap program detects the right-handed parallel beta-helix super-secondary structural motif in primary amino acid sequences by using beta-strand interactions learned from non-beta-helix structures.
Wrap-and-pack detects beta-trefoils in protein sequences by using both pairwise beta-strand interactions and 3-D energetic packing information
The BetaWrapPro program predicts right-handed beta-helices and beta-trefoils by using both sequence profiles and pairwise beta-strand interactions, and returns coordinates for the structure.
The MSARi program indentifies conserved RNA secondary structure in non-coding RNA genes and mRNAs by searching multiple sequence alignments of a large set of candidate catalogs for correlated arrangements of reverse-complementary regions
The Paircoil2 program predicts coiled-coil domains in protein sequences by using pairwise residue correlations obtained from a coiled-coil database. The original Paircoil program is still available for use.
The MultiCoil program predicts the location of coiled-coil regions in amino acid sequences and classifies the predictions as dimeric or trimeric. An updated version, Multicoil2, will soon be available.
The LearnCoil Histidase Kinase program uses an iterative learning algorithm to detect possible coiled-coil domains in histidase kinase receptors.
The LearnCoil-VMF program uses an iterative learning algorithm to detect coiled-coil-like regions in viral membrane-fusion proteins.
The Trilogy program discovers novel sequence-structure patterns in proteins by exhaustively searching through three-residue motifs using both sequence and structure information.
The ChainTweak program efficiently samples from the neighborhood of a given base configuration by iteratively modifying a conformation using a dihedral angle representation.
The TreePack program uses a tree-decomposition based algorithm to solve the side-chain packing problem more efficiently. This algorithm is more efficient than SCWRL 3.0 while maintaining the same level of accuracy.
PartiFold: Ensemble prediction of transmembrane protein structures. Using statistical mechanics principles, partiFold computes residue contact probabilities and sample super-secondary structures from sequence only.
tFolder: Prediction of beta sheet folding pathways. Predict a coarse grained representation of the folding pathway of beta sheet proteins in a couple of minutes.
RNAmutants: Algorithms for exploring the RNA mutational landscape.Predict the effect of mutations on structures and reciprocally the influence of structures on mutations. A tool for molecular evolution studies and RNA design.
AmyloidMutants is a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability.


GLASS aligns large orthologous genomic regions using an iterative global alignment system. Rosetta identifies genes based on conservation of exonic features in sequences aligned by GLASS.
RNAiCut – Automated Detection of Significant Genes from Functional Genomic Screens.
MinoTar – Predict microRNA Targets in Coding Sequence.

Systems Biology

The Struct2Net program predicts protein-protein interactions (PPI) by integrating structure-based information with other functional annotations, e.g. GO, co-expression and co-localization etc. The structure-based protein interaction prediction is conducted using a protein threading server RAPTOR plus logistic regression.
IsoRank is an algorithm for global alignment of multiple protein-protein interaction (PPI) networks. The intuition is that a protein in one PPI network is a good match for a protein in another network if the former’s neighbors are good matches for the latter’s neighbors.


t-sample is an online algorithm for time-series experiments that allows an experimenter to determine which biological samples should be hybridized to arrays to recover expression profiles within a given error bound.


Compressive genomics


Nature Biotechnology 30, 627–630 (2012) doi:10.1038/nbt.2241

Published online 10 July 2012


BMIR is committed to the development of research tools as part of its goal to provide reusable, computational building blocks to facilitate the development of a vast array of systems. Some of these resources are described below.


The National Center for Biomedical Ontology (NCBO)


The National Center for Biomedical Ontology is a consortium of leading biologists, clinicians, informaticians, and ontologists who develop innovative technology and methods that allow scientists to create, disseminate, and manage biomedical information and knowledge in machine-processable form.

visit site


Protege Logo

Protégé is a free, open-source platform that provides its community of more than 80,000 users with a suite of tools to construct domain models and knowledge-based applications with ontologies.

visit site



PharmGKB curates information that establishes knowledge about the relationships among drugs, diseases and genes, including their variations and gene products. Our mission is to catalyze pharmacogenomics research.

visit site


Simbios Logo

About Simbios

Simbios, the National NIH Center for Physics-based Simulation of Biological Structures is devoted to helping biomedical researchers understand biological form and function. It provides infrastructure, software, and training to assist users as they create novel drugs, synthetic tissues, medical devices, and surgical interventions.

Simbios scientists investigate structure-function studies on a wide scale of biology – from molecules to organisms, and are currently focusing on challenging biological problems in RNA folding, myosin dynamics, neuromuscular biomechanics and cardiovascular dynamics.

visit site

Stanford BioMedical Informatics Research (BMIR) – Publications by Project

There are 8 publications for the project “Genomic Nosology for Medicine (GNOMED)”.

Identifying compartment-specific non-HLA targets after renal transplantation by integrating transcriptome and ‘‘antibodyome’’ measures
L. Li, P. Wadia, M. Sarwal, N. Kambham, T. Sigdel, D. B. Miklos, R. Chen, M. Naesens, A. J. Butte
PNAS, 106, 11, 4148-4153. Published in 2009
Using SNOMED-CT For Translational Genomics Data Integration
J. Dudley, D. P. Chen, A. J. Butte
Ronald Cornet, Kent Spackman (eds.): Representing and sharing knowledge using SNOMED. Proceedings of the 3rd International Conference on Knowledge Rep, Pheonix (AZ), USA, CEUR Workshop Proceedings, ISSN 1613-0073, online CEUR-WS.org/Vol-410/, 91-96. Published in 2008
The Ultimate Model Organism
A. J. Butte
Science, 320, 5874, 325-327. Published in 2008
Novel Integration of Hopsital Electronic Medical Records and Gene Expression Measurements to Identify Genetic Markers of Maturation
D. P. Chen, S. C. Weber, P. S. Constantinou, T. A. Ferris, H. J. Lowe, A. J. Butte
Pacific Symposium on Biocomputing, Big Island, Hawaii, 13, 243-254. Published in 2008
Enabling Integrative Genomic Analysis of High-Impact Human Diseases through Text Mining
J. Dudley, A. J. Butte
Pacific Symposium on Biocomputing, Big Island, Hawaii, 13, 580-591. Published in 2008
Methodologies for Extracting Functional Pharmacogenomic Experiments from International Repository
Y. Lin, A. P. Chiang, P. Yao, R. Chen, A. J. Butte, R. S. Lin
AMIA Annual Symposium, Chicago, IL, 463-467. Published in 2007
Clinical Arrays of Laboratory Measures, or “Clinarrays”, Built from an Electronic Health Record Enable Disease Subtyping by Severity
D. P. Chen, S. C. Weber, P. S. Constantinou, T. A. Ferris, H. J. Lowe, A. J. Butte
AMIA Annual Symposium, Chicago, IL, 115-119. Published in 2007
Finding Disease-Related Genomic Experiments Within an International Repository: First Steps in Translational Bioinformatics
A. J. Butte, R. Chen
Annual Symposium of the American Medical Informatics Association, Washington, D.C., 106-10. Published in 2006

Featured Publications

The National Center for Biomedical Ontology
M. A. Musen, N. F. Noy, C. G. Chute, M. A. Storey, B. Smith, N. H. Shah
. Published in 2011
Prototyping a Biomedical Ontology Recommender Service
C. Jonquet, N. H. Shah, M. A. Musen
Bio-Ontologies: Knowledge in Biology, SIG, ISMB ECCB 2009, Stockholm, Sweden. Published in 2009
Translational bioinformatics applications in genome medicine
A. J. Butte
Genome Medicine, 1, 6, 64. Published in 2009
Identifying compartment-specific non-HLA targets after renal transplantation by integrating transcriptome and ‘‘antibodyome’’ measures
L. Li, P. Wadia, M. Sarwal, N. Kambham, T. Sigdel, D. B. Miklos, R. Chen, M. Naesens, A. J. Butte
PNAS, 106, 11, 4148-4153. Published in 2009
Technology for Building Intelligent Systems: From Psychology to Engineering
M. A. Musen
Modeling Complex Systems, Bill Shuart, Will Spaulding and Jeffrey Poland, U Nebraska P, Lincoln, Nebraska, Vol 52 of the Nebraska Symposium on Motivation, 145-184. Published in 2009
Software-Engineering Challenges of Building and Deploying Reusable Problem Solvers
M. J. O’Connor, C. I. Nyulas, A. Okhmatovskaia, D. Buckeridge, S. W. Tu, M. A. Musen
Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 24, 3. Published in 2009
Data-Driven Methods to Discover Molecular Determinants of Serious Adverse Drug Events
A. P. Chiang, A. J. Butte
Clinical Pharmacology and Therapeutics, 28 January 2009, Advance online publication, doi:10.1038/clpt.2008.274. Published in 2009
Knowledge-Data Integration for Temporal Reasoning in a Clinical Trial System
M. J. O’Connor, R. D. Shankar, D. B. Parrish, A. K. Das
International Journal of Medical Informatics, 78, Suppl. 1, S77-S85. Published in 2009
GeneChaser: Identifying all biological and clinical conditions in which genes of interest are differentially expressed
R. Chen, R. Mallelwar, A. Thosar, S. Venkatasubrahmanyam, A. J. Butte
BMC Bioinformatics, 9, 1, 548. (doi:10.1186/1471-2105-9-548). Published in 2008
FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease
R. Chen, A. A. Morgan, J. Dudley, A. M. Deshpande, L. Li, K. Kodama, A. P. Chiang, A. J. Butte
Genome Biology, 9, 12, R170 (doi:10.1186/gb-2008-9-12-r170). Published in 2008
Translational Bioinformatics: Coming of Age
A. J. Butte
Journal of the American Medical Informatics Association, JAMIA, 15, 6, 709-14. Published in 2008
An Ontology-Driven Framework for Deploying JADE Agent Systems
C. I. Nyulas, M. J. O’Connor, S. W. Tu, A. Okhmatovskaia, D. Buckeridge, M. A. Musen
IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Sydney, Australia, 2, 573-577. Published in 2008
Understanding Detection Performance in Public Health Surveillance: Modeling Aberrancy-Detection Algorithms
D. Buckeridge, A. Okhmatovskaia, S. W. Tu, C. I. Nyulas, M. J. O’Connor, M. A. Musen
Journal of the American Medical Informatics Association, 15, 6, 760-769. Published in 2008
Network Analysis of Intrinsic Functional Brain Connectivity in Alzheimer’s Disease
K. S. Supekar, V. Menon, M. A. Musen, D. L. Rubin, M. Greicius
Public Library of Science-Computational Biology., PLoS Computational Biology, June 2008. Published in 2008
Medical Imaging on the Semantic Web: Annotation and Image Markup
D. L. Rubin, P. Mongkolwat, V. Kleper, K. S. Supekar, D. S. Channin
AAAI Spring Symposium Series, Semantic Scientific Knowledge Integration, Stanford. Published in 2008
The Ultimate Model Organism
A. J. Butte
Science, 320, 5874, 325-327. Published in 2008
BioPortal: A Web Portal to Biomedical Ontologies
D. L. Rubin, D. de Abreu Moreira, P. P. Kanjamala, M. A. Musen
AAAI Spring Symposium Series, Symbiotic Relationships between Semantic Web and Knowledge Engineering, Stanford University, (in press). Published in 2008
AILUN: reannotating gene expression data automatically
R. Chen, L. Li, A. J. Butte
Nature Methods, 4, 11, 879. Published in 2007
Evaluation and Integration of 49 Genome-wide Experiments and the Prediction of Previously Unknown Obesity-related Genes
S. B. English, A. J. Butte
Bioinformatics, Epub. Published in 2007
Protege: A Tool for Managing and Using Terminology in Radiology Applications
D. L. Rubin, N. F. Noy, M. A. Musen
Journal of Digital Imaging, J Digit Imaging. Published in 2007
Efficiently Querying Relational Databases using OWL and SWRL
M. J. O’Connor, R. D. Shankar, S. W. Tu, C. I. Nyulas, A. K. Das, M. A. Musen
The First International Conference on Web Reasoning and Rule Systems, Innsbruck, Austria, Springer, LNCS 4524, 361-363. Published in 2007
Creation and implications of a phenome-genome network
A. J. Butte, I. S. Kohane
Nature Biotechnology, 24, 1, 55 – 62. Published in 2006


National Center for Simulation of Biological Structures (SimBioS) at Stanford University

National Center for the Multiscale Analysis of Genomic and Cellular Networks (MAGNet) at Columbia University

National Alliance for Medical Image Computing (NA-MIC) at Brigham and Women’s Hospital, Boston, MA

Integrating Biology and the Bedside (I2B2) at Brigham and Women’s Hospital, Boston, MA

National Center for Biomedical Ontology (NCBO) at Stanford University

Integrate Data for Analysis, Anonymization, and Sharing (IDASH) at the University of California, San Diego



Read Full Post »