UK Biobank Makes Available 200,000 whole genomes Open Access

Reporter: Stephen J. Williams, Ph.D.

The following is a summary of an article by Jocelyn Kaiser, published in the November 26, 2021 issue of the journal Science.

To see the full article please go to https://www.science.org/content/article/200-000-whole-genomes-made-available-biomedical-studies-uk-effort

The UK Biobank (UKBB) this week unveiled to scientists the entire genomes of 200,000 people who are part of a long-term British health study.

The trove of genomes, each linked to anonymized medical information, will allow biomedical scientists to scour the full 3 billion base pairs of human DNA for insights into the interplay of genes and health that could not be gleaned from partial sequences or scans of genome markers. “It is thrilling to see the release of this long-awaited resource,” says Stephen Glatt, a psychiatric geneticist at the State University of New York Upstate Medical University.

Other biobanks have also begun to compile vast numbers of whole genomes, 100,000 or more in some cases (see table, below). But UKBB stands out because it offers easy access to the genomic information, according to some of the more than 20,000 researchers in 90 countries who have signed up to use the data. “In terms of availability and data quality, [UKBB] surpasses all others,” says physician and statistician Omar Yaxmehen Bello-Chavolla of the National Institute for Geriatrics in Mexico City.

Enabling your vision to improve public health

Data drives discovery. We have curated a uniquely powerful biomedical database that can be accessed globally for public health research. Explore data from half a million UK Biobank participants to enable new discoveries to improve public health.

Data Showcase

Future data releases

This UKBB biobank represents genomes collected from 500,000 middle-age and elderly participants for 2006 to 2010. The genomes are mostly of a European descent. Other large scale genome sequencing ventures like Iceland’s DECODE, which collected over 100,000 genomes, is now a subsidiary of Amgen, and mostly behind IP protection, not Open Access as this database represents.

UK Biobank is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. The database is regularly augmented with additional data and is globally accessible to approved researchers undertaking vital research into the most common and life-threatening diseases. It is a major contributor to the advancement of modern medicine and treatment and has enabled several scientific discoveries that improve human health.

A summary of some large scale genome sequencing projects are show in the table below:

BiobankCompleted Whole GenomesRelease Information
UK Biobank200,000300,000 more in early 2023
TransOmics for
Precision Medicien
161,000NIH requires project
specific request
Million Veterans
125,000Non-Veterans Affairs
researchers get first access
100,000 Genomes
120,000Researchers must join Genomics
England collaboration
All of Us90,000NIH expects to release 2022

Brain Biobank and studies of disease structure correlates

Larry H. Bernstein, MD, FCAP, Curator



Unveiling Psychiatric Diseases

Researchers create neuropsychiatric cellular biobank

Image: iStock/mstroz
Image: iStock/mstroz
Researchers from Harvard Medical School and Massachusetts General Hospital have completed the first stage of an important collaboration aimed at understanding the intricate variables of neuropsychiatric disease—something that currently eludes clinicians and scientists.

The research team, led by Isaac Kohane at HMS and Roy Perlis at Mass General, has created a neuropsychiatric cellular biobank—one of the largest in the world.

It contains induced pluripotent stem cells, or iPSCs, derived from skin cells taken from 100 people with neuropsychiatric diseases such as schizophrenia, bipolar disorder and major depression, and from 50 people without neuropsychiatric illness.

In addition, a detailed profile of each patient, obtained from hours of in-person assessment as well as from electronic medical records, is matched to each cell sample.

As a result, the scientific community can now for the first time access cells representing a broad swath of neuropsychiatric illness. This enables researchers to correlate molecular data with clinical information in areas such as variability of drug reactions between patients. The ultimate goal is to help treat, with greater precision, conditions that often elude effective management.

The cell collection and generation was led by investigators at Mass General, who in collaboration with Kohane and his team are working to characterize the cell lines at a molecular level. The cell repository, funded by the National Institutes of Health, is housed at Rutgers University.

“This biobank, in its current form, is only the beginning,” said Perlis, director of the MGH Psychiatry Center for Experimental Drugs and Diagnostics and HMS associate professor of psychiatry. “By next year we’ll have cells from a total of four hundred patients, with additional clinical detail and additional cell types that we will share with investigators.”

A current major limitation to understanding brain diseases is the inability to access brain biopsies on living patients. As a result, researchers typically study blood cells from patients or examine post-mortem tissue. This is in stark contrast with diseases such as cancer, for which there are many existing repositories of highly characterized cells from patients.

The new biobank offers a way to push beyond this limitation.


A Big Step Forward

While the biobank is already a boon to the scientific community, researchers at MGH and the HMS Department of Biomedical Informatics will be adding additional layers of molecular data to all of the cell samples. This information will include whole genome sequencing and transcriptomic and epigenetic profiling of brain cells made from the stem cell lines.

Collaborators in the HMS Department of Neurobiology, led by Michael Greenberg, department chair and Nathan Marsh Pusey Professor of Neurobiology,  will also work to examine characteristics of other types of neurons derived from these stem cells.

“This can potentially alter the entire way we look at and diagnose many neuropsychiatric conditions,” said Perlis.

One example may be to understand how the cellular responses to medication correspond to the patient’s documented responses, comparing in vitro with in vivo. “This would be a big step forward in bringing precision medicine to psychiatry,” Perlis said.

“It’s important to recall that in the field of genomics, we didn’t find interesting connections to disease until we had large enough samples to really investigate these complex conditions,” said Kohane, chair of the HMS Department of Biomedical Informatics.

“Our hypothesis is that here we will require far fewer patients,” he said. “By measuring the molecular functioning of the cells of each patient rather than only their genetic risk, and combining that all that’s known of these people in terms of treatment response and cognitive function, we will discover a great deal of valuable information about these conditions.”

Added Perlis, “In the early days of genetics, there were frequent false positives because we were studying so few people. We’re hoping to avoid the same problem in making cellular models, by ensuring that we have a sufficient number of cell lines to be confident in reporting differences between patient groups.”

The generation of stem cell lines and characterization of patients and brain cell lines is funded jointly by the the National Institute of Mental Health, the National Human Genome Research Institute and a grant from the Centers of Excellence in Genomic Science program.


On C.T.E. and Athletes, Science Remains in Its Infancy

Se Hoon ChoiYoung Hye KimMatthias Hebisch, et al.


Alzheimer’s disease is the most common form of dementia, characterized by two pathological hallmarks: amyloid-β plaques and neurofibrillary tangles1. The amyloid hypothesis of Alzheimer’s disease posits that the excessive accumulation of amyloid-β peptide leads to neurofibrillary tangles composed of aggregated hyperphosphorylated tau2, 3. However, to date, no single disease model has serially linked these two pathological events using human neuronal cells. Mouse models with familial Alzheimer’s disease (FAD) mutations exhibit amyloid-β-induced synaptic and memory deficits but they do not fully recapitulate other key pathological events of Alzheimer’s disease, including distinct neurofibrillary tangle pathology4, 5. Human neurons derived from Alzheimer’s disease patients have shown elevated levels of toxic amyloid-β species and phosphorylated tau but did not demonstrate amyloid-β plaques or neurofibrillary tangles6, 7, 8, 9, 10, 11. Here we report that FAD mutations in β-amyloid precursor protein and presenilin 1 are able to induce robust extracellular deposition of amyloid-β, including amyloid-β plaques, in a human neural stem-cell-derived three-dimensional (3D) culture system. More importantly, the 3D-differentiated neuronal cells expressing FAD mutations exhibited high levels of detergent-resistant, silver-positive aggregates of phosphorylated tau in the soma and neurites, as well as filamentous tau, as detected by immunoelectron microscopy. Inhibition of amyloid-β generation with β- or γ-secretase inhibitors not only decreased amyloid-β pathology, but also attenuated tauopathy. We also found that glycogen synthase kinase 3 (GSK3) regulated amyloid-β-mediated tau phosphorylation. We have successfully recapitulated amyloid-β and tau pathology in a single 3D human neural cell culture system. Our unique strategy for recapitulating Alzheimer’s disease pathology in a 3D neural cell culture model should also serve to facilitate the development of more precise human neural cell models of other neurodegenerative disorders.



Figure 2: Robust increases of extracellular amyloid-β deposits in 3D-differentiated hNPCs with FAD mutations.close

Robust increases of extracellular amyloid-[bgr] deposits in 3D-differentiated hNPCs with FAD mutations.

a, Thin-layer 3D culture protocol. HC, histochemistry; IF, immunofluorescence; IHC, immunohistochemistry. b, Amyloid-β deposits in 6-week differentiated control and FAD ReN cells in 3D Matrigel (green, GFP; blue, 3D6; scale bar, …


Stem Cell-Based Spinal Cord Repair Enables Robust Corticospinal Regeneration


Novel use of EPR spectroscopy to study in vivo protein structure



α-synuclein is a protein found abundantly throughout the brain. It is present mainly at the neuron ends where it is thought to play a role in ensuring the supply of synaptic vesicles in presynaptic terminals, which are required for the release of neurotransmitters to relay signals between neurons. It is critical for normal brain function.

However, α-synuclein is also the primary protein component of the cerebral amyloid deposits characteristic of Parkinson’s disease and its precursor is found in the amyloid plaques of Alzheimer’s disease. Although α-synuclein is present in all areas of the brain, these disease-state amyloid plaques only arise in distinct areas.

Alpha-synuclein protein. May play role in Parkinson’s and Alzheimer’s disease.  © molekuul.be / Shutterstock.com

Imaging of isolated samples of α-synuclein in vitro indicate that it does not have the precise 3D folded structure usually associated with proteins. It is therefore classed as an intrinsically disordered protein. However, it was not known whether the protein also lacked a precise structure in vivo.

There have been reports that it can form helical tetramers. Since the 3D structure of a biological protein is usually precisely matched to the specific function it performs, knowing the structure of α-synuclein within a living cell will help elucidate its role and may also improve understanding of the disease states with which it is associated.

If α-synuclein remains disordered in vivo, it may be possible for the protein to achieve different structures, and have different properties, depending on its surroundings.

Techniques for determining protein structure

It has long been known that elucidating the structure of a protein at an atomic level is fundamental for understanding its normal function and behavior. Furthermore, such knowledge can also facilitate the development of targeted drug treatments. Unfortunately, observing the atomic structure of a protein in vivo is not straightforward.

X-ray diffraction is the technique usually adopted for visualizing structures at atomic resolution, but this requires crystals of the molecule to be produced and this cannot be done without separating the molecules of interest from their natural environment. Such processes can modify the protein from its usual state and, particularly with complex structures, such effects are difficult to predict.

The development of nuclear magnetic resonance (NMR) spectroscopy improved the situation by making it possible for molecules to be analyzed under in vivo conditions, i.e. same pH, temperature and ionic concentration.

More recently, increases in the sensitivity of NMR and the use of isotope labelling have enabled determinations of the atomic level structure and dynamics of proteins to be determined within living cells1. NMR has been used to determine the structure of a bacterial protein within living cells2 but it is difficult to achieve sufficient quantities of the required protein within mammalian cells and to keep the cells alive for NMR imaging to be conducted.

Electron paramagnetic resonance (EPR) spectroscopy for determining protein structure

Recently, researchers have managed to overcome these obstacles by using in-cell NMR and electron paramagnetic resonance (EPR) spectroscopy. EPR spectroscopy is a technique that is similar to NMR spectroscopy in that it is based on the measurement and interpretation of the energy differences between excited and relaxed molecular states.

In EPR spectroscopy it is electrons that are excited, whereas in NMR signals are created through the spinning of atomic nuclei. EPR was developed to measure radicals and metal complexes, but has also been utilized to study the dynamic organization of lipids in biological membranes3.

EPR has now been used for the first time in protein structure investigations and has provided atomic-resolution information on the structure of α-synuclein in living mammalians4,5.

Bacterial forms of the α-synuclein protein labelled with 15N isotopes were introduced into five types of mammalian cell using electroporation. Concentrations of α-synuclein close to those found in vivo were achieved and the 15N isotopes allowed the protein to be clearly defined from other cellular components by NMR. The conformation of the protein was then determined using electron paramagnetic resonance (EPR).

The results showed that within living mammalian cells α-synuclein remains as a disordered and highly dynamic monomer. Different intracellular environments did not induce major conformational changes.


The novel use of EPR spectroscopy has resolved the mystery surrounding the in vivo conformation of α-synuclein. It showed that α-synuclein maintains its disordered monomeric form under physiological cell conditions. It has been demonstrated for the first time that even in crowded intracellular environments α-synuclein does not form oligomers, showing that intrinsic structural disorder can be sustained within mammalian cells.


  1. Freedberg DI and Selenko P. Live cell NMR Annu. Rev. Biophys. 2014;43:171–192.
  2. Sakakibara D, et al. Protein structure determination in living cells by in-cell NMR spectroscopy. Nature 2009;458:102–105.
  3. Yashroy RC. Magnetic resonance studies of dynamic organisation of lipids in chloroplast membranes. Journal of Biosciences 1990;15(4):281.
  4. Alderson TA and Bax AD. Parkinson’s Disease. Disorder in the court. Nature 2016; doi:10.1038/nature16871.
  5. Theillet FX, et al. Structural disorder of monomeric α-synuclein persists in mammalian cells. Nature 2016; doi:10.1038/nature16531.


Personalized Medicine: Clinical Aspiration of Microarrays

Reporter, Writer: Stephen J. Williams, Ph.D.

 In this month’s Science, Mike May (at http://www.sciencemag.org/site/products/lst_20130215.xhtml) describes some of the challenges and successes in introducing microarray analysis to the clinical setting.  Traditionally used for investigational research, microarray is now being developed, customized and used for biomarker analysis, prognostic and predictive value, in a disease-specific manner.

Challenges in data interpretation

      In an interview with Seth Crosby, director of the Genome Technology Access Center at Washington University School of Medicine in St. Louis, “the biggest challenge” in moving microarray to the clinical setting is data interpretation.  The current technology makes it possible to evaluate expression of thousands of genes from a patient’s sample however as Crosby describes is assigning clinical relevance to the data.  For example Crosby explains that Washington University had validated a panel of 45 oncology genes by next generation sequencing and are using these genes to develop diagnostic tests to screen patient tumors for the purpose of determining a personalized therapeutic strategy. Seth Crosby noted it took “hundreds of Ph.D. and M.D. hours” to sift through the hundreds of papers to determine which genes were relevant to a specific cancer type. However, he notes, that once we better understand which changes in the patient’s genome are related to a specific disease we will be able to narrow down the list and be able to produce both economical and more disease-relevant microarrays.

Is this aberration pathogenic or not?

     Microarrays are becoming an invaluable tool in cytogenetics, as eluded by Andy Last, executive vice president of the genetic analysis business unit at Affymetrix.  Certain diseases like Down syndrome have well characterized chromosomal alterations like additions or deletions of parts or entire chromosomes.  According to Affymetrix, the most common use of microarrays is for determining copy number variation.  However according to James Clough, vice president of clinical and genomic services at Oxford Gene Technology, given the hundreds of syndromes associated with chromosomal rearrangements, the challenge will be to determine if a small chromosomal aberration has pathologic significance, given that microarray affords much higher diagnostic yield and speed of analysis than traditional microscopic techniques.  To address this challenge, Oxford Gene Technologies, PerkinElmer, Affymetrix, and Agilent all have custom designed microarrays to evaluate disease specific copy number and SNP (single nucleotide polymorphism) microarrays.  For example PerkinElmer designed OncoChip™ to evaluate copy number variation in more than 1.800 cancer genes.  Agilent makes microarrays that evaluates both copy number variation such as its CGH (comparative genomic hybridization) plus SNP microarrays.  Patricia Barco, product manager for cytogenetics at Agilent, notes these arrays can be used in prenatal and postnatal research and cancer, and “can be customized from more than 28 million probes in our library”.

Custom Tools and Software to Handle the Onslaught of Big Data

     There is a need for FDA approved diagnostic tools based on microarrays. Pathwork Diagnostic’s has one such tool (the Pathwork Tissue of Origin test), which uses 2,000 transcript markers and a proprietary computational algorithm to determine from expression analysis, the tissue of origin of a patient’s tumor.  Pathwork also provides a fast, custom turn-around analytical service for pathologists who encounter difficult to interpret samples.  Illumina provides the Infinium HumanCore BeadChip family of microarrays, which can determine genetic variations for purposes of biological tissue banking.  This system uses a set of over 300,000 SNP probes plus 240,000 exome-based markers.

     Tools have also been developed to validate microarray results.  A common validation strategy is the use of quantitative real-time PCR to verify the expression changes seen on the microarray.  Life Technologies developed the TaqMan OpenArray Real Time PCR plates, which have 3,072 wells and can be custom-formatted using their library of eight million validated TaqMan assays.

Making Sense of the Big Data: Bridging the Knowledge Gap using Bioinformatics

          The use of microarray has spurned industries devoted to developing the bioinformatics software to analyze the massive amounts of data and provide clinical significance.  For example companies such as Expression Analysis use their bioinformatics software to provide pathway analysis for microarray data in order to translate the data into the biology.  Using such strategies can also validate the design of microarrays for various diseases.

Foundation Medicine, Inc., a molecular information company, provides cancer genomics test solutions. It offers FoundationOne, an informative genomic profile to identify a patient’s individual molecular alterations and match them with relevant targeted therapies and clinical trials. The company’s product enables physicians to recommend treatment options for patients based on the molecular subtype of their cancer.

The Canadian Bioinformatics Workshops series recently offered a course on using bioinformatic approaches to analyze clinical data generated from microarray approaches (http://bioinformatics.ca/workshops/2012/bioinformatics-cancer-genomics-bicg).   The course objectives are described below:

Course Objectives

Cancer research has rapidly embraced high throughput technologies into its research, using various microarray, tissue array, and next generation sequencing platforms. The result has been a rapid increase in cancer data output and data types. Now more than ever, having the bioinformatic skills and knowledge of available bioinformatic resources specific to cancer is critical. The CBW will host a 5-day workshop covering the key bioinformatics concepts and tools required to analyze cancer genomic data sets. Participants will gain experience in genomic data visualization tools which will be applied throughout the development of the skills required to analyze cancer -omic data for gene expression, genome rearrangement, somatic mutations and copy number variation. The workshop will conclude with analyzing and conducting pathway analysis on the resultant cancer gene list and integration of clinical data.

Successful Examples of Clinical Ventures Integrating Bioinformatics in Cancer Treatment Decision –Making

The University of Pavia, Italy developed a fully integrated oncology bioinformatics workflow as described on their website and at the ESMO 2012 Congress meeting:







ESMO 2012




Translational research


A. Zambelli, D. Segagni, V. Tibollo, A. Dagliati, A. Malovini, V. Fotia, S. Manera, R. Bellazzi; Pavia/IT

  • Body

The ONCO-i2b2 project, supported by the University of Pavia and the Fondazione Salvatore Maugeri (FSM), aims at supporting translational research in oncology and exploits the software solutions implemented by the Informatics for Integrating Biology and the Bedside (i2b2) research centre, an initiative funded by the NIH Roadmap National Centres for Biomedical Computing. The ONCO-i2b2 software is designed to integrate the i2b2 infrastructure with the FSM hospital information system and the Bruno Boerci Biobank, in order to provide well-characterized cancer specimens along with an accurate patients clinical data-base. The i2b2 infrastructure provides a web-based access to all the electronic medical records of cancer patients, and allow researchers analyzing the vast amount of biological and clinical information, relying on a user-friendly interface. Data coming from multiple sources are integrated and jointly queried.

In 2011 at AIOM Meeting we reported the preliminary experience of the ONCO-i2b2 project, now we’re able to present the up and running platform and the extended data set. Currently, more than 4400 specimens are stored and more than 600 of breast cancer patients give the consent for the use of specimens in the context of clinical research, in addition, more than 5000 histological reports are stored in order to integrate clinical data.

Within the ONCO-i2b2 project is possible to query and merge data regarding:

• Anonymous patient personal data;

• Diagnosis and therapy ICD9-CM subset from the hospital information system;

• Histological data (tumour SNOMED and TNM codes) and receptor profile testing (Her2, Ki67) from anatomic pathology database;

• Specimen molecular characteristics (DNA, RNA, blood, plasma and cancer tissues) from the Bruno Boerci Biobank management system.

The research infrastructure will be completed by the development of new set of components designed to enhance the ability of an i2b2 hive to utilize data generated by NGS technology, providing a mechanism to apply custom genomic annotations. The translational tool created at FSM is a concrete example regarding how the integration of different information from heterogeneous sources could bring scientific research closer to understand the nature of disease itself and to create novel diagnostics through handy interfaces.


All authors have declared no conflicts of interest.

NCI has under-taken a similar effort under the Recovery Act (the full text of the latest report is taken from their website http://www.cancer.gov/aboutnci/recovery/recoveryfunding/investmentreports/bioinformatics:

Cancer Bioinformatics: Recovery Act Investment Report

November 2009

Public Health Burden of Cancer

Cancer is the second leading cause of death in the United States after heart disease. In 2009, it is estimated that nearly 1.5 million new cases of invasive cancer will be diagnosed in this country and more than 560,000 people will die of the disease.

To learn more, visit:

Cancer Bioinformatics Program Overview

Over the past five years, NCI’s Center for Biomedical Informatics and Information Technology (CBIIT) has led the effort to develop and deploy the cancer Biomedical Informatics Grid® (caBIG) in partnership with the broader cancer community.  The caBIG network is designed to enable the integration and exchange of data among researchers in the laboratory and the clinic, simplify collaboration, and realize the potential of information-based (personalized) medicine in improving patient outcomes. caBIG has connected major components of the cancer community, including NCI-designated Cancer Centers, participating institutions of the NCI Community Cancer Centers Program (NCCCP), and numerous large-scale scientific endeavors, as well as basic, translational, and clinical researchers at public and private institutions across the United States and around the world.  Beyond cancer research, caBIG capabilities—infrastructure, standards, and tools—provide a prototype for linking other disease communities and catalyzing a new 21st-century biomedical ecosystem that unifies research and care. ARRA funding will allow NCI to accelerate the ongoing development of the Cancer Knowledge Cloud and Oncology Electronic Health Records (EHRs) initiatives, thereby providing for continued job creation in the areas of biomedical informatics development and application as well as healthcare delivery.

The caBIG Cancer Knowledge Cloud: Extending the Research Infrastructure

The Cancer Knowledge Cloud is a virtual biomedical capability that utilizes caBIG tools, infrastructure, and security frameworks to integrate distributed individual and organizational data, software applications, and computational capacity throughout the broad cancer research and treatment community. The Cancer Knowledge Cloud connects, integrates, and facilitates sharing of the diverse primary data generated through basic and clinical research and care delivery to enable personalized medicine. The cloud includes information generated through large-scale research projects such as The Cancer Genome Atlas (TCGA), the cancer Human Biobank (caHUB) tissue acquisition network, the NCI Functional Biology Consortium, the NCI Patient Characterization Center, and the NCI Preclinical Development Pipeline, academic and industry counterparts to these projects, and clinical observations (from entities such as the NCCCP) captured in oncology-extended Electronic Health Records.  Through the use of the caBIG Data Sharing and Security Framework, the Cloud will support appropriate sharing of information, supporting in silico hypothesis generation and testing, and enabling a learning healthcare system.

A caBIG-Based Rapid-Learning Healthcare System: Incorporating Oncology-Extended Electronic Healthcare Records (EHRs)

The 21st-century Cancer Knowledge Cloud will connect individuals, organizations, institutions, and their associated information within an information technology-enabled cycle of discovery, development, and clinical care—the paradigm of a rapid-learning healthcare system. This will transform these disconnected sectors into a system that is personalized, preventive, pre-emptive, and patient-participatory.  To be realized, this model requires the adoption of standards-based EHRs. Presently, however, no certified oncology-based EHR exists, and fewer than 3 percent of oncologists with outpatient-based practices utilize EHRs. caBIG has recently established a collaboration with the American Society of Clinical Oncology (ASCO) to develop an oncology-specific EHR (caEHR) specification based on open standards already in use in the oncology community that will utilize caBIG standards for interoperability. NCI will implement an open-source version of this specification to validate the specification and to provide a free alternative to sites that choose not to purchase a commercial system. The launch customer for the caEHR will be NCCCP participating sites. NCI will work with appropriate entities to provide a mechanism for certifying that caEHR implementations are consistent with the NCI/ASCO specification.

Bards Cancer Institute has another clinical bioinformatics program to support their clinical efforts:

Clinical Bioinformatics Program in Oncology at Barts Cancer Institute at Barts and the London School of Medicine


BCI HomeCancer Bioinformatics


Why we focus on Cancer Bioinformatics

Bioinformatics is a new interdisciplinary area involving biological, statistical and computational sciences. Bioinformatics will enable cancer researchers not only to manage, analyze, mine and understand the currently accumulated, valuable, high-throughput data, but also to integrate these in their current research programs. The need for bioinformatics will become ever more important as new technologies increase the already exponential rate at which cancer data are generated.

What we do

  • We work alongside clinical and basic scientists to support the cancer projects within BCI.  This is an ideal partnership between scientific experts, who know the research questions that will be relevant from a cancer biologist or clinician’s perspective, and bioinformatics experts, who know how to develop the proposed methods to provide answers.
  • We also conduct independent bioinformatics research, focusing on the development of computational and integrative methods, algorithms, databases and tools to tackle the analysis of the high volumes of cancer data.
  • We also are actively involved in the development of bioinformatics educational courses at BCI. Our courses offer a unique opportunity for biologists to gain a basic understanding in the use of bioinformatics methods to access and harness large complicated high-throughput data and uncover meaningful information that could be used to understand molecular mechanisms and develop novel targeted therapeutics/diagnostic tools.

Developing Criteria for Genomic Profiling in Lung Cancer:

A Report from U.S. Cancer Centers

In a report by Pao et. al., a group of clinicians organized a meeting to standardize some protocols for the integration of microarray and genomic data from lung cancer patients into the clinical setting.[1]  There has been ample evidence that adenocarcinomas could be classified into “clinically relevant molecular subsets” based on distinct genomic changes.  For example EGFR (epidermal growth factor receptor) exon 19 deletions and exon 21 point mutations predict sensitivity to tyrosine kinase inhibitors (TKIs) like gefitinib, whereas exon 20 insertions predict primary resistance[2].

However, as the authors note, “mutational profiling has not been widely accepted or adopted into practice in thoracic oncology”.  

     Therefore, a multi-institutional workshop was held in 2009 among participants from Massachusetts General Hospital (MGH) Cancer Center, Memorial Sloan-Kettering Cancer Center (MSKCC), the Dana-Farber/Bingham & Women’s Cancer Center (DF/BWCC), the M.D. Anderson Cancer Center (VICC), and the Vanderbilt-Ingram Cancer Center (VICC) to discuss their institutes molecular profiling programs with emphasis on:

·         Organization/workflow

·         Mutation detection technologies

·         Clinical protocols and reporting

·         Patient consent

In addition to the aforementioned challenges, the panel discussed further issues for developing improved science-driven criteria for determining targeted therapies including:

1)      Including pathologists into criteria development as pathology departments are usually the main repositories for specimens

2)      Developing integrated informatics systems

3)      Standardizing new target validation methodology across cancer centers


1.            Pao W, Kris MG, Iafrate AJ, Ladanyi M, Janne PA, Wistuba, II, Miake-Lye R, Herbst RS, Carbone DP, Johnson BE et al: Integration of molecular profiling into the lung cancer clinic. Clinical cancer research : an official journal of the American Association for Cancer Research 2009, 15(17):5317-5322.

2.            Wu JY, Wu SG, Yang CH, Gow CH, Chang YL, Yu CJ, Shih JY, Yang PC: Lung cancer with epidermal growth factor receptor exon 20 mutations is associated with poor gefitinib treatment response. Clinical cancer research : an official journal of the American Association for Cancer Research 2008, 14(15):4877-4882.

