Posts Tagged ‘Genome Institute’

Searchable Genome for Drug Development

Reporter: Aviva Lev-Ari, PhD, RN

The Druggable Genome Is Now Googleable

By Aaron Krol

November 22, 2013 | Relationships between human genetic variation and drug responses are being documented at an accelerating rate, and have become some of the most promising avenues of research for understanding the molecular pathways of diseases and pharmaceuticals alike. Drug-gene interactions are a cornerstone of personalized medicine, and learning about the drugs that mediate gene expression can point the way toward new therapeutics with more targeted effects, or novel disease targets for existing drugs. So it may seem surprising that, until October of this year, a researcher interested in pharmacogenetics generally needed the help of a dedicated bioinformatician just to access the known background on a gene’s drug associations.

Obi and Malachi Griffith are particularly dedicated bioinformaticians, who specialize in applying data analytics to cancer research, a rich field for drug-gene information. Like many professionals in their budding field, the Griffiths pursued doctoral research in bioinformatics applications at a time when this was not quite recognized as a distinct discipline, and quickly found their data-mining talents in hot demand. “We found ourselves answering the same questions over and over again,” says Malachi. “A clinician or researcher, who perhaps wasn’t a bioinformatician, would have a list of genes, and would ask, ‘Well, which of these genes are kinases? Which of these genes has a known drug or is potentially druggable?’ And we would spend time writing custom scripts and doing ad hocanalyses, and eventually decided that you really shouldn’t need a bioinformatics expert to answer this question for you.”

The Griffiths – identical twin brothers, though Malachi helpfully sports a beard – had by this time joined each other at one of the world’s premiere genomic research centers, the Genome Institute at Washington University in St. Louis, and figured they had the resources to improve this state of affairs. The Genome Institute is generously funded by the NIH and was a major contributor to the Human Genome Project; the Griffiths had congregated there deliberately after completing post-doctoral fellowships at the Lawrence Berkeley National Laboratory in California (Obi) and the Michael Smith Genome Sciences Centre in Vancouver (Malachi). “When we finished our PhDs, we knew we would like to set up a lab together,” says Obi. At the Genome Institute, they pitched the idea of building a free, searchable online database of drug-gene associations, and soon the Drug Gene Interaction Database (DGIdb) was under development.

In Search of the Druggable Genome

Existing public databases, like DrugBank, the Therapeutic Target Database, and PharmGKB, were the first ports of call, where a wealth of information was waiting to be re-aggregated in a searchable format. “For their use cases [these databases] are quite powerful,” says Obi. “They were just missing that final component, which is user accessibility for the non-informatics expert.” Getting all this data into DGIdb was and remains the most labor-intensive part of the project. At least two steps removed from the original sources establishing each interaction, the Griffiths felt they had to reexamine each data point, tracing it back to publication and scrutinizing its reliability. “It’s sort of become a rite of passage in our group,” says Malachi. “When new people join the lab, they have to really dig into this resource, learn what it’s all about, and then contribute some of their time toward manual curation.”

The website’s main innovation, however, is its user interface, which presents itself like Google but returns results a little more like a good medical records system. The homepage lets you enter a gene or panel of genes into a search box, and if desired, add a few basic filters. Entering search terms brings up a chart that quickly summarizes any known drug interactions, which can then be further filtered or tracked back to the original sources. The emphasis is not on a detailed breakdown of publications or molecular behavior, but on immediately viewing which drugs affect a given gene’s expression and how. “We did try to place quite a bit of emphasis on creating something that was intuitive and easy to use,” says Malachi. Beta testing involved watching unfamiliar users navigate the website and taking notes on how they interacted with the platform.

DGIdb went live in February of this year, followed by a publication in Nature Methods this October, and the database is now readily accessible at http://dgidb.org/. The code is open source and can be modified for any specific use case, using the Perl, Ruby, Shell, or Python programming languages, and the Genome Institute has also made available their internal API for users who want to run documents through the database automatically, or perform more sophisticated search functions. User response will be key to sustaining and expanding the project, and the Griffiths are looking forward to an update that draws on outside researchers’ knowledge. “A lot of this information [on drug-gene interactions] really resides in the minds of experts,” says Malachi, “and isn’t in a form that we can easily aggregate it from… We’re really motivated to have a crowdsourcing element, so that we can start to harness all of that information.” In the meantime, the bright orange “Feedback” button on every page of the site is being bombarded with requests to add specific interactions to the database.

Not all these interactions are easy to validate. “Another area that we’re really actively trying to pursue,” adds Malachi, “is getting information out of sources where text mining is required, where information is really not in a form where the interaction between genes and drugs is laid out quickly.” He cites the example of clinicaltrials.gov, where the results of all registered clinical trials in the United States are made available online. This surely includes untapped material on drug-gene interactions, but nowhere are those results neatly summarized. “You either have a huge manual curation problem on your hands – there’s literally hundreds of thousands of clinical trial records – or you have to come up with some kind of machine learning, text-mining approach.” So far, the Genome Institute has been limited to manual curation for this kind of scenario, but with a resource as large as the clinical trials registry, the Griffiths hope to bring their programming savvy to bear on a more efficient attack.

In the meantime, new resources are continuously being brought into the database, rising from eleven data sources on launch to sixteen now, with more in the curation pipeline. DGIdb is already regularly incorporated in the Genome Institute’s research. Every cancer patient sequenced at Washington University has her genetic data run first through an analytics pipeline to find genes with unusual variants or levels of expression, and then through DGIdb to see whether any of these genes are known to be druggable. This is an ideal use case for the database, which is presently biased toward cancer-related interactions, the Griffiths’ own area of research.

The twins have a personal investment in advancing cancer therapeutics. Their mother died in her forties from an aggressive case of breast cancer, while Obi and Malachi were still in high school, and their family has continued to suffer disproportionately from cancer ever since. Says Obi, “We’ve had the opportunity to see [everything from] terrible, tragic outcomes… to the other end of the spectrum, where advances in the way cancer is treated were able to really make a huge difference to both our cousin and our brother,” both in remission after life-threatening cases of childhood leukemia and Ewing’s sarcoma, respectively. “Everyone can tell these stories,” Malachi adds, “but we’ve had a little more than our fair share.”

DGIdb can’t influence cancer care directly – most of the data available on drug-gene interactions is too tentative for clinical use – but it can spur research into more personalized treatments for genetically distinct cancers, and increasingly for other diseases as more information is brought inside. Meanwhile, companies like Foundation Medicine and MolecularHealth are drawing on similar drug-gene datasets, narrowed down to the most actionable information, to tailor clinical action to individual cancer patients. The Griffiths are cautiously optimistic that research like the Genome Institute’s is approaching the crucial tipping point where finely tuned clinical decisions could be made based on a patient’s genetic profile. “We’re still firmly on the academic research side,” says Malachi, but “we’re definitely at the stage where this idea needs to be pursued aggressively.”


Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

Genetic Basis of Complex Human Diseases: Dan Koboldt’s Advice to Next-Generation Sequencing Neophytes

Word Cloud by Daniel Menzin

UPDATED 3/27/2013

The Exome is Not Enough

March 27, 2013

Dan Koboldt at MassGenomics explains why exome sequencing often fails to identify causal variants, even in Mendelian disorders — “the very plausible possibility that a noncoding functional variant is responsible.”

Koboldt, the analysis manager in the human genetics group at the Genome Institute at Washington University, says that researchers shouldn’t overlook the importance of noncoding functional variants, which require a suite of technologies to detect, including RNA-seq, ChiP-seq, DNAse sequencing and footprinting, bisulfite sequencing, and chromosome conformation capture.

“These types of experiments generate a wealth of data about regulatory activity in genomes,” he says. “While studying each of these independently is certainly informative, integrative analysis will be required to elucidate how all of these different regulatory mechanisms work together.”

While this effort will require “robust statistical models, substantial computing resources, and productive collaboration among research groups, the end result “will be a far more complete understanding of how the genome works,” he says.


Dan Koboldt works as a staff scientist in the Human Genetics group of the Genome Institute at Washington University in St. Louis. There, he works with scientists, physicians, programmers, and data analysts to understand the genetic basis of complex human diseases such as cancer, vision disorders, and metabolic syndromes through next-gen sequencing analysis. He received bachelor’s degrees in Computer Science and French from the University of Missouri-Columbia, and a master’s degree in Biology fromWashington University.

Dan has worked in the field of human genetics since 2003, when he joined the lab of Raymond E. Miller, which played a role in the International HapMap Project and later the genetic map of C. briggsae, a model organism related to C. elegans.

Disclaimer: The views expressed on this site, including blog posts and static pages, do not necessarily reflect the opinions of the Genome Institute at Washington University, the Washington University School of Medicine, or Washington University in St. Louis.

Before diving in with both feet, next-generation sequencing neophytes might want to take a gander at a post by Dan Koboldt at MassGenomics where he describes his 10 commandments for good next-gen sequencing.

In his post, Koboldt breaks up his instructions into four categories: analysis, publications, data sharing and submissions, and research ethics and cost.

His list includes some oft repeated warnings. For example, he cautions against reinventing the wheel when it comes to developing analysis software, and, for pity’s sake, don’t invent any more words that end in “ome” or “omics.”

Some other no-no’s, according to Koboldt, include publishing results before they’ve been vetted properly, testing new methods on simulated data only, and taking “unfair advantage of submitted data.”

He also admonishes newcomers to think a little bit about the cost of analysis without which “your sequencing data, your $1,000 genome, is about as useful as a chocolate teapot,” and to have a care for the privacy of their study participants’ samples and data.

Ten Commandments for Next-Gen Sequencing

10 ngs commandmentsJust as the reach of next-generation sequencing has continued to grow — in both research and clinical realms — so too has the community of NGS users.  Some have been around since the early days. The days of 454 and Solexa sequencing. Since then, the field has matured at an astonishing pace. Many standards were established to help everyone make sense of this flood of data. The recent democratization of sequencing has made next-gen sequencing available to just about anyone.

And yet, there have been growing pains. With great power comes great responsibility. To help some of the newcomers into the field, I’ve drafted these ten commandments for next-gen sequencing.

NGS Analysis

1. Thou shalt not reinvent the wheel. In spite of rapid technological advances, NGS is not a new field. Most of the current “workhorse” technologies have been on the market for a couple of years or more. As such, we have a plethora of short read aligners, de novo assemblers, variant callers, and other tools already. Even so, there is a great temptation for bioinformaticians to write their own “custom scripts” to perform these tasks. There’s a new “Applications Note” every day with some tool that claims to do something new or better.

Can you really write an aligner that’s better than BWA? More importantly, do we need one? Unless you have some compelling reason to develop something new (as we did when we developed SomaticSniper and VarScan), take advantage of what’s already out there.

2. Thou shalt not coin any new term ending with “ome” or “omics”. We have enough of these already, to the point where it’s getting ridiculous. Genome, transcriptome, and proteome are obvious applications of this nomenclature. Epigenome, sure. But the metabolome, interactome, and various other “ome” words are starting to detract from the naming system. The ones we need have already been coined. Don’t give in to the temptation.

3. Thou shall follow thy field’s conventions for jargon. Technical terms, acronyms, and abbreviations are inherent to research. We need them both for precision and brevity. When we get into trouble is when people feel the need to create their own acronyms when a suitable one already exists. Is there a significant difference between next-generation sequencing (NGS), high-throughput sequencing (HTS), and massively parallel sequencing (MPS)?

Widely accepted terms provide something of a standard, and they should be used whenever possible. Insertion/deletion variants are indels, not InDels or INDELs DIPs. Structural variants are SVs, not SVars or GVs. We don’t need any more acronyms!

NGS Publications

These commandments address behaviors that get on my nerves, both as a blogger and a peer reviewer.

4. Thou shalt not publish by press release. This is a disturbing trend that seems to happen more and more frequently in our field: the announcement of “discoveries” before they have been accepted for publication. Peer review is the required vetting process for scientific research. Yes, it takes time and yes, your competitors are probably on the verge of the same discovery. That doesn’t mean you get to skip ahead and claim credit by putting out a press release.

There are already examples of how this can come back to bite you. When the reviewers trash your manuscript, or (gasp) you learn that a mistake was made, it looks bad. It reflects poorly on the researchers and the institution, both in the field and in the eyes of the public.

5. Thou shalt not rely only on simulated data. Often when I read a paper on a new method or algorithm, they showcase it using simulated data. This often serves a noble purpose, such as knowing the “correct” answer and demonstrating that your approach can find it. Even so, you’d better apply it to some real data too. Simulations simply can’t replicate the true randomness of nature and the crap-that-can-go-wrong reality of next-gen sequencing. There’s plenty of freely available data out there; go get some of it.

6. Thou shalt obtain enough samples. One consequence of the rapid growth of our field (and accompanying drop in sequencing costs) is that small sample numbers no longer impress anyone. They don’t impress me, and they certainly don’t impress the statisticians upstairs. The novelty of exome or even whole-genome sequencing has long worn off. Now, high-profile studies must back their findings with statistically significant results, and that usually means finding a cohort of hundreds (or thousands) of patients with which to extend your findings.

This new reality may not be entirely bad news, because it surely will foster collaboration between groups that might otherwise not be able to publish individually.

Data Sharing and Submissions

7. Thou shalt withhold no data. With some exceptions, sequencing datasets are meant to be shared. Certain institutions, such as large-scale sequencing centers in the U.S., are mandated by their funding agencies to deposit data generated using public funds on a timely basis following its generation. Since the usual deposition site is dbGaP, this means that IRB approvals and dbGaP certification letters must be in hand before sequencing can begin.

Any researchers who plan to publish their findings based on sequencing datasets will have to submit them to public datasets before publication.This is not optional. It is not “something we should do when we get around to it after the paper goes out.” It is required to reproduce the work, so it should really be done before a manuscript is submitted. Consider this excerpt from Nature‘s publication guidelines:

Data sets must be made freely available to readers from the date of publication, and must be provided to editors and peer-reviewers at submission, for the purposes of evaluating the manuscript.

For the following types of data set, submission to a community-endorsed, public repository is mandatory. Accession numbers must be provided in the paper.

The policies go on to list various types of sequencing data:

  • DNA and RNA sequences
  • DNA sequencing data (traces for capillary electrophoresis and short reads for next-generation sequencing)
  • Deep sequencing data
  • Epitopes, functional domains, genetic markers, or haplotypes.

Every journal should have a similar policy; most top-tier journals already do. Editors and referees need to enforce this submission requirement by rejecting any manuscripts that do not include the submission accession numbers.

8. Thou shalt not take unfair advantage of submitted data. Many investigators are concerned about data sharing (especially when mandated upon generation, not publication) from fear of being scooped. This is a valid concern. When you submit your data to a public repository, others can find it and (if they meet the requirements) use it. Personally, I think most of these fears are not justified — I mean, have you ever tried to get data out of dbGaP? The time it takes for someone to find, request, obtain, and use submitted data should allow the producers of the data to write it up.

Large-scale efforts to which substantial resources have been devoted — such as the Cancer Genome Atlas — have additional safeguards in place. Their data use policy states that, for a given cancer type, submitted data can’t be used until the “marker paper” has been published. This is a good rule of thumb for the NGS community, and something that journal editors (and referees) haven’t always enforced.

Just because you can scoop someone doesn’t mean that you should. It’s not only bad karma, but bad for your reputation. Scientists have long memories. They will likely review your manuscript or grant proposal sometime in the future. When that happens, you want to be the person who took the high road.

Research Ethics and Cost

9. Thou shalt not discount the cost of analysis. It’s true that since the advent of NGS technology, the cost of sequencing has plummeted. The cost of analysis, however, has not. And making sense of genomic data — alignment, quality control, variant calling, annotation, interpretation — is a daunting task indeed. It takes computational resources as well as expertise. This infrastructure is not free; in fact, it can be more expensive than the sequencing itself. 

Without analysis, your sequencing data, your $1,000 genome, is about as useful as a chocolate teapot.

10. Thou shalt honor thy patients and their samples. Earlier this month, I wrote about how supposedly anonymous individuals from the CEPH collection were identified using a combination of genetic markers and online databases. It is a simple fact that we can no longer guarantee a sequenced sample’s anonymity. That simple fact, combined with our growing ability to interpret the possible consequences of an individual genome, means a great deal of risk for study volunteers.

We must safeguard the privacy of study participants — and find ways to protect them from privacy violations and/or discrimination — if we want their continued cooperation.

This means obtaining good consent documents and ensuring that they’re all correct before sequencing begins. It also means adhering to the data use policies those consents specify. As I’ve written before, samples are the new commodity in our field. Anyone can rent time on a sequencer. If you don’t make an effort to treat your samples right, someone else will.

Related Posts:


Dan Koboldt’s Publications

Bose R, Kavuri SM, Searleman AC, Shen W, Shen D, Koboldt DC, Monsey J, Goel N, Aronson AB, Li S, Ma CX, Ding L, Mardis ER, & Ellis MJ (2013).Activating HER2 mtations in HER2 gene amplification negative breast cancer. Cancer discovery PMID: 23220880

The 1000 Genomes Project Consortium (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56-65. DOI: 10.1038/nature11632

Cancer Genome Atlas Network (2012). Comprehensive molecular portraits of human breast tumours. Nature, 490 (7418), 61-70 PMID:23000897

Ellis MJ, Ding L, Shen D, Luo J, Suman VJ, Wallis JW, Van Tine BA, Hoog J, Goiffon RJ, Goldstein TC, Ng S, Lin L, Crowder R, Snider J, Ballman K, Weber J, Chen K, Koboldt DC, Kandoth C, Schierding WS, McMichael JF, Miller CA, Lu C, Harris CC, McLellan MD, Wendl MC, DeSchryver K, Allred DC, Esserman L, Unzeitig G, Margenthaler J, Babiera GV, Marcom PK, Guenther JM, Leitch M, Hunt K, Olson J, Tao Y, Maher CA, Fulton LL, Fulton RS, Harrison M, Oberkfell B, Du F, Demeter R, Vickery TL, Elhammali A, Piwnica-Worms H, McDonald S, Watson M, Dooling DJ, Ota D, Chang LW, Bose R, Ley TJ, Piwnica-Worms D, Stuart JM, Wilson RK, & Mardis ER (2012). Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature, 486 (7403), 353-60 PMID: 22722193

Welch JS, Ley TJ, Link DC, Miller CA, Larson DE, Koboldt DC, Wartman LD, Lamprecht TL, Liu F, Xia J, Kandoth C, Fulton RS, McLellan MD, Dooling DJ, Wallis JW, Chen K, Harris CC, Schmidt HK, Kalicki-Veizer JM, Lu C, Zhang Q, Lin L, O’Laughlin MD, McMichael JF, Delehaunty KD, Fulton LA, Magrini VJ, McGrath SD, Demeter RT, Vickery TL, Hundal J, Cook LL, Swift GW, Reed JP, Alldredge PA, Wylie TN, Walker JR, Watson MA, Heath SE, Shannon WD, Varghese N, Nagarajan R, Payton JE, Baty JD, Kulkarni S, Klco JM, Tomasson MH, Westervelt P, Walter MJ, Graubert TA, DiPersio JF, Ding L, Mardis ER, & Wilson RK (2012). The origin and evolution of mutations in acute myeloid leukemia. Cell, 150 (2), 264-78 PMID: 22817890

Cancer Genome Atlas Network (2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature, 487(7407), 330-7 PMID: 22810696

Dees ND, Zhang Q, Kandoth C, Wendl MC, Schierding W, Koboldt DC, Mooney TB, Callaway MB, Dooling D, Mardis ER, Wilson RK, & Ding L (2012). MuSiC: identifying mutational significance in cancer genomes.Genome research, 22 (8), 1589-98 PMID: 22759861

Walter MJ, Shen D, Ding L, Shao J, Koboldt DC, Chen K, Larson DE, McLellan MD, Dooling D, Abbott R, Fulton R, Magrini V, Schmidt H, Kalicki-Veizer J, O’Laughlin M, Fan X, Grillot M, Witowski S, Heath S, Frater JL, Eades W, Tomasson M, Westervelt P, DiPersio JF, Link DC, Mardis ER, Ley TJ, Wilson RK, & Graubert TA (2012). Clonal architecture of secondary acute myeloid leukemia. The New England journal of medicine, 366(12), 1090-8 PMID: 22417201

Matsushita H, Vesely MD, Koboldt DC, Rickert CG, Uppaluri R, Magrini VJ, Arthur CD, White JM, Chen YS, Shea LK, Hundal J, Wendl MC, Demeter R, Wylie T, Allison JP, Smyth MJ, Old LJ, Mardis ER, & Schreiber RD (2012).Cancer exome analysis reveals a T-cell-dependent mechanism of cancer immunoediting. Nature, 482 (7385), 400-4 PMID: 22318521

Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, & Wilson RK (2012). VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research PMID: 22300766

Koboldt DC, Larson DE, Chen K, Ding L, & Wilson RK (2012). Massively parallel sequencing approaches for characterization of structural variation. Methods in molecular biology (Clifton, N.J.), 838, 369-84 PMID:22228022

Graubert TA, Shen D, Ding L, Okeyo-Owuor T, Lunn CL, Shao J, Krysiak K, Harris CC, Koboldt DC, Larson DE, McLellan MD, Dooling DJ, Abbott RM, Fulton RS, Schmidt H, Kalicki-Veizer J, O’Laughlin M, Grillot M, Baty J, Heath S, Frater JL, Nasim T, Link DC, Tomasson MH, Westervelt P, DiPersio JF, Mardis ER, Ley TJ, Wilson RK, & Walter MJ (2011). Recurrent mutations in the U2AF1 splicing factor in myelodysplastic syndromes. Nature genetics, 44 (1), 53-7 PMID: 22158538

Larson DE, Harris CC, Chen K, Koboldt DC, Abbott TE, Dooling DJ, Ley TJ, Mardis ER, Wilson RK, & Ding L. (2011). SomaticSniper: Identification of Somatic Point Mutations in Whole Genome Sequencing Data.Bioinformatics, Online : doi: 10.1093/bioinformatics/btr665

Cancer Genome Atlas Research Network (2011). Integrated genomic analyses of ovarian carcinoma. Nature, 474 (7353), 609-15 PMID:21720365

Marth GT, Yu F, Indap AR, Garimella K, et al & the 1000 Genomes Project (2011). The functional spectrum of low-frequency coding variation.Genome biology, 12 (9) PMID: 21917140

Ross JA, Koboldt DC, Staisch JE, Chamberlin HM, Gupta BP, Miller RD, Baird SE, & Haag ES (2011). Caenorhabditis briggsae recombinant inbred line genotypes reveal inter-strain incompatibility and the evolution of recombination. PLoS genetics, 7 (7) PMID: 21779179

Bowne SJ, Humphries MM, Sullivan LS, Kenna PF, Tam LC, Kiang AS, Campbell M, Weinstock GM, Koboldt DC, Ding L, Fulton RS, Sodergren EJ, et al (2011). A dominant mutation in RPE65 identified by whole-exome sequencing causes retinitis pigmentosa with choroidal involvement. European journal of human genetics : EJHG, 19 (10) PMID:21938004

Link DC, Schuettpelz LG, Shen D, Wang J, Walter MJ, Kulkarni S, Payton JE, Ivanovich J, Goodfellow PJ, Le Beau M, Koboldt DC, Dooling DJ, Fulton RS, et al (2011). Identification of a novel TP53 cancer susceptibility mutation through whole-genome sequencing of a patient with therapy-related AML. JAMA : the journal of the American Medical Association, 305 (15), 1568-76 PMID: 21505135

Ley T, Ding L, Walter M, McLellan M, Lamprecht T, Larson D, Kandoth C, Payton J, Baty J, Welch J, Harris C, Lichti C, Townsend R, Fulton R, Dooling D, Koboldt D, et al. (2010). DNMT3A Mutations in Acute Myeloid Leukemia
New England Journal of Medicine DOI: 10.1056/NEJMoa1005143

Ding L, Wendl MC, Koboldt DC, & Mardis ER (2010). Analysis of next-generation genomic data in cancer: accomplishments and challenges. Human Molecular Genetics, 19 (R2):R188-96. PMID:20843826

Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J, 1000 Genomes Project, & Eichler EE (2010). Diversity of human copy number variation and multicopy genes. Science (New York, N.Y.), 330 (6004), 641-6 PMID: 21030649

The 1000 Genomes Project Consortium (2010). A map of human genome variation from population-scale sequencing. Nature, 467(7319), 1061-1073 DOI: 10.1038/nature09534

Bowne SJ, Sullivan LS, Koboldt DC, Ding L, Fulton R, Abbott RM, Sodergren EJ, Birch DG, Wheaton DH, Heckenlively JR, Liu Q, Pierce EA, Weinstock GM, & Daiger SP (2010). Identification of Disease-Causing Mutations in Autosomal Dominant Retinitis Pigmentosa (adRP) Using Next-Generation DNA Sequencing. Investigative ophthalmology & visual science PMID: 20861475

Fehniger, T., Wylie, T., Germino, E., Leong, J., Magrini, V., Koul, S., Keppel, C., Schneider, S., Koboldt, D., Sullivan, R., Heinz, M., Crosby, S., Nagarajan, R., Ramsingh, G., Link, D., Ley, T., & Mardis, E. (2010). Next-generation sequencing identifies the natural killer cell microRNA transcriptome Genome Research DOI: 10.1101/gr.107995.110

Ramsingh G, Koboldt DC, Trissal M, Chiappinelli KB, Wylie T, Koul S, Chang LW, Nagarajan R, Fehniger TA, Goodfellow P, Magrini V, Wilson RK, Ding L, Ley TJ, Mardis ER, & Link DC (2010). Complete characterization of the microRNAome in a patient with acute myeloid leukemia. BloodPMID: 20876853

Koboldt DC, Ding L, Mardis ER & Wilson RK. (2010). Challenges of sequencing human genomes. Briefings in Bioinformatics DOI:10.1093/bib/bbq016

Ding L, Ellis MJ, Li S, Larson DE, Chen K, Wallis JW, Harris CC, McLellan MD, Fulton RS, Fulton LL, Abbott RM, Hoog J, Dooling DJ, Koboldt DC, et al. (2010). Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature, 464 (7291), 999-1005 PMID:20393555

Koboldt DC and Miller RD (2010). Identification of polymorphic markers for genetic mapping. Genomics: Essential Methods, In Press.

Koboldt DC, Staisch J, Thillainathan B, Haines K, Baird SE, Chamberlin HM, Haag ES, Miller RD, & Gupta BP (2010). A toolkit for rapid gene mapping in the nematode Caenorhabditis briggsae. BMC genomics, 11 (1) PMID: 20385026

Voora D, Koboldt DC, King CR, Lenzini PA, Eby CS, Porche-Sorbet R, Deych E, Crankshaw M, Milligan PE, McLeod HL, Patel SR, Cavallari LH, Ridker PM, Grice GR, Miller RD, & Gage BF (2010). A polymorphism in the VKORC1 regulator calumenin predicts higher warfarin dose requirements in African Americans. Clinical pharmacology and therapeutics, 87 (4), 445-51 PMID: 20200517

Zhang Q, Ding L, Larson DE, Koboldt DC, McLellan MD, Chen K, Shi X, Kraja A, et al (2009). CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. Bioinformatics (Oxford, England) PMID: 20031968

Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K, Koboldt DC, et al (2009). Recurring mutations found by sequencing an acute myeloid leukemia genome. The New England journal of medicine, 361(11), 1058-66 PMID: 19657110

Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, Weinstock GM, Wilson RK, & Ding L (2009). VarScan: variant detection in massively parallel sequencing of individual and pooled samples.Bioinformatics (Oxford, England), 25 (17), 2283-5 PMID: 19542151

Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore BH, McGrath S, Hickenbotham M, Cook L, Abbott R, Larson DE, Koboldt DC, et al (2008). DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature, 456 (7218), 66-72 PMID: 18987736

Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, et al (2008). Somatic mutations affect key pathways in lung adenocarcinoma. Nature, 455 (7216), 1069-75 PMID: 18948947

Cancer Genome Atlas Research Network (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455 (7216), 1061-8 PMID: 18772890

International HapMap Consortium (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449 (7164), 851-61 PMID: 17943122

Sabeti PC, Varilly P, Fry B, et al (2007). Genome-wide detection and characterization of positive selection in human populations. Nature, 449 (7164), 913-8 PMID: 17943131

Hillier LW, Miller RD, Baird SE, Chinwalla A, Fulton LA, Koboldt DC, & Waterston RH (2007). Comparison of C. elegans and C. briggsaegenome sequences reveals extensive conservation of chromosome organization and synteny. PLoS biology, 5 (7) PMID: 17608563

Stanley SL Jr, Frey SE, Taillon-Miller P, Guo J, Miller RD, Koboldt DC, Elashoff M, Christensen R, Saccone NL, & Belshe RB (2007). The immunogenetics of smallpox vaccination. The Journal of infectious diseases, 196 (2), 212-9 PMID: 17570108

Koboldt DC, Miller RD, & Kwok PY (2006). Distribution of human SNPs and its effect on high-throughput genotyping. Human mutation, 27(3), 249-54 PMID: 16425292

The International HapMap Consortium (2005). A haplotype map of the human genome. Nature, 437 (7063), 1299-1320 PMID: 16255080

Miller RD, Phillips MS, et al (2005). High-density single-nucleotide polymorphism maps of the human genome. Genomics, 86 (2), 117-26 PMID: 15961272

Other Writing by Dan Koboldt

Dan Koboldt is also the author of Get Your Baby to Sleep, a resource to help new parents whose baby won’t sleep with advice on establishing healthy baby sleep habits and handling baby sleep problems. He contributes to The Best of Twins and In Search of Whitetails blogs as well.

How would you like to start your own blog? See this guide to building a blog or website in 20 minutes. It walks you through setting up a site with open-source WordPress software, which happens to be what runs Massgenomics.


Other related articles on this Open Access Online Scientific Journal:

“Genome in a Bottle”: NIST’s new metrics for Clinical Human Genome Sequencing “Genome in a Bottle”: NIST’s new metrics for Clinical Human Genome Sequencing


DNA – The Next-Generation Storage Media for Digital Information


How Genome Sequencing is Revolutionizing Clinical Diagnostics


NGS Market: Trends and Development for Genotype-Phenotype Associations Research


What is the Future for Genomics in Clinical Medicine?


Genomically Guided Treatment after CLIA Approval: to be offered by Weill Cornell Precision Medicine Institute


Inaugural Genomics in Medicine – The Conference Program, 2/11-12/2013, San Francisco, CA


GSK for Personalized Medicine using Cancer Drugs needs Alacris systems biology model to determine the in silico effect of the inhibitor in its “virtual clinical trial”


arrayMap: Genomic Feature Mining of Cancer Entities of Copy Number Abnormalities (CNAs) Data


NGS Cardiovascular Diagnostics: Long-QT Genes Sequenced – A Potential Replacement for Molecular Pathology


Speeding Up Genome Analysis: MIT Algorithms for Direct Computation on Compressed Genomic Datasets


Clinical Genetics, Personalized Medicine, Molecular Diagnostics, Consumer-targeted DNA – Consumer Genetics Conference (CGC) – October 3-5, 2012, Seaport Hotel, Boston, MA


“CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics” lays the manifold multivariate systems analytical tools that has moved the science forward to a groung that ensures clinical application.


Read Full Post »

%d bloggers like this: